Software Projects

VBScript Excel Macro for Parsing URL Strings

Posted in Uncategorized by rmt on April 11, 2010

This VBScript (Microsoft Excel macro) function is based on part of a script that organizes a list of URLs.

The script takes a list of URLs and extracts the top-level and second-level domains using regular expression matching. For example, it extracts microsoft.com out of a URL such as msdn.microsoft.com/en-us/aa570309.aspx.

The company this was written for had a set of URLs gathered from an outside data source indicating the most-trafficked websites in their industry and formatted as an ordered range of cells in Excel. They needed to compare this set of URLs against a list of partner URLs to visualize their partner coverage in different geographical areas.

To do this comparison, the URLs needed to be standardized to show only the top-level and second-level domains. The final step (which is not shown here) was to do an array-based comparison and graphically show their coverage.

VBScript supports regular expression matching, which makes this URL parsing straightforward:

'Truncate URLs to show top and second-level domains only, e.g. "microsoft.com"
Function ExtractDomain(Str As String) As String
 'Remove everything after the first slash
 Str = ParseURL(Str, "/(.+)", "")

 'Remove trailing slashes
 Str = ParseURL(Str, "/$", "")

 'Extract top and second-level domains
 Str = ParseURL(Str, "[a-zA-Z0-9._%+-]+\.([a-zA-Z0-9._%+-]+\.)", "$1")

 'Make lowercase
 ExtractDomain = LCase(Str)
End Function

'Search for the a regular expression in a string, and replace the regular expression
Function ParseURL(Source As String, _
 MatchString As String, ReplacementString As String) As String
 Option Explicit
 #Const LateBind = True
 #If Not LateBind Then
 Dim re As RegExp
 Set re = New VBScript_RegExp_55.RegExp
 #Else
 Dim re As Object
 Set re = CreateObject("vbscript.regexp")
 #End If
 With re
 .MultiLine = False
 .Global = True
 .Pattern = MatchString
 End With
 ParseURL = re.Replace(Source, ReplacementString)
End Function

Screenshot

Screenshot of Excel Window

Template contains a VBScript macro that parses URLs to show only top-level and second level domains. The contents of column B were automatically extracted from column A by the macro.

Download Link

Excel Template Containing VBScript

(Works with Excel 97-2003 and higher)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s