Parsing domain names form URL

I have a need to parse domain names out of URL strings, Has anyone done this before or can point me to the right directions?

 

 

thanks,

Rok

March 21st, 2011 6:02pm

Are you using SSIS to achieve this? If there is a pattern in the url string (which should be there I suppose) then you can extract domain name from the url string using string
Free Windows Admin Tool Kit Click here and download it now
March 21st, 2011 6:06pm

1. Define a variable

2. Get the domain name from url using Request.Url.Host

3. Assign that value to the variable declared in Step1 (Like Dts.Variables("variableName")=<ValueFromStep2>).

March 21st, 2011 8:38pm

You could also use a regex function in a Script Component. (but you need to understand .net)

You only need to find the correct regular expression. For example: http://regexlib.com/Search.aspx?k=URL

 

Free Windows Admin Tool Kit Click here and download it now
March 21st, 2011 10:33pm

Thanks all for your responses and tips! Although there were different ways to tackle, I needed the fastest way to accomplish it because of time constraint and it was for one time run only.  I picked up this UDF from some website and tailored it to my need, Now I cannot seem to find that address so that I can cite it here . But whoever the person  who wrote it, saved me lot of  time and got my small project done. So God Bless him!!!!

--I'm just pasting it here so somebody might use it later----

CREATE FUNCTION [dbo].[usp_Parse_domain_name_v5]  (@strURL varchar(1000))
RETURNS varchar(1000)
AS
BEGIN
IF CHARINDEX('http://',@strURL) > 0 OR CHARINDEX('https://',@strURL) > 0

SELECT @strURL = REPLACE(@strURL,'https://','')
SELECT @strURL = REPLACE(@strURL,'http://','')
SELECT @strURL = REPLACE(@strURL,'www','')
SELECT @strURL = REPLACE(@strURL,'www.','')
SELECT @strURL = REPLACE(@strURL,'ftp://','')
-- Remove everything after "/" if one exists
IF CHARINDEX('/',@strURL) > 0 (SELECT @strURL = LEFT(@strURL,CHARINDEX('/',@strURL)-1))

 --Optional: Remove subdomains but differentiate between www.google.com and www.google.com.au
--IF (LEN(@strURL)-LEN(REPLACE(@strURL,'.','')))/LEN('.') < 3 -- if there are less than 3 periods
--SELECT @strURL = PARSENAME(@strURL,2) + '.' + PARSENAME(@strURL,1)
--ELSE -- It's likely a google.co.uk, or google.com.au
--SELECT @strURL = PARSENAME(@strURL,3) + '.' + PARSENAME(@strURL,2) + '.' + PARSENAME(@strURL,1)
RETURN @strURL
END

GO



  • Marked as answer by rok1 Tuesday, March 22, 2011 10:47 PM
March 23rd, 2011 1:46am

Try this instead...

ALTER FUNCTION dbo.spExtractDomainFromURL ( @strURL NVARCHAR(1000) )
RETURNS NVARCHAR(100)
--Posted at http://stackoverflow.com/a/20808097/391101
AS
    BEGIN
        --Strip Protocol
        SELECT  @strURL = SUBSTRING(@strURL, CHARINDEX('://', @strURL) + 3, 999)

        -- Strip www subdomain
        IF LEFT(@strURL, 4) = 'www.'
            SELECT  @strURL = SUBSTRING(@strURL, 5, 999)

        -- Strip Path
        IF CHARINDEX('/', @strURL) > 0
            SELECT  @strURL = LEFT(@strURL, CHARINDEX('/', @strURL) - 1)

        --Unless you iterate through a list of TLDs, you can't differentiate between subdomain.example.com and example.com.au
        --I decided against this because it's slower, and the TLD list requires maintenance

        RETURN @strURL
    END

  • Edited by harvest316 Friday, December 27, 2013 9:40 PM
  • Proposed as answer by harvest316 Friday, December 27, 2013 9:40 PM
Free Windows Admin Tool Kit Click here and download it now
December 28th, 2013 12:02am

URLtoDomain saved me hours when I had to disavow bad links (in order to do that, you have to parse domains from URLs). Maybe you'll find it useful too.
January 1st, 2014 11:00pm

Have you got a link for this URLtoDomain thingy?

Free Windows Admin Tool Kit Click here and download it now
January 2nd, 2014 1:30am

It works perfectly.
February 19th, 2015 2:01am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics