Splitting a string in five sic columns, faster solution

Hi guys any idea how can I reach the results

Country Lastname   Firstname DOB PB SB

having a string like

BLR Siratsiuk Illia 09/04/1994 10.54   10.66

ESP Rodrguez ngel David 25/04/1980   10.14 10.32

I thought about a charindex space but it doesn't work if I have double first name.

Any advice? Thanks

June 19th, 2015 9:26am

Hello - Can you try this:

  • I used string splitter function (Source: SQL Server Central ) to segregate the row contents and then
  • Used a table variable to form the columns
-- First Create This Function
ALTER FUNCTION [dbo].[fnSplitString] 
( 
    @string NVARCHAR(MAX), 
    @delimiter CHAR(1) 
) 
RETURNS @output TABLE(id smallint identity(1,1), splitdata NVARCHAR(MAX) 
) 
BEGIN 
    DECLARE @start INT, @end INT 
    SELECT @start = 1, @end = CHARINDEX(@delimiter, @string) 
    WHILE @start < LEN(@string) + 1 BEGIN 
        IF @end = 0  
            SET @end = LEN(@string) + 1
       
        INSERT INTO @output (splitdata)  
        VALUES(SUBSTRING(@string, @start, @end - @start)) 
        SET @start = @end + 1 
        SET @end = CHARINDEX(@delimiter, @string, @start)
        
    END 
    RETURN 
END

-- Rest of it
DECLARE @strData varchar(500)
-- SET @strData = 'BLR Siratsiuk Illia 09/04/1994 10.54 10.66'
SET @strData = 'ESP Rodrguez ngel David 25/04/1980 10.14 10.32'
DECLARE @tblData TABLE ( ID SmallInt, Val VARCHAR(100) )
DECLARE @strData2 VARCHAR(200)

INSERT INTO @tblData
SELECT * FROM dbo.fnSplitString(@strData, ' ')

SELECT @strData2 = 
COALESCE(@strData2 + ' ','') + Val FROM @tblData 
WHERE ID >= 3 AND ID <= (SELECT MAX(ID)-3 FROM @tblData ) 


SELECT * FROM
( SELECT Val AS Country FROM @tblData WHERE ID = 1) AS Country,
( SELECT @strData2 AS LastName ) AS LastName,
( SELECT Val AS FirstName FROM @tblData WHERE ID = 2) AS FirstName,
( SELECT Val AS SB FROM @tblData WHERE ID IN ( SELECT MAX(ID) FROM @tblData ) ) AS SB,
( SELECT Val AS PB FROM @tblData WHERE ID IN ( SELECT MAX(ID)-1 FROM @tblData ) ) AS PB,
( SELECT Val AS DOB FROM @tblData WHERE ID IN ( SELECT MAX(ID)-2 FROM @tblData ) ) AS DOB

Split Function URL: http://www.sqlservercentral.com/blogs/querying-microsoft-sql-server/2013/09/19/how-to-split-a-string-by-delimited-char-in-sql-server/

Hope this helps !

Free Windows Admin Tool Kit Click here and download it now
June 19th, 2015 10:47am

The answer partly depends:  if this example is your exact question, it's short enough to do in a somewhat simple way (keeping in mind 100 percent accuracy will never be possible for names or addresses).   Split functions convert strings into relational data, which although technically superior, is overkill, you want to split into multiple columns, not into a relational model (in this case).

If it's a more general question on a situation with lots more virtual columns, including names and addresses, then it's harder, because there is no way to absolutely and completely deal with double first names, hyphenated last names, and last names like "de la cruz".

Here's one of about a million ways to do it.  I like using APPLY, because it makes code cleaner, and particularly with string functions, helps you avoid calling string functions repeatedly, or nesting one inside another over and over.  My example dodged the firstname/lastname topic, but you could use the same basic method to imperfectly split on the first space in the name area.

Declare @ExampleData Table (Instr Varchar(999))
Insert @ExampleData
Select * From (  Values 
	('BLR Siratsiuk Illia 09/04/1994 10.54   10.66')
	, ('ESP Rodrguez ngel David 25/04/1980   10.14 10.32')
	, ('USA De La Hoya Jean David 30/11/1979 10.22 10.23')
	) as VT(VV)

Select  Substring(Instr, 1, NameStart - 1) as Countr
	, Substring(Instr, NameStart, DOBStart - NameStart) as Name
	, Substring(Instr, DOBStart + 1, 10) as DOB
	, Substring(Instr, PBStart,  SBStart + 1 - PBStart) as PB
	, Substring(Instr, SBStart + 1, 5) as SB
	, CA_001.*, ED.*
  From @Exampledata ED
 Cross apply (Select Charindex(' ', Instr) + 1 as NameStart
				, patindex('% [0-9][0-9][//]%', Instr) as DOBStart
				, patindex('% [0-9][0-9][.]%', Instr) as PBStart
				, len(Instr) - 5 as SBStart
			 ) as CA_001 

Results:

Countr	Name	DOB	PB	SB
BLR 	Siratsiuk Illia	09/04/1994	 10.54   	10.66
ESP 	Rodrguez ngel David	25/04/1980	 10.14 	10.32
USA 	De La Hoya Jean David	30/11/1979	 10.22 	10.23

At times, reversing the string adds some additional ways to deal with the data.  I bet someone will post an answer that uses reverse as part of the solution.

  • Edited by johnqflorida Friday, June 19, 2015 12:56 PM Fixed typo
June 19th, 2015 12:52pm

The problem isn't specific "2 first names"... The fact is that a person can have a middle name, a 2 part last name... Really and almost unlimited number of names... (I have 2 middle names myself).

The problem is that it's nearly impossible to determine with name parts are which when you're receiving then in a concatenated fashion like you have above...

My suggestion is to not even try to split the name up into first & last. Just keep it together as a single full name.

Since it looks like the name is always falling between the country code and the dob (which is a date) this become doable... Take a look at the following code...

SET DATEFORMAT 'DMY'

IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp;

IF OBJECT_ID('tempdb..#SplitValues') IS NOT NULL
DROP TABLE #SplitValues;


SELECT 
	x.ID,
	x.SomeString
	INTO #temp
FROM (VALUES 
	(1, 'BLR Siratsiuk Illia 09/04/1994 10.54   10.66'),
	(2, 'ESP Rodrguez ngel David 25/04/1980   10.14 10.32')
	) x (ID, SomeString);

-- Use a string splitting function to split the value on spaces and dump the results into a temp table
SELECT
	t.ID,
	ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY sc.ItemNumber) AS ItemNumber,	--Maintains an unbroken ItemNumber even after empty strings are removed
	sc.Item,
	ISDATE(sc.Item) AS [IsDate]
	INTO #SplitValues
FROM
	#temp t
	CROSS APPLY dbo.SplitCSV8K(t.SomeString, ' ') sc
WHERE 
	sc.Item <> '';	-- remove the empty strings that result from multiple, consecutive, spaces in the string

WITH GetStringLengths AS (	-- the CTE simply calculates the numeric break points that will be necessary for the final select.
	SELECT 
		sv.ID,
		MAX(sv.ItemNumber) AS LastNameItemNumber,
		SUM(CASE WHEN sv.ItemNumber = 1 THEN LEN(sv.Item) ELSE 0 END) +2 AS CountryLength,
		SUM(CASE WHEN sv.ItemNumber > 1 THEN LEN(sv.Item) ELSE 0 END) + COUNT(*) -1 AS NameLength
	FROM 
		#SplitValues sv
	WHERE 
		sv.ItemNumber < (SELECT TOP 1 sv2.ItemNumber FROM #SplitValues sv2 WHERE sv.ID = sv2.ID AND sv2.[IsDate] = 1)
	GROUP BY 
		sv.ID
) 
	-- The final select statement. (Note CASTing to VARCHAR(n) isn't 100% necessary but DOB, PB & SB should be cast to their correct datatypes.)
	SELECT
		sv.ID,
		CAST(MAX(CASE WHEN sv.ItemNumber = 1 THEN sv.Item END) AS VARCHAR(10)) AS Country,
		CAST(MAX(SUBSTRING(t.SomeString, gsl.CountryLength, gsl.NameLength)) AS VARCHAR(20)) AS FullName,
		CAST(MAX(CASE WHEN sv.ItemNumber = gsl.LastNameItemNumber + 1 THEN sv.Item END) AS DATE) AS DOB,
		CAST(MAX(CASE WHEN sv.ItemNumber = gsl.LastNameItemNumber + 2 THEN sv.Item END) AS DECIMAL(9,2)) AS PB,
		CAST(MAX(CASE WHEN sv.ItemNumber = gsl.LastNameItemNumber + 3 THEN sv.Item END) AS DECIMAL(9,2)) AS SB
	FROM
		#SplitValues sv
		JOIN #temp t
			ON sv.ID = t.ID
		JOIN GetStringLengths gsl
			ON t.ID = gsl.ID
	GROUP BY
		sv.ID;

The Results...

ID          Country    FullName                  DOB        PB        SB
----------- ---------- ------------------------- ---------- --------- ---------
1           BLR        Siratsiuk Illia           1994-04-09 10.54     10.66
2           ESP        Rodrguez ngel David     1980-04-25 10.14     10.32

Note: The splitter I use is a renames copy of Jeff Moden's "DelimitedSplit8k" function which can be found here... http://www.sqlservercentral.com/articles/Tally+Table/72993/

HTH,

Jason

Free Windows Admin Tool Kit Click here and download it now
June 21st, 2015 12:40am

First thing to look is whether the format is consistent

ie whether you'll have all the parts coming in every string. If any of the values is missing in any of strings then you cant handle all of them using a single logic

So I would approach this kind of scenario as below

1. Check if the string format is consistent

2. If not, try to group them into batches based on their format

3. Come up with a parsing logic for each group and then apply the logic separately to get values for each group

The parsing part you can use any of the below logic

http://visakhm.blogspot.ae/2010/02/parsing-delimited-string.html

http://visakhm.blogspot.ae/2013/01/delimited-string-split-xml-parsing.html

June 21st, 2015 2:07am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics