Importing CSV file to SQL Server, Code Page and Data Type Issues
I have a series of CSV files that I need to import daily into a SQL server database. I am having a particular issue with a field: LoanNumber. LoanNumber can be either a very long number or text, my issue is the data needs to eventually get into a varchar field and depending on how I try to move it through SSIS, either the text gets Nulled or some of the numbers end up as text in scientific notation. The origin of these files are CSV files I download. The resulting file is a CSV that uses Code Page 65001 (UTF-8) where some LoanNumber's are text, some numeric and some in scientific notation, the Text is Qualified by "" and comma delimiter is a ','. My goal is to get this field into a SQL Server table as a varchar, which requires Code Page 1252 as part of the transformation. Part of my issue is the different ways Excel handles the CSV file depending on how it is opened. If I use Windows Explorer and double click on the file name, it will open in Excel with no wizard or prompting, I can widen and change the LoanNumber column to a "Number" type which converts all the scientific notation to numbers and save the file. What is odd though, is even after saving the file using the save button, and without making additional changes, when I close the file Excel prompt's me to save again. If I save with this prompt, Excel saves the CSV as an ANSI 1252 file that is tab delimited (according to SSIS) even though it is still a CSV file. On the other hand, if I have Excel open and try to open the CSV file, it gives me the text conversion wizard. My question is this: is there a way in Excel to force a CSV file to be saved in the ANSI 1252, tab delimited format, or is there some other way to save the file that is easy for SSIS to interpret? I have been converting all of my CSV files to Excel files, and using SSIS with Excel data sources, but Excel will implicitly convert some of my numbers to Scientific Notation at times, even if the column is a Number column I've found, so I don't think Excel data sources for SSIS are the right answer.
August 7th, 2012 1:43pm

here is something you can try: http://publib.boulder.ibm.com/infocenter/ts7700/cust/index.jsp?topic=%2Fcom.ibm.storage.ts7740.doc%2Fts7740_ua_preserve_leading_zeros.html Please mark the post as answered if it answers your question
Free Windows Admin Tool Kit Click here and download it now
August 7th, 2012 2:05pm

Excel is not unicode aware, and its driver. You just need some data cleansing and 1st of all try to set the connection property to it with IMEX=1 to instruct it to work for import.Arthur My Blog
August 7th, 2012 3:42pm

DotNetMonster, thanks for the link; I am still validating it will wok, but so far so good as at least a potential solution. ArthurZ, I appreciate the response, but unclean data isn't my problem: 2005983478650234 and FirstLastName041512 are both potentially valid Loan Numbers in my data. I don't understand the connection property comment; I'll do my homework on IMEX=1 and see if I can figure it out.
Free Windows Admin Tool Kit Click here and download it now
August 7th, 2012 4:04pm

You need to persist the metadata, the "scientific notation" you get is because the text gets interpreted as a number, just set this field to string. Furthermore, if you get a CSV files why would you use Excel? Just import the CSV, they are text files easier to deal with than Excel Matt.Arthur My Blog
August 7th, 2012 8:03pm

Thanks Arthur, I'll try today just using the CSV files without ever opening them in Excel. Assuming that Excel is the scientific notation culprit (which makes sense, thanks!), I think the only issue I'll have to conquer is moving the data from Code Page 65001 (which is what a native CSV file comes to me as) to ANSI 1252, and deciding where in the process to do so. Thanks again for the help.
Free Windows Admin Tool Kit Click here and download it now
August 8th, 2012 8:18am

Thanks Arthur, I'll try today just using the CSV files without ever opening them in Excel. Assuming that Excel is the scientific notation culprit (which makes sense, thanks!), I think the only issue I'll have to conquer is moving the data from Code Page 65001 (which is what a native CSV file comes to me as) to ANSI 1252, and deciding where in the process to do so. Thanks again for the help.
August 8th, 2012 8:29am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics