0

I am stuck with a CSV file with over 100,000 rows that contains product images from a provider. Here are the details of the issue, I would really appreciate some tips to help resolve this. Thanks.

The File has 1 Row per product and the following 4 columns. ID,URL,HEIGHT,WIDTH example: 1,http://i.img.com,100,200

Problem starts when a product has multiple images. Instead of having 1 row per image the file has more columns in same row.
example: 1,http://i.img.com,100,200,//i.img.com,20,100,//i.img.com,30,50

Note that only first image has "http://" remaining images start with "//"

There is no telling how many images per product hence no way to tell how many total columns per row or max columns.

How can I import this using SSIS or sql import wizard.

Also I need to do this on regular intervals.

Thank you for your help.

1
  • Have you tried contacting the provider to see whether they can supply the extract in a more usable format? This looks suspiciously like pivot table output - if they can supply you with the input instead of the output, it may be easier to import. Commented Sep 21, 2011 at 15:30

1 Answer 1

1

I don't think that you can use any standard SSIS task or wizard to do this. You're going to have to write some custom code which parses each line. You can do this in SSIS using VB code or you can import the file into a staging table that's just a single column to hold each row and do the parsing in SQL. SSIS will probably be faster for this kind of operation.

Another possibility is to preprocess the file using regex or a search-and-replace command. Try to get double-quotes around the image list then you should be able to import the whole file fine, with the quoted part going into a single column. Catching the start of the string should be easy enough given the "http:\" for which you can search. Determining where the end quote goes might be more of a problem.

A third potential solution would be to get the source to fix the data. Even if you can't get the images in separate rows (or another file with separate rows, which would be ideal), maybe you can get the double-quotes added from the source as part of the export. This would likely be less error-prone than using the search-and-replace method.

Good luck!

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Tom. I guess I will have to bite the bullet on this one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.