0

I am currently working on a prototype to create a package to crunch some data. There is a nightly CSV output from an Informix box for which there is an intention to make redundant. My plan is to read this CSV using BIDS, do stuff with the data, such as some basic cleaning and calculations and then insert those details into a SQL Server 2008 table.

I have only had mild exposure with SSIS, not quite enough to know which may be the best approach. I currently have a script task that is reading the data into a DataTable object. I have stopped for fear that this may not be the best approach.

To summarise;

Import CSV > Do stuff(calcs etc...) > insert into new Table.

Which combination of components would quickly and easily achieve this?

EDIT:

The data has nothing unique about it.

20140722|0000771935|000000000000012654|0000012775|      40.000-|      289.20-|       346.800 |        346.80 |       346.800 |GBP  |0|
20140722|0000771935|000000000000012654|0000012775|      40.000-|      289.20-|       346.800 |        346.80 |       346.800 |GBP  |0|
20140722|0000771935|000000000000012654|0000012775|      40.000-|      289.20-|       346.800 |        346.80 |       346.800 |GBP  |0|

That is a snippet of some of the rows. The format of some fields can vary.

000000000000012654 can become F021 or X00F5

This refer to SKU data and order/pallet quantity. These three rows are for a particular customer order/sku/date/order quantity/price/discounts/currency etc

As you can see they are all the same. The data has been like this for 15 years and why it has not been grouped is beyond my understanding. I'm fairly new at this business and this is a task they gave me. I expect these columns are 'SELECT' from a view that makes the rows unique. This is all I have to work with. Strange requirement.

2 Answers 2

1

Personally I wouldn't use SSIS but I prefer scripting solutions over UI solutions.

You can import the CSV into a staging table using any number of methods: BCP.EXE, BULK INSERT, OPENROWSET (and SSIS of course)

Then you can run required UPDATE/INSERT etc. on your staging table, writing log rows to a table if required.

Then move data to the final table again using UPDATE / INSERT

If you use BULK INSERT then this could all be written inside one stored procedure

If you would like more details post back and I will expand further.

I like the staging table approach because all of your workings can be seen in the staging table, as opposed to SSIS where calculations are performed on the fly and get pushed straight into the final table.

Sign up to request clarification or add additional context in comments.

Comments

1

The way I have done it is to call a rest web service to get the CSV. This is done through the first script task in the SSIS package. You may want to update a schedule table as one of the first steps aswell to update batch run info if more than one file is downloaded.

Once the CSV files are in a folder on the server - e.g. C:\Downloads, you add a flat file connection to the downloaded CSV file. Then create a data flow task, where you have the flat file source with an oledb source (for the database table that holds the data).

Then what you want is a sort for underneath each source. Then a merge join with a left outer join on ID - you can create a conditional split underneath this that gets new or exsisting depending on whether the ID is null or not (it will be null due to the outer join). Then underneath the conditional split, you will do an oledb update command for existing (ID not null) and a oledb destination for inserts (ID null - new records).

Here is the structure of one that I have done that is sheduled through SQL server agent:

flat file source (csv file)                       oledbsource (db table)
           |                                                 |
           |                                                 |
sort (by ID)                                      sort (by ID)
       |                                                 |
       |--------------------------------------------------
                               |
                           merge join (left outer)
                               |
                               |
                           conditional split (ID null = new, not null = existing)
                               |
                   [**your calculations here]
                               |
       existing-----------------------------------------new
       |                                                  |
oledb command (update table command)               oledb destination (insert)   

Regards, Rob

3 Comments

Good suggestion. Pardon for the lack of clarity on my part here, but - this data does not always have an ID. It's data based on views which needs to go into its own table. So with that in mind....
Is there any candidate key combinations you could do the merge join on? e.g. a combination of fields that would give a unique value? If not, maybe you could extract this data first into A staging table, then call a truncate SQLCommand on the staging table to clear the data out once you have your data in your operational/reporting table. Then you could add some data flow tasks for data cleansing and data translation if required.
I'm afraid not. This is what I looked for in the data. See my edit

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.