I have around 300 hundred Excel files with sales data, but different schema (one has a column named "Product Name", another has only "Product") but contains the same information about sales from different shops. The files are generated manually by some people so typos are also possible. Is there any nice way to import this data or I have to create 300 ETL packages in SSIS?
-
so, is the number of columns and the order of columns will be the same ?Jayasurya Satheesh– Jayasurya Satheesh2017-11-30 08:40:19 +00:00Commented Nov 30, 2017 at 8:40
-
strategically I would 1) scan all files and extract all column names into a table 2) build a dictionary for source col name <=> target (=schema) col name 3) import all files using col name translation tableMikeD– MikeD2017-11-30 08:53:24 +00:00Commented Nov 30, 2017 at 8:53
-
@JayasuryaSatheesh Nope. the problem is files can have totally different schemas, one shop can send sales and purchases in one file, different sheets and another one can send in 2 files.Mikołaj Klimas– Mikołaj Klimas2017-11-30 09:08:31 +00:00Commented Nov 30, 2017 at 9:08
-
1Mikolaj Take a look at my answer on this topic stackoverflow.com/questions/47437513/…Hadi– Hadi2017-11-30 18:58:54 +00:00Commented Nov 30, 2017 at 18:58
-
1@KeithL I'm that junior person :DMikołaj Klimas– Mikołaj Klimas2017-12-02 13:24:28 +00:00Commented Dec 2, 2017 at 13:24
1 Answer
You can do this in these steps
- Get all excel files
- For each file
Script task to get column names and sheet names
Store column names and sheet names in ssis variables
EDIT: You can't easily do anything about the typos, the easiest thing I can do out of the box is to have a dictionary table ready to go of expected values and a fuzzy match transformation to check against the dictionary table
For each sheet
SQL task to create stage table if not exists
Script task to read from excel sheet and insert into table
And it is easy to find how to read the column names of an excel file dynamically with c#. I've done something similar with vb but below is an example of how to do with c#.
Sheet names
Column names
Also this guy is loading all the files in dynamically with out of the box ssis excel dataflow
To create the table you will need to create the SQL statement to create table and then the SQL statement to insert into the table you created.