I am using Excel power query to combine and process data from multiple csv files located in a folder. All csv files have the same column structure (shown below) but a different number of rows or records.
| Year | Period | Week | District | Route | SaleAmt |
|---|---|---|---|---|---|
| 2024 | 10 | 1 | 123 | ABC | 1000 |
Here is the link to the sample data files: sample data files
Once data from all csv files is combined into one table, I create a new column "PK" by concatenating the values in fields "Period", "Week", "District" and "Route".
Next, to get the list of unique "PK" values in the combined data, I group by the "PK" column and aggregate the "SaleAmt" column using the operation "All Rows". This will create a new column containing tables of all rows grouped by the unique "PK" values.
Finally, I need to generate a table structure as follows, which maps the "SaleAmt" value from each csv file to the corresponding "PK" value:
| PK | Year | Period | Week | District | Route | File-1 SaleAmt | File-2 SaleAmt | File-3 SaleAmt |
|---|---|---|---|---|---|---|---|---|
| 101123ABC | 2024 | 10 | 1 | 123 | ABC | 1000 | 500 | 800 |
| 11469HBG | 2024 | 11 | 4 | 69 | HBG | null | null | 999 |
Note:
- Certain values in the "PK" column may not be present in some data files. In that case, the "SaleAmt" column corresponding to the data file should show as "null" or "not found" in the final table.
Here is what I have done so far: (M-code below)
let
Source = Folder.Files("Folder containing csv files"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
#"Added Custom" = Table.AddColumn(#"Expanded Table Column1", "PK", each Text.Combine({[Period],[Week],[District],[Route]},"")),
#"Grouped Rows" = Table.Group(#"Added Custom", {"PK"}, {{"SaleAmt", each _, type table [Source.Name=text, Year=text, Period=text, Week=text, District=text, Route=text, SaleAmt=text, PK=text]}})
in
#"Grouped Rows"
How do I get to the final table structure after creating a unique list of "PK" values? Any help / pointers will be appreciated.