Read from files and parse SQL table

Question

All the following must be done in C#. Parsing the SQL table (SQL Server) will be done using methods in System.Data.Odbc.

Let's assume I have two .csv files, fi1 and fi2. The first csv file has two columns id and val1, and the second csv has two columns as well, id and val2.

I would like to read the two files, and parse the output to one SQL table with the following columns: id, val1, val2.

The problem is that the two files may have different entries in the id columns: in other words, some id's may have a val1 value but no val2 value, and vice versa, or they might have both values.

The table should contain the union of the id columns in the two files.

Example:

File 1

enter image description here

File2

enter image description here

The way I would want the final SQL table to look like is this:

enter image description here

Note that each file might contain duplicates, and we would want to exclude the duplicates when parsing the SQL table.

The thought I had is to create two dictionaries, dict1 and dict2, where the key would be the id, and the value would be val1 and val2. Dictionaries will be used to make sure that duplicates are not included:

 Dictionary<string, string> dict1 = new Dictionary<string, string>();
 string[] header1 = new string[]{};

 using (StreamReader rdr = new StreamReader(fi1))
 {
     header1 = rdr.ReadLine().Split(',');
     while (!rdr.EndOfStream)
     {
          string ln = rdr.ReadLine();
          string[] split_ln = ln.Split(',');
          dict1.Add(split_ln[0], split_ln[1]);
     }
 }

 Dictionary<string, string> dict2 = new Dictionary<string, string>();
 string[] header2 = new string[]{};

 using (StreamReader rdr = new StreamReader(fi2))
 {
     header2 = rdr.ReadLine().Split(',');
     while (!rdr.EndOfStream)
     {
          string ln = rdr.ReadLine();
          string[] split_ln = ln.Split(',');
          dict2.Add(split_ln[0], split_ln[1]);
     }
 }

However, after adding each file to a dictionary, I am not sure how to match the id's of both dictionaries.

Would anyone have a good hint as to how to deal with this problem?

Why is anyone voting to close this post? It has both a reproducible example and shows what I have tried. I am not sure what's the problem here. — Mayou
– Mayou, Commented Dec 6, 2013 at 14:39
Not really, I have never used a Tuple. Could you please tell me how you suggest using it? Thanks! — Mayou
– Mayou, Commented Dec 6, 2013 at 14:47
Do you know if your data is clean? Do all keys in File1 have a match in File2 Is there a 1-1, 1-many, or many-many relationship between the files? And what should be done in the case where there is not a match? — Aaron Palmer
– Aaron Palmer, Commented Dec 6, 2013 at 14:57
The keys don't have to have a match. All I want is to take the union of the keys, and if a key doesn't have a corresponding value in one of the values, just assign null or something. — Mayou
– Mayou, Commented Dec 6, 2013 at 14:58
Why is it necessary to go through C#? I would import both CSV's into SQL Server and then use TSQL to join them. — Aaron Palmer
– Aaron Palmer, Commented Dec 6, 2013 at 15:17

theDarse · Accepted Answer · 2013-12-06 15:18:58Z

1

I would do atually do a list of tuples to hold the values here instead of a dictionary so that all the information is in one place rather than matching keys, each tuple corresponds to a table record

var dict = new List<Tuple<string, string, string>>();
        using (StreamReader rdr = new StreamReader(fi1))
        {
            while (!rdr.EndOfStream)
            {
                string ln = rdr.ReadLine();
                string[] split_ln = ln.Split(',');
                dict.Add(new Tuple<string, string, string>(split_ln[0], split_ln[1],null));
            }
        }
        using (StreamReader rdr = new StreamReader(fi2))
        {
            while (!rdr.EndOfStream)
            {
                string ln = rdr.ReadLine();
                string[] split_ln = ln.Split(',');
                if (dict.Any(item => item.Item1 == split_ln[0]))
                {
                    var item = dict.Find(i => i.Item1 == split_ln[0]);
                    var newtuple = new Tuple<string, string, string>(item.Item1, item.Item2, split_ln[1]);
                    dict.Remove(item);
                    dict.Add(newtuple);
                }
                else
                {
                    dict.Add(new Tuple<string, string, string>(split_ln[0],null,split_ln[1]));
                }
            }
        }

answered Dec 6, 2013 at 15:18

theDarse

7355 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mayou Over a year ago

This sounds very simple and great! However, would it be possible to add an item to the tuple that only has val1 (in file1) and no value val2 in file2?

theDarse Over a year ago

it actually does this, all the values in file one are added to the list, then when file two is being read it will check the items added from file 1, and update them if necessary, but will leave the ones that have no file 2 equivalent alone

Collectives™ on Stack Overflow

Read from files and parse SQL table

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related