1

All the following must be done in C#. Parsing the SQL table (SQL Server) will be done using methods in System.Data.Odbc.

Let's assume I have two .csv files, fi1 and fi2. The first csv file has two columns id and val1, and the second csv has two columns as well, id and val2.

I would like to read the two files, and parse the output to one SQL table with the following columns: id, val1, val2.

The problem is that the two files may have different entries in the id columns: in other words, some id's may have a val1 value but no val2 value, and vice versa, or they might have both values.

The table should contain the union of the id columns in the two files.

Example:

File 1

enter image description here

File2

enter image description here

The way I would want the final SQL table to look like is this:

enter image description here

Note that each file might contain duplicates, and we would want to exclude the duplicates when parsing the SQL table.

The thought I had is to create two dictionaries, dict1 and dict2, where the key would be the id, and the value would be val1 and val2. Dictionaries will be used to make sure that duplicates are not included:

 Dictionary<string, string> dict1 = new Dictionary<string, string>();
 string[] header1 = new string[]{};

 using (StreamReader rdr = new StreamReader(fi1))
 {
     header1 = rdr.ReadLine().Split(',');
     while (!rdr.EndOfStream)
     {
          string ln = rdr.ReadLine();
          string[] split_ln = ln.Split(',');
          dict1.Add(split_ln[0], split_ln[1]);
     }
 }

 Dictionary<string, string> dict2 = new Dictionary<string, string>();
 string[] header2 = new string[]{};

 using (StreamReader rdr = new StreamReader(fi2))
 {
     header2 = rdr.ReadLine().Split(',');
     while (!rdr.EndOfStream)
     {
          string ln = rdr.ReadLine();
          string[] split_ln = ln.Split(',');
          dict2.Add(split_ln[0], split_ln[1]);
     }
 }

However, after adding each file to a dictionary, I am not sure how to match the id's of both dictionaries.

Would anyone have a good hint as to how to deal with this problem?

10
  • Why is anyone voting to close this post? It has both a reproducible example and shows what I have tried. I am not sure what's the problem here. Commented Dec 6, 2013 at 14:39
  • Not really, I have never used a Tuple. Could you please tell me how you suggest using it? Thanks! Commented Dec 6, 2013 at 14:47
  • Do you know if your data is clean? Do all keys in File1 have a match in File2 Is there a 1-1, 1-many, or many-many relationship between the files? And what should be done in the case where there is not a match? Commented Dec 6, 2013 at 14:57
  • The keys don't have to have a match. All I want is to take the union of the keys, and if a key doesn't have a corresponding value in one of the values, just assign null or something. Commented Dec 6, 2013 at 14:58
  • 1
    Why is it necessary to go through C#? I would import both CSV's into SQL Server and then use TSQL to join them. Commented Dec 6, 2013 at 15:17

1 Answer 1

1

I would do atually do a list of tuples to hold the values here instead of a dictionary so that all the information is in one place rather than matching keys, each tuple corresponds to a table record

var dict = new List<Tuple<string, string, string>>();
        using (StreamReader rdr = new StreamReader(fi1))
        {
            while (!rdr.EndOfStream)
            {
                string ln = rdr.ReadLine();
                string[] split_ln = ln.Split(',');
                dict.Add(new Tuple<string, string, string>(split_ln[0], split_ln[1],null));
            }
        }
        using (StreamReader rdr = new StreamReader(fi2))
        {
            while (!rdr.EndOfStream)
            {
                string ln = rdr.ReadLine();
                string[] split_ln = ln.Split(',');
                if (dict.Any(item => item.Item1 == split_ln[0]))
                {
                    var item = dict.Find(i => i.Item1 == split_ln[0]);
                    var newtuple = new Tuple<string, string, string>(item.Item1, item.Item2, split_ln[1]);
                    dict.Remove(item);
                    dict.Add(newtuple);
                }
                else
                {
                    dict.Add(new Tuple<string, string, string>(split_ln[0],null,split_ln[1]));
                }
            }
        }
Sign up to request clarification or add additional context in comments.

2 Comments

This sounds very simple and great! However, would it be possible to add an item to the tuple that only has val1 (in file1) and no value val2 in file2?
it actually does this, all the values in file one are added to the list, then when file two is being read it will check the items added from file 1, and update them if necessary, but will leave the ones that have no file 2 equivalent alone

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.