2

As part of a data cleansing exercise I need to correct the formatting of a csv file.

Due to poor formatting/lack of quotes an extra comma in a description field is breaking my DTS package.

So, to get around this I have created a simple C# script to find any line in the csv that contains more columns than the header row.

When the row contains more columns than the header I want to merge array item [10] and [11] into one column and then write the line to my new file - keeping all the other existing columns as they are.

Code:

    var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();

    if (headers == null) headers = new string[columns.Length];

    if (columns.Length != headers.Length)
    {
        // TODO - Linq to write comma separated string but merge column 10 and 11 of the array
        // writer.WriteLine(string.Join(delimiter, columns));
    }
    else
    {
        writer.WriteLine(string.Join(delimiter, columns));
    }

Unfortunately, my Linq writing skills are somewhat lacking, can someone please help me fill in the TODO.

1 Answer 1

3

Simply use for columns list instead of array. That will allow you to remove unnecessary columns after merge:

var columns = splitExpression.Split(line).Where(s => s != delimiter).ToList();

if (headers == null) headers = new string[columns.Count];

if (columns.Count != headers.Length)
{
    columns[10] = columns[10] + columns[11]; // combine columns here
    columns.RemoveAt(11);       
}

writer.WriteLine(string.Join(delimiter, columns));
Sign up to request clarification or add additional context in comments.

1 Comment

That works perfectly thank you Sergey... To be honest I'm a bit embarrassed I didn't see that in the first place

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.