1

I am using ParseCSV function to parse a CSV file in C#.

The last column in a row of CSV file contains: NM 120922C00002500(lots of spaces after this)

In ParseCSV function i am passing an inputstring, as a result of reading the CSV file.

A part of the inputstring is:

"1",000066,"07/30/2012","53193315D4","B ","99AAXXPB0"," "," "," ","CALL NM 09/22/12 00002.500 ","MG",100.00,1.050000,310,32550.00,25530.70,360,37800.00,30477.78,"C",2.50000,09/22/2012,"NM","NM 120922C00002500".

in the CSVParse function, am doing the following:

string csvParsingRegularExpressionOld = Prana.Global.ConfigurationHelper.Instance.GetAppSettingValueByKey("CSVParsingRegularExpression");
string csvParsingRegularExpression = csvParsingRegularExpressionOld.Replace("\\\\", "\\");

In csvParsingRegularExpression value comes out as:

((?<field>[^",\r\n]*)|"(?<field>([^"]|"")*)")(,|(?<rowbreak>\r\n|\n|$))

The I follow up with

Regex re = new Regex(csvParsingRegularExpression);

MatchCollection mc = re.Matches(inputString);

foreach (Match m in mc) 
{

   field = m.Result("${field}").Replace("\"\"", "\"");
}

But here field contains empty string when it comes to the last value "NM 120922C00002500". What may be the possible solution for this problem?

I dont know if there's a problem with the CSV file or with the regex method "Matches".

5
  • what is csvParsingRegularExpressionOld Commented Aug 1, 2012 at 10:06
  • Also, this is not the real code... "\" does not compile... Commented Aug 1, 2012 at 10:10
  • @digEmall "\"\"" compiles fine. It means a string that contains two double quotes. The two middle ones are escaped, thus part of the string. The last quote is not escaped, thus will mark the end of the string. Commented Aug 1, 2012 at 10:26
  • 1
    @Tormod - @digEmAll means the "\" on the second line. Commented Aug 1, 2012 at 10:39
  • 2
    Is it really really necessary to use regular expressions to parse this csv? string.split(',')? filehelpers? jet provider?Why of all possibilities regular expressions? Commented Aug 1, 2012 at 11:57

3 Answers 3

1

Don't use Regex to read CSV.

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

Sign up to request clarification or add additional context in comments.

Comments

0

If you don't absolutely want to use regex, here is a small class I made, followed by it's usage :

public class ParseHelper
{
    public char TextDelimiter { get; set; }
    public char TextQualifier { get; set; }
    public char EscapeCharacter { get; set; }

    public List<string> Parse(string str, bool keepTextQualifiers = false)
    {
        List<string> returnedValues = new List<string>();

        bool inQualifiers = false;
        string currentWord = "";

        for (int i = 0; i < str.Length; i++)
        {
            //Looking for EscapeCharacter.
            if (str[i] == EscapeCharacter)
            {
                i++;
                currentWord += str[i];
                continue;
            }

            //Looking for TextQualifier.
            if (str[i] == TextQualifier)
            {
                if (keepTextQualifiers)
                    currentWord += TextQualifier;

                inQualifiers = !inQualifiers;
                continue;
            }

            //Looking for TextDelimiter.
            if (str[i] == TextDelimiter && !inQualifiers)
            {
                returnedValues.Add(currentWord);
                currentWord = "";
                continue;
            }

            currentWord += str[i];
        }

        if (inQualifiers)
            throw new FormatException("The input string, 'str', is not properly formated.");

        returnedValues.Add(currentWord);
        currentWord = "";

        return returnedValues;
    }
}

Usage, based on your case :

ParseHelper ph = new ParseHelper() {
    TextDelimiter = ',',
    TextQualifier = '"',
    EscapeCharacter = '\'};
List<string> parsedLine = ph.Parse(unparsedLine);

Comments

0

You're not matching the last group because it ends with a period outside the quotes. If you add the period to the terminating group of your regex it works:

(\"?(?<field>[^",\r|\n]*)\"?\,?)*\.?(?<rowbreak>[\r|\n]*)

Although as other comments have pointed out, it's not a great idea to roll your own parser if the data is really valid CSV (I did't bother to check whether the given sample matches the spec). There are plenty of parsers available and you're likely to miss some edge cases.

1 Comment

Try the updated version, it should work. I tested it using regexpal.com by removing group names, since the tool doesn't support them. If you plug in (\"?([^",\r|\n]*)\"?\,?)*\.?([\r|\n]*) it should work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.