2

I have a string in the following format in a comma delimited file:

someText, "Text with, delimiter", moreText, "Text Again"

What I need to do is create a method that will look through the string, and will replace any commas inside of quoted text with a dollar sign ($).

After the method, the string will be:

someText, "Text with$ delimiter", moreText, "Text Again"

I'm not entirely good with RegEx, but would like to know how I can use regular expressions to search for a pattern (finding a comma in between quotes), and then replace that comma with the dollar sign.

3
  • 3
    This looks like CSV. Is that just a coincidence? If this is CSV, you should know that CSV is not a 'regular' language and thus cannot be completely and correctly parsed via a regular expression in all cases. See comments and answers to this question: stackoverflow.com/questions/1189416/… Commented Jul 21, 2011 at 0:25
  • If this is just a hack on the way to Split(','), you should certainly use a CSV parser. What would you do if the string contained a $, by the way (1,2,"$5.4",6)? Commented Jul 21, 2011 at 4:45
  • @Daniel - Actually, valid CSV is a regular language (as long as you don't count all rows have the same number of unknown columns). It doesn't contain any nesting, or any context to consider. Commented Jul 21, 2011 at 4:52

5 Answers 5

3

Personally, I'd avoid regexes here - assuming that there aren't nested quote marks, this is quite simple to write up as a for-loop, which I think will be more efficient:

var inQuotes = false;
var sb = new StringBuilder(someText.Length);

for (var i = 0; i < someText.Length; ++i)
{
    if (someText[i] == '"')
    {
        inQuotes = !inQuotes;
    }

    if (inQuotes && someText[i] == ',')
    {
        sb.Append('$');
    }
    else
    {
        sb.Append(someText[i]);
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Yeah I was thinking that due to the sheer amount of possibilities regarding pattern matching are big, that I was afraid that regexs wouldn't be a possibility. However, this is a pretty good algorithm for stepping through the string itself.
@Hans Gruber - It's actually pretty easy with a regular expression. RegEx.Replace allows you to provide a delegate for doing the replacement once you've found the match, as shown in my answer.
1

This type of problem is where Regex fails, do this instead:

    var sb = new StringBuilder(str);

    var insideQuotes = false;

    for (var i = 0; i < sb.Length; i++)
    {
        switch (sb[i])
        {
            case '"':
                insideQuotes = !insideQuotes;
                break;
            case ',':
                if (insideQuotes)
                    sb.Replace(',', '$', i, 1);
                break;
        }               
    }

    str = sb.ToString();

You can also use a CSV parser to parse the string and write it again with replaced columns.

Comments

1

Here's how to do it with Regex.Replace:

        string output = Regex.Replace(
            input,
            "\".*?\"",
            m => m.ToString().Replace(',', '$'));

Of course, if you want to ignore escaped double quotes it gets more complicated. Especially when the escape character can itself be escaped.

Assuming the escape character is \, then when trying to match the double quotes, you'll want to match only quotation marks which are preceded by an even number of escape characters (including zero). The following pattern will do that for you:

string pattern = @"(?<=((^|[^\\])(\\\\){0,}))"".*?(?<=([^\\](\\\\){0,}))""";

A this point, you might prefer to abandon regular expressions ;)

UPDATE:

In reply to your comment, it is easy to make the operation configurable for different quotation marks, delimiters and placeholders.

        string quote = "\"";
        string delimiter = ",";
        string placeholder = "$";

        string output = Regex.Replace(
            input,
            quote + ".*?" + quote,
            m => m.ToString().Replace(delimiter, placeholder));

1 Comment

Hmm....let's say that I wanted to allow the user to specify the delimiter of the file (anything, other than a comma), and specify the quote as well. How would I change this Regex expression to be dynamic?
0

If you'd like to go the regex route here's what you're looking for:

var result = Regex.Replace( text, "(\"[^,]*),([^,]*\")", "$1$$$2" );

The problem with regex in this case is that it won't catch "this, has, two commas".

1 Comment

This wont work for: someText, ""Text with, delimiter"", ""text,comma"", moreText, ""Text Again"", ""text,comma""
-2

Can you give this a try: "[\w ],[\w ]" (double quotes included)? And be careful with the replacement because direct replacement will remove the whole string enclosed in the double quotes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.