2

I have a csv file contains columns with value '\\\n' and '\\\t' which is escaped new line and tab. However, i want to split each row into string array.

how to split specifically '\n' but not '\\\n'?

I am looking at Regex.Split is it right direction? I tried Regex.Split(input, @"[^\\]\n"); but the result seems correct but one character in front is always missing, supposedly is caused by [^\].

1
  • Does your file contain \r as well? Commented Aug 10, 2013 at 16:59

4 Answers 4

5

If you want to use Regex.Split then @"(?<!\\)\\n" matches \n but not \\n (\\\n as well for that matter) and would not cut anything off. The negative look behind (?<!\\) does not form part of the match so will not remove the extra character.

Sign up to request clarification or add additional context in comments.

1 Comment

glad it worked. I used this tool gskinner.com/RegExr for adhoc regex testing. Helped me to learn.
2

If you're parsing a CSV file, please try to use the TextFieldParser thats already in the framework. It will save you the headache of dealing with all the specific problems that come up when parsing a delimited file.


As mentioned below, it's part of the Microsoft.VisualBasic.dll, but this comes with the framework by default, you just need a reference. And even though it's called VisualBasic, it's in no way VB specific.

1 Comment

Just to add, TextFieldParser is only available within the Microsoft.VisualBasic namespace so that assembly has to be added to the project.
1
Regex.Split(input, @"[^\\]\n");

The problem with the regex above is that square brackets match only one character, and what they match is considered part of the match itself, meaning the character directly preceding \n will be considered part of the split string and treated accordingly.

I think what you are looking for is a negative look-behind, which is used as follows:

(?<!DO NOT MATCH THIS)match

Look-behinds and look-aheads ensure that a match exists without including the matched text as part of your match.

I assume what you are looking for is something like this:

Regex.Split(input, @"(?<!\\)\n");

Hope that helps!

Comments

0

How about this:

(?<=^|^[^\\]|[^\\]{2})\\(n|t)

This will account for \ns and \ts that are at the beginning or second position of the input string

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.