0

I wanted to know if there is a solution to the problem mentioned in the topic.

Example:

In my project I have to parse a lot of messages. These messages contain formatting characters like "\n" or "\r". The end of this message is always signed with the name of the author.

Now I want to remove the signatures from each message. The problem is that the end of the message could look like

  • \r\n\rDaniel Walters\n\r\n
  • \n\r\n\r\n\rDaniel

or something else

The problem is that I don't know how to identifiy these varying endings. I tried to only remove the last "\n\r\n"'s by calling string.EndsWith() in a loop but this solution only removes everything except "\r\n\rDaniel Walter". Then I tried to remove the author (I parsed it prior to this step) but this does not work either. Sometimes the parsed author is "Daniel Walters" and the signature is only "Daniel".

Any ideas how to solve this? Are there maybe some easier and smarter solutions than looping through the string?

3
  • Loving the irony of including a signature/salutation in a post asking how to remove them, thus proving that you can't just "fix the user". They're too broken. Commented Jan 23, 2013 at 19:37
  • Skadier what code are you using currently to do this parsing can you show what you have already tried..? Commented Jan 23, 2013 at 19:38
  • It has nothing to do with irony Servy. I have to replace these signatures with a customized one for a newspaper of my final exam class. Unfortunately the platform we are using does not provide any functionality to do this and I do not want to do this manually (we have 1000+ messages with "wrong" signatures) Commented Jan 23, 2013 at 19:45

5 Answers 5

2

You can make a regular expression to replace the name with an optional last name, and any number of whitespace characters before and after.

Example:

string message = "So long and thanks for all the fish  \t\t\r Arthur \t Dent  \r\r\n  ";
string firstName = "Arthur";
string lastName = "Dent";

string pattern = "\\s+" + Regex.Escape(firstName) + "(\\s+" + Regex.Escape(lastName) + ")?\\s*$";

message = Regex.Replace(message, pattern, String.Empty);

(Yes, I know it was really the dolphins saying that.)

Sign up to request clarification or add additional context in comments.

2 Comments

i will try this. looks promising at the first sight.
Rather than making the name optional, you could just have (firstname|lastname) and do a RemoveAll so that even a last name with no first name will be removed.
1

you could try something like the following (untested) :-

string str="\r\n\rDaniel Walters\n\r\n";
while(str.EndsWith("\r") || str.EndsWith("\n"))
{
  // \r and \n have the same length. So, we can use either \r or \n in the end
  str=str.SubString(0,str.Length - ("\r".Length));
}
while(str.StartsWith("\r") || str.StartsWith("\n"))
{
  // \r and \n have the same length
  str=str.SubString("\r".Length,str.length);
}

2 Comments

i did something similar to this but there is not only the signature. before the signature there is a whole lot more text. so this does not work but thanks for your solution.
Note that this is just a less efficient version of TrimEnd.
1

You'll have to determine what "looks like" a signature. Are there specific criteria that always apply?

  • Always followed by at least 3 newlines (\r or \n)
  • Starts with a capital letter
  • Has no following text

A regex like this might work for those criteria:

/[\r\n]{3,}[A-Z][\w ]+[\r\n]*(?!\w)/

Adjust according to your needs.

Edited to add: This should match the last "paragraph" of a document.

/([\r\n]+[\w ]+[\r\n]*)(?!.)/

4 Comments

Unfortunately it varies a lot and unfortunately I noticed it very late but thanks for this example.
That actually doesn't quite work, the lookahead needs tweaking. But it'll get you started. When doing text processing, you'll need to determine unique characteristics about what you're trying to match and use them.
You could also chop off the last paragraph from every page with something like this.
I just added to my answer. That regex will delete the last paragraph (delimited by any number of \r and \n) in your document. This assumes there is no additional text after the signature.
0

you can do this as well but I am not sure if your pattern changes but this will return Daniel Walter

string replaceStr = "\r\n\rDaniel Walters\n\r\n";
replaceStr = replaceStr.TrimStart(new char[] { '\r', '\n' });
replaceStr = replaceStr.TrimEnd(new char[] { '\r', '\n' });

or if you want to use the trim method you can do the following

string replaceStr = "\r\n\rDaniel Walters\n\r\n";
replaceStr = replaceStr.Trim();

Comments

0

A different approach could be to split your message at the newline chars removing the empty newline entries. Then reassembling the expected string excluding the last line where I assume there is always the signature.

string removeLastLine = "Text on the firstline\r\ntest on second line\rtexton third line\r\n\rDaniel Walters\n\r\n";
string[] lines = removeLastLine.Split(new char[] {'\r', '\n'},  StringSplitOptions.RemoveEmptyEntries);
lines = lines.Take(lines.Length - 1).ToArray();
string result = string.Join(Environment.NewLine, lines);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.