2

I have a malformed csv file which I need to fix:

  • The file is supposed to have one record per line but because of this formatting issue, it has a MS-DOS newline character (^M).
  • To make matters worse, the last field of the CSV file is a text field and it also contains this MS-DOS newline character so I can't simply replace ^M character.
  • But the good news is that the first field of the file is DATE field (MM/DD/YY)

So I tried to replace (\r\nMM/DD/YY) pattern by (\rMM/DD/YY) but it didn't work. Here is my code snippet:

fixed_content = re.sub(r"""\r\n\d{2})/\d{2}/\d{2}""", r"""\r\1/\2/\3""", malformed_content)

My problems are:

  1. I don't know how to represent ^M character as a pattern. I used \r\n
  2. I don't know how to refer to previous matches in the new replacing pattern. I used \1 for first MM pattern, \2 for next DD pattern and \3 for last YY pattern.
3
  • Previous matches must be grouped in brackets (...) for backtracking. Commented Nov 9, 2014 at 22:09
  • Thanks. Let me retry and see tomorrow! Commented Nov 10, 2014 at 5:40
  • 2
    For 1. that seems to be the correct way to represent the MS-DOS newline character. 2. Your replacement shouldn't have worked because your regex is malformed (contains unbalanced parentheses). You will have to capture every match you want to get back, so something like: r"""\r\n(\d{2})/(\d{2})/(\d{2})""" but it's so much easier to just do re.sub(r"""\r\n(\d{2}/\d{2}/\d{2})""", r"""\r\1""", malformed_content) Commented Nov 10, 2014 at 6:55

1 Answer 1

1

To match a date string in the form DD/MM/YY you can use the following regexp:

 \d{2}\/\d{2}\/\d{2}

If you then want to backreference the matched string you have to put it between brackets (...) like so:

 (\d{2}\/\d{2}\/\d{2})

The overall substitution command would then become:

fixed_content = re.sub(r"""\r\n(\d{2}\/\d{2}\/\d{2})""", r"""\r\1""", malformed_content)

Please note that I escaped the backslash \/ as this is sometimes required (in cases in which the backslash is used as a delimiter between match/replace strings). Modify to fit your needs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.