python 3 regex pattern replacement

Question

I have a malformed csv file which I need to fix:

The file is supposed to have one record per line but because of this formatting issue, it has a MS-DOS newline character (^M).
To make matters worse, the last field of the CSV file is a text field and it also contains this MS-DOS newline character so I can't simply replace ^M character.
But the good news is that the first field of the file is DATE field (MM/DD/YY)

So I tried to replace (\r\nMM/DD/YY) pattern by (\rMM/DD/YY) but it didn't work. Here is my code snippet:

fixed_content = re.sub(r"""\r\n\d{2})/\d{2}/\d{2}""", r"""\r\1/\2/\3""", malformed_content)

My problems are:

I don't know how to represent ^M character as a pattern. I used \r\n
I don't know how to refer to previous matches in the new replacing pattern. I used \1 for first MM pattern, \2 for next DD pattern and \3 for last YY pattern.

Previous matches must be grouped in brackets (...) for backtracking. — JoErNanO
– JoErNanO, Commented Nov 9, 2014 at 22:09
For 1. that seems to be the correct way to represent the MS-DOS newline character. 2. Your replacement shouldn't have worked because your regex is malformed (contains unbalanced parentheses). You will have to capture every match you want to get back, so something like: r"""\r\n(\d{2})/(\d{2})/(\d{2})""" but it's so much easier to just do re.sub(r"""\r\n(\d{2}/\d{2}/\d{2})""", r"""\r\1""", malformed_content) — Jerry
– Jerry, Commented Nov 10, 2014 at 6:55

JoErNanO · Accepted Answer · 2014-11-10 10:56:06Z

1

To match a date string in the form DD/MM/YY you can use the following regexp:

 \d{2}\/\d{2}\/\d{2}

If you then want to backreference the matched string you have to put it between brackets (...) like so:

 (\d{2}\/\d{2}\/\d{2})

The overall substitution command would then become:

fixed_content = re.sub(r"""\r\n(\d{2}\/\d{2}\/\d{2})""", r"""\r\1""", malformed_content)

Please note that I escaped the backslash \/ as this is sometimes required (in cases in which the backslash is used as a delimiter between match/replace strings). Modify to fit your needs.

answered Nov 10, 2014 at 10:56

JoErNanO

2,4881 gold badge27 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python 3 regex pattern replacement

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related