Removing whitespace and carriage return from a text file with Python

Question

I am having a dataframe that contains 5 columns while doing data cleaning process i got a problem caused by the carriage return from the text file as shown in the exp below.

Input :

001|Baker St.
London|3|4|7
002|Penny Lane
Liverpool|88|5|7

Output:

001|Baker St. London|3|4|7
002|Penny Lane Liverpool|88|5|7

Any suggestions are welcome.

Carriage return is represented by \r while newline is \n, so replacing all \r with nothing should work. — Kraigolas
– Kraigolas, Commented Jun 11, 2021 at 11:08
Please replace images of text with actual text so we can copy it and use it. Also edit your question with your coding attempt as comments don’t format it correctly. — Mark Tolonen
– Mark Tolonen, Commented Jun 11, 2021 at 14:02

Andreas · Accepted Answer · 2021-06-11 11:28:57Z

1

You can replace the \r like this:

with open("your.csv", "r") as myfile:
 data = myfile.read().replace('\r', '')

Example:

from io import StringIO

# second entry contains a carriage return \r
s = """91|AAA|2010|3
92|BB\rB|2011|4 
93|CCC|2012|5
"""

# StringIO simulates a loaded csv file:

# carriage return still there
StringIO(s).read()
# '91|AAA|2010|3\n92|BB\rB|2011|4\n93|CCC|2012|5\n'

# carriage return gone
StringIO(s).read().replace('\r', '')
# '91|AAA|2010|3\n92|BBB|2011|4\n93|CCC|2012|5\n'

With Pandas:

data = StringIO(StringIO(s).read().replace('\r', ''))
pd.read_csv(data, sep='|')

Out[35]: 
   91  AAA  2010  3
0  92  BBB  2011  4
1  93  CCC  2012  5

edited Jun 11, 2021 at 11:28

answered Jun 11, 2021 at 11:23

Andreas

9,2854 gold badges20 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Learner Over a year ago

Thanks for the anwswer but i want to fix the carriage return only for specific row.(highlighted in yellow)

mijiturka · Accepted Answer · 2021-06-11 11:29:17Z

The built-in strip() method that string objects provide does this; You can call it like this as you iterate over a line:

cleaned_up_line = line.strip()

As the Python str.strip() docs tell us, it also gets rid of whitespace, newlines, and other special characters - at the beginning and end of a string.

For example:

In [7]: with open('file', 'r') as f: 
   ...:     a = f.readlines() 
   ...:     print(a) 
   ...:                                                                                              
['the\n', 'file\n\r', 'is\n\r', 'here\n', '\n']

In [8]: with open('file', 'r') as f: 
   ...:     a = [line.strip() for line in f.readlines()] 
   ...:     print(a) 
   ...:                                                                                              
['the', 'file', 'is', 'here', '']

C K · Accepted Answer · 2021-06-11 11:13:09Z

0

You could match it with regex and remove it, i.e. re.sub('[\r\n]', '', inputline).

answered Jun 11, 2021 at 11:13

C K

163 bronze badges

Collectives™ on Stack Overflow

Removing whitespace and carriage return from a text file with Python

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related