0

I am having a dataframe that contains 5 columns while doing data cleaning process i got a problem caused by the carriage return from the text file as shown in the exp below.

Input :

001|Baker St.
London|3|4|7
002|Penny Lane
Liverpool|88|5|7

Output:

001|Baker St. London|3|4|7
002|Penny Lane Liverpool|88|5|7

Any suggestions are welcome.

3
  • Carriage return is represented by \r while newline is \n, so replacing all \r with nothing should work. Commented Jun 11, 2021 at 11:08
  • Please replace images of text with actual text so we can copy it and use it. Also edit your question with your coding attempt as comments don’t format it correctly. Commented Jun 11, 2021 at 14:02
  • Hey Mark, ok it's done thanks Commented Jun 11, 2021 at 14:06

3 Answers 3

1

You can replace the \r like this:

with open("your.csv", "r") as myfile:
 data = myfile.read().replace('\r', '')

Example:

from io import StringIO

# second entry contains a carriage return \r
s = """91|AAA|2010|3
92|BB\rB|2011|4 
93|CCC|2012|5
"""

# StringIO simulates a loaded csv file:

# carriage return still there
StringIO(s).read()
# '91|AAA|2010|3\n92|BB\rB|2011|4\n93|CCC|2012|5\n'

# carriage return gone
StringIO(s).read().replace('\r', '')
# '91|AAA|2010|3\n92|BBB|2011|4\n93|CCC|2012|5\n'

With Pandas:

data = StringIO(StringIO(s).read().replace('\r', ''))
pd.read_csv(data, sep='|')

Out[35]: 
   91  AAA  2010  3
0  92  BBB  2011  4
1  93  CCC  2012  5
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the anwswer but i want to fix the carriage return only for specific row.(highlighted in yellow)
1

The built-in strip() method that string objects provide does this; You can call it like this as you iterate over a line:

cleaned_up_line = line.strip()

As the Python str.strip() docs tell us, it also gets rid of whitespace, newlines, and other special characters - at the beginning and end of a string.

For example:

In [7]: with open('file', 'r') as f: 
   ...:     a = f.readlines() 
   ...:     print(a) 
   ...:                                                                                              
['the\n', 'file\n\r', 'is\n\r', 'here\n', '\n']

In [8]: with open('file', 'r') as f: 
   ...:     a = [line.strip() for line in f.readlines()] 
   ...:     print(a) 
   ...:                                                                                              
['the', 'file', 'is', 'here', '']

Comments

0

You could match it with regex and remove it, i.e. re.sub('[\r\n]', '', inputline).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.