2

I am trying to converting excel file to csv file. the data in the excel file is like below:

My code to convert to csv:

import pandas as pd
import glob
for excel_file in glob.glob('C:/Talend/DEV/MARKET_OPTIMISATION/IMS/*Extract*.xls'):
    print(excel_file)
    data_xls = pd.read_excel(excel_file, 'Untitled', index=0,skiprows=1, sep='|',encoding='utf-8')
    #data_xlx.pop
    data_xls1=data_xls.replace('\r\n','')
    data_xls1.to_csv('C:/Talend/DEV/MARKET_OPTIMISATION/IMS/IMS_Raw_data.csv',sep='|',encoding='utf-8')

The output of the above code is:

enter image description here

but I need out put like this enter image description here

can anyone please help me in removing the line breaks on the excel file.

Thank you in advance.

6 Answers 6

3

In your dataframe, the newlines are in the column names. And the column names are not affected when you use the replace method of the dataframe, only the data are.

So in your example, you should explicitely change the column names:

data_xls = pd.read_excel(excel_file, 'Untitled', index=0,skiprows=1, sep='|',encoding='utf-8')
data_xls.columns = data_xls.columns.map(lambda x: x.replace('\r','').replace('\n', ''))
Sign up to request clarification or add additional context in comments.

1 Comment

@Ballesta, I am facing one more problem here. I have data like 'NA' while converting to the xls file to csv, it is considering NA as null value. can you please suggest how read the data as is.
0

try replace \r and \n separately

mystring = mystring.replace('\n', ' ').replace('\r', '')

if it fails just .split() string and then .join() list elements

1 Comment

tried using replace separately also but it didn't work.
0

You can use something like this:

import re
re.sub("\n|\r", "", mystring)

Comments

0

You have to use regex=True in your command, like below:

import re
data_xls = data_xls.replace('\n', ' ', regex=True)

or better would be to replace any kind of whitespace with a single space:

data_xls = data_xls.replace('\s', ' ', regex=True)

Comments

0

You need to replace \t (Tabs). This will get you all records inline.

mystring = mystring.replace('\t','')

Comments

0

You need to replace \t (Tabs). This will get you all records inline.

mystring = mystring.replace('\t','')

You can paste part of your data here to see what chars are hidden in your data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.