1

I am using the below code. All my CSV files have uniform structure. When a dataframe is formed, it contains two columns for date in my CSV.

In the resulting dataframe, for few rows date value is in first date column, while for rest of the data, it goes to second date column.

Any idea, why two columns (Date columns), are getting generated for one column in the source CSV files.

all_data = pd.DataFrame()
for f in glob.glob("/Users/tcssig/Desktop/Files/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df,ignore_index=True)

In [76]: all_data.columns
Out[76]: Index(['0', '0.1', 'Channel_ID', 'Date', 'Date ', 'Duration (HH:MM)','Episode #', 'Image', 'Language', 'Master House ID', 'Parental Rating','Program Category', 'Program Title', 'StartTime_ET', 'StartTime_ET2','Synopsis'],
 dtype='object')
2
  • Probably in some of your csv files you have Date column with space. Commented Sep 6, 2016 at 12:39
  • Thanks, it worked. Commented Sep 7, 2016 at 2:33

1 Answer 1

5

because you have a space in the second column:

'Date', 'Date '
             ^

so you need to normalise the columns prior to appending

all_data = pd.DataFrame()
for f in glob.glob("/Users/tcssig/Desktop/Files/*.csv"):
    df = pd.read_csv(f)
    df.columns = df.columns.str.strip()
    all_data = all_data.append(df,ignore_index=True)

here I use str.strip to remove any leading and trailing whitespace

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot, I corrected that from some of my CSV files, and it worked.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.