2

For multiple csv files in a folder, I hope to loop all files ends with csv and merge as one excel file, here I give two examples:

first.csv

     date    a    b
0  2019.1  1.0  NaN
1  2019.2  NaN  2.0
2  2019.3  3.0  2.0
3  2019.4  3.0  NaN

second.csv

     date    c     d
0  2019.1  1.0   NaN
1  2019.2  5.0   2.0
2  2019.3  3.0   7.0
3  2019.4  6.0   NaN
4  2019.5  NaN  10.0

...

My desired output is like this, merging them based on date:

        date    a     b    c    d
0  2019/1/31  1.0   NaN  1.0  NaN
1  2019/2/28  NaN   2.0  5.0  2.0
2  2019/3/31  3.0   2.0  3.0  7.0
3  2019/4/30  3.0   NaN  6.0  NaN
4  2019/5/31  NaN   NaN  NaN  10.0

I have edited the following code, but obviously there are some parts about date convert and merge dfs are incorrect:

import numpy as np
import pandas as pd
import glob

dfs = pd.DataFrame()
for file_name in glob.glob("*.csv"):
    # print(file_name)
    df = pd.read_csv(file_name, engine='python', skiprows=2, encoding='utf-8')
    df = df.dropna()
    df = df.dropna(axis = 1)
    df['date'] = pd.to_datetime(df['date'], format='%Y.%m')
    ...
    dfs = pd.merge(df1, df2, on = 'date', how= "outer")

# save the data frame
writer = pd.ExcelWriter('output.xlsx')
dfs.to_excel(writer,'sheet1')
writer.save()

Please help me. Thank you.

2
  • what output are you getting now? Commented Oct 24, 2019 at 8:37
  • It doesn't work for merge, so no result so far. For concatenate, the link from here is useful: stackoverflow.com/questions/56033013/… Commented Oct 24, 2019 at 8:47

2 Answers 2

1

Try it like this:

import numpy as np
import pandas as pd
import glob
from pandas.tseries.offsets import MonthEnd

dfs = pd.DataFrame()
for file_name in glob.glob("*.csv"):
    df = pd.read_csv(file_name, engine='python', skiprows=2, encoding='utf-8')
    df.columns = df.columns.str.lower().str.replace('dates', 'date')
    df = df.dropna()
    df = df.dropna(axis = 1)
    df['date'] = pd.to_datetime(df['date'].astype(str), format='%Y.%m') + MonthEnd(1)
    if dfs.empty:
        dfs = df.copy()
    else:
        dfs = dfs.merge(df, on='date', how="outer")
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you. Another question, if date columns in different dataframe are not identical, for example they could be dates, Date, or DATE etc, how could rename them all to date before merging.
@ahbon You can make them all lowercase and replace possible dates with df.columns = df.columns.str.lower().str.replace('dates', 'date'). Added to answer.
Can we str.replace multiple strings to date? For example: date, dates, time, etc. replace with date.
@ahbon Sure, just use | in pattern like 'dates|time|times'
When merge a lot of dfs, date column has duplicated dates, do you know why?
|
1

Use concat with DatetimeIndex created in read_csv by parameter index_col and parse_dates with 0 for first column of data and last add last day of Month in last step for improve performance:

dfs = []
for file_name in glob.glob("*.csv"):
    df = pd.read_csv(file_name, 
                     engine='python', 
                     skiprows=2, 
                     encoding='utf-8',
                     index_col=0, 
                     parse_dates=[0])
    #if necessary some processing

    dfs.append(df)

df = pd.concat(dfs, axis=1)
df.index = df.index + pd.offsets.MonthEnd()
print (df)
              a    b    c     d
date                           
2019-01-31  1.0  NaN  1.0   NaN
2019-02-28  NaN  2.0  5.0   2.0
2019-03-31  3.0  2.0  3.0   7.0
2019-04-30  3.0  NaN  6.0   NaN
2019-05-31  NaN  NaN  NaN  10.0

2 Comments

Sorry, I just try with your method. I get: ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible for this line df.index = df.index + pd.offsets.MonthEnd().
@ahbon - I create DatetimeIndex by index_col parameter, and also is used parse_dates paramter by first column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.