0

I have a number of txt files in a directory that I'd like to combine.The following is an example of three files named df_A, df_B and df_c:

df_A
       0  1    2
0  James  1  yes
1   Jake  3   No
2   Jane  2  Yes

df_B
       0  1    2
0   Jane  2   No
1    Job  6   No
2  James  1  Yes

df_C
       0  1    2
0   Jack  4   No
1  Jenny  7  Yes
2  James  1   No
3   John  9  Yes

AndI'd like the final dataframe to look like this:

ID  Name    df_A    df_B    df_C

1   James   Yes     Yes     No
3   Jake    No      NA      NA
2   Jane    Yes     No      NA
6   Job     NA      Yes     NA
4   Jack    NA      NA      No
7   Jenny   NA      NA      Yes
9   John    NA      NA      Yes

this is the code I have thus far...

new_df = pd.DataFrame(columns = ['Name', 'ID'])

for filename in os.listdir('/path'):
    if filename.endswith('.txt'):
        course = os.path.splitext(filename)[0]

        new_df = pd.concat([combined_df,pd.DataFrame(columns=[course])])
        data = pd.read_csv(filename, sep="\t", header=None)

        for i in data[data.columns[1]]:
            if i not in new_df['ID']:
                new_df['ID'].append(i)

2 Answers 2

3

For these three dataframes, just assign column names. The last column should be unique. Then, call pd.concat + groupby for your output.

dfA.columns = ['Name', 'ID', 'df_A']
dfB.columns = ['Name', 'ID', 'df_B']
dfC.columns = ['Name', 'ID', 'df_C']

pd.concat([dfA, dfB, df3])\
      .groupby('Name', as_index=False, sort=False).first()\
      .set_index('ID').fillna('')

     Name df_A df_B df_C
ID                      
1   James  yes  Yes   No
3    Jake   No          
2    Jane  Yes   No     
6     Job        No     
4    Jack             No
7   Jenny            Yes
9    John            Yes

In a general case, say you have df_list. You can then assign column names in a loop.

df_list = [dfA, dfB, dfC, ...]
for i, df in enumerate(df_list):
    df.columns = ['Name', 'ID', 'df_{}'.format(chr(ord('A') + i))]

pd.concat(df_list).groupby('Name', 
        as_index=False, sort=False).first().set_index('ID')
Sign up to request clarification or add additional context in comments.

3 Comments

Your code works but I'd really love to avoid manipulating the files. So, I thought of tweaking your code. I did this: import os import pandas as pd finalDF = pd.DataFrame() for filename in os.listdir(os.getcwd()): if filename.endswith('.txt'): target = os.path.splitext(filename)[0] df = pd.read_csv(filename, header=None, delimiter=' ') df.columns = ['Name', 'ID', target] finalDF = pd.concat([finalDF, df]).groupby('Name', as_index=False, sort=False).first().set_index('ID').fillna('')
@AndyG Feel free to edit my answer with whatever works for you, and mark it accepted (upvote it as well, if you wish).
@AndyG I'm sorry, I can't understand your code as you've pasted it. If this code works, then open a new post with your follow up question, please?
3
import os
import pandas as pd

combinedDF = pd.DataFrame(columns=['Name','ID'])

for filename in os.listdir(os.getcwd()):
    if filename.endswith('.txt'):
        df = pd.read_csv(filename, header=None, delimiter=' ')
        df.columns=['Name','ID',filename[:-4]]
        combinedDF = combinedDF.merge(df, on=['Name', 'ID'], how='outer')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.