1

I am trying to read all the .txt files with the below format provided and concat them to a single pandas dataframe.

sample1.txt

ID                                    a123
Delivery_person_ID             VADRES03DEL01
Delivery_person_Age                    24.00
Delivery_person_Ratings                 4.30
Name: 1, dtype: object

sample2.txt

ID                                    b123
Delivery_person_ID             VADRES03DEL02
Delivery_person_Age                    22.00
Delivery_person_Ratings                 4.10
Name: 2, dtype: object

Below is the code -

folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path + "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None)
  
for i in range(1,len(file_list)):    
    df = pd.read_fwf(file_list[i], header=None)
    main_dataframe = pd.concat([main_dataframe, df], axis = 0)
  
print(main_dataframe.head(30))  

Output:

                              0               1
0                            ID          a123
1            Delivery_person_ID  VADRES03DEL01
2           Delivery_person_Age       24.00
3       Delivery_person_Ratings        4.30
4       Name: 1, dtype: object             NaN
0                            ID          b123
1            Delivery_person_ID  VADRES03DEL02
2           Delivery_person_Age       22.00
3       Delivery_person_Ratings        4.10
4       Name: 2, dtype: object            NaN

But I need the dataframe to be listed row wise for each person. For eg, in below format I want -

                              ID          Delivery_person_ID  Delivery_person_Age       Delivery_person_Ratings       
                              0  a123                VADRES03DEL01      24.00              4.30                             

                              1  b123                VADRES03DEL02      22.00              4.10      
4
  • is the dtype: object part actually in the text file? Commented Sep 2, 2022 at 8:12
  • correct , it is in text file, like i provided above under sample1.txt and sample2.txt Commented Sep 2, 2022 at 8:13
  • just used transpose: main_dataframe = main_dataframe.T Commented Sep 2, 2022 at 8:14
  • when i transpose I am getting columns getting split into multiple lines... see my edit above. How to output all the columns in the same line? Commented Sep 2, 2022 at 8:27

2 Answers 2

1

So, the input text file is weird - this code should deal with that

# Read in text file
df = pd.read_fwf("./test.txt")
# Remove the "Name: 1, dtype: object"
df = df.drop(df.index[3])
# Transpose it
df = df.T
# Rename the columns correctly
df.columns = df.iloc[0]
# Remove the column names from the data
df = df.drop(df.index[0])

An input text file that looks like this:

ID                                    a123
Delivery_person_ID             VADRES03DEL01
Delivery_person_Age                    24.00
Delivery_person_Ratings                 4.30
Name: 1, dtype: object

Would be converted to this:

ID   Delivery_person_ID Delivery_person_Age Delivery_person_Ratings
a123      VADRES03DEL01               24.00                    4.30

From here, you can do the same for each text file, then do a pd.concat() to merge the new textfile dataframe to the main dataframe, but from your code I can see that you already know how to do this.

Sign up to request clarification or add additional context in comments.

1 Comment

great thanks, let me modify my code and check...
0

After reading text file to pandas dataframe make it transform for each one

folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path + "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None).T
  
for i in range(1,len(file_list)):    
    df = pd.read_fwf(file_list[i], header=None).T
    main_dataframe = pd.concat([main_dataframe, df], axis = 0)
  
print(main_dataframe.head(30))  

Edit

import pandas as pd
import glob

folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path + "/*.txt")


def read_clean_df(file_name) -> pd.DataFrame:
    df = pd.read_fwf(file_name, header=None).T
    df.pop(4)
    df.columns = df.iloc[0]
    df = df[1:]
    df.reset_index(drop=True, inplace=True)
    return df


main_dataframe = read_clean_df(file_list[0])

for file_name in file_list[1:]:
    df = read_clean_df(file_list[0])
    main_dataframe = pd.concat([main_dataframe, df], axis=0)

main_dataframe.reset_index(drop=True, inplace=True)
print(main_dataframe.head(30))

3 Comments

when i transpose I am getting columns getting split into multiple lines... see my edit above. How to output all the columns in the same line?
It was in same line only.. Due to console/screen width limit, it shows as multiple line
@ggcoder look at the updated code

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.