Extract a column from text file and store it in dataframe in Python

Question

I have multiple files (thousands of files) in a folder, I'm reading these files using some glob function. What I want to do is print the first column (text file doesn't have a header column) and store it in some dataframe as I need to make tables based on calculations across multiple files. Here is my data (Sample data of two files)

File1:

O.U20,99.73000,75538,99.73500,51794,57821,99.73167,1062,4,,,,99.73173
O.Z20,99.70000,58974,99.70500,6748,35815,99.70250,468,3,99.70500,1132,2,99.70048
O.H21,99.79500,4274,99.80000,47043,49961,,,,99.79750,3424,3,99.79236
O.M21,99.81000,48584,99.81500,7062,37456,99.81167,243,3,99.81500,234,2,99.80975
S3.U20,3.000,1132,3.500,69740,3831,,,,3.250,1380,2,3.125
S3.Z20,-9.500,58855,-9.000,27304,3295,-9.250,468,2,-9.000,3730,2,-9.188

File 2:

O.U20,99.73000,75711,99.73500,51794,57821,99.73167,1062,4,,,,99.73173
O.Z20,99.70000,59142,99.70500,6748,35815,99.70250,468,3,99.70500,1132,2,99.70048
O.H21,99.79500,4447,99.80000,47043,49961,,,,99.79750,3424,3,99.79236
O.M21,99.81000,48765,99.81500,7062,37456,99.81167,243,3,99.81500,234,2,99.80975
S3.U20,3.000,1132,3.500,69740,3831,,,,3.250,1380,2,3.125
S3.Z20,-9.500,58855,-9.000,27477,3295,-9.250,468,2,-9.000,3730,2,-9.188

This is my code I'm working on

import glob
for file in glob.glob("C:/Users/Data/*"):
    print(file)
    myfile = open(file,"r")
    lines = myfile.readlines()
    for line in lines:
         print(line.strip()[0])

This however print output (2 times, which is another issue as I want it to print the output just once)

I want the output to be

O.U20
O.Z20
O.H21
O.M21
S3.U20
S3.Z20

in a dataframe, so that I can create further tables. I thought of using multiple columns however O symbol has 4 characters and S symbol has 5 characters.

Isn’t that a CSV file? Is the current output not what you should expect? Have you done any debugging? — AMC
– AMC, Commented Jul 10, 2020 at 4:30

Bahram Jannesar · Accepted Answer · 2020-07-10 03:30:38Z

1

first of all you need to convert the txt to csv, after this you can read it with pandas and turn them to the dataframe :

import glob
import pandas as pd

for each in glob.glob('*.txt'):
    with open(each , 'r') as file:
        content = file.readlines()
        with open('{}.csv'.format(each[0:-4]) , 'w') as file:
            file.writelines(content)

for each in glob.glob('*.csv'):
    dataframe = pd.read_csv(each , skiprows=0 , header=None , index_col= 0)

then:

dataframe.reset_index(inplace=True)

output:

>>>print(dataframe[0])
0     O.U20
1     O.Z20
2     O.H21
3     O.M21
4    S3.U20
5    S3.Z20
Name: 0, dtype: object

answered Jul 10, 2020 at 3:30

Bahram Jannesar

1392 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

hyeri Over a year ago

Hi, thanks for the reply, can you please tell me what does ''{}.csv'.format(each[0:-4]''. I see that it has created new CSV files in my folder I'm taking the text files from. Also, the code runs fine but doesn't print anything? Also is there any way I can access the dataframe outside of loop as I have to append multiple columns with it later. Thanks

Bahram Jannesar Over a year ago

'{}.csv'.format(each[0:-4]) this line of code --> save the csv file name like this 'file.csv' not 'file.txt.csv' , remove the '.txt' from file name .

hyeri Over a year ago

Oh I'm really very sorry, I made a mistake, now it's running but it's showing the output for all the files when I want it to show the list just once :/ EDIT: I solved the issue, Many thanks for the help Bahram :)

Collectives™ on Stack Overflow

Extract a column from text file and store it in dataframe in Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related