0

I have multiple files (thousands of files) in a folder, I'm reading these files using some glob function. What I want to do is print the first column (text file doesn't have a header column) and store it in some dataframe as I need to make tables based on calculations across multiple files. Here is my data (Sample data of two files)

File1:

O.U20,99.73000,75538,99.73500,51794,57821,99.73167,1062,4,,,,99.73173
O.Z20,99.70000,58974,99.70500,6748,35815,99.70250,468,3,99.70500,1132,2,99.70048
O.H21,99.79500,4274,99.80000,47043,49961,,,,99.79750,3424,3,99.79236
O.M21,99.81000,48584,99.81500,7062,37456,99.81167,243,3,99.81500,234,2,99.80975
S3.U20,3.000,1132,3.500,69740,3831,,,,3.250,1380,2,3.125
S3.Z20,-9.500,58855,-9.000,27304,3295,-9.250,468,2,-9.000,3730,2,-9.188

File 2:

O.U20,99.73000,75711,99.73500,51794,57821,99.73167,1062,4,,,,99.73173
O.Z20,99.70000,59142,99.70500,6748,35815,99.70250,468,3,99.70500,1132,2,99.70048
O.H21,99.79500,4447,99.80000,47043,49961,,,,99.79750,3424,3,99.79236
O.M21,99.81000,48765,99.81500,7062,37456,99.81167,243,3,99.81500,234,2,99.80975
S3.U20,3.000,1132,3.500,69740,3831,,,,3.250,1380,2,3.125
S3.Z20,-9.500,58855,-9.000,27477,3295,-9.250,468,2,-9.000,3730,2,-9.188

This is my code I'm working on

import glob
for file in glob.glob("C:/Users/Data/*"):
    print(file)
    myfile = open(file,"r")
    lines = myfile.readlines()
    for line in lines:
         print(line.strip()[0])

This however print output (2 times, which is another issue as I want it to print the output just once)

    O
    O
    O
    O
    S
    S

I want the output to be

O.U20
O.Z20
O.H21
O.M21
S3.U20
S3.Z20

in a dataframe, so that I can create further tables. I thought of using multiple columns however O symbol has 4 characters and S symbol has 5 characters.

1
  • Isn’t that a CSV file? Is the current output not what you should expect? Have you done any debugging? Commented Jul 10, 2020 at 4:30

1 Answer 1

1

first of all you need to convert the txt to csv, after this you can read it with pandas and turn them to the dataframe :

import glob
import pandas as pd

for each in glob.glob('*.txt'):
    with open(each , 'r') as file:
        content = file.readlines()
        with open('{}.csv'.format(each[0:-4]) , 'w') as file:
            file.writelines(content)

for each in glob.glob('*.csv'):
    dataframe = pd.read_csv(each , skiprows=0 , header=None , index_col= 0)

then:

dataframe.reset_index(inplace=True)

output:

>>>print(dataframe[0])
0     O.U20
1     O.Z20
2     O.H21
3     O.M21
4    S3.U20
5    S3.Z20
Name: 0, dtype: object
Sign up to request clarification or add additional context in comments.

3 Comments

Hi, thanks for the reply, can you please tell me what does ''{}.csv'.format(each[0:-4]''. I see that it has created new CSV files in my folder I'm taking the text files from. Also, the code runs fine but doesn't print anything? Also is there any way I can access the dataframe outside of loop as I have to append multiple columns with it later. Thanks
'{}.csv'.format(each[0:-4]) this line of code --> save the csv file name like this 'file.csv' not 'file.txt.csv' , remove the '.txt' from file name .
Oh I'm really very sorry, I made a mistake, now it's running but it's showing the output for all the files when I want it to show the list just once :/ EDIT: I solved the issue, Many thanks for the help Bahram :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.