0

I want to find list of columns present in more than 500 csv files. Read only columns & put in dataframe or list and later I want to export to csv.

0

1 Answer 1

2

pandas can easily extract only certain columns and write out a new CSV file for you:

import pandas as pd

data = pd.read_csv('input.csv', usecols=[1, 3], index_col=1)
data.to_csv('output.csv')

If you wanted to extract the same two columns from multiple files into a single output, you could do the following:

import pandas as pd
import glob

combined_data = pd.DataFrame()

for csv_file in glob.glob('*.csv'):
    data = pd.read_csv(csv_file, usecols=[1, 3], index_col=1, skiprows=1, header=None, names=['Col1', 'Col2'])
    combined_data = combined_data.append(data)

combined_data.to_csv('output.csv')

This might need tweaking depending on whether or not your files have headers and the same column names.


To extract the headers from all of the files (without loading the whole file), you could simply use the CSV library as follows:

import csv
import glob

with open('headers.csv', 'w', newline='') as f_headers:
    csv_headers = csv.writer(f_headers)

    for csv_file in glob.glob('input*.csv'):
        with open(csv_file, 'r', newline='') as f_csv:
            csv_headers.writerow([csv_file] + next(csv.reader(f_csv)))

This would also put the filename as the first entry on each row.

Sign up to request clarification or add additional context in comments.

5 Comments

thanks, but I wanted column name, this also helps for my other requirements.
So in effect you are trying to create a CSV file containing all of the header rows from your 500 files?
I just want to know columns present in all the 500 csv files
I have added a simple CSV reader which should help. It adds the filename as the first entry on each row.
thanks this is I wanted

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.