I want to find list of columns present in more than 500 csv files. Read only columns & put in dataframe or list and later I want to export to csv.
1 Answer
pandas can easily extract only certain columns and write out a new CSV file for you:
import pandas as pd
data = pd.read_csv('input.csv', usecols=[1, 3], index_col=1)
data.to_csv('output.csv')
If you wanted to extract the same two columns from multiple files into a single output, you could do the following:
import pandas as pd
import glob
combined_data = pd.DataFrame()
for csv_file in glob.glob('*.csv'):
data = pd.read_csv(csv_file, usecols=[1, 3], index_col=1, skiprows=1, header=None, names=['Col1', 'Col2'])
combined_data = combined_data.append(data)
combined_data.to_csv('output.csv')
This might need tweaking depending on whether or not your files have headers and the same column names.
To extract the headers from all of the files (without loading the whole file), you could simply use the CSV library as follows:
import csv
import glob
with open('headers.csv', 'w', newline='') as f_headers:
csv_headers = csv.writer(f_headers)
for csv_file in glob.glob('input*.csv'):
with open(csv_file, 'r', newline='') as f_csv:
csv_headers.writerow([csv_file] + next(csv.reader(f_csv)))
This would also put the filename as the first entry on each row.
5 Comments
question.it
thanks, but I wanted column name, this also helps for my other requirements.
Martin Evans
So in effect you are trying to create a CSV file containing all of the header rows from your 500 files?
question.it
I just want to know columns present in all the 500 csv files
Martin Evans
I have added a simple CSV reader which should help. It adds the filename as the first entry on each row.
question.it
thanks this is I wanted