2

Have a list of 200 or so files in a folder. Each has the same amount of columns but there can be some variation in the naming. For instance, i can have Global ID or Global id or Global Id. Is there a way to control for case in pandas column names so that it doesnt matter what it equals? Currently it will get through the first 15 or so files out of 200 and will error because it doesnt find Global ID.

Caveat that im a beginner and still learning.

import pandas as pd
import glob

with open('test99.txt' , 'a') as out:
    list_of_files = glob.glob('M:\AD HOC Docs\Client\Blinded\*')
    for file_name in list_of_files:
        df = pd.read_table(file_name, low_memory=False)
        df['Client'] = file_name.split("_")[2].strip()
        Final = df[['Client','ClientID','Global ID','Internal ID','campaign type','engagement type', 'file_name']]
        Final.to_csv(out,index=False)
1
  • Have you tried looping through and renaming the columns? Commented Oct 11, 2016 at 19:30

1 Answer 1

2

Use header=None, names=[list of column names you want to use] as additional argument to read_tableto ignore the header row and to get consistent names.

Sign up to request clarification or add additional context in comments.

2 Comments

OK, i can try that. What would happen if for example, i have 1 file with an extra column by mistake?
Pass the same list to usecols argument as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.