Select column from multiple DataFrames based on same header prefix

Question

I have a function that iterates over the rows of a csv for the Age column and if an age is negative, it will print the Key and the Age value to a text file.

def neg_check():
    results = []

    file_path = input('Enter file path: ')
    file_data = pd.read_csv(file_path, encoding = 'utf-8')
    
    for index, row in file_data.iterrows():
        if row['Age'] < 0:
            results.append((row['Key'], row['Age']))
    with open('results.txt', 'w') as outfile:
        outfile.write("\n".join(map(str, results)))   
        outfile.close()

In order to make this code repeatable, how can I modify it so it will iterate the rows if the column starts with "Age"? My files have many columns that start with "Age" but end differently. . I tried the following...

if row.startswith['Age'] < 0:

and

if row[row.startswith('Age')] < 0:

but it throws AttributeError: 'Series' object has no attribute 'startswith' error.

My csv files:

sample 1

Key   Sex     Age
    1        Male          46
    2        Female        34

sample 2

Key   Sex     AgeLast
    1        Male          46
    2        Female        34

sample 3

Key   Sex     AgeFirst
    1        Male          46
    2        Female        34

cs95 · Accepted Answer · 2018-12-06 19:07:26Z

2

I would do this in one step, but there are a few options. One is filter:

v = df[df.filter(like='AgeAt').iloc[:, 0] < 0]

Or,

c = df.columns[df.columns.str.startswith('AgeAt')][0]
v = df[df[c] < 0]

Finally, to write to CSV, use

if not v.empty:
    v.to_csv('invalid.csv')

Looping over your data is not necessary with pandas.

edited Dec 6, 2018 at 19:07

answered Dec 5, 2018 at 20:25

cs95

406k106 gold badges745 silver badges798 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

n8-da-gr8 Over a year ago

Works great, thanks @coldspeed! Can you explain what the [0] means in the first line or direct me to the documentation please?

cs95 Over a year ago

@n8_ df.columns[df.columns.str.startswith('AgeAt')] returns a list of (in your case) one column name. From this list, I extract the first element with [0].

n8-da-gr8 Over a year ago

After some testing, this creates the file even if no negative ages were found (just writes headers). How can I modify to only write to file if negative ages are found? Using an if-statement returns: ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Collectives™ on Stack Overflow

Select column from multiple DataFrames based on same header prefix

My csv files:

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

My csv files:

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related