Count how many named columns in CSV [python pandas]

Question

I have a csv that has 'headers' that looks kind of like this:

name, id, avoid1, avoid2, avoid3, avoidN, choice1, choice2, choice3, choiceN

The number of choice and avoid columns is not a known amount. It can possibly only have one choice1, or it can go up to choice100. Same for avoid. They will always be labeled choiceN and avoidN.

I want to know how to determine how many 'choice' columns there are and how many 'avoid' columns there are. They probably will each have a different amount of columns, meaning just because choiceN goes up to choice5, doesn't mean avoid will go up to avoid5, it can be avoid2 or avoid20. The closest I got was counting all the columns using:

print(df.count(axis='columns'))

but that just tells me how many total columns there are, which only brings me a third way there.

jezrael · Accepted Answer · 2020-07-20 06:53:52Z

2

Use str.extract with columns names, remove non matched values by Index.dropna and last use Index.value_counts:

print (df.columns.str.extract('(avoid|choice)', expand=False).dropna().value_counts())
choice    4
avoid     4
dtype: int64

answered Jul 20, 2020 at 6:53

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Count how many named columns in CSV [python pandas]

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related