1

I have a csv that has 'headers' that looks kind of like this:

name, id, avoid1, avoid2, avoid3, avoidN, choice1, choice2, choice3, choiceN

The number of choice and avoid columns is not a known amount. It can possibly only have one choice1, or it can go up to choice100. Same for avoid. They will always be labeled choiceN and avoidN.

I want to know how to determine how many 'choice' columns there are and how many 'avoid' columns there are. They probably will each have a different amount of columns, meaning just because choiceN goes up to choice5, doesn't mean avoid will go up to avoid5, it can be avoid2 or avoid20. The closest I got was counting all the columns using:

print(df.count(axis='columns'))

but that just tells me how many total columns there are, which only brings me a third way there.

1 Answer 1

2

Use str.extract with columns names, remove non matched values by Index.dropna and last use Index.value_counts:

print (df.columns.str.extract('(avoid|choice)', expand=False).dropna().value_counts())
choice    4
avoid     4
dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.