Slicing Pandas DataFrame by column label using list of strings

Question

I have two files. One contains the metadata/labels, the other contains the actual count data that has a label corresponding to the metadata file. I went through the metadata file and slices out the labels I wanted using Pandas and exported it into a list.

How can I take that list of labels and use that to slice a Pandas DataFrame by column label?

I've done something similar with row labels, but that was using Pandas .isin() function, which can't be used on columns.

Edit: When I'm slicing out rows based on whether the name of the row is found in a list I use a one-liner similar to this

row_list = ['row_name1', 'row_name2', row_name3']
sliced_rows = df[df['row_names'].isin(row_list)]

df = 
row_names   1   2   3   4
row_name1   0   2   0   6
row_name5   0   0   1   0
row_name2   0   0   0   0
row_name17  0   5   6   5

So here I'd get row_names1 & rownames_2

I'm trying to do the same thing, but when row_names are labelling the columns instead of the names.

So the matrix would look something like this.

label   column_name1    column_name2    column_name3    column_name4
1   0   2   0   6
2   0   0   1   0
3   0   0   0   0
4   0   5   6   5`

And I'd select by column based on whether or not the name of that column was in a list for the entire dataframe.

Sorry can you post an actual example, are you asking to construct a column list so you can select just those columns from a df? something like col_list = [col for col in df if col in other_col_list]? — EdChum
– EdChum, Commented Apr 8, 2015 at 19:45

EdChum · Accepted Answer · 2015-04-08 20:36:56Z

3

Actually you can use isin:

In [34]:

df = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df
Out[34]:
          A         B         C         D
0  0.540783  0.206722  0.627336  0.865066
1  0.204596  1.317936  0.624362 -0.573012
2  0.124457  1.052614 -0.152633 -0.021625
3  0.415278  1.469842  0.581196  0.143085
4  0.043743 -1.191018 -0.202574  0.479122
In [37]:

col_list=['A','D']
df[df.columns[df.columns.isin(col_list)]]
Out[37]:
          A         D
0  0.540783  0.865066
1  0.204596 -0.573012
2  0.124457 -0.021625
3  0.415278  0.143085
4  0.043743  0.479122

So what you can do is call isin and pass your list, this will produce a boolean series:

In [38]:

df.columns.isin(col_list)
Out[38]:
array([ True, False, False,  True], dtype=bool)

You then use the boolean mask to mask your columns:

In [39]:

df.columns[df.columns.isin(col_list)]
Out[39]:
Index(['A', 'D'], dtype='object')

You now have an array of columns you can use to subset the df with

answered Apr 8, 2015 at 20:36

EdChum

397k204 gold badges837 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bulworth Over a year ago

Excellent, thank you! I had read somewhere that it was only for rows and accepted it without even trying.

Collectives™ on Stack Overflow

Slicing Pandas DataFrame by column label using list of strings

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related