2

I have two files. One contains the metadata/labels, the other contains the actual count data that has a label corresponding to the metadata file. I went through the metadata file and slices out the labels I wanted using Pandas and exported it into a list.

How can I take that list of labels and use that to slice a Pandas DataFrame by column label?

I've done something similar with row labels, but that was using Pandas .isin() function, which can't be used on columns.

Edit: When I'm slicing out rows based on whether the name of the row is found in a list I use a one-liner similar to this

row_list = ['row_name1', 'row_name2', row_name3']
sliced_rows = df[df['row_names'].isin(row_list)]

df = 
row_names   1   2   3   4
row_name1   0   2   0   6
row_name5   0   0   1   0
row_name2   0   0   0   0
row_name17  0   5   6   5

So here I'd get row_names1 & rownames_2

I'm trying to do the same thing, but when row_names are labelling the columns instead of the names.

So the matrix would look something like this.

label   column_name1    column_name2    column_name3    column_name4
1   0   2   0   6
2   0   0   1   0
3   0   0   0   0
4   0   5   6   5`

And I'd select by column based on whether or not the name of that column was in a list for the entire dataframe.

1
  • Sorry can you post an actual example, are you asking to construct a column list so you can select just those columns from a df? something like col_list = [col for col in df if col in other_col_list]? Commented Apr 8, 2015 at 19:45

1 Answer 1

3

Actually you can use isin:

In [34]:

df = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df
Out[34]:
          A         B         C         D
0  0.540783  0.206722  0.627336  0.865066
1  0.204596  1.317936  0.624362 -0.573012
2  0.124457  1.052614 -0.152633 -0.021625
3  0.415278  1.469842  0.581196  0.143085
4  0.043743 -1.191018 -0.202574  0.479122
In [37]:

col_list=['A','D']
df[df.columns[df.columns.isin(col_list)]]
Out[37]:
          A         D
0  0.540783  0.865066
1  0.204596 -0.573012
2  0.124457 -0.021625
3  0.415278  0.143085
4  0.043743  0.479122

So what you can do is call isin and pass your list, this will produce a boolean series:

In [38]:

df.columns.isin(col_list)
Out[38]:
array([ True, False, False,  True], dtype=bool)

You then use the boolean mask to mask your columns:

In [39]:

df.columns[df.columns.isin(col_list)]
Out[39]:
Index(['A', 'D'], dtype='object')

You now have an array of columns you can use to subset the df with

Sign up to request clarification or add additional context in comments.

1 Comment

Excellent, thank you! I had read somewhere that it was only for rows and accepted it without even trying.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.