2

Is there a way to select a subset of columns using text matching or regular expressions?

In R it would be like this:

attach(iris) #Load the 'Stairway to Heaven' of R's built-in data sets
iris[grep(names(iris),pattern="Length")] #Prints only columns containing the word "Length"

2 Answers 2

6

You can use the filter method for this (use axis=1 to filter on the column names). This function has different possibilities:

  • Equivalent to if 'Length' in col:

    df.filter(like='Length', axis=1)
    
  • Using a regex (however, it is using re.search and not re.match, so you have possibly to adjust the regex):

    df.filter(regex=r'\.Length$', axis=1)
    
Sign up to request clarification or add additional context in comments.

2 Comments

Very good info @joris. But I also need to get the column names which contains some other characters also along with the column name. For example "Length_1", "Length_2", "Width_1", "Width_2", etc.. are my column names. My filter function is like df.filter(like=col+'_', axis=1) where col will have values like "Length", "Width", etc... which is not fetching values . Any idea what should I correct ?
You should be able to do that with a regular expression, eg regex=r"Length|Width"
1

Using Python's in statement, it would work like this:

#Assuming iris is already loaded as a df called 'iris' and has a proper header
iris = iris[[col for col in iris.columns if 'Length' in col]]
print iris.head()

Or, using regular expressions,

import re
iris = iris[[col for col in iris.columns if re.match(r'\.Length$',col)]]
print iris.head()

The first will run faster but the second will be more accurate.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.