62

I have a data.table with a character column, and want to select only those rows that contain a substring in it. Equivalent to SQL WHERE x LIKE '%substring%'

E.g.

> Months = data.table(Name = month.name, Number = 1:12)
> Months["mb" %in% Name]
Empty data.table (0 rows) of 2 cols: Name,Number

How would I select only the rows where Name contains "mb"?

2 Answers 2

104

data.table has a like function.

Months[like(Name,"mb")]
        Name Number
1: September      9
2:  November     11
3:  December     12

Or, %like% looks nicer :

> Months[Name %like% "mb"]
    Name Number
1: September      9
2:  November     11
3:  December     12

Note that %like% and like() use grepl (returns logical vector) rather than grep (returns integer locations). That's so it can be combined with other logical conditions :

> Months[Number<12 & Name %like% "mb"]
        Name Number
1: September      9
2:  November     11

and you get the power of regular expression search (not just % or * wildcard), too.

Sign up to request clarification or add additional context in comments.

6 Comments

is there a way to use this command and update the table without <- , I was thinking in something like Months[ Name== like(Name, "mb") ,]
@RafaelPereira Have you looked at ?data.table (examples), read the documentation and taken the DataCamp course? Months[like(Name,"mb"), someCol:=someValue]
Thank you for the suggestions @Matt-Dowle. Maybe I wasn't clear enough. I meant to ask you this.
@RafaelPereira I see, yes. As Frank has now pointed to there, I'd like this feature too: stackoverflow.com/a/10791729/403310
Can you use this to search for multiple strings?
|
10

The operator %in% does not do partial string matching it is used for finding if values exist in another set of values i.e. "a" %in% c("a","b","c")

To do partial string matching you need to use the grep() function. You can use the grep to return an index of all columns with "mb" in it. Then subset the rows by that index

Months[grep("mb", Name)]    # data.table syntax slightly easier

4 Comments

Great thanks, infact Months[grep("mb", Name)] seems to work.
That should only work if you've defined Name as a separate vector somewhere else in your workspace. Be careful which variables you are using
Isn't this just working because this is a data.table vs data.frame?
Well then, how can we save the actual rows?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.