Issue while subsetting a data frame based on a column value in R

Question

I am facing an issue while subsetting a data frame in R. Data frame is att2 which has a column filter_name based upon which I want to subset. The unique values for this column are below.

unique(att2[["filter_name"]])
# [1] title             Type        Operating_System         Occasion           Brand
148 Levels: Accessories Age Antennae Art_Style Aspect_ratio ... Zoom

This shows that Brand is a value for filter_name column. But when I subset the frame using below code, it gives 0 rows as below.

att3 <- subset(att2, filter_name == 'Brand')
> att3
[1] a      b         c  filter_name
<0 rows> (or 0-length row.names)

I am not able to find out the reason. Has anyone faced this kind of issue?

Could you paste a reproducible example? For the question you've asked, No, I have not faced this issue. — Arun
– Arun, Commented Feb 5, 2013 at 6:33
Please refer the example in my above post. I think there might be some data issue. I am not sure. I am still checking it from my end as well. — Kunal Batra
– Kunal Batra, Commented Feb 5, 2013 at 6:43

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2013-02-05 07:44:15Z

2

All that we can do is guess at what the source of your problem might be.

Here's my best guess: Your "filter_name" column has whitespace in it, thus you shouldn't actually be looking for "Brand" until you strip the whitespace.

Here's a minimal example that reproduces your problem if my guess is correct:

First, some sample data:

mydf <- data.frame(Param =  c("   Brand   ", "Operating System", 
                              "Type ", "   Brand   ", "Type ", 
                              "Type ", "   Brand   ", "Type ", 
                              "   Brand   "), Value = 1:9)
unique(mydf[["Param"]])
# [1]    Brand         Operating System Type            
# Levels:    Brand    Operating System Type 

subset(mydf, Param == "Brand")
# [1] Param Value
# <0 rows> (or 0-length row.names)

Use print with the quote = TRUE argument to see the whitespace in your data.frame:

print(mydf, quote = TRUE)
#                Param Value
# 1      "   Brand   "   "1"
# 2 "Operating System"   "2"
# 3            "Type "   "3"
# 4      "   Brand   "   "4"
# 5            "Type "   "5"
# 6            "Type "   "6"
# 7      "   Brand   "   "7"
# 8            "Type "   "8"
# 9      "   Brand   "   "9"

If that happens to be your problem, then a quick gsub should fix it:

mydf$Param <- gsub("^\\s+|\\s+$", "", mydf$Param)
unique(mydf[["Param"]])
# [1] "Brand"            "Operating System" "Type"  

subset(mydf, Param == "Brand")
#   Param Value
# 1 Brand     1
# 4 Brand     4
# 7 Brand     7
# 9 Brand     9

You may also want to look into the strip.white argument in read.table and family which defaults to FALSE. Try re-reading in your data with strip.white = TRUE and then try your subsetting.

answered Feb 5, 2013 at 7:44

A5C1D2H2I1M1N2O1R2T1

194k31 gold badges417 silver badges497 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

A5C1D2H2I1M1N2O1R2T1 Over a year ago

@KunalBatra, you do have to work on making your examples better by making sure they are reproducible. The easiest way would be to post the output of something like dput(head(att3)) to let other's be able to copy and paste the code into their session and see if they can help track down the problem. Otherwise, it really is just guesswork!

Community · Accepted Answer · 2017-05-23 12:12:30Z

0

First of, you should really read this stackoverflow post on how to ask good questions.

To your question, something like this, (hard when you do not post a reproducible example, as Arun also points out above)

 att2 <- (data.frame(v=rnorm(10), filter_name=c('Brand','Not Brand')))

 att2[att2$filter_name == 'Brand', ]
            v filter_name
1 -1.84217530       Brand
3 -0.36199449       Brand
5 -0.54431665       Brand
7 -0.05659442       Brand
9  1.29753513       Brand

 subset(att2, filter_name == 'Brand')
            v filter_name
1 -1.84217530       Brand
3 -0.36199449       Brand
5 -0.54431665       Brand
7 -0.05659442       Brand
9  1.29753513       Brand

Here is a lot more on sub setting.

edited May 23, 2017 at 12:12

CommunityBot

11 silver badge

answered Feb 5, 2013 at 7:26

Eric Fail

8,0188 gold badges76 silver badges136 bronze badges

2 Comments

A5C1D2H2I1M1N2O1R2T1 Over a year ago

Sorry, but I'm not sure why you have posted this as an answer. The OP has shown that they know how to subset, but they are running into a problem. Your best contribution here is adding a link to how to ask a good question, and showing an alternative way to subset.

Eric Fail Over a year ago

@AnandaMahto, it wasn't clear to me and I simple wanted to start a conversation with some suggestions. I think I thought that OP might expand on his question.

agstudy · Accepted Answer · 2013-02-05 07:50:33Z

0

Using stringr package, you can do something like

   dat$filter_name_trim <- str_trim(dat$filter_name)
   att3 <- subset(att2, filter_name_trim == 'Brand')

answered Feb 5, 2013 at 7:50

agstudy

122k18 gold badges205 silver badges265 bronze badges

4 Comments

A5C1D2H2I1M1N2O1R2T1 Over a year ago

But that depends on knowing that the problem lies in whitespace to begin with ;)

agstudy Over a year ago

@AnandaMahto of course see my comment. next time I will refresh better. and my solution hide the complexity of "your simple regular expression" :)

A5C1D2H2I1M1N2O1R2T1 Over a year ago

I'm just joking. Here's from the source of str_trim though: pattern <- switch(side, left = "^\\s+", right = "\\s+$", both = "^\\s+|\\s+$"), so it all boils down to the same thing...

agstudy Over a year ago

@AnandaMahto I know you are joking. Me too.

Collectives™ on Stack Overflow

Issue while subsetting a data frame based on a column value in R

3 Answers 3

1 Comment

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related