0

I am facing an issue while subsetting a data frame in R. Data frame is att2 which has a column filter_name based upon which I want to subset. The unique values for this column are below.

unique(att2[["filter_name"]])
# [1] title             Type        Operating_System         Occasion           Brand
148 Levels: Accessories Age Antennae Art_Style Aspect_ratio ... Zoom

This shows that Brand is a value for filter_name column. But when I subset the frame using below code, it gives 0 rows as below.

att3 <- subset(att2, filter_name == 'Brand')
> att3
[1] a      b         c  filter_name
<0 rows> (or 0-length row.names)

I am not able to find out the reason. Has anyone faced this kind of issue?

3
  • 1
    Could you paste a reproducible example? For the question you've asked, No, I have not faced this issue. Commented Feb 5, 2013 at 6:33
  • Please refer the example in my above post. I think there might be some data issue. I am not sure. I am still checking it from my end as well. Commented Feb 5, 2013 at 6:43
  • I don't have access to att2 to test it. Commented Feb 5, 2013 at 6:45

3 Answers 3

2

All that we can do is guess at what the source of your problem might be.

Here's my best guess: Your "filter_name" column has whitespace in it, thus you shouldn't actually be looking for "Brand" until you strip the whitespace.

Here's a minimal example that reproduces your problem if my guess is correct:

First, some sample data:

mydf <- data.frame(Param =  c("   Brand   ", "Operating System", 
                              "Type ", "   Brand   ", "Type ", 
                              "Type ", "   Brand   ", "Type ", 
                              "   Brand   "), Value = 1:9)
unique(mydf[["Param"]])
# [1]    Brand         Operating System Type            
# Levels:    Brand    Operating System Type 

subset(mydf, Param == "Brand")
# [1] Param Value
# <0 rows> (or 0-length row.names)

Use print with the quote = TRUE argument to see the whitespace in your data.frame:

print(mydf, quote = TRUE)
#                Param Value
# 1      "   Brand   "   "1"
# 2 "Operating System"   "2"
# 3            "Type "   "3"
# 4      "   Brand   "   "4"
# 5            "Type "   "5"
# 6            "Type "   "6"
# 7      "   Brand   "   "7"
# 8            "Type "   "8"
# 9      "   Brand   "   "9"

If that happens to be your problem, then a quick gsub should fix it:

mydf$Param <- gsub("^\\s+|\\s+$", "", mydf$Param)
unique(mydf[["Param"]])
# [1] "Brand"            "Operating System" "Type"  

subset(mydf, Param == "Brand")
#   Param Value
# 1 Brand     1
# 4 Brand     4
# 7 Brand     7
# 9 Brand     9

You may also want to look into the strip.white argument in read.table and family which defaults to FALSE. Try re-reading in your data with strip.white = TRUE and then try your subsetting.

Sign up to request clarification or add additional context in comments.

1 Comment

@KunalBatra, you do have to work on making your examples better by making sure they are reproducible. The easiest way would be to post the output of something like dput(head(att3)) to let other's be able to copy and paste the code into their session and see if they can help track down the problem. Otherwise, it really is just guesswork!
0

First of, you should really read this stackoverflow post on how to ask good questions.

To your question, something like this, (hard when you do not post a reproducible example, as Arun also points out above)

 att2 <- (data.frame(v=rnorm(10), filter_name=c('Brand','Not Brand')))

 att2[att2$filter_name == 'Brand', ]
            v filter_name
1 -1.84217530       Brand
3 -0.36199449       Brand
5 -0.54431665       Brand
7 -0.05659442       Brand
9  1.29753513       Brand

 subset(att2, filter_name == 'Brand')
            v filter_name
1 -1.84217530       Brand
3 -0.36199449       Brand
5 -0.54431665       Brand
7 -0.05659442       Brand
9  1.29753513       Brand

Here is a lot more on sub setting.

2 Comments

Sorry, but I'm not sure why you have posted this as an answer. The OP has shown that they know how to subset, but they are running into a problem. Your best contribution here is adding a link to how to ask a good question, and showing an alternative way to subset.
@AnandaMahto, it wasn't clear to me and I simple wanted to start a conversation with some suggestions. I think I thought that OP might expand on his question.
0

Using stringr package, you can do something like

   dat$filter_name_trim <- str_trim(dat$filter_name)
   att3 <- subset(att2, filter_name_trim == 'Brand')

4 Comments

But that depends on knowing that the problem lies in whitespace to begin with ;)
@AnandaMahto of course see my comment. next time I will refresh better. and my solution hide the complexity of "your simple regular expression" :)
I'm just joking. Here's from the source of str_trim though: pattern <- switch(side, left = "^\\s+", right = "\\s+$", both = "^\\s+|\\s+$"), so it all boils down to the same thing...
@AnandaMahto I know you are joking. Me too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.