R loop assign output to new vector

Question

I am working in R trying to generate several distinct vectors using a for loop.

First I created a small reproducible example data frame called df.

cluster.assignment <- c("1 Unknown", "1 Unknown", "2 Neuron","3 
PBMC","4 Basket")
Value1 <- c("a","b","c","d","e")
Value2 <- c("191","234","178","929","123")
df <- data.frame(cluster.assignment,Value1,Value2)

df

  cluster.assignment Value1 Value2
1          1 Unknown      a    191
2          1 Unknown      b    234
3           2 Neuron      c    178
4             3 PBMC      d    929
5           4 Basket      e    123 .

Next I create a variable named clusters that includes keys to the datasets that I am interested in.

clusters <- c("1 ","4 ")

Here is my attempt to extract rownames of the data of interest in df using a for loop.

for (COI in clusters) { 
  name2 <- c(gsub(" ","", paste("Cluster", COI, sep = "_")))
  assign(Cluster_1, name2, envir = parent.frame())
  name2 <- grep(COI, df$cluster.assignment)
}

Desired output is two vectors called Cluster_1 and Cluster_4.

Cluster_1 would contain the values 1 and 2

Cluster_4 would contain the value 5

I can't seem to figure out how to assign the name of the COI variable to be the name of the output vector.

COI takes the value of each element of clusters, that is, first it is "1 " and then it is "2 ". A number with a space is an exceptionally bad variable name--is this really what you want, to assign the name of the COI variable to be the name of the output? — Gregor Thomas
– Gregor Thomas, Commented Sep 4, 2018 at 19:00
In this case yes because I am mining an existing dataset generated by someone else. — Paul
– Paul, Commented Sep 4, 2018 at 19:03

Gregor Thomas · Accepted Answer · 2018-09-04 19:28:47Z

1

I would suggest against using assign. Instead, I'll create a named list. See this answer for a long discussion of why lists are better than sequentially named variables. If, at any point, you decide you want to convert the list to objects in the global environment, you can use list2env, but doing so will probably just make more work.

## subset the data to the parts we care about, use `split` to separate it
## into a list
subdf = df[grepl(paste(clusters, collapse = "|"), df$cluster.assignment), ]
result = split(subdf, subdf$cluster.assignment, drop = TRUE)
result
# $`1 Unknown`
#   cluster.assignment Value1 Value2
# 1          1 Unknown      a    191
# 2          1 Unknown      b    234
# 
# $`4 Basket`
#   cluster.assignment Value1 Value2
# 5           4 Basket      e    123

## name the list as desired
names(result) = paste("Cluster", trimws(clusters), sep = "_")
result
# $`Cluster_1`
#   cluster.assignment Value1 Value2
# 1          1 Unknown      a    191
# 2          1 Unknown      b    234
# 
# $Cluster_4
#   cluster.assignment Value1 Value2
# 5           4 Basket      e    123

## if only the row names are needed, use lapply
result = lapply(result, row.names)
result
# $`Cluster_1`
# [1] "1" "2"
# 
# $Cluster_4
# [1] "5"

A few other notes - I assume you are including the spaces in clusters to prevent, e.g., "1" from matching "12 foo". You might consider using the regex word boundary "\\b1\\b" instead, as "1 " will still match, say, "11 foo" or "21 bar". Better yet, you could use strplit or similar to create a new column with just the numeric key you want to match.

edited Sep 4, 2018 at 19:28

answered Sep 4, 2018 at 19:11

Gregor Thomas

147k22 gold badges185 silver badges320 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Paul Over a year ago

Oh my, I see now why the spaces are so bad. Thanks for your suggestions and very informative answer I will give them a try!

Pang · Accepted Answer · 2018-09-20 02:25:29Z

0

I don't see the necessity to create a for loop for this unless you have your own reasons, but the following code gives you what you want:

library(data.table)
Cluster_1<-df[df$cluster.assignment %like% "1 ", c("Value1", "Value2")]
Cluster_2<-df[df$cluster.assignment %like% "4 ", c("Value1", "Value2")]
View(Cluster_1);View(Cluster_2)

you can remove or alter c("Value1", "Value2") to get the columns that you want in the final output.

edited Sep 20, 2018 at 2:25

Pang

10.2k146 gold badges87 silver badges126 bronze badges

answered Sep 4, 2018 at 19:04

Shirin Yavari

7124 silver badges7 bronze badges

1 Comment

Paul Over a year ago

I should have specified that this is a small portable example. Unfortunately in real life I need to repeat this over hundreds of different COI values. So a loop to iterate the process and make it portable across datasets is required. The heart of the question really is how do we do this in a for loop or some other high throughput way.

Collectives™ on Stack Overflow

R loop assign output to new vector

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related