Split and create a new dataframes for each variable in a specific column

Question

I'm not exactly sure how to make progress on this question. I'm using here this mtcars dataset:

structure(list(index = 1:32, car = c("Mazda RX4", "Mazda RX4 Wag", 
"Datsun 710", "Hornet 4 Drive", "Hornet Sportabout", "Valiant", 
"Duster 360", "Merc 240D", "Merc 230", "Merc 280", "Merc 280C", 
"Merc 450SE", "Merc 450SL", "Merc 450SLC", "Cadillac Fleetwood", 
"Lincoln Continental", "Chrysler Imperial", "Fiat 128", "Honda Civic", 
"Toyota Corolla", "Toyota Corona", "Dodge Challenger", "AMC Javelin", 
"Camaro Z28", "Pontiac Firebird", "Fiat X1-9", "Porsche 914-2", 
"Lotus Europa", "Ford Pantera L", "Ferrari Dino", "Maserati Bora", 
"Volvo 142E"), mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 
24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 
30.4, 33.9, 21.5, 15.5, 15.2, 13.3, 19.2, 27.3, 26, 30.4, 15.8, 
19.7, 15, 21.4), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 
8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4), 
    disp = c(160, 160, 108, 258, 360, 225, 360, 146.7, 140.8, 
    167.6, 167.6, 275.8, 275.8, 275.8, 472, 460, 440, 78.7, 75.7, 
    71.1, 120.1, 318, 304, 350, 400, 79, 120.3, 95.1, 351, 145, 
    301, 121), hp = c(110, 110, 93, 110, 175, 105, 245, 62, 95, 
    123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150, 
    150, 245, 175, 66, 91, 113, 264, 175, 335, 109), drat = c(3.9, 
    3.9, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 
    3.07, 3.07, 3.07, 2.93, 3, 3.23, 4.08, 4.93, 4.22, 3.7, 2.76, 
    3.15, 3.73, 3.08, 4.08, 4.43, 3.77, 4.22, 3.62, 3.54, 4.11
    ), wt = c(2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 
    3.15, 3.44, 3.44, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2, 
    1.615, 1.835, 2.465, 3.52, 3.435, 3.84, 3.845, 1.935, 2.14, 
    1.513, 3.17, 2.77, 3.57, 2.78), qsec = c(16.46, 17.02, 18.61, 
    19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3, 18.9, 17.4, 17.6, 
    18, 17.98, 17.82, 17.42, 19.47, 18.52, 19.9, 20.01, 16.87, 
    17.3, 15.41, 17.05, 18.9, 16.7, 16.9, 14.5, 15.5, 14.6, 18.6
    ), vs = c(0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 
    0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1), am = c(1, 
    1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 
    0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1), gear = c(4, 4, 4, 3, 
    3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 
    3, 3, 4, 5, 5, 5, 5, 5, 4), carb = c(4, 4, 1, 1, 2, 1, 4, 
    2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1, 
    2, 2, 4, 6, 8, 2)), row.names = c(NA, -32L), class = c("tbl_df", 
"tbl", "data.frame"))

Here is the pseudo-code that I have written.

for (i in 1:length(car)) {
  mtcars %>%
  filter(car == car[i]) %>%
  mtcars_i <- mtcars
}

The idea here is that I would like create 32 different datasets with the name of each car in the label for this particular dataset.

mtcars_mazda_rx4
mtcars_hornet_sportabout
etc.

Here mtcars_mazda_rx4 would be a dataframe with all the same variables but only one observation, where car == "Mazda RX4", i.e. mtcars[car == "Mazda RX4",]

Is there a way to create a for loop that filters the dataframe by a specific variable, and then outputs a new dataframe with that variable name identified in the new df?

M-- · Accepted Answer · 2022-10-19 17:53:26Z

2

Just a different approach using split; I am using dplyr to make the solution more legible;

library(tidyverse)

mtcars %>% 
#  rownames_to_column("car") %>% ## run this line if you are using original mtcars
  split(., .$car) %>% 
  set_names(., nm = paste0("mtcars_", names(.))) %>% 
  list2env(., envir=.GlobalEnv)

edited Oct 19, 2022 at 17:53

answered Oct 19, 2022 at 17:43

M--

33.7k12 gold badges74 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

hachiko Over a year ago

hm I'm getting a weird error that says Error: caused by error in 'repaired_names()': ! Names must be unique. x These names are duplicated: "Plant.Name" at locations 23 and 25

M-- Over a year ago

you forgot to share your error. by any chance it says column name should not be duplicated? Here, I am using the original mtcars dataset which has the car names in its rownames. For your dataset, you don't need the second line (rowname_to_column). Just remove that line.

akrun · Accepted Answer · 2022-10-19 17:19:47Z

1

We can use assign

for (i in 1:length(car)) {
  tmp <- mtcars %>%
  filter(car == car[i])
  assign(paste0('mtcars_', car[i]), tmp)
}

answered Oct 19, 2022 at 17:19

akrun

891k38 gold badges590 silver badges700 bronze badges

5 Comments

akrun Over a year ago

@hachiko <- you may use car <- na.omit(car) and then loop

hachiko Over a year ago

I've discovered that I have the same error with each of my most recent posts - trying to take the code from the mtcars version to a larger dataframe -- I get an error message that says "object Plant.Name not found" (Plant.Name is my version of "car") I'm having trouble figuring out what that means - do you have any ideas?

hachiko Over a year ago

na.omit is a good idea instead of %>% filter(!is.na(car) bc it looks a little faster

akrun Over a year ago

@hachiko But na.omit adds an attribute. although in this case, it is okay

akrun Over a year ago

@hachiko can you show values of car

Collectives™ on Stack Overflow

Split and create a new dataframes for each variable in a specific column

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related