How to split a data frame into multiple dataframes with each two columns as a new dataframe?

Question

I've the below dataframe with 10 columns.

V1  V2  V3  V4  V5  V6  V7  V8  V9  V10
1   2   3   4   5   6   7   8   9   10
11  12  13  14  15  16  17  18  19  20
21  22  23  24  25  26  27  28  29  30
31  32  33  34  35  36  37  38  39  40
41  42  43  44  45  46  47  48  49  50
51  52  53  54  55  56  57  58  59  60
61  62  63  64  65  66  67  68  69  70
71  72  73  74  75  76  77  78  79  80
81  82  83  84  85  86  87  88  89  90
91  92  93  94  95  96  97  98  99  100

I want to split that dataframe into different chunks with each two of its two columns as a new dataframe. For example, V1 & V2 into one dataframe, and V3 & V4 into one dataframe and so on:

How can I achieve this easily in R?

G. Grothendieck · Accepted Answer · 2016-03-02 17:46:48Z

5

Try tapply with an INDEX argument of 1, 1, 2, 2, etc.

tapply(as.list(DF), gl(ncol(DF)/2, 2), as.data.frame)

giving (continued below output):

Another possibility if there is an all numeric data frame as in the question is to reshape it into an array:

a <- array(unlist(DF), c(nrow(DF), 2, ncol(DF)/2))

in which case a[,,i] is the ith matrix for i = 1, ..., ncol(DF)/2 .

Note: The input DF in reproducible form is:

DF <- as.data.frame(matrix(1:100, 10, byrow = TRUE))

edited Mar 2, 2016 at 17:46

answered Mar 2, 2016 at 14:13

G. Grothendieck

273k18 gold badges221 silver badges365 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Frank Over a year ago

Or split.default(DF, gl(ncol(DF)/2, 2)) ?

LearneR Over a year ago

@G.Grothendieck: tapply(as.list(DF), gl(ncol(DF)/2, 2), as.data.frame) seems very good for splitting the big dataframe into chunks.. but instead of creating one list with 5 elements, is there a way where I can output 5 dataframes into 5 new variables rather than one list with 5 elements??

G. Grothendieck Over a year ago

That is normally not a good idea but if you must then: for(i in seq_along(L)) assign(paste0("DF", i), L[[i]]) where L is the list created by tapply.

LearneR Over a year ago

that worked... it created new dataframes.. THANKS A LOT REALLY.. but ya for data of millions of records, probably it's going to take up huge memory (u already told it's probably not a good idea) but looks like I'm cornered here.. I'm having to use this... Hmm, or, on the other hand, can you suggest me if there is a way where I can run a function from that large 10 columned dataset but passing only 2 columns at a time into each function run (first run with V1&V2, second run with V3&v4 and so on...).. Is there a way to do that??? Because that's why I wanted 5 diff dataframes with 2 columns each.

G. Grothendieck Over a year ago

Replace as.data.frame in the tapply expression with your function.

|

akrun · Accepted Answer · 2016-03-02 13:27:31Z

4

We can use seq to create the index of alternating columns, loop along and then subset the dataset

lst1 <- lapply(seq(1, ncol(df1), by=2), function(i) 
                   df1[i: pmin((i+1), ncol(df1))])

Or use split

 lst2 <- lapply(split(seq_along(df1),as.numeric(gl(ncol(df1), 
           2, ncol(df1)))), function(i) df1[i])

If we need 5 individual datasets in the global environment, use list2env (not recommended though)

list2env(setNames(lst1, paste0("newdf", seq_along(lst1))),
           envir=.GlobalEnv)

data

df1 <- structure(list(V1 = c(1L, 11L, 21L, 31L, 41L, 51L, 61L, 71L, 
81L, 91L), V2 = c(2L, 12L, 22L, 32L, 42L, 52L, 62L, 72L, 82L, 
92L), V3 = c(3L, 13L, 23L, 33L, 43L, 53L, 63L, 73L, 83L, 93L), 
    V4 = c(4L, 14L, 24L, 34L, 44L, 54L, 64L, 74L, 84L, 94L), 
    V5 = c(5L, 15L, 25L, 35L, 45L, 55L, 65L, 75L, 85L, 95L), 
    V6 = c(6L, 16L, 26L, 36L, 46L, 56L, 66L, 76L, 86L, 96L), 
    V7 = c(7L, 17L, 27L, 37L, 47L, 57L, 67L, 77L, 87L, 97L), 
    V8 = c(8L, 18L, 28L, 38L, 48L, 58L, 68L, 78L, 88L, 98L), 
    V9 = c(9L, 19L, 29L, 39L, 49L, 59L, 69L, 79L, 89L, 99L), 
    V10 = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 100L
    )), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7", 
"V8", "V9", "V10"), class = "data.frame", row.names = c(NA, -10L
))

edited Mar 2, 2016 at 13:27

answered Mar 2, 2016 at 12:52

akrun

891k38 gold badges590 silver badges700 bronze badges

2 Comments

LearneR Over a year ago

But then even if I do this, it is forming a list with 5 elements not in the expected format (with V1 & V2 as seperate dataframe and V3 & V4 and so on...) It's forming a list of 5 elements, where each element contains the two columns but in row format..

akrun Over a year ago

@LearneR As I understand you need a data.frame with V1, V2 as columns, similarly another one with V3, V4 etc. I get a list of 5 data.frames with 2 columns each. I assume that your input dataset is a data.frame.

Collectives™ on Stack Overflow

How to split a data frame into multiple dataframes with each two columns as a new dataframe?

2 Answers 2

9 Comments

data

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

data

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related