2

I've the below dataframe with 10 columns.

V1  V2  V3  V4  V5  V6  V7  V8  V9  V10
1   2   3   4   5   6   7   8   9   10
11  12  13  14  15  16  17  18  19  20
21  22  23  24  25  26  27  28  29  30
31  32  33  34  35  36  37  38  39  40
41  42  43  44  45  46  47  48  49  50
51  52  53  54  55  56  57  58  59  60
61  62  63  64  65  66  67  68  69  70
71  72  73  74  75  76  77  78  79  80
81  82  83  84  85  86  87  88  89  90
91  92  93  94  95  96  97  98  99  100

I want to split that dataframe into different chunks with each two of its two columns as a new dataframe. For example, V1 & V2 into one dataframe, and V3 & V4 into one dataframe and so on:

V1  V2
1   2
11  12
21  22
31  32
41  42
51  52
61  62
71  72
81  82
91  92

V3  V4
3   4
13  14
23  24
33  34
43  44
53  54
63  64
73  74
83  84
93  94

How can I achieve this easily in R?

2 Answers 2

5

Try tapply with an INDEX argument of 1, 1, 2, 2, etc.

tapply(as.list(DF), gl(ncol(DF)/2, 2), as.data.frame)

giving (continued below output):

$`1`
   V1 V2
1   1  2
2  11 12
3  21 22
4  31 32
5  41 42
6  51 52
7  61 62
8  71 72
9  81 82
10 91 92

$`2`
   V3 V4
1   3  4
2  13 14
3  23 24
4  33 34
5  43 44
6  53 54
7  63 64
8  73 74
9  83 84
10 93 94

$`3`
   V5 V6
1   5  6
2  15 16
3  25 26
4  35 36
5  45 46
6  55 56
7  65 66
8  75 76
9  85 86
10 95 96

$`4`
   V7 V8
1   7  8
2  17 18
3  27 28
4  37 38
5  47 48
6  57 58
7  67 68
8  77 78
9  87 88
10 97 98

Another possibility if there is an all numeric data frame as in the question is to reshape it into an array:

a <- array(unlist(DF), c(nrow(DF), 2, ncol(DF)/2))

in which case a[,,i] is the ith matrix for i = 1, ..., ncol(DF)/2 .

Note: The input DF in reproducible form is:

DF <- as.data.frame(matrix(1:100, 10, byrow = TRUE))
Sign up to request clarification or add additional context in comments.

9 Comments

Or split.default(DF, gl(ncol(DF)/2, 2)) ?
@G.Grothendieck: tapply(as.list(DF), gl(ncol(DF)/2, 2), as.data.frame) seems very good for splitting the big dataframe into chunks.. but instead of creating one list with 5 elements, is there a way where I can output 5 dataframes into 5 new variables rather than one list with 5 elements??
That is normally not a good idea but if you must then: for(i in seq_along(L)) assign(paste0("DF", i), L[[i]]) where L is the list created by tapply.
that worked... it created new dataframes.. THANKS A LOT REALLY.. but ya for data of millions of records, probably it's going to take up huge memory (u already told it's probably not a good idea) but looks like I'm cornered here.. I'm having to use this... Hmm, or, on the other hand, can you suggest me if there is a way where I can run a function from that large 10 columned dataset but passing only 2 columns at a time into each function run (first run with V1&V2, second run with V3&v4 and so on...).. Is there a way to do that??? Because that's why I wanted 5 diff dataframes with 2 columns each.
Replace as.data.frame in the tapply expression with your function.
|
4

We can use seq to create the index of alternating columns, loop along and then subset the dataset

lst1 <- lapply(seq(1, ncol(df1), by=2), function(i) 
                   df1[i: pmin((i+1), ncol(df1))])

Or use split

 lst2 <- lapply(split(seq_along(df1),as.numeric(gl(ncol(df1), 
           2, ncol(df1)))), function(i) df1[i])

If we need 5 individual datasets in the global environment, use list2env (not recommended though)

list2env(setNames(lst1, paste0("newdf", seq_along(lst1))),
           envir=.GlobalEnv)

data

df1 <- structure(list(V1 = c(1L, 11L, 21L, 31L, 41L, 51L, 61L, 71L, 
81L, 91L), V2 = c(2L, 12L, 22L, 32L, 42L, 52L, 62L, 72L, 82L, 
92L), V3 = c(3L, 13L, 23L, 33L, 43L, 53L, 63L, 73L, 83L, 93L), 
    V4 = c(4L, 14L, 24L, 34L, 44L, 54L, 64L, 74L, 84L, 94L), 
    V5 = c(5L, 15L, 25L, 35L, 45L, 55L, 65L, 75L, 85L, 95L), 
    V6 = c(6L, 16L, 26L, 36L, 46L, 56L, 66L, 76L, 86L, 96L), 
    V7 = c(7L, 17L, 27L, 37L, 47L, 57L, 67L, 77L, 87L, 97L), 
    V8 = c(8L, 18L, 28L, 38L, 48L, 58L, 68L, 78L, 88L, 98L), 
    V9 = c(9L, 19L, 29L, 39L, 49L, 59L, 69L, 79L, 89L, 99L), 
    V10 = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 100L
    )), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7", 
"V8", "V9", "V10"), class = "data.frame", row.names = c(NA, -10L
))

2 Comments

But then even if I do this, it is forming a list with 5 elements not in the expected format (with V1 & V2 as seperate dataframe and V3 & V4 and so on...) It's forming a list of 5 elements, where each element contains the two columns but in row format..
@LearneR As I understand you need a data.frame with V1, V2 as columns, similarly another one with V3, V4 etc. I get a list of 5 data.frames with 2 columns each. I assume that your input dataset is a data.frame.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.