Replacing string data frame

Question

I have a file like this

1880.1.1    74
1881.1.1    74
1882.1.1    75
1883.1.1    79
1884.1.1    111
1885.1.1    145

and I want to create a dataframe like this

1880    1    1  74
1881    1    1  74
1882    1    1  75
1883    1    1  79
1884    1    1  111
1885    1    1  145

but when I try with the gsub function I fail.. Many many thanks!

You have to escape the period, try out: gsub("\\."," ","1880.1.1") — David
– David, Commented Sep 9, 2013 at 14:51
Since you didn't show us how your gsub is failing, I'm going to guess you aren't escaping the .. It should look like gsub('\\.', ...) However, I don't think gsub is the function you want. Instead, look at strsplit and please share more of the code that you have tried. — Justin
– Justin, Commented Sep 9, 2013 at 14:52

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2013-09-09 14:56:40Z

You can use concat.split from my "splitstackshape" package for a more convenient way to do what you're trying to do. Assuming your data.frame is called "mydf" and the first column is called "V1", you can do:

> library(splitstackshape)
> concat.split(mydf, "V1", sep = ".", drop = TRUE)
   V2 V1_1 V1_2 V1_3
1  74 1880    1    1
2  74 1881    1    1
3  75 1882    1    1
4  79 1883    1    1
5 111 1884    1    1
6 145 1885    1    1

Here, "mydf" is defined as:

mydf <- structure(list(V1 = c("1880.1.1", "1881.1.1", "1882.1.1", "1883.1.1", 
  "1884.1.1", "1885.1.1"), V2 = c(74L, 74L, 75L, 79L, 111L, 145L)), 
  .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -6L))

The equivalent in base R is to use something like the following:

> cbind(read.table(text = as.character(mydf$V1), sep = "."), mydf[-1])
    V1 V2 V3  V2
1 1880  1  1  74
2 1881  1  1  74
3 1882  1  1  75
4 1883  1  1  79
5 1884  1  1 111
6 1885  1  1 145

Jilber Urbina · Accepted Answer · 2013-09-09 15:13:46Z

2

Although Anandas' R base solution is the simplier and nicer, here's another approach using strsplit

> data.frame(t(sapply(strsplit(mydf[,"V1"], "\\." ), as.numeric)), X4=mydf[, "V2"])
    X1 X2 X3  X4
1 1880  1  1  74
2 1881  1  1  74
3 1882  1  1  75
4 1883  1  1  79
5 1884  1  1 111
6 1885  1  1 145

edited Sep 9, 2013 at 15:13

answered Sep 9, 2013 at 15:05

Jilber Urbina

61.4k10 gold badges116 silver badges141 bronze badges

4 Comments

dayne Over a year ago

I did not know as.numeric would coerce the data to a matrix. Thanks for the lesson!

A5C1D2H2I1M1N2O1R2T1 Over a year ago

@dayne, it's not the as.numeric that's coercing to a matrix. You can have almost anything there that won't change the values (c, as.vector, ...). It's just that sapply will simplify to a matrix whenever possible (as it was in this case).

dayne Over a year ago

@AnandaMahto Thanks! I really should have known that. In this case is sapply or mapply more appropriate? They both seem to behave identically, using either the as.numeric or cbind/rbind approach.

A5C1D2H2I1M1N2O1R2T1 Over a year ago

@dayne, Not sure, really. Probably depends on how you define "more appropriate" :). I don't know which function is more efficient. I haven't used mapply much.

dayne · Accepted Answer · 2013-09-09 15:05:24Z

1

Here is a strsplit approach. I used @Ananda's data.

> data.frame(t(mapply(cbind,strsplit(mydf[,1],split='[:.:]'))),mydf[,2])
    X1 X2 X3 mydf...2.
1 1880  1  1        74
2 1881  1  1        74
3 1882  1  1        75
4 1883  1  1        79
5 1884  1  1       111
6 1885  1  1       145

answered Sep 9, 2013 at 15:05

dayne

7,8547 gold badges42 silver badges59 bronze badges

Collectives™ on Stack Overflow

Replacing string data frame

3 Answers 3

Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related