2

I have a dataframe that looks as follows:

df <- data.frame(one=c("s1_below_10", "s2_below_20"), 
                 two=c("s3_above_10","s4_above_10"))

I want to replace all the strings by the number preceding the first underscore. In other words, the desired output is

1   3
2   4

I would like to know how I can perform this replacement (the dataset is very large). Thanks for your help.

2 Answers 2

4

The basic gsub call would be something like:

gsub("^.+?(\\d+)_.+","\\1",df$one)
[1] "1" "2"

Which you could lapply to each column:

data.frame(lapply(df, gsub, pattern="^.+(\\d+)_.+",replacement= "\\1"))
  one two
1   1   3
2   2   4
Sign up to request clarification or add additional context in comments.

1 Comment

The above pattern will only grab the last digit of a multi digit number (which may be what was wanted, or not) because the first '.+' is greedy and will match as much as possible. A simple correction of changing '.+' to '.+?' will make it non-greedy and let the \\d+ match multiple digits.
2

If the values you want are always the second character of the string (which seems to be true of all your examples), you can do this with substr:

data.frame(lapply(df, substr, 2, 2))

Output:

  one two
1   1   3
2   2   4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.