1

I'm working with semi-structured wiki data from a project I inherited from a colleague and having some trouble getting it tidy. It has a ton of issues but one of the first things I need to do is create sensible column names.

Suppose I have a data frame like this:

df <- data.frame(x1 = "ID: 4",
    x2 = "Start Date: 1946/11/13",
    x3 = "End Date: 1946/12/31")
 x1                     x2                   x3
ID: 4 Start Date: 1946/11/13 End Date: 1946/12/31

I'd like to extract everything in the value before the colon and rename the columns based on this extract so that my data frame looks like this:

ID Start_Date End_Date
4  1946/11/13 1946/12/31

So far, I've learned that I can use str_extract from from the stringr package to pull out the strings of interest but I'm stumbling over how to use this resulting list for renaming column names.

library(tidyverse)

map(df, function(x) {str_extract(x,"[^:]+") %>% str_replace(" ", "_")}) 

Thanks for checking out this question :)

2
  • 1
    Do you have one row only or multiple rows? Commented Sep 17, 2019 at 21:39
  • @M-- The actual data will have multiple rows but the strings preceding the colon (i.e. id, start_date, etc.) shouldn't change. Commented Sep 17, 2019 at 21:43

3 Answers 3

1
nm = gsub("\\s", "_", sapply(df[1,], function(x) gsub("(.*):.*", "\\1", x)))
setNames(data.frame(lapply(df, function(x) gsub(".*:\\s?(.*)", "\\1", x))), nm)
#  ID Start_Date   End_Date
#1  4 1946/11/13 1946/12/31
Sign up to request clarification or add additional context in comments.

Comments

0

We can use a little regular expression magic to grab anything that appears before the ":" character, then assign the results to the column names of your data frame:

df <- data.frame(x1 = "ID: 4",
                 x2 = "Start Date: 1946/11/13",
                 x3 = "End Date: 1946/12/31")

labels <- sapply(df[1, ], sub, pattern = '(?=:).*', replacement = '\\1', perl = T)
labels <- gsub(' ', '_', labels)
colnames(df) <- labels

> colnames(df)
[1] "ID"         "Start_Date" "End_Date"  

Comments

0
df <- data.frame(x1 = "ID: 4",
                 x2 = "Start Date: 1946/11/13",
                 x3 = "End Date: 1946/12/31", stringsAsFactors = F)

names(df) <- sapply(df[1,], function(x) {stringr::str_extract(x,"[^:]+") %>% stringr::str_replace(" ", "_")})
df <- rbind(df, sapply(df[1,], function(x) {stringr::str_extract(x,":.+$") %>% stringr::str_replace(": ", "")}))
df <- df[2, ]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.