1

I have a dataframe:

                     value
2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = "jbohl"
2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = "msmith". limit test
2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = "msmith"

How could i separate column "value" into 6 columns, defined by timestamp and parenthesis? Desired result must look like this:

        timestamp           col2      col3          col4               col5              message
2020-11-20 09:10:28:005    DEBUG     <main>      {EVENT-upload}     [Item_create]      increase values: user = "jbohl"
2020-11-20 09:11:10:055    DEBUG     <main>      {EVENT-upload}     [Item_create]      redirect: user = "msmith". limit test
2020-11-20 09:10:28:174    INFO      <main>      {EVENT-upload}     [INPUT]            new set: id = 12442, user = "msmith"

dput:

df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl", "2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test", "2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith" )), class = "data.frame", row.names = c(NA, -3L)) 

1 Answer 1

2

You can use tidyr's extract and provide a pattern to extract for each column value.

tidyr::extract(df, value,
               c('timestamp', paste0('col', 2:5), 'message'), 
               '(\\d+-\\d+-\\d+ \\d+:\\d+:\\d+:\\d+)\\s*([A-Z]+)\\s*(<.*?>)\\s*({.*?})\\s*(\\[.*?\\])\\s*(.*)')

#               timestamp  col2   col3           col4          col5
#1 2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create]
#2 2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create]
#3 2020-11-20 09:10:28:174  INFO <main> {EVENT-upload}       [INPUT]

#                              message
#1       increase values: user = jbohl
#2 redirect: user = msmith. limit test
#3  new set: id = 12442, user = msmith

timestamp - extract numbers that follow the pattern num-num-num num:num:num:num

col2 - extract all the following uppercase text

col3 - extracts value in <.*>

col4 - extracts value in {.*}

col5 - extracts value in [.*]

col6 - all the remaining text.

data

df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl", 
"2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test", 
"2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith"
)), class = "data.frame", row.names = c(NA, -3L))
Sign up to request clarification or add additional context in comments.

7 Comments

it gives this error Error in stringi::stri_match_first_regex(x, regex) : Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)
@french_fries Provide data using dput. It works for the data shared in my post.
df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl", "2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test", "2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith" )), class = "data.frame", row.names = c(NA, -3L))
one that you provided
It works for me with packageVersion('tidyr') #‘1.1.2’. What is yours?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.