0

I have lines that look like this

 01:04:43.064 [12439] <2> xyz
 01:04:43.067 [12439] <2> a lmn
 01:04:43.068 [12439] <4> j klm
 x_times_wait to <3000>
 01:04:43.068 [12439] <4> j klm
 enter_object <5000> main k

I want a regex to extract only the values after the angular brackets for lines that start with a timestamp

This is what I have tried - assuming that these lines are in a data frame called nn

 split<-str_split_fixed(nn[,1], ">", 2)
 split2<-data.frame(split[,2])

The problem is that split2 gives

   xyz
   a lmn
   j klm

   j klm
   main k

How can I make sure that the empty line and main k is not returned?

2
  • solved using the stringr package aa<-str_extract(as.character(nn[,1]), "[0-9][0-9]:.*") and then did the str_split_fixed Commented Dec 18, 2014 at 21:25
  • Thanks everyone for the awesome answers! Commented Dec 18, 2014 at 21:33

4 Answers 4

3
\d+(?::\d+){2}\.\d+\s+\[[^\]]+\]\s+<\d+>(.+)$

Instead of split try match and grab the group 1.See demo.

https://regex101.com/r/vN3sH3/16

or

Split by (?<=<\d>) and get split2

Sign up to request clarification or add additional context in comments.

Comments

2

If a timestamp is defined as 1 or more digits followed by a :, followed by 1 or more digits and another : and then 1 or more digits, then perhaps this method would work for you.

x <- c("01:04:43.064 [12439] <2> xyz", "01:04:43.067 [12439] <2> a lmn",   
       "01:04:43.068 [12439] <4> j klm", "x_times_wait to <3000>",  
       "01:04:43.068 [12439] <4> j klm", "enter_object <5000> main k")

sub(".*> ", "", x[grepl("\\d+:\\d+:\\d+", x)])
# [1] "xyz"   "a lmn" "j klm" "j klm"

This removes all the non-timestamp elements first, then gets the values after > with the remaining elements.

Comments

0

Here's an approach in base R:

The regex:

^(\\d{2}:){2}\\d{2}\\.\\d{3}.*>\\s*\\K.+

You can use it with gregexpr:

unlist(regmatches(vec, gregexpr("^(\\d{2}:){2}\\d{2}\\.\\d{3}.*>\\s*\\K.+", 
                                vec, perl = TRUE)))
# [1] "xyz"   "a lmn" "j klm" "j klm"

where vec is the vector containing your strings.

Comments

0

Using rex may make this type of task a little simpler.

string <- "01:04:43.064 [12439] <2> xyz
01:04:43.067 [12439] <2> a lmn
01:04:43.068 [12439] <4> j klm
x_times_wait to <3000>
01:04:43.068 [12439] <4> j klm
enter_object <5000> main k"

library(rex)

timestamp <- rex(n(digit, 2), ":", n(digit, 2), ":", n(digit, 2), ".", n(digit, 3))

re <- rex(timestamp, space,
          "[", digits, "]", space,
          "<", digits, ">", space,
          capture(anything))

re_matches(string, re, global = TRUE)

#> [[1]]
#>       1
#> 1   xyz
#> 2 a lmn
#> 3 j klm
#> 4 j klm

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.