1

Using the Hive command regexp_extract I am trying to change the following strings from:

201703170455 to 2017-03-17:04:55

and from:

2017031704555675 to 2017-03-17:04:55.0010

I am doing this in sparklyr trying to use this code that works with gsub in R:

  newdf<-df%>%mutate(Time1 = regexp_extract(Time, "(....)(..)(..)(..)(..)", "\\1-\\2-\\3:\\4:\\5"))

and this code:

newdf<-df%>mutate(TimeTrans = regexp_extract("(....)(..)(..)(..)(..)(....)", "\\1-\\2-\\3:\\4:\\5.\\6"))

but does not work at all. Any suggestions of how to do this using regexp_extract?

1 Answer 1

5

Apache Spark uses Java regular expression dialect not R, and groups should be referenced with $. Furthermore regexp_replace is used to extract a single group by a numeric index.

You can use regexp_replace:

df <- data.frame(time = c("201703170455", "2017031704555675"))
sdf <- copy_to(sc, df)

sdf %>% 
  mutate(time1 = regexp_replace(
    time, "^(....)(..)(..)(..)(..)$", "$1-$2-$3 $4:$5" )) %>%
  mutate(time2 = regexp_replace(
    time, "^(....)(..)(..)(..)(..)(....)$", "$1-$2-$3 $4:$5.$6"))

Source:   query [2 x 3]
Database: spark connection master=local[8] app=sparklyr local=TRUE

# A tibble: 2 x 3
              time            time1                 time2
             <chr>            <chr>                 <chr>
1     201703170455 2017-03-17 04:55          201703170455
2 2017031704555675 2017031704555675 2017-03-17 04:55.5675
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.