extract number in string using regex

Question

I have a data.frame like this : SO <- data.frame(coiffure_IDF$SIREN, coiffure_IDF$L6_NORMALISEE )

  coiffure_IDF.SIREN    coiffure_IDF.L6_NORMALISEE

1 54805015            75008 PARIS

2 300086907           94210 ST MAUR DES FOSSES

3 300090453           94220 CHARENTON LE PONT

4 300209608           75007 PARIS

5 300570553           95880 ENGHIEN LES BAINS

6 301123626           75019 PARIS

7 301362349           92300 LEVALLOIS PERRET

I want to have this :

  coiffure_IDF.SIREN    codpos_norm     ville

1 54805015            75008             PARIS

2 300086907           94210           ST MAUR DES FOSSES

3 300090453           94220           CHARENTON LE PONT

4 300209608           75007            PARIS

5 300570553           95880            ENGHIEN LES BAINS

6 301123626           75019             PARIS

7 301362349           92300             LEVALLOIS PERRET

so I used regex : SO2<- SO %>% extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\d+)\\s+(\\S+)")

so I have the right column is "codpos_norm" but in "ville" in line 2 I just have "ST" in stead of "ST MAUR DES FOSSES". In line 3 just "CHARENTON", etc so I tried to add some \\s+ and \\S+ in the regex but R told me that they are to many groups and that it has to have only 2 groups.

What could I do ?

Do you mean you need regex="(\\d+)\\s+(.+)"? .+ will extract any 1 or more chars. Or .* if empty values are expected. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Aug 3, 2018 at 10:21
you can easily solve this with data.table::tstrplit() but since I have no data to work with i can't help you. — Andre Elrico
– Andre Elrico, Commented Aug 3, 2018 at 10:27

Wiktor Stribiżew · Accepted Answer · 2018-08-03 10:24:50Z

2

You need to match the rest of the string in Group 2, the \S construct only matches non-whitespace chars. Use .+ to match any 1+ chars up to the string end:

extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\d+)\\s+(.+)")

You may use .* to match empty strings (if there is no text after 1+ whitespaces).

answered Aug 3, 2018 at 10:24

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

extract number in string using regex

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related