0

I have a data.frame like this : SO <- data.frame(coiffure_IDF$SIREN, coiffure_IDF$L6_NORMALISEE )

  coiffure_IDF.SIREN    coiffure_IDF.L6_NORMALISEE

1 54805015            75008 PARIS

2 300086907           94210 ST MAUR DES FOSSES

3 300090453           94220 CHARENTON LE PONT

4 300209608           75007 PARIS

5 300570553           95880 ENGHIEN LES BAINS

6 301123626           75019 PARIS

7 301362349           92300 LEVALLOIS PERRET

I want to have this :

  coiffure_IDF.SIREN    codpos_norm     ville

1 54805015            75008             PARIS

2 300086907           94210           ST MAUR DES FOSSES

3 300090453           94220           CHARENTON LE PONT

4 300209608           75007            PARIS

5 300570553           95880            ENGHIEN LES BAINS

6 301123626           75019             PARIS

7 301362349           92300             LEVALLOIS PERRET

so I used regex : SO2<- SO %>% extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\d+)\\s+(\\S+)")

so I have the right column is "codpos_norm" but in "ville" in line 2 I just have "ST" in stead of "ST MAUR DES FOSSES". In line 3 just "CHARENTON", etc so I tried to add some \\s+ and \\S+ in the regex but R told me that they are to many groups and that it has to have only 2 groups.

What could I do ?

4
  • Do you mean you need regex="(\\d+)\\s+(.+)"? .+ will extract any 1 or more chars. Or .* if empty values are expected. Commented Aug 3, 2018 at 10:21
  • Yes, it works, I didn't know about it. Commented Aug 3, 2018 at 10:23
  • post your data as result of dput(SO) Commented Aug 3, 2018 at 10:25
  • you can easily solve this with data.table::tstrplit() but since I have no data to work with i can't help you. Commented Aug 3, 2018 at 10:27

1 Answer 1

2

You need to match the rest of the string in Group 2, the \S construct only matches non-whitespace chars. Use .+ to match any 1+ chars up to the string end:

extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\d+)\\s+(.+)")

You may use .* to match empty strings (if there is no text after 1+ whitespaces).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.