I have a data.frame like this :
SO <- data.frame(coiffure_IDF$SIREN, coiffure_IDF$L6_NORMALISEE )
coiffure_IDF.SIREN coiffure_IDF.L6_NORMALISEE
1 54805015 75008 PARIS
2 300086907 94210 ST MAUR DES FOSSES
3 300090453 94220 CHARENTON LE PONT
4 300209608 75007 PARIS
5 300570553 95880 ENGHIEN LES BAINS
6 301123626 75019 PARIS
7 301362349 92300 LEVALLOIS PERRET
I want to have this :
coiffure_IDF.SIREN codpos_norm ville
1 54805015 75008 PARIS
2 300086907 94210 ST MAUR DES FOSSES
3 300090453 94220 CHARENTON LE PONT
4 300209608 75007 PARIS
5 300570553 95880 ENGHIEN LES BAINS
6 301123626 75019 PARIS
7 301362349 92300 LEVALLOIS PERRET
so I used regex :
SO2<- SO %>% extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\d+)\\s+(\\S+)")
so I have the right column is "codpos_norm" but in "ville" in line 2 I just have "ST" in stead of "ST MAUR DES FOSSES". In line 3 just "CHARENTON", etc
so I tried to add some \\s+ and \\S+ in the regex but R told me that they are to many groups and that it has to have only 2 groups.
What could I do ?
regex="(\\d+)\\s+(.+)"?.+will extract any 1 or more chars. Or.*if empty values are expected.dput(SO)data.table::tstrplit()but since I have no data to work with i can't help you.