How to create dictionary from a string using a regex to get groups? [closed]

Question

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Guide the asker to update the question so it focuses on a single, specific problem. Narrowing the question will help others answer the question concisely. You may edit the question if you feel you can improve it yourself. If edited, the question will be reviewed and might be reopened.

Closed 5 years ago.

Improve this question

I have complex task that I want to accomplish: From a string I want to be able to classify words in particular categories.

s = 'Age 63 years, female 35%; race or ethnic group: White 68%, Black 5%, Asian 19%, other 8%'
d = function(s)
print(d)
      {"age": "63 years",
       "gender: "female 35%",
       "race": "White 68%, Black 5%, Asian 19%, other 8%"}

I must not that not all strings are in the same format but there is a finite set of categories in all (age, gender, race, region) but some strings only have 1 or 2 out of the 4 categories.

Here are some other toy strings:

s2 = 'Age 71 years, male 64%'
s3 = 'Age 64 years, female 21%,
Race or ethnicity: White 66%, Black 5%, Asian 18%, other 11%
Region: N. America 7%, Latin America 17%, W. Europe or other 24%, central Europe 33%, Asia-Pacific 18%

As you can see there are some patterns:

age is not preceded by any ':'.
gender is documented as either female or male.
race and region are followed by ':'.

I am in interested in collection all the information corresponding to the category as see in my toy example with the race category.

What I need:

Writing the RegEx pattern with the appropriate capturing groups to obtain the results.
Transform the matches to a dictionary: I have seen a solution using the .groupdict() method to do so.

I have a problem writing the regex pattern that will return the aforementioned groups.

I have seen this interesting solution for a similar problem: python regex: create dictionary from string. But I have trouble applying it to mine.

It seems like you are looking for a general solution based on one example string. Are all strings in this exact form? — Mark
– Mark, Commented Oct 24, 2020 at 18:40
Please clarify upon which condition the sepperation take place? (For age the qord age must appear, the possible genders are only male or female? The order is always like in the example?) , second - have you tried something? Share it. — Yossi Levi
– Yossi Levi, Commented Oct 24, 2020 at 18:40
Hi both of you @MarkMeyer and @YossiLevi! I updated my question to incorporate your questions. Thanks in advance! — Eric Yamga
– Eric Yamga, Commented Oct 24, 2020 at 18:52
You can take first pattern, Age.\d{1,}.\w{1,}, discover it, remove from the string, then process the substring without age to get the first pattern, then discover female 35% using ^\w+\s{1,}\d{1,}%, remove and so on. — dmitryro
– dmitryro, Commented Oct 24, 2020 at 19:01
To find the first one you can use ^\w+\s{0,}\d{1,}\s{0,}\w+, remove it from string and do same ^\w+\s{1,}\d{1,}% on substring and so on. — dmitryro
– dmitryro, Commented Oct 24, 2020 at 19:07

stunner.agk · Accepted Answer · 2020-10-25 02:18:57Z

0

Instead of finding one golden regex to handle all the cases you could possibly pass your input string through a set of regexes each trying to extract one of the columns you have mentioned in the question. Something like

ageMatch = re.match( r'Age\s+(\d+)\s+years?', s, re.I)
if ageMatch:
    //Use ageMatch.group(1) to form part of your dict

genderMatch = re.match( r'(male|female)\s+(\d+)\s*%', s, re.I)
if genderMatch:
    //Use genderMatch.group(1) genderMatch.group (2) to form part of your dict

answered Oct 25, 2020 at 2:18

stunner.agk

363 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Eric Yamga Over a year ago

Thank you for the tip! That is actually a great alternative!!

Collectives™ on Stack Overflow

How to create dictionary from a string using a regex to get groups? [closed]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related