I have a countries.txt file that contains the following sample text:
[Country "Kenya"]\n[CapitalCity "Nairobi"]\n\n
[Country "Uganda"]\n[CapitalCity "Kampala"]\n\n
[Country "Tanzania"]\n[CapitalCity "Dodoma"]\n\n
The country can have up to 20 attributes. For simplicity, i have only included Country and CapitalCity. I need a regex that works in python that will return for the sample data above:
a) n matches, in the above case n=3
b) Each match should have m groups, in this case m=2: Country and CapitalCity
I have read this https://www.regular-expressions.info/captureall.html but cannot seem to get it to work for my usecase.
I have tried this
(\[([A-Za-z]+)\s\"([^\"]*)\"\]\\n\\n)+
here https://regex101.com/r/cujIDd/1 but it does not give me the Country.
EDIT: Expected input and output
Example 1: input
[Country "Kenya"]\n[CapitalCity "Nairobi"]\n\n
[Country "Uganda"]\n[CapitalCity "Kampala"]\n\n
[Country "Tanzania"]\n[CapitalCity "Dodoma"]\n\n
expected output
matches: 3
match 1: Country: Kenya
CapitalCity: Nairobi
match 2: Country: Uganda
CapitalCity: Kampala
match 3: Country: Tanzania
CapitalCity: Dodoma
Example 2: input
[Country "Kenya"]\n[CapitalCity "Nairobi"]\n[President "Kenyatta"]\n\n
[Country "Uganda"]\n[CapitalCity "Kampala"]\n[President "Museveni"]\n\n
[Country "Tanzania"]\n[CapitalCity "Dodoma"]\n[President "Magufuli"]\n\n
expected output
matches: 3
match 1: Country: Kenya
CapitalCity: Nairobi
President: Kenyatta
match 2: Country: Uganda
CapitalCity: Kampala
President: Museveni
match 3: Country: Tanzania
CapitalCity: Dodoma
President: Magufuli
Example 3: input
[Country "Kenya"]\n[CapitalCity "Nairobi"]\n[President "Kenyatta"]\n[Continent "Africa"]\n\n
[Country "Uganda"]\n[CapitalCity "Kampala"]\n[President "Museveni"]\n[Continent "Africa"]\n\n
[Country "Tanzania"]\n[CapitalCity "Dodoma"]\n[President "Magufuli"]\n[Continent "Africa"]\n\n
expected output
matches: 3
match 1: Country: Kenya
CapitalCity: Nairobi
President: Kenyatta
Continent: Africa
match 2: Country: Uganda
CapitalCity: Kampala
President: Museveni
Continent: Africa
match 3: Country: Tanzania
CapitalCity: Dodoma
President: Magufuli
Continent: Africa
You get the flow
\nin your capture group. correctionn matchesmean here? 2) does your input file literally contains the string\n? 3) added complete expected output to question, it will give better understanding of your question