2

i need to match a pattern from a string. The string is variable so i need to develop some amount of variability in it .
What i need to do is extract words occurring with "layout" and they occur in 4 different manners

1 word -- layout` eg: hsr layout

2words -- layout eg: golden garden layout

digit-word -- layout eg: 19th layout

digit-word word --layout eg:- 20th garden layout

It can be seen that i need the digits field to be optional. a single regex must do it. here's what i did:

import re
p = re.compile(r'(?:\d*)?\w+\s(?:\d*)?\w+l[ayout]*')
text = "opp when the 19th hsr layut towards"
q = re.findall(p,text)

i need 19th hsr layout in this expression. but the above code returns none. What is the problem with my code above?

Some string examples are:

str1 = " 25/4 16th june road ,watertank layout ,blr"  #extract watertank layout 
str2 = " jacob circle 16th rusthumbagh layout , 5th cross" #extract 16th rustumbagh layout
str3 = " oberoi splendor garden blossoms layout , 5th main road"  #extract garden blossoms layout
str4 = " belvedia heights , 15th layout near Jaffrey gym" #extract 15th layout
8
  • What exactly to you mean by 'connected with "layout"'? Commented Mar 5, 2014 at 5:08
  • words that occur with the word layout.. 4 conditions given above.. Commented Mar 5, 2014 at 5:08
  • Your example text says "layot" not "layout" Commented Mar 5, 2014 at 5:08
  • well ,yes.. i accounted for it using l[ayout]* for any mistakes in text.. Commented Mar 5, 2014 at 5:09
  • Do you want this? r'(?:\w+\s+){1,2}layout' Commented Mar 5, 2014 at 5:10

2 Answers 2

2

Use r'(?:\w+\s+){1,2}layout' as I commented:

>>> import re
>>> p = re.compile(r'(?:\w+\s+){1,2}layout')
>>> p.findall(" 25/4 16th june road ,watertank layout ,blr")
['watertank layout']
>>> p.findall(" jacob circle 16th rusthumbagh layout , 5th cross")
['16th rusthumbagh layout']
>>> p.findall(" oberoi splendor garden blossoms layout , 5th main road")
['garden blossoms layout']
>>> p.findall(" belvedia heights , 15th layout near Jaffrey gym")
['15th layout']

{1,2} is used to match at most 2 words.

Sign up to request clarification or add additional context in comments.

1 Comment

@Sword, \w matches word character. Word characters include alphabets, digits and _.
1

This seems to work -

import re

l = [" 25/4 16th june road ,watertank layout ,blr",
" jacob circle, 16th rusthumbagh layout , 5th cross",
" oberoi splendor , garden blossoms layout , 5th main road",
" belvedia heights , 15th layout near Jaffrey gym",]

for ll in l:
    print re.search(r'\,([\w\s]+)layout', ll).groups()

Output:

('watertank ',)
(' 16th rusthumbagh ',)
(' garden blossoms ',)
(' 15th ',)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.