2

Following this question , I was thinking of including one more level of heirarchy to the string. For example this is my string:

sometext
somemore    text here

some  other text

              course: course1

some details
TestName: test1
some other details
Id              Name                marks
____________________________________________________
1               student1            65
2               student2            75
3               MyName              69
4               student4            43

some details
TestName: test3
some other details
Id              Name                marks
____________________________________________________
1               student1            23
3               MyName              63
4               student4            64


              course: course2

some details
TestName: test2
some other details
Id              Name                marks
____________________________________________________
1               student1            84
2               student3            73

some details
TestName: test5
some other details
Id              Name                marks
____________________________________________________
1               MyName              84
2               student2            73


              course: course4

some details
TestName: test1
some other details
Id              Name                marks
____________________________________________________
1               student1            58
2               student3            89

some details
TestName: test2
some other details
Id              Name                marks
____________________________________________________
1               student1            97
3               MyName              60
8               student6            82

and I want to get the details of MyName. An output like (course1,test1,69),(course1,test3,63),(course2,test5,84),(course4,test2,60) or similar output.

I was unable to do it in a single step, and hence came up with this:

import re
eachcourse = re.split(r'course: \w+',string1)
courselist = re.findall(r'course: (\w+)',string1)
li =[]
for i,course in enumerate(courselist):
    match = re.findall(r".*?TestName: (\w+)(?:(?!\TestName\b).)*MyName\s+(\d+).*?",eachcourse[i+1],re.DOTALL)
    li.append((course,match))
print li

which gives me

[('course1', [('test1', '69'), ('test3', '63')]), ('course2', [('test5', '84')]), ('course4', [('test2', '60')])]

Is there a better and cleaner way?

Thanks.

1 Answer 1

1
x=re.findall(r"\bcourse: (\w+)(.*?)(?=(?:\bcourse:|$))",x,flags=re.DOTALL)


print [[i[0]]+re.findall(r"TestName: (\w+)(?:(?!\bTestName\b).)*MyName\s*(\d+)",i[1],flags=re.DOTALL) for i in x]

You can try this.Though the format is not exactly same ,it is usable.

Sign up to request clarification or add additional context in comments.

5 Comments

Wonderful!! Thanks a lot! Just, one more doubt. Is this way preferable when Iam using very large strings? say 25 pages long data? I notice that the time it takes to return a result depends on the length of string ofcourse, and also the number of occurences of MyName in the string. It varies from 0.05 secs to 50 secs based on the number of occurences of MyName For Eg. 18 occurences in a 25 page string takes 0.05 secs and 1 occurence takes 50.2 secs . Just need an advice on whether this is the best possible way ?
@Deepa this should work but regex generally does not give good performance.Best method could to parse through csv or some other parser :)
Oh ok thanks! just one more clarification please. Supposing I need to retrieve details of say two students, then I need to repeat this for the second name right?
@Deepa yeah right..you can store name in variable and make regex on the fly
Iam really greatful for all your help. Thanks a lot! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.