1

I'm writing a python function to take a chunk of text, parsed from a text file using f.readlines and split this chunk of text into a list. The text contains dividers and I want to split this text specifically at these locations. Below is an example of the text file in question.

@model:2.4.0=Skeleton "Skeleton"
@compartments
 Cell=1.0 "Cell"
@species
 Cell:[A]=100.0 "A"
 Cell:[B]=1.0 "B"
 Cell:[C]=0.0 "C"
 Cell:[D]=0.0 "D"
@parameters
kcat=4000
km = 146
v2_k = 88
@reactions
@r=v1 "v1"
 A -> C : B
 Cell * kcat * B * A / (km + A) 
@r=v2 "v2"
 C -> C+D
 Cell * v2_k * C

My desired output is to have a python dictionary that has the name of the dividers as keys and all the content between that divider and the next as values. For example, the first element of the sections dictionary should be:

sections['@model']=:2.4.0=Skeleton "Skeleton"

Current Code

def split_sections(SBshorthand_file):
    '''
    Takes a SBshorthand file and returns a dictionary of each of the sections. 
    Keys of the dictionary are the dividers.
    Values of dictionary are the content between dividers. 
    '''
    SBfile=parse_SBshorthand_read(SBshorthand_file) #simple parsing function. uses f.read()
    dividers=["@model", "@units", "@compartments", "@species", "@parameters", "@rules", "@reactions", "@events"]
    sections={}
    for i in  dividers:
        pattern=re.compile(i)
        if re.findall(pattern,SBfile) == []:
            pass
#            print 'Section \'{}\' not present in {}'.format(i,SBshorthand_file)
        else:
            SBfile2=re.sub(pattern,'\n'+i,SBfile)
            print SBfile2

This however does not do what I want. Would anybody have any ideas how to fix my code? Thanks

-----------------Edit--------------------

Please note that the section '@reactions' contains a number of 'reactions' all of which start with @r, but they all need to be grouped under the reactions key.

2 Answers 2

1
import re

x="""@model:2.4.0=Skeleton "Skeleton"
@compartments
Cell=1.0 "Cell"
@species
Cell:[A]=100.0 "A"
Cell:[B]=1.0 "B"
Cell:[C]=0.0 "C"
Cell:[D]=0.0 "D"
@parameters
kcat=4000
km = 146
v2_k = 88
@reactions
@r=v1 "v1"
A -> C : B
Cell * kcat * B * A / (km + A)
@r=v2 "v2"
C -> C+D
Cell * v2_k * C"""


print dict(re.findall(r"(?:^|(?<=\n))(@\w+)([\s\S]*?)(?=\n@(?!r\b)\w+|$)",x))

You can directly use re.findall and get what you want.

Sign up to request clarification or add additional context in comments.

1 Comment

Sorry, I accidentally edited your post instead of the question and it wont let me change it back. Your code works but would you happen to know to exclude the '@r' tag (as I've explained in the edit). Thanks
1

You can use capture groups as follows:

re.findall(r"(?s)(@.*?)[\s:]\s+(.*?)(?=[@$])");

demo

where capture group1 matches the key
capture group2 matches the value

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.