python - Regular expression to extract certain text data from a file

Question

I have a text file which got connverted from pdf to text data . From the text data ,i would like to extract descriptions present followed by string "FIGURE" . Below is some sample lines of text data ,

FIGURE 1-1. An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).

Derendorf5e_CH01.indd 4Derendorf5e_CH01.indd 4 5/25/19 11:07 PM5/25/19 11:07 PM

CHAPTER 1 • Therapeutic Relevance 5

Another way of looking at these two subdisciplines is that pharmacokinetics deals with what the body does to the drug (absorption, distribution, metabolism, excretion), whereas pharmacodynamics describes what the drug does to the body (both desired and undesired effects). From this definition, one could wrongly conclude that these are opposite disci- plines, whereas in reality, they go hand-in-hand. Figure 1-3 shows that pharmacokinetics deals with concentration–time relationships, whereas pharmacodynamics describes the relationship between drug concentration and both good (desired) and bad (adverse) effects. Each of these two puzzle pieces by itself is insufficient to guide therapy and optimize dosing; only when pharmacokinetics and pharmacodynamics are linked (PK/PD) and integrated do they become therapeutically useful. This integration is commonly achieved by developing mathematical models (PK/PD models) that capture the observed relationships and allow prediction and identification of optimum dosing regimens.

FIGURE 1-2. A rational approach to the design of a dosage regimen. The pharmacokinetics and pharmacodynam- ics of the drug are first defined. Then, responses to the drug, coupled with pharmacokinetic information, are used as feedback ( dashed lines ) to modify the dosage regimen to achieve optimal ther- apy. For some drugs, active metabolites formed in the body may also need to be taken into account.

I have read pdf file into text and tried applying re.search on the text data with some regex combinations. But no luck .

# Get files text content
text = file_data['content']
#print(text)
text1 = re.search('FIGURE[ ]*[0-9]-[0-9]. (.*)',text,re.MULTILINE)

G1Rao · Accepted Answer · 2019-08-07 10:27:58Z

1

text1 = re.findall('FIGURE\s*[0-9]+-[0-9]+. (.*)',text,re.MULTILINE)
>>> import re
>>> t="""FIGURE 1-1. An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).
...
... Derendorf5e_CH01.indd 4Derendorf5e_CH01.indd 4 5/25/19 11:07 PM5/25/19 11:07 PM
...
... CHAPTER 1 • Therapeutic Relevance 5
...
... Another way of looking at these two subdisciplines is that pharmacokinetics deals with what the body does to the drug (absorption, distribution, metabolism, excretion), whereas pharmacodynamics describes what the drug does to the body (both desired and undesired effects). From this definition, one could wrongly conclude that these are opposite disci- plines, whereas in reality, they go hand-in-hand. Figure 1-3 shows that pharmacokinetics deals with concentration–time relationships, whereas pharmacodynamics describes the relationship between drug concentration and both good (desired) and bad (adverse) effects. Each of these two puzzle pieces by itself is insufficient to guide therapy and optimize dosing; only when pharmacokinetics and pharmacodynamics are linked (PK/PD) and integrated do they become therapeutically useful. This integration is commonly achieved by developing mathematical models (PK/PD models) that capture the observed relationships and allow prediction and identification of optimum dosing regimens.
...
... FIGURE 1-2. A rational approach to the design of a dosage regimen. The pharmacokinetics and pharmacodynam- ics of the drug are first defined. Then, responses to the drug, coupled with pharmacokinetic information, are used as feedback ( dashed lines ) to modify the dosage regimen to achieve optimal ther- apy. For some drugs, active metabolites formed in the body may also need to be taken into account."""
>>> re.findall('FIGURE\s*[0-9]-[0-9]. (.*)',t,re.MULTILINE)
['An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).', 'A rational approach to the design of a dosage regimen. The pharmacokinetics and pharmacodynam- ics of the drug are first defined. Then, responses to the drug, coupled with pharmacokinetic information, are used as feedback ( dashed lines ) to modify the dosage regimen to achieve optimal ther- apy. For some drugs, active metabolites formed in the body may also need to be taken into account.']`

edited Aug 7, 2019 at 10:27

answered Aug 7, 2019 at 10:04

G1Rao

4806 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Rashmi Over a year ago

its giving output of first line description of each FIGURE . I want complete text description .

G1Rao Over a year ago

What do you mean complete text description ? Can you show me an expected ouput?

Rashmi Over a year ago

I want ouput like below , An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).

G1Rao Over a year ago

I am not sure in which format your data is, if your data is in a multiline as specified in above interpreter the code will work.

Collectives™ on Stack Overflow

python - Regular expression to extract certain text data from a file

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related