Extract Numeric Data from a Text file in Python

Question

Say I have a text file with the data/string:

Dataset #1: X/Y= 5, Z=7 has been calculated
Dataset #2: X/Y= 6, Z=8 has been calculated
Dataset #10: X/Y =7, Z=9 has been calculated

I want the output to be on a csv file as:

X/Y, X/Y, X/Y

Which should display:

5, 6, 7

Here is my current approach, I am using string.find, but I feel like this is rather difficult in solving this problem:

data = open('TestData.txt').read()
#index of string
counter = 1

if (data.find('X/Y=')==1):      
#extracts segment out of string
    line = data[r+6:r+14]
    r = data.find('X/Y=')
    counter += 1 
    print line
else: 
    r = data.find('X/Y')`enter code here`
    line = data[r+6:r+14]
    for x in range(0,counter):
    print line


print counter

Error: For some reason, I'm only getting the value of 5. when I setup a #loop, i get infinite 5's.

Padraic Cunningham · Accepted Answer · 2014-05-30 16:36:36Z

3

If you want the numbers and your txt file is formatted like the first two lines i.e X/Y= 6, not like X/Y =7:

import re
result=[]
with open("TestData.txt") as f:
    for line in f:
        s = re.search(r'(?<=Y=\s)\d+',line) # pattern matches up to "Y" followed by "=" and a space "\s" then a digit or digits. 
        if s: # if there is a match i.e re.search does not return None, add match to the list.
            result.append(s.group())
print result
['5', '6', '7']

To match the pattern in your comment, you should escape the period like . or you will match strings like 1.2+3 etc.. the "." has special meaning re.

So re.search(r'(?<=Counting Numbers =\s)\d\.\d\.\d',s).group() will return only 1.2.3

If it makes it more explicit, you can use s=re.search(r'(?<=X/Y=\s)\d+',line) using the full X/Y=\s pattern.

Using the original line in your comment and updated line would return :

['5', '6', '7', '5', '5']

The (?<=Y=\s)is called a positive lookbehind assertion.

(?<=...)

Matches if the current position in the string is preceded by a match for ... that ends at the current position

There are lots of nice examples here in the re documentation. The items in the parens are not returned.

edited May 30, 2014 at 16:36

answered May 29, 2014 at 0:13

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

user3685687 Over a year ago

So how would this work for a more complex text file? Say that I now have an extra Y in my text file: Dataset #1: Y=2, X/Y= 5, Z=7 has been calculated

Padraic Cunningham Over a year ago

I added the output, what comes before is irrelevant, all we are looking for is Y= 5, Yes does not match our pattern

user3685687 Over a year ago

Sorry, I just updated my comment, I get the arguments now. Thanks a lot for clearing that up with me. Would you be able to observe the updated comment? Say I have Dataset #1: Y= 2, X/Y= 5 has been calculated. But I want to only have the value of X/Y= 5

Padraic Cunningham Over a year ago

@user3685687, again it will work as the pattern has to have Y= 5.

user3685687 Over a year ago

Is there a way to make the Pattern X/Y= ?

|

khampson · Accepted Answer · 2014-05-29 00:24:35Z

1

Since it appears that the entities are all on a single line, I would recommend using readline in a loop to read the file line-by-line and then using a regex to parse out the components you're looking for from that line.

Edit re: OP's comment:

One regex pattern that could be used to capture the number given the specified format in this case would be: X/Y\s*=\s*(.+),

edited May 29, 2014 at 0:24

answered May 28, 2014 at 23:59

khampson

15.5k4 gold badges45 silver badges44 bronze badges

2 Comments

user3685687 Over a year ago

When using regex to parse out the components, what exactly would you use for the regex arguments? r= re.compile("^X/Y=|,$")?

user3685687 Over a year ago

Got it! That helped clear up the question I had in the bottom solution. Thank you!

Collectives™ on Stack Overflow

Extract Numeric Data from a Text file in Python

2 Answers 2

10 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related