Python Data Extraction from Text File

Question

The problem is data extraction from a bunch of junk in a text file. For example, first, I need to pull out this particular section from the text file:

%T 525 1:0.00:6425.12 2:0.01:6231.12 3:0.00:3234.51 and goes on for quite long.

Then, I need to specifically pull out the 3rd data from each phrase, that is 6425.12, 6231.12 and 3234.51 and write it to a new text file and then do some other editing on this data.

I was looking into using regular expression for this case. Can anyone show sample code? Should be quite straight forward for an experience programmer.

How is the string you've posted different from all the other strings in the file? That is necessary to be able to craft a regex sufficient for picking out that string instead of the next string which may (or may not) look like %T 526 1:0.00:... — mgilson
– mgilson, Commented May 9, 2012 at 15:16
Yes, consider using regular expressions. en.wikibooks.org/wiki/Python_Programming/Regular_Expression — Has QUIT--Anony-Mousse
– Has QUIT--Anony-Mousse, Commented May 9, 2012 at 15:20
Ok sorry guys. So far I have been doing a lot of testing and Googling. I managed to pull out that particular section from other junks using startswith and write it into a new textfile. Now the problem is what function to use in Python for such specific data extraction on all the 3rd data in each phrase (6425.12, 6231.12,3234.51,...). I do not have the entire text file with me now. It is in another computer. I can post it tomorrow. But basically, I need help to pull every third data out from each phrase. — Poker Prof
– Poker Prof, Commented May 9, 2012 at 15:25

mgilson · Accepted Answer · 2012-05-09 15:20:20Z

2

You don't need re to get the numbers...

s='%T 525 1:0.00:6425.12 2:0.01:6231.12 3:0.00:3234.51'
columns=s.split()[2:]  #Create a list of all the columns except the first 2.
numbers=[c.split(':')[-1] for c in columns]  #Split each column on ':' and take the last piece.

However, we need a little more information about the structure of the file before we can determine how to pick out the string s in the first place.

answered May 9, 2012 at 15:20

mgilson

312k70 gold badges656 silver badges722 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Poker Prof Over a year ago

Elegant and simple. I was complicating matters in my head while the trick was using 2 split and the [] to capture the elements in a string. Enlightening ! I wish I can vote up but I can't. New here. Thanks to all !

Poker Prof Over a year ago

I captured the %T 525 1:0.00:6425.12 2:0.01:6231.12 3:0.00:3234.51 and so on section using startswith and write this section into a new textfile. Convert the new text file into string??????? Other ways to capture this section? Ok I will try provide more info on the original textfile

mgilson Over a year ago

@MelvinAng I'm sorry, I don't understand what your last comment is asking. if startswith is good enough, use it -- I doubt you'd get any performance gain from re. As far as writing the numbers into a new text file, you can use the join method to convert my list "numbers" into a string. e.g. ','.join(numbers) will create a string with ',' between each of the numbers.

John Gaines Jr. · Accepted Answer · 2012-05-09 15:20:54Z

2

I don't think I'd resort to regex for this, looks pretty simple.

with open(...) as file:
    for line in file:
        for word in line.split():
             if ':' in word:
                  print word.split(':')[2]  # do something with it here

answered May 9, 2012 at 15:20

John Gaines Jr.

11.6k1 gold badge28 silver badges25 bronze badges

Collectives™ on Stack Overflow

Python Data Extraction from Text File

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related