0

The problem is data extraction from a bunch of junk in a text file. For example, first, I need to pull out this particular section from the text file:

%T 525 1:0.00:6425.12 2:0.01:6231.12 3:0.00:3234.51 and goes on for quite long.

Then, I need to specifically pull out the 3rd data from each phrase, that is 6425.12, 6231.12 and 3234.51 and write it to a new text file and then do some other editing on this data.

I was looking into using regular expression for this case. Can anyone show sample code? Should be quite straight forward for an experience programmer.

3
  • How is the string you've posted different from all the other strings in the file? That is necessary to be able to craft a regex sufficient for picking out that string instead of the next string which may (or may not) look like %T 526 1:0.00:... Commented May 9, 2012 at 15:16
  • Yes, consider using regular expressions. en.wikibooks.org/wiki/Python_Programming/Regular_Expression Commented May 9, 2012 at 15:20
  • Ok sorry guys. So far I have been doing a lot of testing and Googling. I managed to pull out that particular section from other junks using startswith and write it into a new textfile. Now the problem is what function to use in Python for such specific data extraction on all the 3rd data in each phrase (6425.12, 6231.12,3234.51,...). I do not have the entire text file with me now. It is in another computer. I can post it tomorrow. But basically, I need help to pull every third data out from each phrase. Commented May 9, 2012 at 15:25

2 Answers 2

2

You don't need re to get the numbers...

s='%T 525 1:0.00:6425.12 2:0.01:6231.12 3:0.00:3234.51'
columns=s.split()[2:]  #Create a list of all the columns except the first 2.
numbers=[c.split(':')[-1] for c in columns]  #Split each column on ':' and take the last piece.

However, we need a little more information about the structure of the file before we can determine how to pick out the string s in the first place.

Sign up to request clarification or add additional context in comments.

3 Comments

Elegant and simple. I was complicating matters in my head while the trick was using 2 split and the [] to capture the elements in a string. Enlightening ! I wish I can vote up but I can't. New here. Thanks to all !
I captured the %T 525 1:0.00:6425.12 2:0.01:6231.12 3:0.00:3234.51 and so on section using startswith and write this section into a new textfile. Convert the new text file into string??????? Other ways to capture this section? Ok I will try provide more info on the original textfile
@MelvinAng I'm sorry, I don't understand what your last comment is asking. if startswith is good enough, use it -- I doubt you'd get any performance gain from re. As far as writing the numbers into a new text file, you can use the join method to convert my list "numbers" into a string. e.g. ','.join(numbers) will create a string with ',' between each of the numbers.
2

I don't think I'd resort to regex for this, looks pretty simple.

with open(...) as file:
    for line in file:
        for word in line.split():
             if ':' in word:
                  print word.split(':')[2]  # do something with it here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.