0

I'm quite new to Python and generally used to Java. I'm currently trying to parse a text file outputted by Praat that is always in the same format and looks generally like this, with a few more features:

-- Voice report for 53. Sound T1_1001501_vowels --
Date: Tue Aug  7 12:15:41 2018

Time range of SELECTION
    From 0 to 0.696562 seconds (duration: 0.696562 seconds)
Pitch:
   Median pitch: 212.598 Hz
   Mean pitch: 211.571 Hz
   Standard deviation: 23.891 Hz
   Minimum pitch: 171.685 Hz
   Maximum pitch: 265.678 Hz
Pulses:
   Number of pulses: 126
   Number of periods: 113
   Mean period: 4.751119E-3 seconds
   Standard deviation of period: 0.539182E-3 seconds
Voicing:
   Fraction of locally unvoiced frames: 5.970%   (12 / 201)
   Number of voice breaks: 1
   Degree of voice breaks: 2.692%   (0.018751 seconds / 0.696562 seconds)

I would like to output something that looks like this:

0.696562,212.598,211.571,23.891,171.685,265.678,126,113,4.751119E-3,0.539182E-3,5.970,1,2.692

So essentially I want to print out a string of just the numbers between the colon and its following whitespace from each line, separated by commas. I know this might be a stupid question but I just can't figure it out in Python; any help would be much appreciated!

1
  • can you include what have you tried so far? Commented Aug 7, 2018 at 19:17

3 Answers 3

1

Okay here is something simple, that you need to tweak a little to work for you.

import re
with open("file.txt", "r") as f:
  lines = [s.strip() for s in f.readlines()]
  numbers_list = []
  for _ in lines : 
    numbers_list.append(re.findall(r'\d+', _))
  print(numbers_list)

where file.txt is your file.

Sign up to request clarification or add additional context in comments.

2 Comments

I think the problem with stripping is that in some cases there are numbers in the string that I don't want included, for example I don't want the numbers between parentheses under voicing, but I do want the "E" in the scientific notation of some numbers to be included. This is why I've been trying to specifically get the substring between ":" and the whitespace that follows it.
You can do another regex before the one that checks for the integers or an if statement to check weather the ":" is in the string for example. as i said this will get you all the numbers, and then it's up to you how you wanna tweak it
1

Maybe:

for line in text.splitlines():
         line=line.strip()
         head,sepa,tail=line.partition(":")
         if sepa:
             parts=tail.split(maxsplit=1)
             if parts and all( ch.isdigit() or ch in ".eE%-+" for ch in parts[0]):
                 num=parts[0].replace("%"," ")
                 try:
                     print(float(num.strip()))
                 except ValueError:
                     print("invalid number:",num)

Out:

0.696562
212.598
211.571
23.891
171.685
265.678
126.0
113.0
0.004751119
0.000539182
5.97
1.0
2.692

1 Comment

@ling-analysis There are two constructs to learn for a python newcomer: 1) generator comprehension that I have used in all(); 2) the try...except clause. (I must change the 'except' to "except ValueError")
0

Thank you for the help everyone! I actually came up with this solution:

import csv

input = 't2_5.txt'
input_name = input[:-4]

def parse(filepath):
data = []
with open(filepath, 'r') as file:
    file.readline()
    file.readline()
    file.readline()
    for line in file:
        if line[0] == ' ':
            start = line.find(':') + 2
            end = line.find(' ', start)
            if line[end - 1] == '%':
                end -= 1
            number = line[start:end]
            data.append(number)
with open(input_name + '_output.csv', 'wb') as csvfile:
    wr = csv.writer(csvfile)
    wr.writerow(data)

parse(input)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.