0

I am new to parsing in python . I want to parse the following type of text

value one = 5

value two = 10

%some text here

value three = 15

%some text

value one = 12

value two = 13

%some text here

value three = 11 .. and this goes on I want to extract .value one. .value two. and .value three. and arrange them in a tabular format for processing. Any ideas on how to do it

I tried the following till now. It gives me error: local value value two referenced before assignment

import re
import pandas as pd
val_dict = { 'value_one':re.compile(r'value one = (?P<value_one>.*)\n'),
           'value_two':re.compile(r'value two = (?P<value_two>.*)\n'),
           'value_three':re.compile(r'value three = (?P<value_three>.*)\n')}

def _parse_line(line):


    for key, val in val_dict.items():
        match = val.search(line)
        if match:
            return key, match
# if there are no matches
    return None, None


def parse_file(filepath):


    data = []  
    with open(filepath, 'r') as file_object:
        line = file_object.readline()
        while line:

            key, match = _parse_line(line)

            if key == 'value_one':
                value_one = match.group('value_one')
                value_one = int(value_one)

            if key == 'value_two':
                value_two = match.group('value_two')
                value_two = int(value_two)

            if key == 'value_three':
                value_three = match.group('value_three')
                value_three = int(value_three)

            row = {
                        'value one': value_one,
                        'value two': value_two,
                        'value three': value_three 
                    }
                # append the dictionary to the data list
            data.append(row)
            line = file_object.readline()


        data = pd.DataFrame(data)

        data.set_index(['value one', 'value two', 'value three'], inplace=True)

        data = data.groupby(level=data.index.names).first()

        data = data.apply(pd.to_numeric, errors='ignore')
        return data

if __name__ == '__main__':
    filepath = 'test3.txt'
    data = parse_file(filepath)
11
  • What was your implementation from your research? Commented May 22, 2019 at 17:58
  • 1
    What have you tried till now? And also whats the expected output ? Commented May 22, 2019 at 17:59
  • Is it a complete example. How are you calling _parse_line and how you are managing the dicts returned by the same Commented May 22, 2019 at 18:09
  • Do your comment lines always start with a percent sign? Commented May 22, 2019 at 18:11
  • apologies first time user, the whole code is pasted above. Commented May 22, 2019 at 18:18

2 Answers 2

1

Your problem comes that on one line, you can only have one of 'value one', 'value two' or 'value_three', so on first line only variable value_one will be defined, but you try to use all three hence the error.

You should only append a row when you have a full sequence. You could try to change your code to:

def parse_file(filepath):
    data = []  
    with open(filepath, 'r') as file_object:
        row = {}                                # prepare an empty row
        for line in file_object:
            key, match = _parse_line(line)
            # search for keys in the line
            if key == 'value_one':
                value_one = match.group('value_one')
                value_one = int(value_one)
                if 'value one' in row:          # we always have a full row
                    data.append(row)            # append it to the data liest
                    row = {}                    # and reset it
                row['value one'] = value_one    # we have a match: store the value in row

            if key == 'value_two':
                value_two = match.group('value_two')
                value_two = int(value_two)
                if 'value two' in row:
                    data.append(row)
                    row = {}
                row['value two'] = value_two

            if key == 'value_three':
                value_three = match.group('value_three')
                value_three = int(value_three)
                if 'value three' in row:
                    data.append(row)
                    row = {}
                row['value three'] = value_three

        if row != {}:                      # do not forget the last row
            data.append(row)
        data = pd.DataFrame(data)
        return data

I have also removed the last part as IMHO it is no longer a matter of parsing a text file to build a dataframe but is just pandas dataframe processing.

Sign up to request clarification or add additional context in comments.

1 Comment

what changes will i have to do probabily if i encounter a different ordering say value one value two value three value three value one value two value three
0

You can try something like this:

import re
import pandas as pd

with open('text.txt') as fd:
    data = fd.read()

val_to_pattern = {
    'value_one': r'value one = (\d+)',
    'value_two': r'value two = (\d+)',
    'value_three': r'value three = (\d+)',
}

val_dict = {}
for key, patt in val_to_pattern.items():
    val_dict[key] = re.findall(patt, data)

df = pd.DataFrame.from_dict(val_dict)
print(df)

The result:

  value_one value_two value_three
0         5        10          15
1        12        13          11

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.