Parsing text file to data frame in python

Question

I am new to parsing in python . I want to parse the following type of text

value one = 5

value two = 10

%some text here

value three = 15

%some text

value one = 12

value two = 13

%some text here

value three = 11 .. and this goes on I want to extract .value one. .value two. and .value three. and arrange them in a tabular format for processing. Any ideas on how to do it

I tried the following till now. It gives me error: local value value two referenced before assignment

import re
import pandas as pd
val_dict = { 'value_one':re.compile(r'value one = (?P<value_one>.*)\n'),
           'value_two':re.compile(r'value two = (?P<value_two>.*)\n'),
           'value_three':re.compile(r'value three = (?P<value_three>.*)\n')}

def _parse_line(line):


    for key, val in val_dict.items():
        match = val.search(line)
        if match:
            return key, match
# if there are no matches
    return None, None


def parse_file(filepath):


    data = []  
    with open(filepath, 'r') as file_object:
        line = file_object.readline()
        while line:

            key, match = _parse_line(line)

            if key == 'value_one':
                value_one = match.group('value_one')
                value_one = int(value_one)

            if key == 'value_two':
                value_two = match.group('value_two')
                value_two = int(value_two)

            if key == 'value_three':
                value_three = match.group('value_three')
                value_three = int(value_three)

            row = {
                        'value one': value_one,
                        'value two': value_two,
                        'value three': value_three 
                    }
                # append the dictionary to the data list
            data.append(row)
            line = file_object.readline()


        data = pd.DataFrame(data)

        data.set_index(['value one', 'value two', 'value three'], inplace=True)

        data = data.groupby(level=data.index.names).first()

        data = data.apply(pd.to_numeric, errors='ignore')
        return data

if __name__ == '__main__':
    filepath = 'test3.txt'
    data = parse_file(filepath)

What have you tried till now? And also whats the expected output ? — Santosh Karki
– Santosh Karki, Commented May 22, 2019 at 17:59
Is it a complete example. How are you calling _parse_line and how you are managing the dicts returned by the same — mad_
– mad_, Commented May 22, 2019 at 18:09

Serge Ballesta · Accepted Answer · 2019-05-22 20:39:51Z

Your problem comes that on one line, you can only have one of 'value one', 'value two' or 'value_three', so on first line only variable value_one will be defined, but you try to use all three hence the error.

You should only append a row when you have a full sequence. You could try to change your code to:

def parse_file(filepath):
    data = []  
    with open(filepath, 'r') as file_object:
        row = {}                                # prepare an empty row
        for line in file_object:
            key, match = _parse_line(line)
            # search for keys in the line
            if key == 'value_one':
                value_one = match.group('value_one')
                value_one = int(value_one)
                if 'value one' in row:          # we always have a full row
                    data.append(row)            # append it to the data liest
                    row = {}                    # and reset it
                row['value one'] = value_one    # we have a match: store the value in row

            if key == 'value_two':
                value_two = match.group('value_two')
                value_two = int(value_two)
                if 'value two' in row:
                    data.append(row)
                    row = {}
                row['value two'] = value_two

            if key == 'value_three':
                value_three = match.group('value_three')
                value_three = int(value_three)
                if 'value three' in row:
                    data.append(row)
                    row = {}
                row['value three'] = value_three

        if row != {}:                      # do not forget the last row
            data.append(row)
        data = pd.DataFrame(data)
        return data

I have also removed the last part as IMHO it is no longer a matter of parsing a text file to build a dataframe but is just pandas dataframe processing.

what changes will i have to do probabily if i encounter a different ordering say value one value two value three value three value one value two value three

Perplexabot · Accepted Answer · 2019-05-22 21:50:22Z

0

You can try something like this:

import re
import pandas as pd

with open('text.txt') as fd:
    data = fd.read()

val_to_pattern = {
    'value_one': r'value one = (\d+)',
    'value_two': r'value two = (\d+)',
    'value_three': r'value three = (\d+)',
}

val_dict = {}
for key, patt in val_to_pattern.items():
    val_dict[key] = re.findall(patt, data)

df = pd.DataFrame.from_dict(val_dict)
print(df)

The result:

  value_one value_two value_three
0         5        10          15
1        12        13          11

edited May 22, 2019 at 21:50

answered May 22, 2019 at 21:21

Perplexabot

2,0094 gold badges20 silver badges22 bronze badges

Collectives™ on Stack Overflow

Parsing text file to data frame in python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related