1

I have some sort of generic index imported with

f = open(indexfile, "r")

and the resulting object is a _io.TextIOWrapper that looks like this:

GROUP_FIELD_NAME:ID
GROUP_FIELD_VALUE:1 
GROUP_FIELD_NAME:NAME
GROUP_FIELD_VALUE:Joe 
GROUP_OFFSET:0
GROUP_LENGTH:1234
GROUP_FILENAME:/tmp/something1
GROUP_FIELD_NAME:ID
GROUP_FIELD_VALUE:2 
GROUP_FIELD_NAME:NAME
GROUP_FIELD_VALUE:Jenny 
GROUP_OFFSET:1235
GROUP_LENGTH:12
GROUP_FILENAME:/tmp/something2

Where some data fields can be extracted by combining a correspongning _NAME and _VALUE, and some fields just require looking at the name (_OFFSET, _LENGTH, _FILENAME). E.g by looping through each line and populating lists, something like this:

Import pandas as pd

ID = []
NAME = []
GROUP_LENGTH = []
GROUP_OFFSET = []
GROUP_FILENAME = []

for line in file:
    if GROUP_OFFSET then add to list
    if GROUP_FIELD_NAME:ID then add GROUP_FIELD_VALUE from next line


a = {'ID': ID,
     'NAME': NAME,
     'GROUP_LENGTH': GROUP_LENGTH,
     'GROUP_OFFSET': GROUP_OFFSET,
     'GROUP_FILENAME': GROUP_FILENAME     
     }

df = pd.DataFrame.from_dict(a, orient='index')

df = df.transpose()

How can I get to something like this:

ID     NAME    GROUP_LENGTH    GROUP_OFFSET    GROUP_FILENAME
1      Joe     1234            0               /tmp/something1
2      Jenny   12              1235            /tmp/something2
1
  • The file is imported using f = open(indexfile, "r"), and the resulting object is a _io.TextIOWrapper Commented Sep 20, 2019 at 11:11

2 Answers 2

2

Accumulate records with collections.OrderedDict object:

import pandas as pd
from collections import OrderedDict

with open('input.ind') as f:
    records = []
    for line in f:
        name, val = line.strip().split(':')
        if name == 'GROUP_FIELD_NAME':
            if val == 'ID':
                records.append(OrderedDict())
            records[-1][val] = next(f).strip().split(':')[1]
        else:
            records[-1][name] = val

df = pd.DataFrame(records)
print(df)

The expected output:

  ID   NAME GROUP_OFFSET GROUP_LENGTH   GROUP_FILENAME
0  1    Joe            0         1234  /tmp/something1
1  2  Jenny         1235           12  /tmp/something2
Sign up to request clarification or add additional context in comments.

1 Comment

This worked out great! Thank you for a brilliant sollution
0

If you want to obtain directly a Dataframe, I suggest to use the read_csv, with sep parameter setted as :.

Now, you should have a DataFrame with two columns: one with names and other with values.

Then, you can use for example the groupby to group rows and have some operations on grouping. An "official" example

>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Last, with transpose, you can obtain the final Dataframe.

2 Comments

the file is not a csv, but some Generic Indexer (see ibm.com/support/knowledgecenter/en/SSQHWE_9.5.0/…). Can i "force" it to read as a csv?
I think you can try. At least, try to convert in a txt, to export or something

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.