Read text file with a given structure delimitation with Python

Question

I am trying to read a text file containing several fields structured with a given number of characters each. I know that first field takes n1 characters, second field n2 chars, ...

This is what I have so far, for one line:

# Line
line = 'AAABBCCCCDDDDDE'

# Array structure
slice_structure  = [3,2,4,5,1]

sliced_array = []
cursor = 0
for n in slice_structure :
    sliced_array.append(line[cursor:cursor+n])
    cursor += n

print(sliced_array)

The response is the following:

['AAA', 'BB', 'CCCC', 'DDDDD', 'E']

My intention is to create a function with this code and call it for every line of the file. I am sure there must be a better way to do this.

Thanks in advance.

Can you please clarify your example? Are your fieldnames always a repetition of the same character as above, or a string of text which should be delimited by a set number of characters? — s3dev
– s3dev, Commented Oct 15, 2019 at 18:14

Vasilis G. · Accepted Answer · 2019-10-15 17:58:51Z

2

You can use groupby for every line you're reading from that file:

from itertools import groupby

line = 'AAABBCCCCDDDDDE'

result = ["".join(list(g)) for k, g in groupby(line)]

print(result)

Result:

['AAA', 'BB', 'CCCC', 'DDDDD', 'E']

answered Oct 15, 2019 at 17:58

Vasilis G.

7,9074 gold badges23 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

s3dev · Accepted Answer · 2019-10-15 19:44:57Z

1

If your field names are actually text (rather than a repeated character) and you want to split your string by the values in your slice list, here's a simple / readable approach:

# Line
line = 'AAABBCCCCDDDDDE'
# Array structure
slice_structure  = [3,2,4,5,1]
# Results list
result = []

for i in slice_structure:
    result.append(line[:i])
    line = line[i:]

print(result)

Output:

['AAA', 'BB', 'CCCC', 'DDDDD', 'E']

answered Oct 15, 2019 at 19:44

s3dev

9,8713 gold badges34 silver badges49 bronze badges

Comments

CypherX · Accepted Answer · 2019-10-15 20:22:04Z

You could do it using following two methods.

Method-1:
Uses list.insert to place some separators ('|') and then split the string using these separators.

Method-2:
Uses list comprehension.

import numpy as np

# Line
line = 'AAABBCCCCDDDDDE'
# Array structure
slice_structure  = [3,2,4,5,1]
ss = np.array(slice_structure).cumsum()

# Method-1
# >> Uses list.insert to place some separators ('|')
#    and then split the string using these separators.
l = list(line)
for p in np.flip(ss[:-1]):
    l.insert(p,'|')
final_1 = ''.join(l).split('|')
print('Method-1: {}'.format(final_1))

# Method-2
# >> Uses list comprehension
stop_pos = ss.tolist()
start_pos = [0] + ss[:-1].tolist()
final_2 = [line[start:stop] for start, stop in zip(start_pos, stop_pos)]
print('Method-2: {}'.format(final_2))

Output:

Method-1: ['AAA', 'BB', 'CCCC', 'DDDDD', 'E']
Method-2: ['AAA', 'BB', 'CCCC', 'DDDDD', 'E']

stovfl · Accepted Answer · 2019-10-16 11:53:52Z

1

Question: unpack record fields structured with a given number of characters each.

from struct import unpack

record = 'AAABBCCCCDDDDDE'

fields = [item.decode() for item in 
          unpack('3s2s4s5s1s', bytes(record, 'utf-8'))]

print(fields)
>>> ['AAA', 'BB', 'CCCC', 'DDDDD', 'E']

edited Oct 16, 2019 at 11:53

answered Oct 15, 2019 at 18:41

stovfl

15.6k7 gold badges26 silver badges54 bronze badges

Collectives™ on Stack Overflow

Read text file with a given structure delimitation with Python

4 Answers 4

Comments

Output:

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Output:

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related