1

I am trying to read a text file containing several fields structured with a given number of characters each. I know that first field takes n1 characters, second field n2 chars, ...

This is what I have so far, for one line:

# Line
line = 'AAABBCCCCDDDDDE'

# Array structure
slice_structure  = [3,2,4,5,1]

sliced_array = []
cursor = 0
for n in slice_structure :
    sliced_array.append(line[cursor:cursor+n])
    cursor += n

print(sliced_array)

The response is the following:

['AAA', 'BB', 'CCCC', 'DDDDD', 'E']

My intention is to create a function with this code and call it for every line of the file. I am sure there must be a better way to do this.

Thanks in advance.

1
  • 1
    Can you please clarify your example? Are your fieldnames always a repetition of the same character as above, or a string of text which should be delimited by a set number of characters? Commented Oct 15, 2019 at 18:14

4 Answers 4

2

You can use groupby for every line you're reading from that file:

from itertools import groupby

line = 'AAABBCCCCDDDDDE'

result = ["".join(list(g)) for k, g in groupby(line)]

print(result)

Result:

['AAA', 'BB', 'CCCC', 'DDDDD', 'E']
Sign up to request clarification or add additional context in comments.

Comments

1

If your field names are actually text (rather than a repeated character) and you want to split your string by the values in your slice list, here's a simple / readable approach:

# Line
line = 'AAABBCCCCDDDDDE'
# Array structure
slice_structure  = [3,2,4,5,1]
# Results list
result = []

for i in slice_structure:
    result.append(line[:i])
    line = line[i:]

print(result)

Output:

['AAA', 'BB', 'CCCC', 'DDDDD', 'E']

Comments

1

You could do it using following two methods.

Method-1:
Uses list.insert to place some separators ('|') and then split the string using these separators.

Method-2:
Uses list comprehension.

import numpy as np

# Line
line = 'AAABBCCCCDDDDDE'
# Array structure
slice_structure  = [3,2,4,5,1]
ss = np.array(slice_structure).cumsum()

# Method-1
# >> Uses list.insert to place some separators ('|')
#    and then split the string using these separators.
l = list(line)
for p in np.flip(ss[:-1]):
    l.insert(p,'|')
final_1 = ''.join(l).split('|')
print('Method-1: {}'.format(final_1))

# Method-2
# >> Uses list comprehension
stop_pos = ss.tolist()
start_pos = [0] + ss[:-1].tolist()
final_2 = [line[start:stop] for start, stop in zip(start_pos, stop_pos)]
print('Method-2: {}'.format(final_2))

Output:

Method-1: ['AAA', 'BB', 'CCCC', 'DDDDD', 'E']
Method-2: ['AAA', 'BB', 'CCCC', 'DDDDD', 'E']

Comments

1

Question: unpack record fields structured with a given number of characters each.

from struct import unpack

record = 'AAABBCCCCDDDDDE'

fields = [item.decode() for item in 
          unpack('3s2s4s5s1s', bytes(record, 'utf-8'))]

print(fields)
>>> ['AAA', 'BB', 'CCCC', 'DDDDD', 'E']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.