1

I have a series that looks like this:

01 1ABCD     E    1   4.011   3.952   7.456 -0.3096  1.0132  0.2794

02 1ABCD     F    2   4.088   3.920   7.517  0.3839 -0.5482 -1.3874

...

I want to split it into 10 columns based on the length: the first 4 characters including spaces = column 1, the seconds 5 characters = column 2, ..., the last 8 characters = column10

The result should be something like this:

column1 column2 column3 .... column10
01 1 ABCD E ..... 0.2794
02 1 ABCD F .... -1.3874

How can I do this in python?

Thanks

Mehrnoosh

1 Answer 1

3

An elegant solution is to:

  • Start with a list of sizes (how many chars should be in each "segment").
  • Create a (compiled) Regex pattern with named capturing groups, each capturing a stated number of chars.
  • Use str.extract to extract the required substrings from your Series. Group names will be used as names of output columns.

Assuming that s is the source Series, the code to do it is:

import re

# Define size of each group
sizes = [4, 4, 6, 5, 8, 8, 8, 8, 8, 8]
# Generate the pattern string and compile it
pat = re.compile(''.join([ f'(?P<Column{idx}>.{{{n}}})'
    for idx, n in enumerate(sizes, start=1) ]))
# Generate the result
result = s.str.extract(pat)

The result is:

  Column1 Column2 Column3 Column4   Column5   Column6   Column7   Column8  Column9  Column10
0    01 1    ABCD       E       1     4.011     3.952     7.456   -0.3096   1.0132    0.2794 
1    02 1    ABCD       F       2     4.088     3.920     7.517    0.3839  -0.5482   -1.3874 

But note that all columns in result are of object type (actually they are strings). So to perform any sensible processing of them, you should probably:

  • strip spaces from each column (both leading and trailing),
  • convert some columns to either int or float.
Sign up to request clarification or add additional context in comments.

2 Comments

I also thought about read_fwf (e.g.) but the OP stated that he already has a Series with these strings.
ya, then ok. Maybe like alternative if Series is from csv shoul dbe used pd.read_fwf, +1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.