4

I would like to split each row into new columns based on several indices:

6ABCDE0218594STRING

to

6 ABCDE 021 8594 STRING

This seems like it'd have been asked at least once before, but I keep finding only variations on the question (separating by a delimiter as in pandas: How do I split text in a column into multiple rows?, separating into new rows using rather than new columns, again with a delimiter: Split pandas dataframe string entry to separate rows).

I apologize in advance if this is a duplicate!

1
  • You can select parts of a string with simply use 'df.new=df.originallongstring.str()[1:5]' (untested) Commented Aug 11, 2014 at 20:59

3 Answers 3

4

One way is to use a regex and str.extract to pull out the columns:

In [11]: df = pd.DataFrame([['6ABCDE0218594STRING']])

You could just do it with index, so something like this:

In [12]: df[0].str.extract('(.)(.{5})(.{3})(.{4})(.*)')
Out[12]:
   0      1    2     3       4
0  6  ABCDE  021  8594  STRING

Or you could be a bit more cautious and ensure each column is the correct form:

In [13]: df[0].str.extract('(\d)(.{5})(\d{3})(\d{4})(.*)')
Out[13]:
   0      1    2     3       4
0  6  ABCDE  021  8594  STRING

Note: You can also use named groups (see the docs).

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! I'm working on this now -- had a bit of a hiccup because I was using pandas 0.12.
@Andy Hayden is there a way I can replicate this to a large set on indices on to a data frame?
Interestingly, this seems to convert strings consisting of only spaces into NaNs. Is there a way to avoid that, or should I just convert NaNs back to spaces?
0

Try this:

string = '6ABCDE0218594STRING'
indices = [1,5,3,4]
myList = []

for index in indices:
    token, string = string[:index],string[index:]
    myList.append(token)

myList.append(string)

>>> Output: ['6', 'ABCDE', '021', '8594', 'STRING']

Comments

0

Or in case you don't know the number of digits, letters etc.:

import re

m = re.match('(\d*)([A-Z]*)(\d*)([A-Z]*)', '6ABCDE0218594STRING').groups()
print m[0], m[1], m[2], m[3]

Output:

6 ABCDE 0218594 STRING

2 Comments

Yeah, regex is great! However, one issue with this particular application is that I will often need to break up a series of integers too (e.g. 0123456 -> 012 34 56).
Oh, I overlooked that one number is split up. Then it is, of course, not working as required. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.