Python 3 Split single column into multiple columns with no commas

Question

Using Python 3.6, I have the results of a text reader that gives me repeating lines like this:

df

         Col 1
    0    Text A1
    1    Text B1
    2    Text C1
    3    Text D1
    4    Text E1
    5    Text A2
    6    Text B2
    7    Text C2
    8    Text D2
    9    Text E2
   10    Text A3
   11    Text B3
   12    Text C3
   13    Text D3
   14    Text E3

- * Added edit: Some of the above texts are blanks. There are no commas that I can do a str.split() and I'm not sure reshaping is the right way to go. The information repeats every 5 entries and I'm trying to separate them into columns so that it looks like:

     Col1     Col2     Col3     Col4     Col5
0    Text A1  Text B1  Text C1  Text D1  Text E1
1    Text A2  Text B2  Text C2  Text D2  Text E2
2    Text A3  Text B3  Text C3  Text D3  Text E3

What is the pythonic way to reshape or split into 5 columns not relying on punctuation from the text?

DSM · Accepted Answer · 2018-08-23 17:45:53Z

5

TBH, if you know they repeat every 5, I would reshape:

In [36]: pd.DataFrame(df.values.reshape(-1, 5), columns=[f"Col {i}" for i in range(1,6)])
Out[36]: 
     Col 1    Col 2    Col 3    Col 4    Col 5
0  Text A1  Text B1  Text C1  Text D1  Text E1
1  Text A2  Text B2  Text C2  Text D2  Text E2
2  Text A3  Text B3  Text C3  Text D3  Text E3

Personally though I'm wary of missing values, and so I'd probably groupby on some function of the strings, e.g.

pd.concat([v.reset_index(drop=True) 
           for _, v in df.groupby(df["Col 1"].str.rstrip(string.digits))], axis=1)

or something.

answered Aug 23, 2018 at 17:45

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Arthur D. Howland Over a year ago

Yes some of those 'text x' entries are blanks now that you mention it. It may give a pivot a hard time. Never seen the columns=[f"Col {i}... construction before, thank you it works!

BENY · Accepted Answer · 2018-08-23 17:44:08Z

4

By using pd.crosstab

pd.crosstab(index=np.arange(len(df))//5,columns = np.arange(len(df))%5,values=df['Col 1'],aggfunc='sum')
Out[382]: 
col_0         0         1         2         3         4
row_0                                                  
0       Text A1   Text B1   Text C1   Text D1   Text E1
1       Text A2   Text B2   Text C2   Text D2   Text E2
2       Text A3   Text B3   Text C3   Text D3   Text E3

answered Aug 23, 2018 at 17:44

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

rafaelc · Accepted Answer · 2018-08-23 17:47:52Z

4

IIUC, can groupby the arange and just use pd.DataFrame default constructor

n = df.shape[0] // 5
pd.DataFrame(df.groupby(np.repeat(np.arange(n), 5)).col.apply(lambda s: s.tolist()).tolist())


    0       1       2       3       4
0   Text A1 Text B1 Text C1 Text D1 Text E1
1   Text A2 Text B2 Text C2 Text D2 Text E2
2   Text A3 Text B3 Text C3 Text D3 Text E3

edited Aug 23, 2018 at 17:47

answered Aug 23, 2018 at 17:42

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

1 Comment

Arthur D. Howland Over a year ago

This one works too, had to replace ".col.apply" to "my_columnname.apply"

cs95 · Accepted Answer · 2018-08-23 17:55:08Z

4

Group by index mod 5 and concat horizontally. The last step is just fixing the column names with rename.

df = pd.concat([g.reset_index(drop=True) 
         for _, g in df.groupby(df.index % 5)], axis=1)
df.columns = [f'Col {i}' for i in range(1, 6)]

     Col 1    Col 2    Col 3    Col 4    Col 5
0  Text A1  Text B1  Text C1  Text D1  Text E1
1  Text A2  Text B2  Text C2  Text D2  Text E2
2  Text A3  Text B3  Text C3  Text D3  Text E3

edited Aug 23, 2018 at 17:55

answered Aug 23, 2018 at 17:42

cs95

406k106 gold badges744 silver badges797 bronze badges

Comments

Onyambu · Accepted Answer · 2018-08-23 18:26:10Z

1

What if some of the repeat was not actually 5? ie some repeated 4 while others 5 etc?? What if you do not know the number of repeats?: You can just fix the data and then reshape into wide

a = df.assign(ID = df['Col 1'].replace('\\D','',regex=True),
              col = df['Col 1'].replace('.*(.)\\d','col \\1',regex=True))

pd.crosstab(a.ID,a.col,a['Col 1'],aggfunc=lambda x:x) #Using an identity function

col    col A    col B    col C    col D    col E
ID                                              
1    Text A1  Text B1  Text C1  Text D1  Text E1
2    Text A2  Text B2  Text C2  Text D2  Text E2
3    Text A3  Text B3  Text C3  Text D3  Text E3

edited Aug 23, 2018 at 18:26

answered Aug 23, 2018 at 18:14

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Collectives™ on Stack Overflow

Python 3 Split single column into multiple columns with no commas

5 Answers 5

1 Comment

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related