2

Using Python 3.6, I have the results of a text reader that gives me repeating lines like this:

df

         Col 1
    0    Text A1
    1    Text B1
    2    Text C1
    3    Text D1
    4    Text E1
    5    Text A2
    6    Text B2
    7    Text C2
    8    Text D2
    9    Text E2
   10    Text A3
   11    Text B3
   12    Text C3
   13    Text D3
   14    Text E3

- * Added edit: Some of the above texts are blanks. There are no commas that I can do a str.split() and I'm not sure reshaping is the right way to go. The information repeats every 5 entries and I'm trying to separate them into columns so that it looks like:

     Col1     Col2     Col3     Col4     Col5
0    Text A1  Text B1  Text C1  Text D1  Text E1
1    Text A2  Text B2  Text C2  Text D2  Text E2
2    Text A3  Text B3  Text C3  Text D3  Text E3

What is the pythonic way to reshape or split into 5 columns not relying on punctuation from the text?

5 Answers 5

5

TBH, if you know they repeat every 5, I would reshape:

In [36]: pd.DataFrame(df.values.reshape(-1, 5), columns=[f"Col {i}" for i in range(1,6)])
Out[36]: 
     Col 1    Col 2    Col 3    Col 4    Col 5
0  Text A1  Text B1  Text C1  Text D1  Text E1
1  Text A2  Text B2  Text C2  Text D2  Text E2
2  Text A3  Text B3  Text C3  Text D3  Text E3

Personally though I'm wary of missing values, and so I'd probably groupby on some function of the strings, e.g.

pd.concat([v.reset_index(drop=True) 
           for _, v in df.groupby(df["Col 1"].str.rstrip(string.digits))], axis=1)

or something.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes some of those 'text x' entries are blanks now that you mention it. It may give a pivot a hard time. Never seen the columns=[f"Col {i}... construction before, thank you it works!
4

By using pd.crosstab

pd.crosstab(index=np.arange(len(df))//5,columns = np.arange(len(df))%5,values=df['Col 1'],aggfunc='sum')
Out[382]: 
col_0         0         1         2         3         4
row_0                                                  
0       Text A1   Text B1   Text C1   Text D1   Text E1
1       Text A2   Text B2   Text C2   Text D2   Text E2
2       Text A3   Text B3   Text C3   Text D3   Text E3

Comments

4

IIUC, can groupby the arange and just use pd.DataFrame default constructor

n = df.shape[0] // 5
pd.DataFrame(df.groupby(np.repeat(np.arange(n), 5)).col.apply(lambda s: s.tolist()).tolist())


    0       1       2       3       4
0   Text A1 Text B1 Text C1 Text D1 Text E1
1   Text A2 Text B2 Text C2 Text D2 Text E2
2   Text A3 Text B3 Text C3 Text D3 Text E3

1 Comment

This one works too, had to replace ".col.apply" to "my_columnname.apply"
4

Group by index mod 5 and concat horizontally. The last step is just fixing the column names with rename.

df = pd.concat([g.reset_index(drop=True) 
         for _, g in df.groupby(df.index % 5)], axis=1)
df.columns = [f'Col {i}' for i in range(1, 6)]

     Col 1    Col 2    Col 3    Col 4    Col 5
0  Text A1  Text B1  Text C1  Text D1  Text E1
1  Text A2  Text B2  Text C2  Text D2  Text E2
2  Text A3  Text B3  Text C3  Text D3  Text E3

Comments

1

What if some of the repeat was not actually 5? ie some repeated 4 while others 5 etc?? What if you do not know the number of repeats?: You can just fix the data and then reshape into wide

a = df.assign(ID = df['Col 1'].replace('\\D','',regex=True),
              col = df['Col 1'].replace('.*(.)\\d','col \\1',regex=True))

pd.crosstab(a.ID,a.col,a['Col 1'],aggfunc=lambda x:x) #Using an identity function

col    col A    col B    col C    col D    col E
ID                                              
1    Text A1  Text B1  Text C1  Text D1  Text E1
2    Text A2  Text B2  Text C2  Text D2  Text E2
3    Text A3  Text B3  Text C3  Text D3  Text E3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.