2

I am struggling with the following issue.

My DF is:

df = pd.DataFrame(
[
    ['7890-1', '12345N', 'John', 'Intermediate'],
    ['7890-4', '30909N', 'Greg', 'Intermediate'], 
    ['3300-1', '88117N', 'Mark', 'Advanced'],
    ['2502-2', '90288N', 'Olivia', 'Elementary'],
    ['7890-2', '22345N', 'Joe', 'Intermediate'],
    ['7890-3', '72245N', 'Ana', 'Elementary']
],
columns=['Id', 'Code', 'Person', 'Level'])
print(df)

I would like to get such a result:

Id Code 1 Person 1 Level 1 Code 2 Person 2 Level 2 Code 3 Person 3 Level 3 Code 4 Person 4 Level 4
0 7890 12345N John Intermediate 22345N Joe Intermediate 72245N Ana Elementary 30909N Greg Intermediate
1 3300 88117N Mark Advanced NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2502 NaN NaN NaN 90288N Olivia Elementary NaN NaN NaN NaN NaN NaN

2 Answers 2

1

I'd start with the same approach as @Andrej Kesely but then sort by index after unstacking and map over the column names with ' '.join.

df[["Id", "No"]] = df["Id"].str.split("-", expand=True)
df_wide = df.set_index(["Id", "No"]).unstack(level=1).sort_index(axis=1,level=1)
df_wide.columns = df_wide.columns.map(' '.join)

Output

      Code 1       Level 1 Person 1  Code 2       Level 2 Person 2  Code 3  \
Id                                                                           
2502     NaN           NaN      NaN  90288N    Elementary   Olivia     NaN   
3300  88117N      Advanced     Mark     NaN           NaN      NaN     NaN   
7890  12345N  Intermediate     John  22345N  Intermediate      Joe  72245N   

         Level 3 Person 3  Code 4       Level 4 Person 4  
Id                                                        
2502         NaN      NaN     NaN           NaN      NaN  
3300         NaN      NaN     NaN           NaN      NaN  
7890  Elementary      Ana  30909N  Intermediate     Greg  
Sign up to request clarification or add additional context in comments.

Comments

0

Try:

df[["Id", "Id2"]] = df["Id"].str.split("-", expand=True)
x = df.set_index(["Id", "Id2"]).unstack(level=1)
x.columns = [f"{a} {b}" for a, b in x.columns]
print(
    x[sorted(x.columns, key=lambda k: int(k.split()[-1]))]
    .reset_index()
    .to_markdown()
)

Prints:

Id Code 1 Person 1 Level 1 Code 2 Person 2 Level 2 Code 3 Person 3 Level 3 Code 4 Person 4 Level 4
0 2502 nan nan nan 90288N Olivia Elementary nan nan nan nan nan nan
1 3300 88117N Mark Advanced nan nan nan nan nan nan nan nan nan
2 7890 12345N John Intermediate 22345N Joe Intermediate 72245N Ana Elementary 30909N Greg Intermediate

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.