1

My dataframe contains three array columns having same length

| col_1                   | col_2                   | col_3                        |
|-------------------------|-------------------------|------------------------------|
|['a', 'b', 'c', 'd', 'e']|['n', 'n', 'n', 'd', 'd']|['cl', 'cl', 'cl', 'en', 'en']|
|['d', 'e', 'f', 'g', 'h']|['d', 'd', 'd', 'n', 'n']|['de', 'de', 'de', 'me', 'me']|
|['a', 'd', 'e', 'b', 'c']|['p', 'p', 'c', 'c', 'c']|['it', 'it', 'cs', 'cs', 'cs']|

I want to join the strings in col_1 using group by in col_3. The corresponding string should be in col_2. My output dataframe should like this.

| col_1       | col_2    | col_3      |
|-------------|----------|------------|
|['abc', 'de']|['n', 'd']|['cl', 'en']|
|['def', 'gh']|['d', 'n']|['de', 'me']|
|['ad', 'ebc']|['p', 'c']|['it', 'cs']|

Final dataframe columns should have same length

1 Answer 1

1

You can explode and use a double groupby.agg:

out = (df
   .explode(list(df)) # explode all columns, use a specific list if needed
   .reset_index()
   .groupby(['index', 'col_3'])
   .agg(''.join)
   .reset_index('col_3')
   .groupby(level=0).agg(list)
   [df.columns]
)

output:

           col_1      col_2     col_3
index                                
0      [abc, de]  [nnn, dd]  [cl, en]
1      [def, gh]  [ddd, nn]  [de, me]
2      [ebc, ad]  [ccc, pp]  [cs, it]

Or maybe you want to group by both col_2/col_3:

out = (df
   .explode(list(df))
   .reset_index()
   .groupby(['index', 'col_2', 'col_3'])
   .agg(''.join)
   .reset_index(['col_2', 'col_3'])
   .groupby(level=0).agg(list)
   [df.columns]
)

output:

           col_1   col_2     col_3
index                             
0      [de, abc]  [d, n]  [en, cl]
1      [def, gh]  [d, n]  [de, me]
2      [ebc, ad]  [c, p]  [cs, it]

Used input:

df = pd.DataFrame({'col_1': [['a', 'b', 'c', 'd', 'e'],
                             ['d', 'e', 'f', 'g', 'h'], 
                             ['a', 'd', 'e', 'b', 'c']],
                   'col_2': [['n', 'n', 'n', 'd', 'd'],
                             ['d', 'd', 'd', 'n', 'n'], 
                             ['p', 'p', 'c', 'c', 'c']],
                   'col_3': [['cl', 'cl', 'cl', 'en', 'en'], 
                             ['de', 'de', 'de', 'me', 'me'],
                             ['it', 'it', 'cs', 'cs', 'cs']]})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.