1

I have a dataframe that looks like this:

df
   col1  col2  col3
0     1   "A"    10
1     1   "B"    20
2     1   "C"    30
...
n     k   "A"    15
n+1   k   "B"    10
n+2   k   "C"     5

I would like to compare the col3 values across rows with matching col1 values with specific values of col2("A" vs "B" and "A" vs "C").

Let's suppose I generate a resulting data analysis dataframe, it will look like this:

da_df
       col1    col2
0    "AvsB"    33.3  #10/30*100
1    "AvsC"    50.0  #10/20*100
...
2k   "AvsB"     150  #15/10*100
2k+1 "AvsC"     300  #15/5*100

How can I do it without for loops?

2 Answers 2

0

You can pivot, then loop over the combinations of columns:

from itertools import combinations

comb = combinations(df['col2'].unique(), r=2)

tmp = df.pivot(index='col1', columns='col2', values='col3')

out = (pd.concat({f'{x}vs{y}': tmp[x].div(tmp[y]).mul(100)
                  for x, y in comb},
                 names=['combination'])
         .reset_index('combination', name='value')
         .sort_index()
      )

Output:

     combination       value
col1                        
1           AvsB   50.000000
1           AvsC   33.333333
1           BvsC   66.666667
k           AvsB  150.000000
k           AvsC  300.000000
k           BvsC  200.000000

Restrict to specific combinations

Note that if you're only interested in specific conditions, you can just manually feed those:

comb = [('A', 'B'), ('A', 'C')]

tmp = df.pivot(index='col1', columns='col2', values='col3')

out = (pd.concat({f'{x}vs{y}': tmp[x].div(tmp[y]).mul(100) for x,y in comb},
                 names=['combination'])
         .reset_index('combination', name='value')
         .sort_index()
      )

Output:

     combination       value
col1                        
1           AvsB   50.000000
1           AvsC   33.333333
k           AvsB  150.000000
k           AvsC  300.000000
Sign up to request clarification or add additional context in comments.

Comments

0

Similar to mozway's answer. But I think my code is simpler and should be efficient.

import pandas as pd
import numpy as np

# Create Col1 by replicating 5 cases 3 times each
col1_values = np.repeat(['Case1', 'Case2', 'Case3', 'Case4', 'Case5'], 3)

# Create Col2 with 3 different observations for each value of Col1
col2_values = np.tile(['A', 'B', 'C'], 5)

# Create Col3 as numeric values
col3_values = np.random.rand(15)*100

# Create the DataFrame
df = pd.DataFrame({'col1': col1_values, 'col2': col2_values, 'col3': col3_values})
df1 = pd.pivot_table(df, values = ['col3'], index = ['col1'], columns = ['col2'], aggfunc= 'mean').reset_index()
df1.columns = df1.columns.map('_'.join).str.strip('|')
df1 = df1.assign(AvsC = lambda x: x['col3_A']/x['col3_C'])
df1 = df1.assign(BvsC = lambda x: x['col3_B']/x['col3_C'])

print(df1)

col1_     col3_A     col3_B     col3_C      AvsC      BvsC
0  Case1  39.101404  28.564307  69.350939  0.563819  0.411881
1  Case2  19.903692  50.989200  94.227183  0.211231  0.541130
2  Case3  14.830127  28.185849  14.900414  0.995283  1.891615
3  Case4  29.918815  36.282592  84.880638  0.352481  0.427454
4  Case5  74.119780  95.239943  99.377965  0.745837  0.958361

8 Comments

Your code is not simpler, it's less generic. You hardcoded the AvsC/BvsC. Imagine having A/B/C/D/E and needing all combinations, now you'll have to type 10 lines of assign. I updated my answer to show how to use only specific combinations.
Then we will use loop in assign function.
I just don't understand the claim of being simpler, I'm using a pivot, then the core of my code is essentially {f'{x}vs{y}': tmp[x]/tmp[y]*100 for x,y in comb}. There are many ways of doing a loop, here I'm using a dictionary comprehension, but I really don't get how this is really different…
Why bother about the claim? Let people decide.
That seems like an example of a constructive suggestion or, at the very least, a call to justify the answer. Stating that something is "simpler" without backing up why that's the case is needless.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.