0

I have below datasets

host          infra    domain  user   entity   bios  ip
 
abcd.com      Lake      Hov    bat     sos     nvm   x.x.x.x
kpm.com       Lake      Hov    bat     sos     nvm   x.x.x.x
ffger.com     Data      JOV    sat     sim     nvm   x.x.x.x
ffger.com     Data      TOV    sat     sim     nvm   NAN
kko.com       Lake      POV    et      som     nvm   NAN
spm.com       Lake      Hov    bat     sos     nvm   NAN

I want to group by infra and with domain include all NAN

so one df would be if df.group_by(infra,domain) like below

abcd.com      Lake      Hov    bat     sos     nvm   x.x.x.x
kpm.com       Lake      Hov    bat     sos     nvm   x.x.x.x
spm.com       Lake      Hov    bat     sos     nvm   NAN
2
  • what's the expected outcome? do you want to create multiple dfs? Commented Jan 27, 2022 at 6:35
  • yes I want multiple dff Commented Jan 27, 2022 at 6:37

2 Answers 2

1

You can create dictionary of DataFrames:

dfs = {f'{"_".join(name)}':df for name, df in df.groupby(['infra','domain'])}

print (dfs['Lake_Hov'])
       host infra domain user entity bios       ip
0  abcd.com  Lake    Hov  bat    sos  nvm  x.x.x.x
1   kpm.com  Lake    Hov  bat    sos  nvm  x.x.x.x
5   spm.com  Lake    Hov  bat    sos  nvm      NAN

If need loops by groups:

for name, df in tuple(df.groupby(['infra','domain'], sort=False)):
    print (df)
           host infra domain user entity bios       ip
    0  abcd.com  Lake    Hov  bat    sos  nvm  x.x.x.x
    1   kpm.com  Lake    Hov  bat    sos  nvm  x.x.x.x
    5   spm.com  Lake    Hov  bat    sos  nvm      NAN
            host infra domain user entity bios       ip
    2  ffger.com  Data    JOV  sat    sim  nvm  x.x.x.x
            host infra domain user entity bios   ip
    3  ffger.com  Data    TOV  sat    sim  nvm  NAN
          host infra domain user entity bios   ip
    4  kko.com  Lake    POV   et    som  nvm  NAN
Sign up to request clarification or add additional context in comments.

2 Comments

@RajarshiDas - Use print (len(dfs))
@RajarshiDas - If need check keys print (list(dfs.keys()))
0

I think you can use the dropna parameter of the groupby function for the same.

pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'

# Example from the docs
df

   a    b  c
0  1  2.0  3
1  1  NaN  4
2  2  1.0  3
3  1  2.0  2

# without NA (the default)
df.groupby('b').sum()

     a  c
b        
1.0  2  3
2.0  2  5
# with NA
df.groupby('b', dropna=False).sum()

     a  c
b        
1.0  2  3
2.0  2  5
NaN  1  4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.