0

I try creating a pd.DataFrame out of another containing data of multiple bases (e. g. A, C, G and U1 out of A, C, G, T, U1 and U2).When I extract exactly one base, it works fine (first row of code), but the second returns a dataframe containing only the first matching row.

#import seaborn as sns
import pandas as pd
import os

with open (os.path.join (os.environ ['HOME'], 'data.csv'), 'r') as f :
    df = pd.read_csv (f, index_col = 0, header = 0, thousands = None, decimal = '.')

g0 = df.loc [df ['base'] == 'A']
g1 = df.loc [df ['base'].keys () == df ['base'].keys ().any (['A', 'C', 'G', 'U1'])]

print ('df\n', df)
print ()
print ('g0\n', g0)
print ()
print ('g1\n', g1)

#sns.catplot (data = df, x = 'base', y = 'energy', hue = 'charge')

My output:

df
    environment base charge   energy  type
0          pbs    A    neg  0.34835     1
1          pbs    C    neg  0.40194     2
2          pbs    G    neg  0.34959     1
3          pbs    T    neg  0.40738     2
4          pbs   U1    neg  0.34904     2
5          pbs   U2    neg  0.40016     2
6          pbs    A    neu  0.40151     3
7          pbs    C    neu  0.34494     3
8          pbs    G    neu  0.40193     3
9          pbs    T    neu  0.34458     3
10         pbs   U1    neu  0.34646     3
11         pbs   U2    neu  0.40871     3
12         pbs    A    pos  0.34047     2
13         pbs    C    pos  0.40157     2
14         pbs    G    pos  0.34232     2
15         pbs    T    pos  0.40854     2
16         pbs   U1    pos  0.34611     2
17         pbs   U2    pos  0.34414     2

g0
    environment base charge   energy  type
0          pbs    A    neg  0.34835     1
6          pbs    A    neu  0.40151     3
12         pbs    A    pos  0.34047     2

g1
   environment base charge   energy  type
1         pbs    C    neg  0.40194     2

My deesired output:

df
    environment base charge   energy  type
0          pbs    A    neg  0.34835     1
1          pbs    C    neg  0.40194     2
2          pbs    G    neg  0.34959     1
3          pbs    T    neg  0.40738     2
4          pbs   U1    neg  0.34904     2
5          pbs   U2    neg  0.40016     2
6          pbs    A    neu  0.40151     3
7          pbs    C    neu  0.34494     3
8          pbs    G    neu  0.40193     3
9          pbs    T    neu  0.34458     3
10         pbs   U1    neu  0.34646     3
11         pbs   U2    neu  0.40871     3
12         pbs    A    pos  0.34047     2
13         pbs    C    pos  0.40157     2
14         pbs    G    pos  0.34232     2
15         pbs    T    pos  0.40854     2
16         pbs   U1    pos  0.34611     2
17         pbs   U2    pos  0.34414     2

g0
    environment base charge   energy  type
0          pbs    A    neg  0.34835     1
6          pbs    A    neu  0.40151     3
12         pbs    A    pos  0.34047     2

g1
   environment base charge   energy  type
0          pbs    A    neg  0.34835     1
1          pbs    C    neg  0.40194     2
2          pbs    G    neg  0.34959     1
4          pbs   U1    neg  0.34904     2
6          pbs    A    neu  0.40151     3
7          pbs    C    neu  0.34494     3
8          pbs    G    neu  0.40193     3
10         pbs   U1    neu  0.34646     3
12         pbs    A    pos  0.34047     2
13         pbs    C    pos  0.40157     2
14         pbs    G    pos  0.34232     2
16         pbs   U1    pos  0.34611     2

Additional information:
The data is stored in a *.csv file and the dataframes (containing a) all; b) several; c) some other set of the bases) are going to be plotted by several categories, e.g. charge, environment.
I'm planning to plot with seaborn.

~/data.csv:

,environment,base,charge,energy,type
0,pbs,A,neg,0.34835,1
1,pbs,C,neg,0.40194,2
2,pbs,G,neg,0.34959,1
3,pbs,T,neg,0.40738,2
4,pbs,U1,neg,0.34904,2
5,pbs,U2,neg,0.40016,2
6,pbs,A,neu,0.40151,3
7,pbs,C,neu,0.34494,3
8,pbs,G,neu,0.40193,3
9,pbs,T,neu,0.34458,3
10,pbs,U1,neu,0.34646,3
11,pbs,U2,neu,0.40871,3
12,pbs,A,pos,0.34047,2
13,pbs,C,pos,0.40157,2
14,pbs,G,pos,0.34232,2
15,pbs,T,pos,0.40854,2
16,pbs,U1,pos,0.34611,2
17,pbs,U2,pos,0.34414,2

Most of my tries creating g1 returns either True or False or an empty dataframe.

2
  • 1
    Welcome to StackOverflow. Good answers need good questions. Have a look at how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example. You can edit your questions to make it easier to get help. Commented May 22, 2020 at 11:51
  • Do you also need the python packages installed? If so, is the version relevant/crucial? Commented May 22, 2020 at 14:25

1 Answer 1

1

Thank you the data helped. You can use isin as follows:

g1 = df.loc[df["base"].isin(["A", "C", "G", "U1"])]
print(g1)
   environment base charge  energy  type
1          pbs    A    neg   0.348     1
2          pbs    C    neg   0.402     2
3          pbs    G    neg   0.350     1
5          pbs   U1    neg   0.349     2
7          pbs    A    neu   0.402     3
8          pbs    C    neu   0.345     3
9          pbs    G    neu   0.402     3
11         pbs   U1    neu   0.346     3
13         pbs    A    pos   0.340     2
14         pbs    C    pos   0.402     2
15         pbs    G    pos   0.342     2
17         pbs   U1    pos   0.346     2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.