I try creating a pd.DataFrame out of another containing data of multiple bases (e. g. A, C, G and U1 out of A, C, G, T, U1 and U2).When I extract exactly one base, it works fine (first row of code), but the second returns a dataframe containing only the first matching row.
#import seaborn as sns
import pandas as pd
import os
with open (os.path.join (os.environ ['HOME'], 'data.csv'), 'r') as f :
df = pd.read_csv (f, index_col = 0, header = 0, thousands = None, decimal = '.')
g0 = df.loc [df ['base'] == 'A']
g1 = df.loc [df ['base'].keys () == df ['base'].keys ().any (['A', 'C', 'G', 'U1'])]
print ('df\n', df)
print ()
print ('g0\n', g0)
print ()
print ('g1\n', g1)
#sns.catplot (data = df, x = 'base', y = 'energy', hue = 'charge')
My output:
df
environment base charge energy type
0 pbs A neg 0.34835 1
1 pbs C neg 0.40194 2
2 pbs G neg 0.34959 1
3 pbs T neg 0.40738 2
4 pbs U1 neg 0.34904 2
5 pbs U2 neg 0.40016 2
6 pbs A neu 0.40151 3
7 pbs C neu 0.34494 3
8 pbs G neu 0.40193 3
9 pbs T neu 0.34458 3
10 pbs U1 neu 0.34646 3
11 pbs U2 neu 0.40871 3
12 pbs A pos 0.34047 2
13 pbs C pos 0.40157 2
14 pbs G pos 0.34232 2
15 pbs T pos 0.40854 2
16 pbs U1 pos 0.34611 2
17 pbs U2 pos 0.34414 2
g0
environment base charge energy type
0 pbs A neg 0.34835 1
6 pbs A neu 0.40151 3
12 pbs A pos 0.34047 2
g1
environment base charge energy type
1 pbs C neg 0.40194 2
My deesired output:
df
environment base charge energy type
0 pbs A neg 0.34835 1
1 pbs C neg 0.40194 2
2 pbs G neg 0.34959 1
3 pbs T neg 0.40738 2
4 pbs U1 neg 0.34904 2
5 pbs U2 neg 0.40016 2
6 pbs A neu 0.40151 3
7 pbs C neu 0.34494 3
8 pbs G neu 0.40193 3
9 pbs T neu 0.34458 3
10 pbs U1 neu 0.34646 3
11 pbs U2 neu 0.40871 3
12 pbs A pos 0.34047 2
13 pbs C pos 0.40157 2
14 pbs G pos 0.34232 2
15 pbs T pos 0.40854 2
16 pbs U1 pos 0.34611 2
17 pbs U2 pos 0.34414 2
g0
environment base charge energy type
0 pbs A neg 0.34835 1
6 pbs A neu 0.40151 3
12 pbs A pos 0.34047 2
g1
environment base charge energy type
0 pbs A neg 0.34835 1
1 pbs C neg 0.40194 2
2 pbs G neg 0.34959 1
4 pbs U1 neg 0.34904 2
6 pbs A neu 0.40151 3
7 pbs C neu 0.34494 3
8 pbs G neu 0.40193 3
10 pbs U1 neu 0.34646 3
12 pbs A pos 0.34047 2
13 pbs C pos 0.40157 2
14 pbs G pos 0.34232 2
16 pbs U1 pos 0.34611 2
Additional information:
The data is stored in a *.csv file and the dataframes (containing a) all; b) several; c) some other set of the bases) are going to be plotted by several categories, e.g. charge, environment.
I'm planning to plot with seaborn.
~/data.csv:
,environment,base,charge,energy,type
0,pbs,A,neg,0.34835,1
1,pbs,C,neg,0.40194,2
2,pbs,G,neg,0.34959,1
3,pbs,T,neg,0.40738,2
4,pbs,U1,neg,0.34904,2
5,pbs,U2,neg,0.40016,2
6,pbs,A,neu,0.40151,3
7,pbs,C,neu,0.34494,3
8,pbs,G,neu,0.40193,3
9,pbs,T,neu,0.34458,3
10,pbs,U1,neu,0.34646,3
11,pbs,U2,neu,0.40871,3
12,pbs,A,pos,0.34047,2
13,pbs,C,pos,0.40157,2
14,pbs,G,pos,0.34232,2
15,pbs,T,pos,0.40854,2
16,pbs,U1,pos,0.34611,2
17,pbs,U2,pos,0.34414,2
Most of my tries creating g1 returns either True or False or an empty dataframe.