I been using the pandas library, and crosstab to create a frequency Dataframe to work with Data. In the following code I read in a csv, create a dataframe then create a crosstab which is a frequency dataframe. Then I get a cross-section of the data to pull out columns and the data beneath.
def dataforgraphs():
d = readcsv()
df = DataFrame(d)
d1=df[1]
d0=df[0]
d2=df[2]
d3=df[3]
d4=df[4]
cta = pd.crosstab(d0,[d2,d1,d3],rownames=['Date'],colnames=['RigStat','Prov','Obj'], margins=False)
ndfABA= ndf.xs('AB', level='Prov', axis=1)
ABrigs = ndfAB.xs(['BIT','GAS','OIL'],axis=1)
Now from here I have the issue of not being able to pull the cross section on the hypothetical column that would include all the blank values that did not have the label 'BIT','GAS' or 'OIL'. In an excel pivot table, I can do this by checking the (blank) box when selecting the columns to be included in a pivot table. I want to do the same thing here to get a frequency count of all those that are blank.
Any suggestions?
Currently I get the following output, which only has the three column specified and the frequencies below.
OIL GAS BIT
Date
01-01-2007 1 6 3
01-02-2007 2 4 4
01-03-2007 1 6 3
01-04-2007 5 6 4
01-05-2007 1 7 3
01-06-2007 6 6 6
01-07-2007 1 8 3
01-08-2007 5 6 6
01-09-2007 1 6 3
01-10-2007 1 7 3
Instead, I would like to get the following, which includes a column for all blank values not listed as OIL,GAS or BIT (or listed as anything for that matter).
OIL GAS BIT "blank'
Date
01-01-2007 1 6 3 10
01-02-2007 2 4 4 11
01-03-2007 1 6 3 12
01-04-2007 5 6 4 10
01-05-2007 1 7 3 1
01-06-2007 6 6 6 4
01-07-2007 1 8 3 5
01-08-2007 5 6 6 2
01-09-2007 1 6 3 5
01-10-2007 1 7 3 2
The Data going into the pandas crosstab dataframe is structured like the following:
Date Obj Operator Type Address
01-01-2007 OIL ABC HZ 112 W Ave
01-01-2007 GAS ABC HZ 112 W Ave
01-01-2007 GAS ABV HZ 113 W Ave
01-01-2007 BIT NCH HZ 114 W Ave
01-01-2007 CNR HZ 115 W Ave
01-02-2007 OIL CNRL HZ 112 W Ave
01-02-2007 OIL CNRL HZ 112 W Ave
01-02-2007 OIL CNRL HZ 112 W Ave
01-03-2007 CNRL HZ 112 W Ave
01-03-2007 CNRL HZ 112 W Ave
From here, pandas crosstab would create a frequency table that would capture the frquency of OIL, GAS, BIT by date, but I cant find how to get the blank value count.Notice how there are some columns that dont have an Obj listed. These are the values that are not captured in the crosstab that I would like to be able to query.
Any suggestions?