I am running Python 3.5.2 and Pandas 0.19.1. I use read_fwf() to read in a large data file that was originally formatted in FORTRAN. It has columns that look like this:
SiC4+ e- C2 c-SiC2 1.500e-07 -5.000e-01 0.000e+00 2.00e+00 0.00e+00 logn 8 10 280 3 746 1 1
SiC4+ e- C l-SiC3 1.500e-07 -5.000e-01 0.000e+00 2.00e+00 0.00e+00 logn 8 10 280 3 747 1 1
O e- O- 1.500e-15 0.000e+00 0.000e+00 2.00e+00 0.00e+00 logn 8 10 280 3 744 1 1
S e- S- 5.000e-15 0.000e+00 0.000e+00 2.00e+00 0.00e+00 logn 8 10 280 3 745 1 1
To read this in, I'm using this code:
convert = lambda x: int(species[x]) if x!='' else None
reactions = pd.read_fwf('data.dat',sep='\s+',converters{0:convert,1:convert,2:convert,3:convert})
reactions.fillna(0,inplace=True)
The converters take the first 4 columns' chemical names and replace them with index numbers (from another file), and any missing data is replaced with index number zero. This works fine.
What doesn't work is the 6th column and the 15th column.
116 76 7 30 1.500000e-07 0.5 0.0 2.0 0.0 logn 8 10 280 3 46 1 1
116 76 1 41 1.500000e-07 0.5 0.0 2.0 0.0 logn 8 10 280 3 47 1 1
4 76 74 0 1.500000e-15 0.0 0.0 2.0 0.0 logn 8 10 280 3 44 1 1
5 76 75 0 5.000000e-15 0.0 0.0 2.0 0.0 logn 8 10 280 3 45 1 1
What is going on here? The 6th column loses it's negative sign, and the 15th column is missing its leading '7'. I can't find a reason for why this is happening, and it doesn't make sense. Other columns in the file that have leading negative signs are left untouched.
Update
The solution below is not incorrect, but for it to work for me required a very important change to the file header. The first 7 columns of my file looks like this (with headers):
Input1 Input2 Output1 Output2 alpha beta gamma
NC3 CRP C2 CN 2.000e+03 0.000e+00 0.000e+00
C2N2 CRP CN CN 2.000e+03 0.000e+00 0.000e+00
NC7 CRP C6 CN 2.000e+03 -1.000e+00 0.000e+00
read_fwf() read in the headers and the spaces in between, and must have presumed that the column marked beta was spaced 2 characters away from the end of the column marked alpha, completely ignoring the negative sign on some of the values in beta.
I changed the header position for all columns that this could be a problem for, and the problem was fixed.
Input1 Input2 Output1 Output2 alpha beta gamma
NC3 CRP C2 CN 2.000e+03 0.000e+00 0.000e+00
C2N2 CRP CN CN 2.000e+03 0.000e+00 0.000e+00
NC7 CRP C6 CN 2.000e+03 -1.000e+00 0.000e+00
Notice that the file header for beta (and gamma) are pulled one space to the left. This starts the column early enough for read_fwf() to include the negative sign.
sep=you are giving a separator but the point ofread_fwfis that you have a column organized file, not a separator-organized file. So I don't think you ever want to combineread_fwfwith thesep=argument. If you want to use a separator, just useread_csvsep=would have been the problem. I had thought it benign since it was included in the docs forread_fwf().