I'm trying to extract data from a csv file I have that contains some missing data
Num,Sym,Element,Group,Weight,Density,Melting,Boiling,Heat,Eneg,Radius,Oxidation
1,H,Hydrogen,1,1.008,0.00008988,14.01,20.28,14.304,2.2,53,"[1,-1]"
2,He,Helium,18,4.002602,0.0001785,0.956,4.22,5.193,No_Data,31,[0]
etc
In this case the missing value is the electronegativity of Helium, a noble gas. I also want to parse this data all at once (ie when I read it in) and cast it to the appropriate data type so I can perform calculations as needed, using this function
import csv
def read_periodic_table():
per_table = {}
with open("element_list.csv", "r") as f:
my_reader = csv.reader(f)
my_reader.next() # Just skipping the header
try:
while True:
tl = my_reader.next()
per_table[tl[1]] =(int(tl[0]), tl[2], int(tl[3]), float(tl[4]),
float(tl[5]), float(tl[6]), float(tl[7]),
float(tl[8]), float(tl[9]), float(tl[10]),
list(tl[11]))
except StopIteration:
return
This works fine, except when there are places where there is no data (as above) and I get a TypeError. I get why there is an error - you can't really cast "No_Data" to a floating point number.
I've read these questions
- Definitive way to parse alphanumeric CSVs in Python with scipy/numpy
- How do you deal with missing data using numpy/scipy?
which could probably answer my question, except I'd like to avoid using extra libraries for just one function.
The only way that I can think of handling this is some try/except blocks... a lot of them
Something like this
num = tl[0]
name = tl[2]
group = tl[3]
try:
weight = float(tl[4])
except TypeError:
weight = "No_Data"
finally:
try:
density = float(tl[5])
except TypeError:
density = "No_Data"
finally:
try:
...
Which, for what I hope are obvious reasons, I'd rather avoid. Is there a way using only the standard library to accomplish this? If the answer is - "No, not very easily/well" then that's fine, I'll just use numpy/pandas. I'd just like to avoid that if possible. Alternately, if there is a fantastic answer with numpy/pandas and a compelling reason why using an extra library wouldn't be bad I'd take that too.
The reason I don't want to use a third party library is that several people, including myself, will be working on this and then quite a few people will be using it afterwards. I'd rather not make them all install another library to make this work.