I have a dataset where some of the sample identifiers (found in the index column) can be interpreted as numbers. Examples: 20010104123140E5 and 2001010412314529. I try to specify that the index column has type string, but pandas.read_csv insists on turning identifiers into floats. See example below.
Does anyone know how I can get around this? Or am I doing something wrong here?
import pandas as pd
with open('test.data', mode = 'w') as infile:
infile.write('id\tval\n20010104123140E5\t1\n2001010412314529\t2')
df = pd.read_csv('test.data', dtype = {'id':'str', 'val':'float'}, sep='\t', index_col='id')
print(df)