0

So I am using pandas to read in excel files and csv files. These files contain both strings and numbers not just numbers. Problem is all my strings get converted into NaN which I do not want at all. I do not know what the types of the columns will be ahead of time (it is actually my job to handle the system that figures this out) so I can't tell pandas what they will be (that must come later). I just want to read in each cell as a string for now.

here is my code

if csv: #check weather to read in excell file or csv
  frame = pandas.read_csv(io.StringIO(data))
else:
  frame = pandas.read_excel(io.StringIO(data))
tbl = []
print frame.dtypes
for (i, col) in enumerate(frame):
  tmp = [col]
  for (j, value) in enumerate(frame[col]):
    tmp.append(unicode(value))
  tbl.append(tmp)

I just need to be able to produce a column wise 2D list and I can do everything from there. I also need to be able to handle Unicode (data is already in Unicode).

How do I construct 'tbl' so that cells that should be strings do not come out as 'NaN'?

4
  • 1
    Is the problem occurring with CSV files or Excel files? Add a sample file to the question so we can reproduce the problem. Commented Jul 11, 2014 at 18:09
  • Did you read documentation parsers.read_csv ? Did you try to use it - make some experiments with arguments ? Commented Jul 11, 2014 at 18:40
  • Yes, I did. That is how I found the function. I did experiment with it that is how I found this issue. Commented Jul 11, 2014 at 19:06
  • To clarify I can't use dtype because I do not know what the header names will be until I read in the file. Commented Jul 11, 2014 at 19:36

1 Answer 1

1

In general cases where you can't know the dtypes or column names of a CSV ahead of time, using a CSV sniffer can be helpful.

import csv
[...] 
dialect = csv.Sniffer().sniff(f.read(1024))
f.seek(0)

frame = pandas.read_csv(io.StringIO(data), dialect=dialect)
Sign up to request clarification or add additional context in comments.

2 Comments

I have to be able to use unicode so I can't use python csv (I am using python 2.7). but close!! I could certainly make use of a Unicode version of that
Haven't tried this, but looks promising: stackoverflow.com/a/10275281/2907617

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.