14

I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.

   ID      xyz
0  12345     4.56
1           45.60
2  54231   987.00

I want to read this column as String, but even if I specifiy it with

df=pd.read_csv(filename,dtype={'ID': str})

I get

   ID         xyz
0  '12345.0'    4.56
1   NaN        45.60
2  '54231.0'  987.00

Is there an easy way get the ID as a string without decimal like '12345'without having to edit the Strings after importing the table?

4
  • Is possible empty values in numeric columns? Commented Nov 13, 2018 at 12:34
  • If your concern is output format, then fix this when you export the data (e.g. to_csv, to_string), not by changing your underlying data (which looks fine) to awkward types. Commented Nov 13, 2018 at 13:18
  • I think you can upgrade your pandas version and all working nice. Commented Nov 13, 2018 at 13:18
  • I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that. Commented Nov 13, 2018 at 13:24

3 Answers 3

9

A solution could be this, but after you have imported the df:

df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)

Or since there are NaN with:

df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
Sign up to request clarification or add additional context in comments.

4 Comments

Doesn't work, because I have empty cells, and NaN values can't be converted to int
That worked thank you. Was trying smthng similar but yours works way better
This saved me after half an hour of looking through other answers that did not work. Thank you!
What shall we do in we have cell with "00212" value? @joe
1

Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:

import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype={'ID': str}, keep_default_na=False)
    print (df)
      ID     xyz
0  12345    4.56
1          45.60
2  54231  987.00

EDIT:

For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:

import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype={'ID': str})
print (df)
      ID     xyz
0  12345    4.56
1    NaN   45.60
2  54231  987.00

7 Comments

It works for your example, but not my csv file. Only difference to previous result is that NaN became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.
@GeorgB - what is expected output in ID column instead empty string?
the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.
@GeorgB - df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x))) is your solution?
Thanks for this solution! Your EDIT: section with dtype={'ID': str} solved the issue for me! I was losing leading zeros that I wanted to keep, so I needed to read it with the correct schema. Great suggestion!
|
0

Specify float format when writing to csv

Since your underlying problem is output format when exporting data, no manipulation is required. Just use:

df.to_csv('file.csv', float_format='%.0f')

Since you want only specific columns to have this formatting you can use to_string:

def format_int(x):
    return f'{x:.0f}' if x==x else ''

with open('file.csv', 'w') as fout:
    fout.write(df.to_string(formatters={'ID': format_int}))

Keep numeric data numeric

There is a column with IDs, which consist of only numbers

If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.

Float NaN prompts upcasting

Your issue is NaN values can't coexist with integers in a numeric series. Since NaN is a float, Pandas forces upcasting. This is natural, because the object dtype alternative is inefficient and not recommended.

If viable, you can use a sentinel value, e.g. -1 to indicate nulls:

df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)

print(df)

      ID     xyz
0  12345    4.56
1     -1   45.60
2  54231  987.00

5 Comments

If your column only includes numbers, don't convert to strings! - it OP need convert numeric to strings, why not? What is wrong about it?
@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."
OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.
I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.
I donvote becasue don't convert to strings! is wrong statement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.