1

I have a csv file (data.csv) like this:

A,B,C,D,E
1.50,2.70,"2,481","1,569",2.15
2020-1-1,2020-1-2,2020-1-3,2020-1-4,2020-1-5
John, Jeff, Ruben, Cath, James

I tried to use df=pd.read_csv("data.csv", thousands=',') I got df=

          A        B        C        D        E
0       1.5      2.7    2,481    1,569     2.15
1  2020-1-1 2020-1-2 2020-1-3 2020-1-4 2020-1-5
2      John     Jeff    Ruben     Cath    James

Looks OK but actually all numbers and dates are strings in df, while Excel can read/convert them correctly.

How can we read numbers, dates and strings from a csv file correctly?

2
  • 3
    one column usually only have one type, check your column A if contain date and number which is more than two types Commented Feb 20, 2021 at 17:02
  • yes, but the actual csv file is in this way, and there are many such csv files. Commented Feb 22, 2021 at 1:49

1 Answer 1

1

The preferred way of handling this would be to read it in normally, taking the transpose and handling it column-wise so like this:

DF = read_csv(pth).T
DF
       0         1       2
A   1.50  2020-1-1    John
B   2.70  2020-1-2    Jeff
C  2,481  2020-1-3   Ruben
D  1,569  2020-1-4    Cath
E   2.15  2020-1-5   James

DF[0] = DF[0].str.replace(",","").astype(float)
DF
         0         1       2
A     1.50  2020-1-1    John
B     2.70  2020-1-2    Jeff
C  2481.00  2020-1-3   Ruben
D  1569.00  2020-1-4    Cath
E     2.15  2020-1-5   James

Then you also have series (columns) with the correct type:

DF[0]
A       1.50
B       2.70
C    2481.00
D    1569.00
E       2.15
Name: 0, dtype: float64  #<<<<< float

If you are really hell-bent on keeping the original shape, you could also do it like this:

df = read_csv(pth)
df.iloc[0,:] = df.iloc[0,:].str.replace(",", "").astype(float)
df
          A         B         C         D         E
0       1.5       2.7    2481.0    1569.0      2.15
1  2020-1-1  2020-1-2  2020-1-3  2020-1-4  2020-1-5
2      John      Jeff     Ruben      Cath     James

then you could do this

df.iloc[0,0] + df.iloc[0,2]
2482.5

But the row itself would still be an object and not float, which may be a disadvantage at some point:

df.iloc[0,:]
A       1.50
B       2.70
C    2481.00
D    1569.00
E       2.15
Name: 0, dtype: object   <<<< object
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I think this is the only way to work around until next Panda version solves this defect and works as good as Excel

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.