convert pandas float series to int

Question

I am discretizing my series for a learner. I really need the series to be in float, and I really need to avoid for loops.

How do I convert this series from float to int?

Here is my function that is currently failing:

def discretize_series(s,count,normalized=True):
    def discretize(value,bucket_size):
        return value % bucket_size
    if normalized:
        maximum = 1.0
    else:
        minimum = np.min(s)
        s = s[:] - minimum
        maximum = np.max(s)
    bucket_size = maximum / float(count)

Here is the line that causes the function to fail:

    s = int((s[:] - s[:] % bucket_size)/bucket_size)

The int() induces a casting error: I am unable to cast the pandas series as an int series.

    return s

If I remove the int(), the function works, so I may just see if I can get it to work anyway.

In the first branch, minimum is greater than zero, so subtracting it from all values sets the minimum to zero. In the second branch, minimum is less than zero, so adding the abs(min) shifts the data up to zero... — Chris
– Chris, Commented Dec 7, 2015 at 23:42
And unless pandas does something fishy, you might not be needing those [:] there. — Andras Deak -- Слава Україні
– Andras Deak -- Слава Україні, Commented Dec 7, 2015 at 23:44
What does "is currently failing" mean? Does it only fail in case normalized==True? You might have to set s=s/np.max(s) in that case. And you can still have trouble if np.max(s)<0. Is that possible? — Andras Deak -- Слава Україні
– Andras Deak -- Слава Україні, Commented Dec 7, 2015 at 23:49
No, it fails because of the int() function in line 11. It cannot change the series from float to int. — Chris
– Chris, Commented Dec 8, 2015 at 0:03
So, are you sure it works right for the normalized==True case, if your input series has a maximum of, say, 10? For instance, for a count of 2, you'd have a bucket_size of 0.5. But then for the maximal value of s you'd have (10 - 10%0.5)/0.5==20, much more than 2. I would expect that you have to do the same shifting to 0, but you also have to divide by the maximum. — Andras Deak -- Слава Україні
– Andras Deak -- Слава Україні, Commented Dec 8, 2015 at 0:22

Andras Deak -- Слава Україні · Accepted Answer · 2020-03-12 21:08:16Z

28

The regular python int function only works for scalars. You should either use a numpy function to round the data, either

s = np.round((s - s % bucket_size) / bucket_size) #to round properly; or
s = np.fix((s - s % bucket_size) / bucket_size)   #to round towards 0

and if you actually want to convert to an integer type, use

s = s.astype(int)

to cast your array.

edited Mar 12, 2020 at 21:08

answered Dec 8, 2015 at 0:16

Andras Deak -- Слава Україні

35.4k13 gold badges94 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Curious Watcher · Accepted Answer · 2022-01-14 16:42:47Z

2

N.B. This answer is less efficient from the point of view that pandas is built on top of numpy. Please consider numpy if going for efficiency.

As for this answer, there is a significant amount of work done using pandas data frames, so adding additional conversion to numpy means writing extra code. So if one is performing an analysis in say jupyter notebook, then we can surely let the programming language do a bit of work under the hood.

Big thank you to @Chris for noticing this.

`pandas` version (theoretically less efficient than `numpy`)

Create a list with float values:

y = [0.1234, 0.6789, 0.5678]

Convert the list of float values to `pandas` Series

s = pd.Series(data=y)

Round values to three decimal values

print(s.round(3))

returns

0    0.123
1    0.679
2    0.568
dtype: float64

Convert to integer

print(s.astype(int))

returns

0    0
1    0
2    0
dtype: int64

Pipe it all

pd.Series(data=y).round(3)

edited Jan 14, 2022 at 16:42

answered Jan 12, 2022 at 23:03

Curious Watcher

6899 silver badges13 bronze badges

3 Comments

Chris Over a year ago

Not sure this answers the question. It may be the case that the pandas library is using numpy calls to fulfill this interface.

Curious Watcher Over a year ago

@Chris, thank you for the comment, doing a quick web search revealed you are right: ... pandas is an open-source library built on top of numpy providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language... There were little requirements about numpy, but sure, we could delete it, please let me know when it is time.

Chris Over a year ago

no problem; doesn't matter to me actually. It is a well written answer. Only saying that the accepted answer will stay that way because it answers a more fundamental question.

Collectives™ on Stack Overflow

convert pandas float series to int

2 Answers 2

Comments

`pandas` version (theoretically less efficient than `numpy`)

Create a list with float values:

Convert the list of float values to `pandas` Series

Round values to three decimal values

Convert to integer

Pipe it all

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

pandas version (theoretically less efficient than numpy)

Create a list with float values:

Convert the list of float values to pandas Series

Round values to three decimal values

Convert to integer

Pipe it all

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related

`pandas` version (theoretically less efficient than `numpy`)

Convert the list of float values to `pandas` Series