13

I am discretizing my series for a learner. I really need the series to be in float, and I really need to avoid for loops.

How do I convert this series from float to int?

Here is my function that is currently failing:

def discretize_series(s,count,normalized=True):
    def discretize(value,bucket_size):
        return value % bucket_size
    if normalized:
        maximum = 1.0
    else:
        minimum = np.min(s)
        s = s[:] - minimum
        maximum = np.max(s)
    bucket_size = maximum / float(count)

Here is the line that causes the function to fail:

    s = int((s[:] - s[:] % bucket_size)/bucket_size)

The int() induces a casting error: I am unable to cast the pandas series as an int series.

    return s

If I remove the int(), the function works, so I may just see if I can get it to work anyway.

8
  • In the first branch, minimum is greater than zero, so subtracting it from all values sets the minimum to zero. In the second branch, minimum is less than zero, so adding the abs(min) shifts the data up to zero... Commented Dec 7, 2015 at 23:42
  • And unless pandas does something fishy, you might not be needing those [:] there. Commented Dec 7, 2015 at 23:44
  • What does "is currently failing" mean? Does it only fail in case normalized==True? You might have to set s=s/np.max(s) in that case. And you can still have trouble if np.max(s)<0. Is that possible? Commented Dec 7, 2015 at 23:49
  • No, it fails because of the int() function in line 11. It cannot change the series from float to int. Commented Dec 8, 2015 at 0:03
  • So, are you sure it works right for the normalized==True case, if your input series has a maximum of, say, 10? For instance, for a count of 2, you'd have a bucket_size of 0.5. But then for the maximal value of s you'd have (10 - 10%0.5)/0.5==20, much more than 2. I would expect that you have to do the same shifting to 0, but you also have to divide by the maximum. Commented Dec 8, 2015 at 0:22

2 Answers 2

28

The regular python int function only works for scalars. You should either use a numpy function to round the data, either

s = np.round((s - s % bucket_size) / bucket_size) #to round properly; or
s = np.fix((s - s % bucket_size) / bucket_size)   #to round towards 0

and if you actually want to convert to an integer type, use

s = s.astype(int)

to cast your array.

Sign up to request clarification or add additional context in comments.

Comments

2

N.B. This answer is less efficient from the point of view that pandas is built on top of numpy. Please consider numpy if going for efficiency.

As for this answer, there is a significant amount of work done using pandas data frames, so adding additional conversion to numpy means writing extra code. So if one is performing an analysis in say jupyter notebook, then we can surely let the programming language do a bit of work under the hood.

Big thank you to @Chris for noticing this.


pandas version (theoretically less efficient than numpy)

Create a list with float values:

y = [0.1234, 0.6789, 0.5678]

Convert the list of float values to pandas Series

s = pd.Series(data=y)

Round values to three decimal values

print(s.round(3))

returns

0    0.123
1    0.679
2    0.568
dtype: float64

Convert to integer

print(s.astype(int))

returns

0    0
1    0
2    0
dtype: int64

Pipe it all

pd.Series(data=y).round(3)

3 Comments

Not sure this answers the question. It may be the case that the pandas library is using numpy calls to fulfill this interface.
@Chris, thank you for the comment, doing a quick web search revealed you are right: ... pandas is an open-source library built on top of numpy providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language... There were little requirements about numpy, but sure, we could delete it, please let me know when it is time.
no problem; doesn't matter to me actually. It is a well written answer. Only saying that the accepted answer will stay that way because it answers a more fundamental question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.