0

There is an exercise I have found on Kaggle which defines my purpose as:

We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

Create a series star_ratings with the number of stars corresponding to each review in the dataset.

Data.head( ):enter image description here

And I wrote this code with the hope to serve my purpose:

def stars(reviews):
for i,pons in enumerate(reviews.points):
    if pons < 85:
        reviews.points[i] = "1 star"
    elif pons <95:
        reviews.points[i] = "2 stars"
    elif (pons >= 95):
        reviews.points[i] = "3 stars"
for i,cons in reviews.country:
    if cons == "Canada":
        reviews.points[i] = "3 stars"

star_ratings = reviews.apply(stars, axis = "columns")
        

This answer did not work for me as I keep getting the

TypeError: 'int' object is not iterable

Why my for loop keeps on giving me this error?

1
  • 2
    It looks like reviews.points is an integer. You can't iterate over a single integer. Commented Dec 15, 2020 at 15:02

1 Answer 1

1

When you use df.apply(axis='columns'), the provided function will be applied to each row of the input DataFrame.

The reviews argument holds a pd.Series representing a single row. Therefore reviews.points is a single cell, not a column.

Here's one way that you could rewrite the function:

def stars(review):
    if review.points < 85:
        review.points = "1 star"
    elif review.points < 95:
        review.points = "2 stars"
    elif review.points >= 95:
        review.points = "3 stars"
    if review.country == "Canada":
        review.points = "3 stars"

star_ratings = reviews.apply(stars, axis="columns")

Another way, which will be much more performant, is to ditch the apply and use vectorized instructions:

star_ratings = reviews.copy()
star_ratings['points'] = pd.cut(star_ratings['points'], bins=[-np.inf, 85, 95, np.inf], labels=['1 star', '2 stars', '3 stars'])
star_ratings.loc[star_ratings['country'] == 'Canada', 'points'] = '3 stars'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.