0

I'm trying to convert a string (such as 10.99) into a float in a for loop, and I can't figure out a way to do it with the data I've scraped from a website. I need to divide the output by another float (also in the for loop). The below is an example of what I'm trying to do

import re

test_data = ['\n\t\t\t\t£10.00 per 100ML', '\xa0', '\n\t\t\t\t£0.40 per EACH', '\xa0', '\xa0', '\xa0', '\xa0', '\n\t\t\t\t£0.54 per EACH', '\n\t\t\t\t£1.33 per EACH']
price_data = [100, 10.99, 20.99, 25.25, 30, 35, 40, 54, 3]

for items in zip(test_data, price_data):
    characters = re.sub("\[p].*$|[^\d\.]", "", items[0])
    price_per_unit = characters[0:5]

    price = items[1]

    number_of_units = price / float(price_per_unit)

I then get the error:

    number_of_units = price / float(price_per_unit)
ValueError: could not convert string to float: 

What's the best way to turn price_per_unit into a float, and calculate number_of_units??

Thanks for your help :)

EDIT: Working solution below for anyone else who's interested

import re

test_data = ['\n\t\t\t\t£10.00 per 100ML', '\xa0', '\n\t\t\t\t£0.40 per EACH', '\xa0', '\xa0', '\xa0', '\xa0', '\n\t\t\t\t£0.54 per EACH', '\n\t\t\t\t£1.33 per EACH']
price_data = [100, 10.99, 20.99, 25.25, 30, 35, 40, 54, 3]

for items in zip(test_data, price_data):
    price = items[1]

    characters = re.sub("\[p].*$|[^\d\.]", "", items[0])
    price_per_unit = characters[0:5]
    if price_per_unit.replace('.', '', 1).isdigit():
        price_per_unit_formatted = float(price_per_unit)
        number_of_units = price / price_per_unit_formatted
    else:
        price_per_unit = None
        number_of_units = None
2
  • 2
    You need to cast both operands into floats to be compatible. Commented Dec 4, 2019 at 16:34
  • 1
    Your re.sub() part is not working good, It doesn't return any data. I don't get what were you trying to achieve. Commented Dec 4, 2019 at 16:49

5 Answers 5

3

Your problem is not from the float() function. When the code parses the test_data, the '\xa0' will return an empty string '', this empty string cannot be converted to float point value.

Hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

1

As Bill said, your problem is due to the fact that some of price_per_unit are ''. a simple wrap around this issue is making sure that the price_per_unit is indeed a number for instance in the following way:

if price_per_unit.replace('.','',1).isdigit():
      number_of_units = price / float(price_per_unit)

This will ignore those '' and still maintain its functionality

1 Comment

Super helpful thank you @David S! I've taken your suggestion and added it to my code, working perfectly:
1

the \xa0 is becoming a empty string, you should handle it

import re

test_data = ['\n\t\t\t\t£10.00 per 100ML', '\xa0', '\n\t\t\t\t£0.40 per EACH', '\xa0', '\xa0', '\xa0', '\xa0', '\n\t\t\t\t£0.54 per EACH', '\n\t\t\t\t£1.33 per EACH']
price_data = [100, 10.99, 20.99, 25.25, 30, 35, 40, 54, 3]

for items in zip(test_data, price_data):

    characters = re.sub("\[p].*$|[^\d\.]", "", items[0])

    price_per_unit = characters[:5]
    if price_per_unit == '':
      print('empty')
      break
    price = items[1]

    number_of_units = price/ float(price_per_unit)
    print(number_of_units)

Comments

1

if I understand what you're asking for:

number_of_units=[]
for items in zip(test_data, price_data):
  if (items[0]!='\xa0'):
    characters = re.sub("\[p].*$|[^\d\.]", "", items[0])
    price_per_unit = characters[0:5]

    price = items[1]

    number_of_units.append(price / float(price_per_unit))
  else:
    number_of_units.append(1)

number_of_units #[10.0, 1, 52.474999999999994, 1, 1, 1, 1, 100.0, 2.255639097744361]

considered '\xa0' elements as 1 indivisible unit.

Using a list to store all the values generated in the loop, with your code you would store only the last.

Comments

0

When you are doing SUBSTR of Test_Data where string like '\xa0' exists then it's giving an empty string in your price_per_unit variable. To avoid this you can replace it with '1' as replacing with '0' would give an error of division by zero.

    import re

test_data = ['\n\t\t\t\t£10.00 per 100ML', '\xa0', '\n\t\t\t\t£0.40 per EACH', '\xa0', '\xa0', '\xa0', '\xa0', '\n\t\t\t\t£0.54 per EACH', '\n\t\t\t\t£1.33 per EACH']
price_data = [100, 10.99, 20.99, 25.25, 30, 35, 40, 54, 3]

for items in zip(test_data, price_data):
    characters = re.sub("\[p].*$|[^\d\.]", "", items[0])
    price_per_unit = characters[0:5]


    price = items[1]

    if price_per_unit == '':
      price_per_unit = '1'
    else:
      price_per_unit

    print('---------')
    number_of_units = price / float(price_per_unit)
    print(number_of_units)

For better understanding of errors it's better to print the variable when you get an error. That's how I came to know why this issue was occurring.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.