2

I have a table like this

   +------+------------+
   | fruit|fruit_number|
   +------+------------+
   | apple|          20|
   |orange|          33|
   |  pear|          27|
   | melon|          31|
   |  plum|           8|
   |banana|           4|
   +------+------------+

I want to generate the percentage of each row but when I sum up the percentage column I could not get 100% Here is the code I generate in pyspark

    from pyspark import SparkConf, SparkContext
    from pyspark.sql import SQLContext, HiveContext,Row
    sqlContext = HiveContext(sc)
    from pyspark.sql.types import StringType, IntegerType,       StructType, StructField,LongType
    from pyspark.sql.functions import sum, mean,col


    rdd = sc.parallelize([('apple', 20),
    ('orange',33),
    ('pear',27),
    ('melon',31),
    ('plum',8),
    ('banana',4)])
    schema = StructType([StructField('fruit', StringType(), True),
                 StructField('fruit_number', IntegerType(),True)])
    df = sqlContext.createDataFrame(rdd, schema)
    df.registerTempTable('fruit_df_sql')

    #total_num = 123
    df_percent=spark.sql("""select fruit, round(fruit_number/123*100,2) as cnt_percent 
         from fruit_df_sql
         order by cnt_percent desc """)

     df_percent.agg(sum('cnt_percent')).show()

but I got a result like this

     +----------------+
     |sum(cnt_percent)|
     +----------------+
     |           99.99|
     +----------------+

not 100%, how to handle this precision error? Thank you

1 Answer 1

2

Change round second parameter to 1 and the precision error will disppear. Unfortunately, 123 is not the best number to divide from, and increasing precision you will increase your error.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.