1

I had this line of code in python:

d = float(round(100.00 - (null_count / total) * 100, 2))

I wanted to convert it into PySpark code so I wrote this:

d = round((100.00-(null_count/total)*100).cast("float"), 2)

but this gives the error

'float' object has no attribute 'cast'

1
  • what are null_count and total used in the pyspark code? are they column names? cast can change the data type of a column, not a variable. a variable will still use python methods. Commented Sep 22, 2022 at 12:40

1 Answer 1

1

In programming, you must know your data types (classes).

You wanted to use this cast method:

Column.cast(dataType: Union[pyspark.sql.types.DataType, str]) → pyspark.sql.column.Column

You must know your data types (classes)

A.cast(B) → C

A: The parent class of the method. It's pyspark.sql.column.Column class (a.k.a. pyspark.sql.Column).
B: Inputs for the method. According to the above documentation line, you can use either pyspark.sql.types.DataType or str class.
C: The output class. According to the above documentation line, it's pyspark.sql.column.Column.

In your case, your actual A is of wrong data type to be chained with cast.
In other words, the class of A, doesn't have a cast method.
In other words, as your A = number1-number2/number3*number4 which means it's a float class object , the error precisely tells you that "'float' object has no attribute 'cast'".


Regarding the translation of your Python code to PySpark, it doesn't really make sense. It's because you do the calculation for variables. I mean, only 2 variables. The pyspark.sql.Column objects are called columns, because they contain many different values. So you must create a dataframe (just columns are not enough for actual calculations) and put some values in columns in order to make sense of translating the formula to PySpark.

I'll just show you how it may work if you had just one row.

Creating Spark session (not needed if you run the code in PySpark shell):

from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()

Creating and printing the dataframe:

df = spark.createDataFrame([(2, 50)], ['null_count', 'total'])
df.show()
# +----------+-----+
# |null_count|total|
# +----------+-----+
# |         2|   50|
# +----------+-----+

Adding a column using your logic, but working with Spark columns instead of Python variables.

df = df.withColumn('d', F.round(100 - F.col('null_count') / F.col('total') * 100, 2).cast('float'))
df.show()
# +----------+-----+----+
# |null_count|total|   d|
# +----------+-----+----+
# |         2|   50|96.0|
# +----------+-----+----+

Python's round was also replaced with PySpark's F.round, because the argument to the function will now be Spark column expression (i.e. a column) as opposed to a single value or variable.

Sign up to request clarification or add additional context in comments.

4 Comments

excellent explanation man please recommend me from where can i get my basics of pyspark done.
Thank you. I've reviewed what they have at Tutorialspoint, so I cannot recommend that. It seems they don't use dataframes and overall everything there feels too complicated and outdated. I suggest looking for a course or book which is as much up-to-date as possible. Try not to touch on RDDs until you get a feel for Dataframes. Dataframes are more user-friendly and I think these days they are used more frequently. If you can get a recently published book, it would be great as books tend to be thought-through more than online courses. Summary: Dataframes, up-to-date resource, maybe a fresh book.
actually there is one problem with the snippet you gave me i dont want to add a column i just need to print the output of d and null_count and total are variables and not columns.
Then don't use Spark. Spark is not a tool to do simple calculations of just few variables. Python looks well for your use case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.