0

I have a dataframe as follows:

|Property ID|Location|Price|Bedrooms|Bathrooms|Size|Price SQ Ft|Status|

When I am filtering it with bedrooms or bathrooms it is giving correct answer

df = spark.read.csv('/FileStore/tables/realestate.txt', header=True, inferSchema=True, sep='|')
df.filter(df.Bedrooms==2).show()

But when I am filtering it with Property ID as df.filter(df.Property ID==1532201).show() , I am getting an error. Is it because there is a space in betweeen Property and ID ?

2 Answers 2

2

the space between Property and ID is the cause of issue. Another approach you can follow is as follows :

from pyspark.sql import functions as F
df.filter(F.col('Property ID')==1532201).show()
Sign up to request clarification or add additional context in comments.

1 Comment

Ok Thanks. I was able to do it using df.filter(col("Property ID")==1499102).show().
1

You can also use the square bracket notation to select the column:

df.filter(df['Property ID'] == 1532201).show()

Or use a raw SQL string to filter: (note the backticks)

df.filter('`Property ID` = 1532201').show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.