0

For stream read from FileStore I'm trying to check if first column of first row value is equal to some string. Unfortunately while I access this column in any way e.g. launching .toList() on it, it throws

    if df["Name"].iloc[0].item() == "Bob":
TypeError: 'Column' object is not callable

I'm calling the customProcessing function from:

df.writeStream\
  .format("delta")\
  .foreachBatch(customProcessing)\
[...]

And inside this function I'm trying to get the value, but none of the ways of getting the data works. The same error is being thrown.

    def customProcessing(df, epochId):
      
      if df["Name"].iloc[0].item() == "Bob":
[...]

Is there a possibility for reading single cols? Or it is writeStream specific and I'm unable to use conditions on that input?

1
  • @mck thanks! That works. It was weird that error message does not suggest that this iloc is not supported in this case. If you write an answer, I'll mark it as solution. Thanks! Commented Dec 17, 2020 at 11:21

1 Answer 1

1

There is no iloc for spark dataframes - this is not pandas; also there is no concept of index.

If you want to get the first item you could try

df.select('Name').limit(1).collect()[0][0] == "Bob"
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.