0

I am using Spark Java (not scala, python).

I have to change my code so that my spark query will select all columns rather than a specific set of columns. (Like using select *). Before when I had a specific set of columns, it is easy for me to know the exact position/index of each column because it is in the order of my select. However, since I am now selecting all, I do not know the order exactly.

I need the position/index of particular columns so that I can use the function .isNullAt() because it requires position/index and not the string column name.

I am wondering does using dataframe.columns() give me an array which the exact same index/position I can use for the dataframe methods that require an index/position? And then I can search the array using my string column name to get back the correct index?

1

1 Answer 1

0

From your question I'm guessing you're trying to get the index of a field in a row so you can check nullity.

Indeed you could use ds.columns() as it will give you the ordered columns and then use the index from here.

Nevertheless, I would advice to use another method though as you keep the logic inside row processing and it will be more robust. You can use .fieldIndex(String fieldName)

row.isNullAt(row.fieldIndex("my_column_name"))

See more https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Row.html#fieldIndex(java.lang.String)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.