0

How should I be able to use a variable inside a lambda function ?

for a_name in name_field_names:
    results = sqlContext.sql("SELECT * FROM noise_data")
    stringsDS = results.map(lambda p:p.(a_name))

The lambda function is expecting me to give the name of the column, whereas I am giving a variable.

How should I pass the value of the a_name variable to the lambda function ?

1 Answer 1

1

To get a variable from Row by name use bracket notation:

from pyspark.sql import Row

row = Row(a = "foo", b = "bar")
row["a"]
'foo'

or getattr:

getattr(row, "b")
'bar'

You can also skip map and use select:

sqlContext.sql("SELECT * FROM noise_data").select(a_name)

Also remember that Python late bindings. Using variable from the closure inside a function called in a loop is not a good idea. If you want map you should rather capture a_name as an attribute, for example:

from operator import attrgetter

for a_name in name_field_names:
    results = ...
    results.rdd.map(attrgetter(a_name)))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.