How to get dataFrame array value in a empty python array

Question

I am working with databricks dataframe(pyspark)

I have a dataframe that contains a array with string value.

I need to use the df value to assemble with value from a python array that i have.

What i want is to put the df value in a python array like this:

listArray = []

listArray.append(dataframeArrayValue)

print(listArray)
outPut:
     [value1, value2, value3]

The problem I get is that it kind off work, but for some reason I can not work with the string value that is added to the new array list(listArray).

My concept is that i am gonna build a url, where i need to use SQL to get the begining information of that url. That first part is what i put in the df array. For the last part off the url, i have that stored in a python array.

I want to loop through both array, and put the result in a empty array.

Something like this:

display(dfList)
outPut:
      [dfValue1, dafValue2, dfValue3]

print(pyList)
      [pyValue1, pyValue2, pyValue3]

Whant to put them together like this:

dfValue1 + pyValue2 etc..

And getting a array like this:

newArrayContainingBoth = []

-- loop with append

result:

print(newArrayContainingBoth)
outPut:
[dfValue1+pyValue1, dfValue2+pyValue2, dfValue3+pyValue]

Hope my question was clear enough

I have not made a loop jet. A problem is that from the df it looks like something like this: [value1, value2], but when i try to get the first element dfList[0], i get [value1, value2]. Idk why it is like that, cause teoretical, it is supose to get me [value1] only. Sorry for bad english — celllaa95
– celllaa95, Commented Oct 28, 2018 at 21:40
On a note, are you sure? df = [value1, value2] and can you show some sample value df. Also, if you do python_list = df.collect(), all you have is list in python_list. — pvy4917
– pvy4917, Commented Oct 28, 2018 at 21:50

pvy4917 · Accepted Answer · 2018-11-01 18:49:28Z

1

Try this steps,

You can use explode() to get a string from that array. Then,
collect() as list,
Extract string part from the Row,
split() by a comma (",").
Finally, use it.

First import explode(),

from pyspark.sql.functions import explode

Assuming your context in DataFrame "df"

columns = ['nameOffjdbc', 'some_column']
rows = [
        (['/file/path.something1'], 'value1'),
        (['/file/path.something2'], 'value2')
        ]

df = spark.createDataFrame(rows, columns)
df.show(2, False)
+-----------------------+-----------+
|nameOffjdbc            |some_column|
+-----------------------+-----------+
|[/file/path.something1]|value1     |
|[/file/path.something2]|value2     |
+-----------------------+-----------+

Select the column nameOffjdbc from DataFrame 'df'

dfArray = df.select('nameOffjdbc')
print(dfArray)
DataFrame[nameOffjdbc: array<string>]

Explode the column `nameOffjdbc`

dfArray = dfArray.withColumn('nameOffjdbc', explode('nameOffjdbc'))
dfArray.show(2, False)
+---------------------+
|nameOffjdbc          |
+---------------------+
|/file/path.something1| 
|/file/path.something2|
+---------------------+

Now collect it to newDfArray (This is a python list that you need).

newDfArray = dfArray.collect()
print(newDfArray)
[Row(nameOffjdbc=u'/file/path.something1'), 
     Row(nameOffjdbc=u'/file/path.something2')]

Since, it is (will be) in the format `[Row(column)=u'value']`. We need to get the `value (string)` part of it. hence,

pyList = ",".join(str('{0}'.format(value.nameOffjdbc)) for value in newDfArray)
print(pyList, type(pyList))
('/file/path.something1,/file/path.something2', <type 'str'>)

Split the value by a comma ",", which will create a `list` out of a `string`.

pyList = pyList.split(',')
print(pyList, type(pyList))
(['/file/path.something1', '/file/path.something2'], <type 'list'>)

Use it

print(pyList[0])
/file/path.something1

print(pyList[1])
/file/path.something2

If you want to loop

for items in pyList:
    print(items)
/file/path.something1
/file/path.something2

In a nut shell the following code is all you need.

columns = ['nameOffjdbc', 'some_column']
rows = [
    (['/file/path.something1'], 'value1'),
    (['/file/path.something2'], 'value2')
    ]
df = spark.createDataFrame(rows, columns)

dfArray = df.select('nameOffjdbc')

dfArray = dfArray.withColumn('nameOffjdbc', explode('nameOffjdbc')).collect()
pyList = ",".join(str('{0}'.format(value.nameOffjdbc)) for value in dfArray).split(',')

NOTE: collect() always collects a DataFrame values into a list.

For more information, refer:

explode(): https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#explode(org.apache.spark.sql.Column)

answered Nov 1, 2018 at 18:49

pvy4917

1,83820 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

celllaa95 Over a year ago

Thank you! It works nice! The only problem is when i got to your "Use it" step, my print looks like this: print(pyList[0]) [u'/file/path.something1' Do you know why? Or does that not matter if it looks like that?

celllaa95 Over a year ago

I dont want the [ u and ' to be a part off my string

celllaa95 Over a year ago

Update: I fixt that porblem with .replace()

pvy4917 Over a year ago

You need to follow every step and then use it.

Collectives™ on Stack Overflow

How to get dataFrame array value in a empty python array

1 Answer 1

Explode the column `nameOffjdbc`

Now collect it to newDfArray (This is a python list that you need).

Since, it is (will be) in the format `[Row(column)=u'value']`. We need to get the `value (string)` part of it. hence,

Split the value by a comma ",", which will create a `list` out of a `string`.

Use it

If you want to loop

In a nut shell the following code is all you need.

NOTE: collect() always collects a DataFrame values into a list.

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Explode the column nameOffjdbc

Now collect it to newDfArray (This is a python list that you need).

Since, it is (will be) in the format [Row(column)=u'value']. We need to get the value (string) part of it. hence,

Split the value by a comma ",", which will create a list out of a string.

Use it

If you want to loop

In a nut shell the following code is all you need.

NOTE: collect() always collects a DataFrame values into a list.

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Explode the column `nameOffjdbc`

Since, it is (will be) in the format `[Row(column)=u'value']`. We need to get the `value (string)` part of it. hence,

Split the value by a comma ",", which will create a `list` out of a `string`.