1

I am looping through multiple webservices which works fine

customers= json.loads(GetCustomers())

for o in customers["result"]:
  if o["customerId"] is not None:
    custRoles = GetCustomersRoles(o["customerId"])
    custRolesObj = json.loads(custRoles)

    if custRolesObj["result"] is not None:
      for l in custRolesObj["result"]:
        print str(l["custId"]) + ", " + str(o["salesAmount"])

This works, and my output from print is also correct. But, now I need to create a DataFrame out of this. I read, we cannot "create a DataFrame with two columns and add row by row while looping".

But how would I solve this?

Update

I hope this is the correct way to create a list?

customers= json.loads(GetCustomers())
result = []

for o in customers["result"]:
  if o["customerId"] is not None:
    custRoles = GetCustomersRoles(o["customerId"])
    custRolesObj = json.loads(custRoles)

    if custRolesObj["result"] is not None:
      for l in custRolesObj["result"]:
          result.append(make_opportunity(str(l["customerId"]), str(o["salesAmount"])))

When this is correct, how to create a Dataframe out of it?

7
  • 2
    Store your results in a list of tuples (or lists) and then create the spark DataFrame at the end. You can add a row inside a loop but it would be terribly inefficient Commented Oct 11, 2018 at 18:57
  • As @pault stated, I would definitely not add (or append) rows to a dataframe inside of a for loop. It will be terribly inefficient. Much more performant to create the dataframe all at once outside of the loop after assembling your data. On that note, you should include a sample of your data in your OP. Commented Oct 11, 2018 at 19:01
  • @pault: Could you give me a sample for a two column scenario please? Commented Oct 11, 2018 at 19:10
  • df = spark.createDataFrame([('a', 1), ('b', 2), ('c', 3)], ["letter", "number"]). Also take a look at this post. Commented Oct 11, 2018 at 19:31
  • 1
    Thats what i know now and know how to imlement in theory. But how to do this practically eith a small code snippet. This would be the answer and what i am looking for. Commented Oct 12, 2018 at 5:46

1 Answer 1

2

I solved my problem by using the following code

customers= json.loads(GetCustomers())
result = []

for o in customers["result"]:
  if o["customerId"] is not None:
    custRoles = GetCustomersRoles(o["customerId"])
    custRolesObj = json.loads(custRoles)

    if custRolesObj["result"] is not None:
      for l in custRolesObj["result"]:
          result.append([str(l["customerId"]), str(o["salesAmount"])])

from pyspark.sql import *

df = spark.createDataFrame(result,['customerId', 'salesAmount'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.