2

I'm trying to create duplicate rows during a dataframe iteration. Basically, I have two for loops wherein in the first loop, I'm feeding values into an API, and in the second loop, I'm extracting values from the JSON output.

I want to duplicate the current row and create N rows based on how many items are on the list. For example:

Name    Date      Sales     
John    1/1/17    100
Bob     1/2/17    200

items = []
for row in df.sales:
    url = 'www.samplewebsite.com/values=xyz/APIKEY=MYAPIKEY'
    result = simplejson.load(urllib.urlopen(url))
    for i in range(0, len(result['column a'][0]['column b']:
        items.append(result['column a'][0]['column b'][i]['item'])

In this particular loop, two lists are created (one for John, the other for Bob):

items = ['Paper','Paper Clips','Pencils']
items = ['Notebook','Stapler','Highlighter','Pen']

Desired output:

Name    Date      Sales     Item
John    1/1/17    100       Paper
John    1/1/17    100       Paper Clips
John    1/1/17    100       Pencils
Bob     1/2/17    200       Notebook
Bob     1/2/17    200       Stapler
Bob     1/2/17    200       Highlighter
Bob     1/2/17    200       Pen

Thank you in advance!

1 Answer 1

1

There are a handful of ways to do this. From inside your loop you could, after extracting each item push one item and one name into a main dataframe. Or, you could push a bunch of items into one df along with one name and then append that to the main df after each name. Or you could gather up all the things and then append them at the very end.

Here's how you would put all items belonging to one name into a df then append it to a master df. You'd have to do this inside the loop, once for each name:

# set this up before the loop
mainDF = pd.DataFrame( columns=['Name','Items'])

## this gets populated inside the loop
name = 'John'
items = ['Paper','Paper Clips','Pencils']

# inside the loop create a df to hold one name and all the items belonging to that name
df = pd.DataFrame( columns=['Name','Items'])

#populate... do items first then fill in all the name with the one name
df.Items = items
df.Name = name

## then append the above df into the main df
mainDF = mainDF.append(df)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks JD! Only issue I run into is that the Name is Null for all of the duplicated rows.
hmmm.... did you add the items in first? That should give you the right number of rows. then set the name column of the inner df to the name you are interested. That should assign the name to all rows regardless of whether it's a dupe or not. Where is the Null coming into play? in the inner or outer DF?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.