Pandas - Duplicate Rows at Iteration

Question

I'm trying to create duplicate rows during a dataframe iteration. Basically, I have two for loops wherein in the first loop, I'm feeding values into an API, and in the second loop, I'm extracting values from the JSON output.

I want to duplicate the current row and create N rows based on how many items are on the list. For example:

Name    Date      Sales     
John    1/1/17    100
Bob     1/2/17    200

items = []
for row in df.sales:
    url = 'www.samplewebsite.com/values=xyz/APIKEY=MYAPIKEY'
    result = simplejson.load(urllib.urlopen(url))
    for i in range(0, len(result['column a'][0]['column b']:
        items.append(result['column a'][0]['column b'][i]['item'])

In this particular loop, two lists are created (one for John, the other for Bob):

items = ['Paper','Paper Clips','Pencils']
items = ['Notebook','Stapler','Highlighter','Pen']

Desired output:

Name    Date      Sales     Item
John    1/1/17    100       Paper
John    1/1/17    100       Paper Clips
John    1/1/17    100       Pencils
Bob     1/2/17    200       Notebook
Bob     1/2/17    200       Stapler
Bob     1/2/17    200       Highlighter
Bob     1/2/17    200       Pen

Thank you in advance!

JD Long · Accepted Answer · 2017-03-29 20:48:02Z

1

There are a handful of ways to do this. From inside your loop you could, after extracting each item push one item and one name into a main dataframe. Or, you could push a bunch of items into one df along with one name and then append that to the main df after each name. Or you could gather up all the things and then append them at the very end.

Here's how you would put all items belonging to one name into a df then append it to a master df. You'd have to do this inside the loop, once for each name:

# set this up before the loop
mainDF = pd.DataFrame( columns=['Name','Items'])

## this gets populated inside the loop
name = 'John'
items = ['Paper','Paper Clips','Pencils']

# inside the loop create a df to hold one name and all the items belonging to that name
df = pd.DataFrame( columns=['Name','Items'])

#populate... do items first then fill in all the name with the one name
df.Items = items
df.Name = name

## then append the above df into the main df
mainDF = mainDF.append(df)

answered Mar 29, 2017 at 20:48

JD Long

61k58 gold badges209 silver badges300 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Walt Reed Over a year ago

Thanks JD! Only issue I run into is that the Name is Null for all of the duplicated rows.

JD Long Over a year ago

hmmm.... did you add the items in first? That should give you the right number of rows. then set the name column of the inner df to the name you are interested. That should assign the name to all rows regardless of whether it's a dupe or not. Where is the Null coming into play? in the inner or outer DF?

Collectives™ on Stack Overflow

Pandas - Duplicate Rows at Iteration

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related