How list in pandas dataframe can be converted into multiple cells with no duplicate values?

Question

I have this kind of data:

[{"id": 1, "name": "Alex", "projects": ["A", "B", "C"]}, {"id": 2, "name": "Bob", "projects": None}]

And I need .xslx in this format:

Each element of the list in a separate row but without other columns getting duplicated(merged cells)

But I need to achieve this a dynamic way, I won't have cell indices statically and finding those cell indices will a bit too complex.

I use pandas and xlswriter as engine to generate the .xlsx file. I can use other modules if needed.

Check the explode function and the rest of the Reshaping and pivot tables page. Once you load the data into a dataframe you probably only need df.explode('projects') — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Apr 30 at 10:08

Panagiotis Kanavos · Accepted Answer · 2025-04-30 12:01:21Z

3

You can "explode" the list values to rows using the explode function. You'll find this and other ways to reshape dataframes in the Reshaping and pivot tables page

By default, to_excel generates merged cells for multi-index rows, so you need to use a MultiIndex with this dataframe.

The resulting code is rather short :

import pandas as pd

data=[{"id": 1, "name": "Alex", "projects": ["A", "B", "C"]}, {"id": 2, "name": "Bob", "projects": None}]

df=pd.DataFrame(data)
df=df.explode('projects')
df=df.set_index(['id','name','projects'])
df.to_excel(r'c:\spikes\exploded.xlsx')

Which generates

Generated Excel image(https://i.sstatic.net/DpniEW4E.png)

Loading from a database

A comment suggests the data is loaded from a database. In that case it's probably easier to use read_sql to load the flat data and avoid explode. The index columns can be specified in read_sql directly. The code could look like this :

df=pd.read_sql(sql, conn,index_col=['id','name','projects'])
df.to_excel(r'c:\spikes\exploded.xlsx')

The projects can be grouped afterwards if needed using eg

df_grouped=df.groupby(['id','name']).agg({'projects':list})

edited Apr 30 at 12:01

answered Apr 30 at 10:21

Panagiotis Kanavos

134k16 gold badges211 silver badges270 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Aykhan Aghayev Apr 30 at 11:21

Can you explain why there is only one id but multiple Alex's? And how can I make Alex appear only once as well?

Panagiotis Kanavos Apr 30 at 11:26

Oops, I didn't notice. projects must be added to the index as well. Leaf nodes aren't merged

Aykhan Aghayev Apr 30 at 11:30

gives error if projects added to the index with the following message: TypeError: unhashable type: 'list'

Panagiotis Kanavos Apr 30 at 11:40

Some parentheses were missing in the first snippet. I fixed this too. BTW if you load the data from a database it's easier to use read_sql to read the flat data and save to Excel. If you need to group the projects for something else you can use df.groupby(['id','name']).agg({'projects':list})

Aykhan Aghayev Apr 30 at 11:44

it works if df = df.explode('projects) used instead of df.explode('projects') And pipeline method works fine. Thank you very much @panagiotis

|

Mario Locatelli · Accepted Answer · 2025-04-30 10:01:52Z

1

You can iterate over the rows in Pandas DF and append each row.

import pandas as pd
from openpyxl import Workbook
from openpyxl.utils import get_column_letter

# Sample data
data = [
    {"id": 1, "name": "Alex", "projects": ["A", "B", "C"]},
    {"id": 2, "name": "Bob", "projects": None},
    {"id": 3, "name": "Robby", "projects": ["Z"]}
]

# Prepare a list to hold the rows for the DataFrame
rows = []

# Process the data
for entry in data:
    id_value = entry["id"]
    name_value = entry["name"]
    projects = entry["projects"]
    
    if projects is None:
        rows.append({"id": id_value, "name": name_value, "project": None})
    else:
        for project in projects:
            rows.append({"id": id_value, "name": name_value, "project": project})

# Create a DataFrame
df = pd.DataFrame(rows)

# Create a new Excel workbook and select the active worksheet
wb = Workbook()
ws = wb.active

# Write the header
header = df.columns.tolist()
ws.append(header)

# Write the data
for index, row in df.iterrows():
    ws.append(row.tolist())

# Merge cells for 'id' and 'name' where applicable
for index in range(len(df)):
    if index == 0 or df.iloc[index]['id'] != df.iloc[index - 1]['id']:
        start_row = index + 2  # +2 because openpyxl is 1-indexed and we have a header
        end_row = start_row
        while end_row < len(df) + 1 and df.iloc[end_row - 2]['id'] == df.iloc[start_row - 2]['id']:
            end_row += 1
        if end_row - start_row > 1:
            ws.merge_cells(start_row=start_row, start_column=1, end_row=end_row - 1, end_column=1)  # Merge 'id'
            ws.merge_cells(start_row=start_row, start_column=2, end_row=end_row - 1, end_column=2)  # Merge 'name'

# Save the workbook
wb.save("output.xlsx")

print("Done!")

answered Apr 30 at 10:01

Mario Locatelli

613 bronze badges

3 Comments

Panagiotis Kanavos Apr 30 at 10:22

There's no need to do that. Pandas already has methods to reshape datframes, handle lists and generate merged cells

Aykhan Aghayev Apr 30 at 11:27

I tried this, there are some other complications as well but the main reason I cannot use this is that every time when I fetch data from database, convert it into python list and create dataframe the order of columns change. It got more complicated afterwards so I came here to find new solutions. As a last resort will return back to this approach

Panagiotis Kanavos Apr 30 at 11:33

@AykhanAghayev why don't you use read_sql to load the dataframe directly ? You won't have to use explode then. You can use other operations to reshape the dataframe and group the projects afterwards. You can use pivot_table or group with list as the aggregate function as this question shows

rikyeah · Accepted Answer · 2025-04-30 11:57:47Z

An answer similar to Mario's but with a single pass through the data (and keeps the "project" column as the last one):

data = [{"id": 1, "name": "Alex", "projects": ["A", "B", "C"]},
        {"id": 2, "name": "Bob", "projects": ["D", "E"]},
        {"id": 3, "name": "Charlie", "projects": None},
        {"id": 4, "name": "Devin", "projects": []},
        {"id": 5, "name": "Ellen", "projects": ["F", "G"]},]

# create a new excel file
import openpyxl
wb = openpyxl.Workbook()

# fill the first row with the keys of the first dictionary
for key in data[0].keys():
    wb.active.cell(row=1, column=list(data[0].keys()).index(key) + 1).value = key

current_row = 2
for item in data:

    item_without_projects = item.copy()
    item_without_projects.pop("projects")

    # handle the case where projects is None or an empty list
    if item["projects"] is None or len(item["projects"]) == 0:
        for col, key in enumerate(item_without_projects.keys()):
            wb.active.cell(row=current_row, column=col + 1).value = str(item[key])
            
        current_row += 1
        continue

    # fill a row with data for each project
    for project in item["projects"]:
        max_col = 0
        for col, key in enumerate(item_without_projects.keys()):
            wb.active.cell(row=current_row, column=col + 1).value = str(item[key])
            max_col = max(max_col, col + 1)
        wb.active.cell(row=current_row, column=max_col + 1).value = project
        current_row += 1

    # merge the newly created cells
    for col, key in enumerate(item_without_projects.keys()):
        print(current_row - len(item["projects"]), current_row, col + 1)
        wb.active.merge_cells(start_row=current_row - len(item["projects"]), start_column=col + 1, end_row=current_row - 1, end_column=col + 1)
        # style: set text alignment to center
        cell = wb.active.cell(row=current_row - len(item["projects"]), column=col + 1)
        cell.alignment = openpyxl.styles.Alignment(horizontal='center', vertical='center')

# write the workbook
wb.save("data.xlsx")

This clearly doesn't use pandas, but from the other comments you posted I understand .explode won't solve your problem. I'd be happy to answer if you can better formulate the various issues for the data loading

explode works. And even if it didn't there's no reason to process rows one by one instead of using Dataframe operations
I'm not saying it doesn't work. I'm saying he pointed out the problem was something else in the data loading step

Collectives™ on Stack Overflow

How list in pandas dataframe can be converted into multiple cells with no duplicate values?

3 Answers 3

6 Comments

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related