Splitting multiple columns into rows in pandas dataframe

Question

I have a pandas dataframe as follows:

ticker    account      value         date
aa       assets       100,200       20121231, 20131231
bb       liabilities  50, 150       20141231, 20131231

I would like to split df['value'] and df['date'] so that the dataframe looks like this:

ticker    account      value         date
aa       assets       100           20121231
aa       assets       200           20131231 
bb       liabilities  50            20141231
bb       liabilities  150           20131231

Would greatly appreciate any help.

Does this answer your question? Efficient way to unnest (explode) multiple list columns in a pandas DataFrame — Pygirl
– Pygirl, Commented Jan 23, 2021 at 13:20

jezrael · Accepted Answer · 2016-07-29 05:25:29Z

16

You can first split columns, create Series by stack and remove whitespaces by strip:

s1 = df.value.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)
s2 = df.date.str.split(',', expand=True).stack().str.strip().reset_index(level=1, drop=True)

Then concat both Series to df1:

df1 = pd.concat([s1,s2], axis=1, keys=['value','date'])

Remove old columns value and date and join:

print (df.drop(['value','date'], axis=1).join(df1).reset_index(drop=True))
  ticker      account value      date
0     aa       assets   100  20121231
1     aa       assets   200  20131231
2     bb  liabilities    50  20141231
3     bb  liabilities   150  20131231

answered Jul 29, 2016 at 5:25

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tan Over a year ago

Thank you jezrael and piRSquared for both answers!! jezrael, your approach worked efficiently.

Community · Accepted Answer · 2017-05-23 11:54:41Z

9

I'm noticing this question a lot. That is, how do I split this column that has a list into multiple rows? I've seen it called exploding. Here are some links:

So I wrote a function that will do it.

def explode(df, columns):
    idx = np.repeat(df.index, df[columns[0]].str.len())
    a = df.T.reindex_axis(columns).values
    concat = np.concatenate([np.concatenate(a[i]) for i in range(a.shape[0])])
    p = pd.DataFrame(concat.reshape(a.shape[0], -1).T, idx, columns)
    return pd.concat([df.drop(columns, axis=1), p], axis=1).reset_index(drop=True)

But before we can use it, we need lists (or iterable) in a column.

Setup

df = pd.DataFrame([['aa', 'assets',      '100,200', '20121231,20131231'],
                   ['bb', 'liabilities', '50,50',   '20141231,20131231']],
                  columns=['ticker', 'account', 'value', 'date'])

df

split value and date columns:

df.value = df.value.str.split(',')
df.date = df.date.str.split(',')

df

Now we could explode on either column or both, one after the other.

Solution

explode(df, ['value','date'])

Timing

I removed strip from @jezrael's timing because I could not effectively add it to mine. This is a necessary step for this question as OP has spaces in strings after commas. I was aiming at providing a generic way to explode a column given it already has iterables in it and I think I've accomplished that.

code

def get_df(n=1):
    return pd.DataFrame([['aa', 'assets',      '100,200,200', '20121231,20131231,20131231'],
                         ['bb', 'liabilities', '50,50',   '20141231,20131231']] * n,
                        columns=['ticker', 'account', 'value', 'date'])

small 2 row sample

medium 200 row sample

large 2,000,000 row sample

edited May 23, 2017 at 11:54

CommunityBot

11 silver badge

answered Jul 29, 2016 at 7:00

piRSquared

296k68 gold badges509 silver badges654 bronze badges

3 Comments

jezrael Over a year ago

I am very curious for timings ;) iterritems is slow, but on the other hand there is a lot operation like stack, concat and join, so maybe this can be comparable.

jezrael Over a year ago

I see difference in solution - I use strip. Can you add it to your solution too and then try timings again? I think you forget for it.

piRSquared Over a year ago

@jezrael updated. Notice what I wrote just below ###Timing

titipata · Accepted Answer · 2017-05-13 02:19:26Z

I wrote explode function based on previous answers. It might be useful for anyone who want to grab and use it quickly.

def explode(df, cols, split_on=','):
    """
    Explode dataframe on the given column, split on given delimeter
    """
    cols_sep = list(set(df.columns) - set(cols))
    df_cols = df[cols_sep]
    explode_len = df[cols[0]].str.split(split_on).map(len)
    repeat_list = []
    for r, e in zip(df_cols.as_matrix(), explode_len):
        repeat_list.extend([list(r)]*e)
    df_repeat = pd.DataFrame(repeat_list, columns=cols_sep)
    df_explode = pd.concat([df[col].str.split(split_on, expand=True).stack().str.strip().reset_index(drop=True)
                            for col in cols], axis=1)
    df_explode.columns = cols
    return pd.concat((df_repeat, df_explode), axis=1)

example given from @piRSquared:

df = pd.DataFrame([['aa', 'assets', '100,200', '20121231,20131231'],
                   ['bb', 'liabilities', '50,50', '20141231,20131231']],
                  columns=['ticker', 'account', 'value', 'date'])
explode(df, ['value', 'date'])

output

+-----------+------+-----+--------+
|    account|ticker|value|    date|
+-----------+------+-----+--------+
|     assets|    aa|  100|20121231|
|     assets|    aa|  200|20131231|
|liabilities|    bb|   50|20141231|
|liabilities|    bb|   50|20131231|
+-----------+------+-----+--------+

Pygirl · Accepted Answer · 2020-09-20 15:43:59Z

2

Pandas >= 0.25

df.value = df.value.str.split(',')
df.date = df.date.str.split(',')
df = df.explode('value').explode("date").reset_index(drop=True)

df:

    ticker  account      value  date
0   aa      assets       100    20121231
1   aa      assets       100    20131231
2   aa      assets       200    20121231
3   aa      assets       200    20131231
4   bb      liabilities  50     20141231
5   bb      liabilities  50     20131231
6   bb      liabilities  50     20141231
7   bb      liabilities  50     20131231

answered Sep 20, 2020 at 15:43

Pygirl

13.4k6 gold badges36 silver badges48 bronze badges

Comments

marc_s · Accepted Answer · 2017-08-24 20:35:41Z

Because I'm too new, I'm not allowed to write a comment, so I write an "answer".

@titipata your answer worked really good, but in my opinion there is a small "mistake" in your code I'm not able to find for my self.

I work with the example from this question and changed just the values.

df = pd.DataFrame([['title1', 'publisher1', '1.1,1.2', '1'],
               ['title2', 'publisher2', '2', '2.1,2.2']],
              columns=['titel', 'publisher', 'print', 'electronic'])

explode(df, ['print', 'electronic'])

    publisher   titel   print   electronic
0   publisher1  title1  1.1     1
1   publisher1  title1  1.2     2.1
2   publisher2  title2  2       2.2

As you see, in the column 'electronic' should be in row '1' the value '1' and not '2.1'.

Because of that, the hole DataSet would change. I hope someone could help me to find a solution for this.

Collectives™ on Stack Overflow

Splitting multiple columns into rows in pandas dataframe

5 Answers 5

1 Comment

Setup

Solution

Timing

3 Comments

Comments

Pandas >= 0.25

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Setup

Solution

Timing

3 Comments

Comments

Pandas >= 0.25

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related