2

i have a Dataframe like this

Names;        Count;  PartNr
R1, R2,...Rn; n;       1234-5678
C1, C2;       2;       1234-6789

The list should be exported to a csv file for importing in another proprietary software. The software accepts up to 100 chars in the "Names" column, if there's more data, i should wrap the existing row, copy the whole column and add the remaining names. So there should be multiple rows with a maximum of 100 characters in the Names column. The absolute count of the parts should be only in the first row, so the Count value should be set to zero.

Names;             Count; PartNr
R1, R2,...Ra;        n;     1234-5678
Ra+1, Ra+2,...Rb;    0;     1234-5678   
Rb+1, Rb+2,...Rn;    0;     1234-5678
C1, C2;              2;     1234-6789

Is there a nice way, to modify this directly in pandas?

I tried to iterate through the rows, but i am not allowed to modify the dataframe i'm iterating through, so this didn't work. Any better solutions?

The Dataframes are from 10 to 1000 times long and only a few rows have too long Names, so performance isn't really important.

4
  • Have you tried iterating through df_copy = df.copy() instead of the original dataframe df? Commented Jan 16, 2017 at 15:40
  • You mean copying the original one, and then iterating and append again? Sounds good, i will try it. Commented Jan 16, 2017 at 15:41
  • 1
    Also, if you're allowed to "wrap" the Rn values, is there a reason why you wouldn't just put a single value on each row? Commented Jan 16, 2017 at 15:41
  • Yeah, that would be possible too, but the file would get very confusting in my opinion. There are only a few lines, which should get 'duplicated' Commented Jan 16, 2017 at 15:48

1 Answer 1

1

I don't think there is a particularly nice way to do this in Pandas. Whenever a DataFrame holds a column of lists, and you want to do some kind of calculation which requires iteration over the list, you are forced to call Python functions once for each item (i.e. list) in the column. This hurts performance since there is no way to apply Pandas' fast vectorized operation here. The moral of the story is avoid putting lists into a DataFrame if possible.

(Of course, in your case, it seems you want to prepare a CSV in a particular format for another application. So if that requires lists in a DataFrame, so be it.)

As you mentioned, you can iterate over the rows. You could use iterrows or itertuples. I'll use itertuples since it tends to be faster:

import itertools as IT
import numpy as np
import pandas as pd

Rs = ['R{}'.format(i) for i in range(1,251)]
Cs = ['C1', 'C2']
df = pd.DataFrame({'Names': [Rs, Cs], 'Count': ['n',0], 
                   'PartNr':['1234-5678','1234-6789']})

def chunks(seq, n):
    # http://stackoverflow.com/a/312464/190597 (Ned Batchelder)
    """ Yield successive n-sized chunks from seq."""
    for i in range(0, len(seq), n):
        yield seq[i:i + n]

result = []
for row in df.itertuples():
    result.append(pd.DataFrame({'Names': list(chunks(row.Names, 100)),
                          'Count':row.Count,
                          'PartNr':row.PartNr}))

result = pd.concat(result, axis=0, ignore_index=True)
print(result)

yields

  Count                                              Names     PartNr
0     n  [R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11,...  1234-5678
1     n  [R101, R102, R103, R104, R105, R106, R107, R10...  1234-5678
2     n  [R201, R202, R203, R204, R205, R206, R207, R20...  1234-5678
3     0                                           [C1, C2]  1234-6789

Inside the itertuples for-loop, a new DataFrame is constructed for each row. The DataFrames are collected into a list called result. After the for-loop is done, the list of DataFrames is concatenated into a single DataFrame.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.