How to iterate over DataFrame and generate a new DataFrame

Question

I have a data frame looks like this:

The objective is to check if there is any value in L, if yes, extract the value on L and P column:

P L
1 3
4,6
4,7

Note there might more than one values in L, in the case of more than 1 value, I would need two rows.

Bellow is my current script, it cannot generate the expected result.

df2 = []
ego
other
newrow = []

for item in data_DF.iterrows():
    if item[1]["L"] is not None:
        ego = item[1]['P']
        other = item[1]['L']
        newrow = ego + other + "\n"
        df2.append(newrow)

data_DF2 = pd.DataFrame(df2)

Are you values in L lists, a string of numbers, etc.. can you post raw input data and code to reproduce your df — EdChum
– EdChum, Commented Dec 3, 2015 at 11:10
You need to post the raw data, so we can see if the values in L are '' (empty string), NaN, or something else. And did they come in from pd.read_csv(), and if so, which dtypes and arguments were specified? You can tell read_csv how you want it to handle NaNs, and you can defined '' as a NaN value. So you can prevent this issue ever arising. — smci
– smci, Commented Jan 21, 2022 at 13:45
This is avoidable, and a possible non-issue. You're probably creating the issue yourself, possibly with pd.read_csv(). You haven't given enough detail to tell. — smci
– smci, Commented Jan 21, 2022 at 15:23

Stefan · Accepted Answer · 2015-12-03 18:19:42Z

2

First, you can extract all rows of the L and P columns where L is not missing like so:

df2 = df[~pd.isnull(df.L)].loc[:, ['P', 'L']].set_index('P')

Next, you can deal with the multiple values in some of the remaining L rows as follows:

df2 = df2.L.str.split(',', expand=True).stack()
df2 = df2.reset_index().drop('level_1', axis=1).rename(columns={0: 'L'}).dropna()
df2.L = df2.L.str.strip()

To explain: with P as index, the code splits the string content of the L column on ',' and distributes the individual elements across various columns. It then stacks the various new columns into a single new column, and cleans up the result.

edited Dec 3, 2015 at 18:19

answered Dec 3, 2015 at 12:05

Stefan

43.1k13 gold badges80 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2015-12-03 12:19:49Z

1

First I extract multiple values of column L to new dataframe s with duplicity index from original index. Remove unnecessary columns L and Q. Then output join to original df and drop rows with NaN values.

print df
   P  Q    L
0  1  2    3
1  2  3  NaN
2  4  5  6,7

s = df['L'].str.split(',').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
s.name = 'L'
print s
0    3
2    6
2    7
Name: L, dtype: object

df = df.drop( ['L', 'Q'], axis=1)
df = df.join(s)
print df
   P    L
0  1    3
1  2  NaN
2  4    6
2  4    7
df = df.dropna().reset_index(drop=True)
print df
   P  L
0  1  3
1  4  6
2  4  7

answered Dec 3, 2015 at 12:19

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Comments

pavelpok · Accepted Answer · 2022-01-21 13:40:59Z

0

I was solving a similar issue when I needed to create a new dataframe as a subset of a larger dataframe. Here's how I went about generating the second dataframe:

import pandas as pd

df2 = pd.DataFrame(columns=['column1','column2'])
for i, row in df1.iterrows():
    if row['company_id'] == 12345 or row['company_id'] == 56789:
        df2 = df2.append(row, ignore_index = True)

answered Jan 21, 2022 at 13:40

pavelpok

476 bronze badges

Collectives™ on Stack Overflow

How to iterate over DataFrame and generate a new DataFrame

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related