14

How to add new feature like length of data frame & Drop rows value using indexing. I want to a add a new column where I can count the no-of rows available in a data frame, & using indexing drop rows value.

for i in range(len(df)):
    if (df['col1'][i] == df['col2'][i]) and (df['col4'][i] == df['col3'][i]):
        pass
    elif (df['col1'][i] == df['col3'][i]) and (df['col4'][i] == df['col2'][i]): 
        df['col1'][i] = df['col2'][i]
        df['col4'][i] = df['col3'][i]
    else:
       df = df.drop(i)
1
  • Please provide enough code so others can better understand or reproduce the problem. Commented Mar 16, 2022 at 8:22

2 Answers 2

18

Polars doesn't allow much mutation and favors pure data handling. Meaning that you create a new DataFrame instead of modifying an existing one.

So it helps to think of the data you want to keep instead of the row you want to remove.

Below I have written an example that keeps all data except for the 2nd row. Note that the slice will be the fastest of the two and will have zero data copy.

df = pl.DataFrame({
    "a": [1, 2, 3],
    "b": [True, False, None]
}).with_row_index()

print(df)

# filter on condition
df_a = df.filter(pl.col("index") != 1)

# stack two slices
df_b = df[:1].vstack(df[2:])

# or via explicit slice syntax
# df_b = df.slice(0, 1).vstack(df.slice(2, -1))

assert df_a.equals(df_b)

print(df_a)

Outputs:

shape: (3, 3)
┌───────┬─────┬───────┐
│ index ┆ a   ┆ b     │
│ ---   ┆ --- ┆ ---   │
│ u32   ┆ i64 ┆ bool  │
╞═══════╪═════╪═══════╡
│ 0     ┆ 1   ┆ true  │
│ 1     ┆ 2   ┆ false │
│ 2     ┆ 3   ┆ null  │
└───────┴─────┴───────┘

shape: (2, 3)
┌───────┬─────┬──────┐
│ index ┆ a   ┆ b    │
│ ---   ┆ --- ┆ ---  │
│ u32   ┆ i64 ┆ bool │
╞═══════╪═════╪══════╡
│ 0     ┆ 1   ┆ true │
│ 2     ┆ 3   ┆ null │
└───────┴─────┴──────┘

Sign up to request clarification or add additional context in comments.

4 Comments

It says slice is faster but do you mean slice? Since there's no method call for slice here. :)
[:3] is syntactic sugar for slice(0, 3)
Right. I was looking at your answer and trying to understand what part of the code you said was better, so hence the question.
I shall add an example with an explicit slice as well. :+1:
0

I think the column you want to add index for getting length of df col, to remove certain rows you need to add index the drop using masking:

df.with_row_index().filter(~pl.col("index").is_in(your_index_points))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.