Bumped by Community user

occurred Apr 30, 2019 at 2:01

Bumped by Community user

occurred Dec 31, 2018 at 0:01

Bumped by Community user

occurred Nov 30, 2018 at 23:00

Bumped by Community user

occurred Oct 31, 2018 at 23:00

Bumped by Community user

occurred Oct 1, 2018 at 22:00

Bumped by Community user

occurred Sep 1, 2018 at 21:01

Bumped by Community user

occurred Aug 2, 2018 at 21:00

Bumped by Community user

occurred Jul 3, 2018 at 20:53

Bumped by Community user

occurred Jun 3, 2018 at 20:16

Tweeted twitter.com/StackCodeReview/status/1002821075296833536

occurred Jun 2, 2018 at 7:55

Bumped by Community user

occurred May 4, 2018 at 19:32

It's about Pandas slecting rows. CSV is just a real-world constraint which informs a decision as explained.

Source Link

edited Apr 1, 2018 at 20:34

xtian

313
3
10

Munging select rows in CSV files and resaving them as CSVusing Pandas

Now, Pandas groupby might seem like a perfect solution, but since I need to export the munged file as CSV, this seemed like something I could do only to have toonly to have to undo.

Munging CSV files and resaving them as CSV

Now, Pandas groupby might seem like a perfect solution, but since I need to export the munged file as CSV, this seemed like something I could do only to have to undo.

Munging select rows in CSV files using Pandas

Now, Pandas groupby might seem like a perfect solution, but since I need to export the munged file as CSV, this seemed like something I could do only to have to undo.

added 27 characters in body; edited tags; edited title

Source Link

edited Apr 1, 2018 at 19:26

Jamal

35.2k
13
134
238

Sum only some duplicate rows (help me grok Pandas) Munging CSV files and resaving them as CSV

Now, Pandas groupby might seem like a perfect solution, but since I need to export the munged file as CSV, this seemed like something I could do only to'have'tato have to undo.

  AMOUNT    KEY     Line
0   7.98    Apples  1
2   5.99    Oranges 2
3   5.98    Pears   3
5   0.99    Carrots 4
6   0.99    Carrots 5

  AMOUNT    KEY     Line
0   7.98    Apples  1
2   5.99    Oranges 2
3   5.98    Pears   3
5   0.99    Carrots 4
6   0.99    Carrots 5

Sum only some duplicate rows (help me grok Pandas)

Now, Pandas groupby might seem like a perfect solution, but since I need to export the munged file as CSV, this seemed like something I could do only to'have'ta undo.

  AMOUNT    KEY     Line
0   7.98    Apples  1
2   5.99    Oranges 2
3   5.98    Pears   3
5   0.99    Carrots 4
6   0.99    Carrots 5

Munging CSV files and resaving them as CSV

Now, Pandas groupby might seem like a perfect solution, but since I need to export the munged file as CSV, this seemed like something I could do only to have to undo.

  AMOUNT    KEY     Line
0   7.98    Apples  1
2   5.99    Oranges 2
3   5.98    Pears   3
5   0.99    Carrots 4
6   0.99    Carrots 5

python csv pandas

Source Link

asked Apr 1, 2018 at 18:34

xtian

313
3
10

Sum only some duplicate rows (help me grok Pandas)

I'm learning Pandas. I have a project where I need to munge some CSV files and resave them as CSV. I can use dictionaries and CSV module, but decided to use DataFrames to get more exposure and practice with Pandas.

The task is to sum() values for some keys while others are not summed. My solution was to add a column called "Line" where items with the same line number would be summed.

sales = [{"Line": 1, "KEY": "Apples", "AMOUNT": 3.99},
         {"Line": 1, "KEY": "Apples", "AMOUNT": 3.99},
         {"Line": 2, "KEY": "Oranges", "AMOUNT": 5.99},
         {"Line": 3, "KEY": "Pears", "AMOUNT": 2.99},
         {"Line": 3, "KEY": "Pears", "AMOUNT": 2.99},
         {"Line": 4, "KEY": "Carrots", "AMOUNT": .99},
         {"Line": 5, "KEY": "Carrots", "AMOUNT": .99},
        ]

df = pd.DataFrame(sales)

Now, Pandas groupby might seem like a perfect solution, but since I need to export the munged file as CSV, this seemed like something I could do only to'have'ta undo.

# Find duplicate Line entries
# Subset df into just duplicate `Line` values
df_tmp = df[df.duplicated(subset="Line", keep=False)]

# Save a list of Line numbers to sum
line_dups = df_tmp['Line'].unique()

for x in line_dups:
    # Sum every line in the DF; one value sum is unchanged
    # asum = df.loc[df['Line'] == x, 'AMOUNT'].sum()
    # or
    # Subset the subset
    df_tmp2 = df_tmp[df_tmp["Line"] == x]
    # sum the sub-subset
    asum = df_tmp2['AMOUNT'].sum()
    # set the value of all keys with the same Line value
    df.loc[df['Line'] == x, 'AMOUNT'] = asum
    # take the inverse of the duplicate subset on the original df
    # Keep only the first duplicate line
    df2 = df[~df.duplicated(subset="Line", keep='first')]

The solution is effective, df2:

  AMOUNT    KEY     Line
0   7.98    Apples  1
2   5.99    Oranges 2
3   5.98    Pears   3
5   0.99    Carrots 4
6   0.99    Carrots 5

And yet, this solution is sort of hairy with all of the temp dataframes. Maybe a more seasoned Pandas user will have something to add which might help me better understand Pandas?

pandas

Stack Exchange Network

Return to Question

Munging select rows in CSV files and resaving them as CSVusing Pandas

Munging CSV files and resaving them as CSV

Munging select rows in CSV files using Pandas

Sum only some duplicate rows (help me grok Pandas) Munging CSV files and resaving them as CSV

Sum only some duplicate rows (help me grok Pandas)

Munging CSV files and resaving them as CSV

Sum only some duplicate rows (help me grok Pandas)