Iterate over CSV and change index by row value

Question

I have a CSV containing the following:

ID    Name    Series    Value
250   A       3         20
250   A       3         40
250   A       3         60
251   B       4         16
251   B       4         18
251   B       4         24
251   B       4         42

The Series column denotes how many elements belong to one another, so taking the first row (not header row), the Series = 3. So I need to gather the number of rows specified by Series, inclusive of the current row. Such that they are grouped like so (by Value):

[(20, 40, 60), (16, 18, 24, 42)]

Essentially, I am moving down through the CSV sequentially, while Series tells me how many of the next rows (including the current one) to gather. If we use the first row again, the value is 3, so my grouping must total 3 rows starting at the current row.

I've read in the CSV and converted it from a Reader to List, but cant quite come up with a solution for actively changing the index of iteration over the rows based upon the value found in series. If I try:

for row in rows...

I end up iterating over every row, I would have to change the value of rows and altering a list while iterating over it is a bad idea. If I try:

for x in range(1, len(rows)...

I cannot devise a method to change where the current x should be.

Are the values in the Series column already grouped? I.E., it won't go 2,2,2, 3,3,3, 2,1,... — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Jan 17, 2018 at 19:36
@juanpa.arrivillaga No. Series denotes from the current row, how many more rows go together. So if the first row is 3, I need a group of that row, plus the next 2 rows which would give me 3 total rows. Numbers can repeat again later in the file but are not part of the previous groupings. — pstatix
– pstatix, Commented Jan 17, 2018 at 19:50

juanpa.arrivillaga · Accepted Answer · 2018-01-17 20:50:29Z

If you can't use pandas, just use the typical grouping idiom using collections.defaultdict:

import csv
import collections

with open("path/to/file.csv") as f:
    reader = csv.DictReader(f)
    grouped = collections.defaultdict(list)
    for row in reader:
        grouped[row['Series']].append(int(row['Value']))

This will give you a handy dictionary from series to values:

In [26]: grouped
Out[26]: defaultdict(list, {'3': [20, 40, 60], '4': [16, 18, 24, 42]})

If you must have a list of tuples:

In [28]: list(map(tuple, grouped.values()))
Out[28]: [(20, 40, 60), (16, 18, 24, 42)]

If you were going to use a pandas.DataFrame, I would use:

In [35]: [tuple(g.Value) for _,g in df.groupby('Series')]
Out[35]: [(20, 40, 60), (16, 18, 24, 42)]

Edit After Comments

So, after elaborating on your problem a bit more, there are a couple of approaches. Here's one ugly one, using itertools.islice to advance the iterator:

import csv
from itertools import islice

with io.StringIO(csvstring) as f:
    reader = csv.DictReader(f)
    grouped = []
    for row in reader:
        n = int(row['Series']) - 1
        val = row['Value']
        next_vals = (int(r['Value']) for r in islice(reader, n))
        grouped.append((val,)+ tuple(next_vals))

You may also use itertools.groupby:

import itertools
import operator
import csv

with open('path/to/file.csv') as f:
    reader = csv.DictReader(f)
    grouped = itertools.groupby(reader, operator.itemgetter('Series'))
    result = []
    for _, g in grouped:
        result.append(tuple(int(r['Value']) for r in g))

The results:

In [48]: result
Out[48]: [(20, 40, 60), (16, 18, 24, 42)]

Note, just for the purposes of illustration, you don't need itertools to do this, you could just for-loops in the following way:

import csv

with open('path/to/file.csv') as f:
    reader = csv.DictReader(f)
    grouped = []
    for row in reader:
        n = int(row['Series']) - 1
        val = row['Value']
        sub = [val]
        for _ in range(n):
            sub.append(int(next(reader)['Value'])) #advance the iterator using next
        grouped.append(tuple(sub))

yoyoyoyo123 · Accepted Answer · 2018-01-17 20:57:14Z

2

how about using pandas?

import pandas as pd

df = pd.read_csv('test.csv')
unique = tuple(df['Series'].unique())
data = [tuple(df[df.Series == i].Value) for i in unique]
print(data)

output is

[(20, 40, 60), (16, 18, 24, 42)]

edited Jan 17, 2018 at 20:57

answered Jan 17, 2018 at 19:37

yoyoyoyo123

2,4923 gold badges25 silver badges37 bronze badges

1 Comment

pstatix Over a year ago

No access to it on the system being used. Good solution, but need a native one.

Patrick Artner · Accepted Answer · 2018-01-17 20:49:40Z

1

Repeating series kindof hurt dicts, so use only lists:

Added repeating series to the data....

import csv

t = """ID    Name    Series    Value
250   A       3         20
250   A       3         40
250   A       3         60
251   B       4         16
251   B       4         18
251   B       4         24
251   B       4         42
250   A       3        140
250   A       3        160"""


results = list()
tempList = list()
lastKey = None

reader = csv.DictReader(t.splitlines(), skipinitialspace=True, delimiter=' '  )
for row in reader:
    actKey = row["Series"]
    actVal = row["Value"]

    if not lastKey or lastKey != actKey: # new series starts here
        lastKey = actKey
        if tempList:                     # avoids result starting with []
            results.append(tempList)
        tempList = [actVal]              # this value goes into the new list
        continue

    tempList.append(actVal)              # same key as last one, simply add value 


if tempList:
    results.append(tempList)             # if not empty, add last ones to result 

print(results)

Output:

[['20', '40', '60'], ['16', '18', '24', '42'], ['140', '160']]

edited Jan 17, 2018 at 20:49

answered Jan 17, 2018 at 19:46

Patrick Artner

51.9k10 gold badges50 silver badges79 bronze badges

8 Comments

pstatix Over a year ago

Problem with this is that later in the file, a Series may be repeated, but the elements dont belong together.

Patrick Artner Over a year ago

@pstatix How about you collect all your "cant do this" conditions and add them the the question firsthand .- the solutions to your problem are much easier to come by that way.

Patrick Artner Over a year ago

@juanpa.arrivillaga I would if his data would contain inner "" that are difficult to parse. He said he already solved the "gathering data in lists" part, so I am just using a quick hack to get data into lists and start from there. I am not really familar with csv, what other benefits then parsing do they give me?

Patrick Artner Over a year ago

@pstatix fixed, but juanpa's solutions might be the wiser choice.

juanpa.arrivillaga Over a year ago

@PatrickArtner heh, yeah. I mean, you could have adapted your manual parsing solution to work lazily, however, I look how much more readable the code becomes once you encapsulate the csv-parsing logic. To me, that is the major advantage.

|

Collectives™ on Stack Overflow

Iterate over CSV and change index by row value

3 Answers 3

Edit After Comments

Comments

1 Comment

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Edit After Comments

Comments

1 Comment

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related