1

I have a CSV containing the following:

ID    Name    Series    Value
250   A       3         20
250   A       3         40
250   A       3         60
251   B       4         16
251   B       4         18
251   B       4         24
251   B       4         42

The Series column denotes how many elements belong to one another, so taking the first row (not header row), the Series = 3. So I need to gather the number of rows specified by Series, inclusive of the current row. Such that they are grouped like so (by Value):

[(20, 40, 60), (16, 18, 24, 42)]

Essentially, I am moving down through the CSV sequentially, while Series tells me how many of the next rows (including the current one) to gather. If we use the first row again, the value is 3, so my grouping must total 3 rows starting at the current row.

I've read in the CSV and converted it from a Reader to List, but cant quite come up with a solution for actively changing the index of iteration over the rows based upon the value found in series. If I try:

for row in rows...

I end up iterating over every row, I would have to change the value of rows and altering a list while iterating over it is a bad idea. If I try:

for x in range(1, len(rows)...

I cannot devise a method to change where the current x should be.

2
  • Are the values in the Series column already grouped? I.E., it won't go 2,2,2, 3,3,3, 2,1,... Commented Jan 17, 2018 at 19:36
  • @juanpa.arrivillaga No. Series denotes from the current row, how many more rows go together. So if the first row is 3, I need a group of that row, plus the next 2 rows which would give me 3 total rows. Numbers can repeat again later in the file but are not part of the previous groupings. Commented Jan 17, 2018 at 19:50

3 Answers 3

3

If you can't use pandas, just use the typical grouping idiom using collections.defaultdict:

import csv
import collections

with open("path/to/file.csv") as f:
    reader = csv.DictReader(f)
    grouped = collections.defaultdict(list)
    for row in reader:
        grouped[row['Series']].append(int(row['Value']))

This will give you a handy dictionary from series to values:

In [26]: grouped
Out[26]: defaultdict(list, {'3': [20, 40, 60], '4': [16, 18, 24, 42]})

If you must have a list of tuples:

In [28]: list(map(tuple, grouped.values()))
Out[28]: [(20, 40, 60), (16, 18, 24, 42)]

If you were going to use a pandas.DataFrame, I would use:

In [35]: [tuple(g.Value) for _,g in df.groupby('Series')]
Out[35]: [(20, 40, 60), (16, 18, 24, 42)]

Edit After Comments

So, after elaborating on your problem a bit more, there are a couple of approaches. Here's one ugly one, using itertools.islice to advance the iterator:

import csv
from itertools import islice

with io.StringIO(csvstring) as f:
    reader = csv.DictReader(f)
    grouped = []
    for row in reader:
        n = int(row['Series']) - 1
        val = row['Value']
        next_vals = (int(r['Value']) for r in islice(reader, n))
        grouped.append((val,)+ tuple(next_vals))

You may also use itertools.groupby:

import itertools
import operator
import csv

with open('path/to/file.csv') as f:
    reader = csv.DictReader(f)
    grouped = itertools.groupby(reader, operator.itemgetter('Series'))
    result = []
    for _, g in grouped:
        result.append(tuple(int(r['Value']) for r in g))

The results:

In [48]: result
Out[48]: [(20, 40, 60), (16, 18, 24, 42)]

Note, just for the purposes of illustration, you don't need itertools to do this, you could just for-loops in the following way:

import csv

with open('path/to/file.csv') as f:
    reader = csv.DictReader(f)
    grouped = []
    for row in reader:
        n = int(row['Series']) - 1
        val = row['Value']
        sub = [val]
        for _ in range(n):
            sub.append(int(next(reader)['Value'])) #advance the iterator using next
        grouped.append(tuple(sub))
Sign up to request clarification or add additional context in comments.

Comments

2

how about using pandas?

import pandas as pd

df = pd.read_csv('test.csv')
unique = tuple(df['Series'].unique())
data = [tuple(df[df.Series == i].Value) for i in unique]
print(data)

output is

[(20, 40, 60), (16, 18, 24, 42)]

1 Comment

No access to it on the system being used. Good solution, but need a native one.
1

Repeating series kindof hurt dicts, so use only lists:

Added repeating series to the data....


import csv

t = """ID    Name    Series    Value
250   A       3         20
250   A       3         40
250   A       3         60
251   B       4         16
251   B       4         18
251   B       4         24
251   B       4         42
250   A       3        140
250   A       3        160"""


results = list()
tempList = list()
lastKey = None

reader = csv.DictReader(t.splitlines(), skipinitialspace=True, delimiter=' '  )
for row in reader:
    actKey = row["Series"]
    actVal = row["Value"]

    if not lastKey or lastKey != actKey: # new series starts here
        lastKey = actKey
        if tempList:                     # avoids result starting with []
            results.append(tempList)
        tempList = [actVal]              # this value goes into the new list
        continue

    tempList.append(actVal)              # same key as last one, simply add value 


if tempList:
    results.append(tempList)             # if not empty, add last ones to result 

print(results)

Output:

[['20', '40', '60'], ['16', '18', '24', '42'], ['140', '160']]

8 Comments

Problem with this is that later in the file, a Series may be repeated, but the elements dont belong together.
@pstatix How about you collect all your "cant do this" conditions and add them the the question firsthand .- the solutions to your problem are much easier to come by that way.
@juanpa.arrivillaga I would if his data would contain inner "" that are difficult to parse. He said he already solved the "gathering data in lists" part, so I am just using a quick hack to get data into lists and start from there. I am not really familar with csv, what other benefits then parsing do they give me?
@pstatix fixed, but juanpa's solutions might be the wiser choice.
@PatrickArtner heh, yeah. I mean, you could have adapted your manual parsing solution to work lazily, however, I look how much more readable the code becomes once you encapsulate the csv-parsing logic. To me, that is the major advantage.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.