Creating a dictionary of arrays, using values in one of the columns as the key

Question

I've been trying to do this for a while, with little success so far. I have a large (>400,000 data points) 2D array in python. The array itself could be split into a series of smaller rows based on the date (dd\mm\yyyy).

To achieve my end goal, one of the things I want to do is to change a numpy.ndarray (similar to as can be seen below, but obviously much larger) into a dictionary of keys (one for each day of the month) and corresponding arrays (consisting of all of the original array's data for each of the particular days).

[['16/06/2015 00:00'    'card' 'Smith' 'John' 'Full Time']
['16/06/2015 00:00' 'card'  'Doe'   'Jane'  'Part Time']
['17/07/2015 00:00' 'card'  'Doe'   'Jane'  'Part Time']
['18/06/2015 00:00' 'card' 'Smith' 'John' 'Full Time']
['30/06/2015 00:00' 'card'  'Bob'   'Roberts'   'Full Time']
['30/06/2015 00:00' 'card'  'Smith' 'John'  'Full Time']
['30/06/2015 00:00' 'card'  'Bob'   'Roberts'   'Full Time']]

I am not sure how to get the array above to appear in the same code format as the one I am importing, but as I mentioned, it should appear as a numpy.ndarray.

I have code, which you can see below, which returns the error "Index Error: Arrays used as indices must be of integer (or boolean) type", which is a problem as the data I have is made up of strings.

Array1 = np.genfromtxt('PATH', delimiter="\t", dtype=(str))
y = {}
for row in Array1:
    v = Array1[row[1:]]
    k = row[0]
    y[k]=v

If you need any more information, please just ask and I will try to provide anything required. I am fairly novice to all this.

'16/06/2015 00:00' 'card' 'Smith' 'John' 'Full Time' evaluates to the single string '16/06/2015 00:00cardSmithJohnFull Time'. Is that taken into account? — TigerhawkT3
– TigerhawkT3, Commented Nov 24, 2015 at 10:16
Shouldn't that be taken into account by the 'delimiter="\t"' when I generate the array from text? — Jamie
– Jamie, Commented Nov 24, 2015 at 10:22
@TigerhawkT3, that is a numpy array so it is not a single string — Padraic Cunningham
– Padraic Cunningham, Commented Nov 24, 2015 at 10:28
Okay; wasn't sure if it was a numpy structure, a pure Python structure, or pseudocode. — TigerhawkT3
– TigerhawkT3, Commented Nov 24, 2015 at 10:29

Daniel Roseman · Accepted Answer · 2015-11-24 10:19:35Z

1

The error message would be pointing to the first line of the loop: as it says, that's not how you index an array. row is already the list of values in the row; you already know how to get a single item, via just row[0], and to get a list it's exactly the same: row[1:]. So your code would just be:

v = row[1:]

Note that you could simplify this to just

y[row[0]] = row[1:]

and in fact the whole loop could be done as a dict comprehension:

y = {row[0]:row[1:] for row in Array1}

answered Nov 24, 2015 at 10:19

Daniel Roseman

602k68 gold badges911 silver badges924 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jamie Over a year ago

Hi, @Daniel Roseman, thanks for the response! I have done what you suggest and it clears up the error, so thanks very much for your help! However, when I call a specific key (for example "print y['16/06/2015 00:00']", it only seems to return one of the datapoints, instead of the whole list of datapoints associated with that column value. Do you have any suggestions as to why that might be happening?

Padraic Cunningham · Accepted Answer · 2015-11-24 10:45:07Z

1

Just create the dict from the file using the csv module to parse it, you need to handle repeated keys like "16/06/2015 00:00" which can be done using a defaultdict or you will only have the last value associated with the key:

import csv
from collections import defaultdict
with open("infile") as f:
    d = defaultdict(list)
    for row in csv.reader(f, delimiter="\t"):
        row[0].extend(row[1:])

Creating an array just to then create a dict is pointless, if you want a dict just create the dict as above.

edited Nov 24, 2015 at 10:45

answered Nov 24, 2015 at 10:39

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

3 Comments

Jamie Over a year ago

Hi @Padraic Cunningham, thanks for the response! I am getting an error "'AttributeError: 'str' object has no attribute 'extend'". Do you have any ideas why that might be popping up? Thanks.

Padraic Cunningham Over a year ago

@Jamie, no worries, if you want to keep each row separated from each other you can append instead of extending, extending will give you a flat list of values which may or my not be what you want

Jamie Over a year ago

thanks for the insight! I've managed to get my program running correctly! :D

Collectives™ on Stack Overflow

Creating a dictionary of arrays, using values in one of the columns as the key

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related