2

I've been trying to do this for a while, with little success so far. I have a large (>400,000 data points) 2D array in python. The array itself could be split into a series of smaller rows based on the date (dd\mm\yyyy).

To achieve my end goal, one of the things I want to do is to change a numpy.ndarray (similar to as can be seen below, but obviously much larger) into a dictionary of keys (one for each day of the month) and corresponding arrays (consisting of all of the original array's data for each of the particular days).

[['16/06/2015 00:00'    'card' 'Smith' 'John' 'Full Time']
['16/06/2015 00:00' 'card'  'Doe'   'Jane'  'Part Time']
['17/07/2015 00:00' 'card'  'Doe'   'Jane'  'Part Time']
['18/06/2015 00:00' 'card' 'Smith' 'John' 'Full Time']
['30/06/2015 00:00' 'card'  'Bob'   'Roberts'   'Full Time']
['30/06/2015 00:00' 'card'  'Smith' 'John'  'Full Time']
['30/06/2015 00:00' 'card'  'Bob'   'Roberts'   'Full Time']]

I am not sure how to get the array above to appear in the same code format as the one I am importing, but as I mentioned, it should appear as a numpy.ndarray.

I have code, which you can see below, which returns the error "Index Error: Arrays used as indices must be of integer (or boolean) type", which is a problem as the data I have is made up of strings.

Array1 = np.genfromtxt('PATH', delimiter="\t", dtype=(str))
y = {}
for row in Array1:
    v = Array1[row[1:]]
    k = row[0]
    y[k]=v

If you need any more information, please just ask and I will try to provide anything required. I am fairly novice to all this.

7
  • 2
    '16/06/2015 00:00' 'card' 'Smith' 'John' 'Full Time' evaluates to the single string '16/06/2015 00:00cardSmithJohnFull Time'. Is that taken into account? Commented Nov 24, 2015 at 10:16
  • Shouldn't that be taken into account by the 'delimiter="\t"' when I generate the array from text? Commented Nov 24, 2015 at 10:22
  • @TigerhawkT3, that is a numpy array so it is not a single string Commented Nov 24, 2015 at 10:28
  • Okay; wasn't sure if it was a numpy structure, a pure Python structure, or pseudocode. Commented Nov 24, 2015 at 10:29
  • @TigerhawkT3, Array1 = np.genfromtxt(........ Commented Nov 24, 2015 at 10:29

2 Answers 2

1

The error message would be pointing to the first line of the loop: as it says, that's not how you index an array. row is already the list of values in the row; you already know how to get a single item, via just row[0], and to get a list it's exactly the same: row[1:]. So your code would just be:

v = row[1:]

Note that you could simplify this to just

y[row[0]] = row[1:]

and in fact the whole loop could be done as a dict comprehension:

y = {row[0]:row[1:] for row in Array1}
Sign up to request clarification or add additional context in comments.

1 Comment

Hi, @Daniel Roseman, thanks for the response! I have done what you suggest and it clears up the error, so thanks very much for your help! However, when I call a specific key (for example "print y['16/06/2015 00:00']", it only seems to return one of the datapoints, instead of the whole list of datapoints associated with that column value. Do you have any suggestions as to why that might be happening?
1

Just create the dict from the file using the csv module to parse it, you need to handle repeated keys like "16/06/2015 00:00" which can be done using a defaultdict or you will only have the last value associated with the key:

import csv
from collections import defaultdict
with open("infile") as f:
    d = defaultdict(list)
    for row in csv.reader(f, delimiter="\t"):
        row[0].extend(row[1:])

Creating an array just to then create a dict is pointless, if you want a dict just create the dict as above.

3 Comments

Hi @Padraic Cunningham, thanks for the response! I am getting an error "'AttributeError: 'str' object has no attribute 'extend'". Do you have any ideas why that might be popping up? Thanks.
@Jamie, no worries, if you want to keep each row separated from each other you can append instead of extending, extending will give you a flat list of values which may or my not be what you want
thanks for the insight! I've managed to get my program running correctly! :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.