0

I have a numpy array containing millions of hourly x y points with the "columns" of the array being x, y, hour, and day of week (all ints). Here is an example of what the array looks like:

array([[1, 2, 0, 0],
       [3, 5, 0, 0],
       [6, 3, 1, 0],
       [6, 2, 3, 0],
       [4, 3, 3, 1]])

I have created a grid of zeros that I can increment for all values in the array:

grid = np.zeros((8,8))
for value in range(0,len(xy_new[:,1])):  
    grid[xy_new[value][1],xy_new[value][0]] += 1

but I need to be able to do this for each hour by day of week (ie Sun at hour 0, Sun at hour 1, etc.).

How do I subset the array by hour and day of week?

I have attempted modifying the answers here: Make subset of array, based on values of two other arrays in Python, Subsetting data in Python, but have not been successful. Any help would be greatly appreciated!!

2
  • and what's the question? Commented Nov 20, 2015 at 15:36
  • How do I subset the array by day and hour to count the number of times each point is accessed? Commented Nov 20, 2015 at 16:29

1 Answer 1

0

Presumably you want to wind up with 24 times 7 or 168 sets of accumulated counts for pairs of x and y. Suppose you have your data in a N by 4 array gdat. First, make week-hour index:

whr = 24*gdat[:,2] + gdat[:,3]

You can now select the gdat rows for each hour in your week. For example, for hour zero of Sunday:

gdat0 = gdat[whr == 0]

Do whatever summing you need with gdat0 and move on to the next hour.

Note that unique is probably a faster way to count occurrences of x, y pairs. You can play the same game of making a composite index for x and y, but you have to know how they are bounded. Supposing x runs from 0 to 120 and y runs from 0 to 5, you could make a composite index using bit fields:

xy = (gdat0[:,0] << 3) & (gdat0[:,1])

Obviously, if y has a larger range you need to shift more than 3 bits, and you may need to offset x and y to avoid negative values.

Then, use unique to return the unique values and counts for the values in xy.

xyval, xycnt = np.unique(xy, return_counts=True)

You then retrieve the x and y value pairs from xyval using bitwise operators, xyval >> 3 and xyval & 7.

Repeat for every hour in the week. Since storage will be an issue if N is huge, you probably want to re-use gdat0 on each iteration.

EDIT: The short data sample you posted is time-sequential. If all your data are time-sequential, you don't need to "select" for each hour. All you need is to find the index for each new value in whr. unique(whr, return_index=True) will find those for you as well!

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! The first method you provided worked quite well once I changed gdat0 = gdat[:, whr == 0] to gdat0 = gdat[whr == 0]. The way you had it created an error: index out of range in dimension 1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.