I have a 2 * N integer array ids representing intervals, where N is about a million. It looks like this
0 2 1 ...
3 4 3 ...
The ints in the arrays can be 0, 1, ... , M-1, where M <= 2N - 1. (Detail: if M = 2N, then the ints span all the 2N integers; if M < 2N, then there are some integers that have the same values.)
I need to calculate a kind of inverse map from ids. What I called "inverse map" is to see ids as intervals and capture the relation from their inner points with their indices.
Intuition Intuitively,
0 2 1
3 4 3
can be seen as
0 -> 0, 1, 2
1 -> 2, 3
2 -> 1, 2
where the right-hand-side endpoints are excluded for my problem. The "inverse" map would be
0 -> 0
1 -> 0, 2
2 -> 0, 1, 2
3 -> 1
Code I have a piece of Python code that attempts to calculate the inverse map in a dictionary inv below:
for i in range(ids.shape[1]):
for j in range(ids[0][i], ids[1][i]):
inv[j].append(i)
where each inv[j] is an array-like data initialized as empty before the nested loop. Currently I use python's built-in arrays to initialize it.
for i in range(M): inv[i]=array.array('I')
Question The nested loop above works like a mess. In my problem setting (in image processing), my first loop has a million iterations; second one about 3000 iterations. Not only it takes much memory (because inv is huge), it is also slow. I would like to focus on speed in this question. How can I accelerate this nested loop above, e.g. with vectorization?