I'd like to find some container object to contain scalar values (doesn't really matter whether integral or fractional or floating point types). The following snippet roughly outlines the usecases and the criteria it needs to fulfill. In this snipped I'm using a numpy array, but I'd like to avoid that as it needs a lot of memory, because it will only be sparsely filled. But other than that, it fulfills my requirements.
- The object should hold scalar values, that are indexed by a
d-dimensional index with values of0,1,2,...,n-1. (Imagine something like5 <= d <= 100,20 <= n <= 200). The object should be mutable, i.e. the values need to be updatable.(Edit: Not actually necessary, see discussion below.)- All possible indices should be initialized with zero at the start. That is, all indices that have not been accessed should implicitly assumed to hold zero.
- It should be efficient to sum across one or multiple dimensions.
Is there built in python object that satisfies this, or is there a data structure that can effciently implemented in python?
So far I have found the scipy COO arrays, that satisfy some of the criteria, but they only support 1- and 2-d indexing.
For context: The idea is building a frequency table of certain objects, and use this then as a distribution to sample from, marginalize, etc.
import numpy as np
# just some placeholder data
data = [((1,0,5),6),((2,6,5),100),((5,3,1),1),((2,0,5),4),((2,6,5),100)]
# structure mapping [0,n]^d coordinates to scalars
data_structure = np.zeros((10, 10, 10))
# needs to be mutable #EDIT: doesn't need to be, see discussion below
for coords, value in data:
data_structure[coords] += value #EDIT: can be done as preprocessing step
# needs to be able to efficiently sum across dimensions
for k in range(100):
x = data_structure.sum(axis=(1,2))
y = data_structure.sum(axis=(0,))
z = data_structure.sum(axis=(0,2))
scipy.sparse.coo_matrixis convenient for data definition, but isn't used for indexing or computation. Butscipyefficiently converts it tocsrformat.lilanddokare good for iterative assignment.dokuses pythondictunder the covers. All these are 2dsparse arrayworld sparsity (% of nonzero values) on the order of 5% or less is best.scipy.sparse.coodoes not implement indexing is that coordinates can be specified in any order. So finding any one index (which may be 0) is inefficient. When converted to CSR, points are sorted, by row, and within rows by column. Duplicates are summed. A lot of the CSR indexing is actually implmented via matrix multiplication. So is row or column summation. Efficient CSR calculation methods were developed years ago by mathematicians working on large linear equation systems. There is a [sparse-matrix] tag if you want to explore these topics more.scipy.sparsetypes are anyway out of question, as they all only implement matrices (2D-arrays)?