Compress a sparse matrix

Question

Compress a sparse matrix using Compressed sparse row (CSR, CRS or Yale format).

These are all the same form of compression (ignore new Yale).

Input may be any 2d data structure (list of lists, etc): e.g

[[0 0 0 0],
 [5 8 0 0],
 [0 0 3 0],
 [0 6 0 0]]

And the output should be three 1d data structures (list etc), that denote the outputs A, IA and JA, for example

[5, 8, 3, 6]
[0, 0, 2, 3, 4]
[0, 1, 2, 1,]

The process is described by wikipedia:

The array A is of length NNZ and holds all the nonzero entries of M in left-to-right top-to-bottom ("row-major") order.

The array IA is of length m + 1. It is defined by this recursive definition:

IA[0] = 0

IA[i] = IA[i − 1] + (number of nonzero elements on the (i − 1)-th row in the original matrix)

Thus, the first m elements of IA store the index into A of the first nonzero element in each row of M, and the last element IA[m] stores NNZ, the number of elements in A, which can be also thought of as the index in A of first element of a phantom row just beyond the end of the matrix M. The values of the i-th row of the original matrix is read from the elements A[IA[i]] to A[IA[i + 1] − 1] (inclusive on both ends), i.e. from the start of one row to the last index just before the start of the next.[5]

The third array, JA, contains the column index in M of each element of A and hence is of length NNZ as well.

If your language doesn't support actual data structures, input and output may be text.

Test cases

Input 1:

[[0 0 0 0],
 [5 8 0 0],
 [0 0 3 0],
 [0 6 0 0]]

Output 1:

[ 5, 8, 3, 6 ]
[ 0, 0, 2, 3, 4 ]
[ 0, 1, 2, 1, ]

Input 2

[[10 20 0 0 0 0],
 [0 30 0 40 0 0],
 [0 0 50 60 70 0],
 [0 0 0 0 0 80]]

Output 2:

[ 10 20 30 40 50 60 70 80 ]
[  0  2  4  7  8 ]
[  0  1  1  3  2  3  4  5 ]

Input 3:

[[0 0 0],
 [0 0 0],
 [0 0 0]]

Output 3:

[ ]
[ 0 0 0 0 ]
[ ]

Input 4:

[[1 1 1],
 [1 1 1],
 [1 1 1]]

Output 4:

[ 1 1 1 1 1 1 1 1 1 ]
[ 0 3 6 9 ]
[ 0 1 2 0 1 2 0 1 2 ]

Input 5:

[[0 0 0 0],
 [5 -9 0 0],
 [0 0 0.3 0],
 [0 -400 0 0]]

Output 5:

[ 5, -9, 0.3, -400 ]
[ 0, 0, 2, 3, 4 ]
[ 0, 1, 2, 1, ]

Assume inputs may contain any real number, you need not consider mathematical symbols or exponential representation (e.g. 5,000 will never be entered as 5e3). You will not need to handle inf, -inf, NaN or any other 'pseudo-numbers'. You may output a different representation of the number (5,000 may be output as 5e3 if you so choose).

Scoring

This is a code-golf, fewest bytes wins.

Isn't IA[0] = 0 completely unnecessary? It's only needed to define IA[i] = IA[i − 1]..., yet we could simply state that if i-1 < 0 to use 0. That is, IA[0] is always equal to 0, therefor it can be compressed out (yes, I realize that this is a critique of the algorithm, not this challenge). — Draco18s no longer trusts SE
– Draco18s no longer trusts SE, Commented Jul 5, 2017 at 21:01
Neat! Hadn't run into either format before, but I'm glad to see someone else did see that before (I shouldn't be the kind of person who spots trivial optimizations in algorithms this old). — Draco18s no longer trusts SE
– Draco18s no longer trusts SE, Commented Jul 6, 2017 at 13:03

Luis Mendo · Accepted Answer · 2017-07-05 20:43:10Z

MATL, 19 bytes

!3#f!Dx0Gg!XsYshDq!

Input uses ; as row separator.

Try it online! Or verify all test cases: 1, 2, 3, 4, 5.

Explanation

!     % Implicit input. Transpose
3#f   % 3-output version of find: it takes all nonzero values and pushes
      % their column indices, row indices, and values, as column vectors
!     % Transpose into a row vector
D     % Display (and pop) vector of values
x     % Delete vector of row values
0     % Push 0
G     % Push input
g     % Convert to logical: nonzeros become 1
!     % Transpose
Xs    % Sum of columns. Gives a row vector
Ys    % Cumulative sum
h     % Prepend the 0 that's below on the stack
D     % Display (and pop) that vector
q     % Subtract 1 from the vector of row indices
!     % Transpose into a row vector. Implicitly display

alephalpha · Accepted Answer · 2017-07-05 13:00:25Z

5

Mathematica, 78 bytes

{a=SparseArray@#;a@"NonzeroValues",a@"RowPointers",Join@@a@"ColumnIndices"-1}&

See this answer on mathematica.stackexchange.com.

answered Jul 5, 2017 at 13:00

alephalpha

51.9k7 gold badges75 silver badges197 bronze badges

Add a comment |

Erik the Outgolfer · Accepted Answer · 2017-07-05 12:33:12Z

3

Jelly, 24 bytes

n0S€0;;\S€,T€Ẏ$’$
Ẏḟ0W;Ç

Try it online!

answered Jul 5, 2017 at 12:33

Erik the Outgolfer

40.8k5 gold badges46 silver badges125 bronze badges

Add a comment |

Adám · Accepted Answer · 2017-07-05 12:59:42Z

3

APL (Dyalog), 31 28 chars or 36 33 bytes*

Requires ⎕IO←0 for zero based indexing. I/O is list of lists.

{(∊d)(0,+\≢¨d←⍵~¨0)(∊⍸¨⍵≠0)}

Try it online!

{…} anonymous function where the argument is represented by ⍵

(…)(…)(…) return a list of three things:

⍵≠0 Boolean where the argument differs from 0
⍸¨ ɩndices of those for each sub-list
∊ ϵnlist (flatten) to combine into single list

⍵~¨0 remove zeros from each sub-list of the argument
d← store that as d
≢¨ tally each
+\ cumulative sum
0, prepend a zero

∊d ϵnlist (flatten) d to combine into single list

^{* To run in Dyalog Classic, simply replace ⍸ with ⎕U2378.}

edited Jul 5, 2017 at 12:59

answered Jul 5, 2017 at 12:06

Adám

31.8k4 gold badges131 silver badges293 bronze badges

\$\begingroup\$ Nice, I don't understand the input format though? f 4 4⍴ and then the values? \$\endgroup\$

AncientSwordRage
– AncientSwordRage

2017-07-05 12:08:05 +00:00
Commented Jul 5, 2017 at 12:08
\$\begingroup\$ @Pureferret the Code defines the function f. The Input is really a REPL, which calls f on the result of 4 4⍴… which reshapes the data into a 4×4 matrix. \$\endgroup\$

Adám
– Adám

2017-07-05 12:09:43 +00:00
Commented Jul 5, 2017 at 12:09
1

\$\begingroup\$ Rho for reshapes. I get it! \$\endgroup\$

AncientSwordRage
– AncientSwordRage

2017-07-05 12:12:24 +00:00
Commented Jul 5, 2017 at 12:12
1

\$\begingroup\$ @Pureferret I've updated the Try it online! link to better show test cases. \$\endgroup\$

Adám
– Adám

2017-07-05 12:21:23 +00:00
Commented Jul 5, 2017 at 12:21

Add a comment |

nimi · Accepted Answer · 2017-07-05 18:20:28Z

Haskell, 87 bytes

f s|a<-filter(/=0)<$>s=(id=<<a,scanl(+)0$length<$>a,s>>= \t->[i|(i,e)<-zip[0..]t,e/=0])

Try it online!

How it works:

a<-filter(/=0)<$>s           -- let a be the list of lists with all 0 removed]
                             -- e.g. [[1,0,0],[0,3,4]] -> [[1],[3,4]]

                             -- return a triple of

id=<<a                       -- a concatenated into a single list -> A 

scanl(+)0$length<$>a         -- partial sums of the length of the sublists of a
                             -- strating with an additional 0 -> IA

s>>=                         -- map the lambda over the sublists of s and concatenate
                             -- into a single list
   \t->[i|(i,e)<-zip[0..]t,e/=0]  -- the indices of the non-zero elements -> JA

Community · Accepted Answer · 2020-06-17 09:04:33Z

3

Python+SciPy, 79 bytes

_{i guess built-ins were not forbidden}

from scipy.sparse import*
A=csr_matrix(input())
print A.data,A.indptr,A.indices

Accepts input in the format [[0, 0, 0, 0],[5, 8, 0, 0],[0, 0, 3, 0],[0, 6, 0, 0]]

edited Jun 17, 2020 at 9:04

CommunityBot

1

answered Aug 6, 2017 at 18:19

Karl Napf

4,46314 silver badges31 bronze badges

Add a comment |

Giuseppe · Accepted Answer · 2020-11-05 17:19:26Z

3

R, 70 bytes

function(m,M=t(m))list(M[x<-!!M],diffinv(colSums(x)),which(x,T)[,1]-1)

Try it online!

This is an old, matrix challenge that didn't get an R answer for 3 years! Having to do this in row-major rather than column-major order costs 8 bytes (,M=t(m))).

answered Nov 5, 2020 at 17:19

Giuseppe

29.4k3 gold badges33 silver badges106 bronze badges

Add a comment |

JStrahl · Accepted Answer · 2020-11-07 18:46:15Z

3

Python 3, 96 bytes

lambda m:list(map(list,zip(*[[e,i,j] for i,r in enumerate(m) for j,e in enumerate(r) if e!=0])))

Try it online!

answered Nov 7, 2020 at 18:46

JStrahl

1313 bronze badges

Add a comment |

Jörg Hülsermann · Accepted Answer · 2017-07-05 13:05:51Z

2

PHP, 107 bytes

<?for($y=[$c=0];$r=$_GET[+$l++];)foreach($r as$k=>$v)!$v?:[$x[]=$v,$z[]=$k,$y[$l]=++$c];var_dump($x,$y,$z);

Try it online!

PHP, 109 bytes

<?$y=[$c=0];foreach($_GET as$r){foreach($r as$k=>$v)if($v){$x[]=$v;$z[]=$k;$c++;}$y[]=$c;}var_dump($x,$y,$z);

Try it online!

edited Jul 5, 2017 at 13:05

answered Jul 5, 2017 at 12:52

Jörg Hülsermann

13.4k3 gold badges16 silver badges36 bronze badges

\$\begingroup\$ Does this need the numbers to be strings? \$\endgroup\$

AncientSwordRage
– AncientSwordRage

2017-07-05 14:06:00 +00:00
Commented Jul 5, 2017 at 14:06
1

\$\begingroup\$ @Pureferret Any Input in PHP is a string or an array of strings. I have not casted the input so if you wish that the output is purely int replace $x[]=$v with $x[]=+$v \$\endgroup\$

Jörg Hülsermann
– Jörg Hülsermann

2017-07-05 14:21:41 +00:00
Commented Jul 5, 2017 at 14:21

Add a comment |

Justin Mariner · Accepted Answer · 2017-07-05 22:35:40Z

JavaScript (ES6), 117 bytes

a=>[a.map((b,i)=>(b=b.filter((x,c)=>x&&o.push(c)),m[i+1]=m[i]+b.length,b),m=[0],o=[]).reduce((x,y)=>x.concat(y)),m,o]

Input is a 2D array of numbers and output is an array of [A, IA, JA].

Explained

a=>[
    a.map((b,i) => (                                // map each matrix row
            b = b.filter((x,c) => x                 // filter to only non-zero elements
                && o.push(c)                        // and add this index to JA
            )
            m[i+1] = m[i] + b.length,               // set next value of IA
            b                                       // and return filtered row
        ),
        m=[0],o=[]                          // initialize IA (m) and JA (o)
    ).reduce((x,y) => x.concat(y)),                 // flatten the non-zero matrix
m,o]                                                // append IA and JA

Tests

let f=
a=>[a.map((b,i)=>(b=b.filter((x,c)=>x&&o.push(c)),m[i+1]=m[i]+b.length,b),m=[0],o=[]).reduce((x,y)=>x.concat(y)),m,o]

let run=x=>O.innerHTML+=f(x).map(s=>`[${s.join`, `}]`).join`\n`+"\n\n"
run([[0,0,0,0],[5,8,0,0],[0,0,3,0],[0,6,0,0]])
run([[10,20,0,0,0,0],[0,30,0,40,0,0],[0,0,50,60,70,0],[0,0,0,0,0,80]])
run([[0,0,0],[0,0,0],[0,0,0]])
run([[1,1,1],[1,1,1],[1,1,1]])
run([[0,0,0,0],[5,-9,0,0],[0,0,0.3,0],[0,-400,0,0]])

<pre id=O></pre>

Shaggy · Accepted Answer · 2020-11-08 16:59:09Z

2

Japt, 31 27 17 bytes

Hopefully this output format is OK.

cf pUmè iT å+ Ucð

Try it

cf pUmè iT å+ Ucð     :Implicit input of 2D array U
c                     :Flat map
 f                    :  Filter
   p                  :Push the following two elements
    Um                :  First element: Map U
      è               :    Count the truthy elements in each
        i             :  Prepend
         T            :    Zero
           å+         :  Cumulatively reduce by addition
              Uc      :  Second element: Flat map U
                ð     :    0-based indices of truthy elements

edited Nov 8, 2020 at 16:59

answered Jul 5, 2017 at 11:59

Shaggy

45k4 gold badges39 silver badges95 bronze badges

\$\begingroup\$ I just ran the other examples though and it works \$\endgroup\$

AncientSwordRage
– AncientSwordRage

2017-07-05 12:08:27 +00:00
Commented Jul 5, 2017 at 12:08

Add a comment |

ovs · Accepted Answer · 2017-07-05 15:02:10Z

1

Python 2, 115 bytes

lambda m:zip(*[[v,i]for k in m for i,v in enumerate(k)if v])+[reduce(lambda a,b:a+[len(b)-b.count(0)+a[-1]],m,[0])]

Try it online!

Output is [A, JA, IA]

answered Jul 5, 2017 at 15:02

ovs

61.2k3 gold badges49 silver badges164 bronze badges

Add a comment |

Sean · Accepted Answer · 2017-07-05 23:56:38Z

Perl 6, 84 bytes

{.flatmap(*.grep(+*)),(0,|[\+] .map(+*.grep(+*))),.flat.kv.flatmap:{$^a%.[0]xx?$^b}}

Try it online!

The single matrix argument is in $_.

.flatmap(*.grep(+*)) selects the nonzero elements of the entire matrix.
[\+] .map(+*.grep(+*)) is the triangular reduction of the number of elements in each row (which some languages call scan). (0,|...) prepends a zero to that list.
.flat.kv produces an indexed list of all elements of the matrix. .flatmap: { $^a % .[0] xx ?$^b } flat-maps over the modulus of each index by the number of columns in the array (.[0], the number of elements in the first row), replicated by the element itself, interpreted as a boolean. That is, nonzero elements are replicated once, and zero elements are replicated zero times (ie, removed).

MarcMush · Accepted Answer · 2021-04-16 16:45:53Z

1

Julia, ⁶⁶ 63 bytes

using SparseArrays
n*a=getfield(sparse(a'),n)
!a=5a,3a.-1,4a.-1

SparseArrays is a standard library that stores with CSC instead of CSR (column instead of row or something), that's why we need to transpose (a').

-6 bytes if 1-indexing was allowed

overloads * with getfield to save a few bytes

Try it online!

edited Apr 16, 2021 at 16:45

answered Apr 16, 2021 at 16:26

MarcMush

6,88515 silver badges18 bronze badges

Add a comment |

user · Accepted Answer · 2021-04-16 17:01:40Z

Scala, 82 bytes

x=>x.flatMap(_.zipWithIndex).filter(_._1!=0).unzip->x.scanLeft(0)(_+_.count(_!=0))

Try it in Scastie!

Returns ((A, JA), IA)

Explanation:

x =>     //The sparse matrix to be smooshed
x.flatMap(      //Do the following operation to each row, then flatten the result
  _.zipWithIndex //Zip each element with its column (makes tuples)
).filter(      //Keep the tuples where
  _._1!=0      //the first element isn't 0
).unzip        //Unzip into a tuple of two lists, A and JA
->             //Make that the first element of another 2-tuple with IA:
x.scanLeft(0)(  //Starting with 0
  _+_.count(_!=0)  //Add the number of nonzero elements in each row
)               //Keep the intermediate results, giving us IA

Stack Exchange Network

Compress a sparse matrix

Test cases

Scoring

15 Answers 15

MATL, 19 bytes

Explanation

Mathematica, 78 bytes

Jelly, 24 bytes

APL (Dyalog), 31 28 chars or 36 33 bytes*

Haskell, 87 bytes

Python+SciPy, 79 bytes

R, 70 bytes

Python 3, 96 bytes

PHP, 107 bytes

PHP, 109 bytes

JavaScript (ES6), 117 bytes

Explained

Tests

Japt, 31 27 17 bytes

Python 2, 115 bytes

Perl 6, 84 bytes

Julia, ⁶⁶ 63 bytes

Scala, 82 bytes

Your Answer

Hot Network Questions

Compress a sparse matrix

Test cases

Scoring

15 Answers 15

MATL, 19 bytes

Explanation

Mathematica, 78 bytes

Jelly, 24 bytes

APL (Dyalog), 31 28 chars or 36 33 bytes*

Haskell, 87 bytes

Python+SciPy, 79 bytes

R, 70 bytes

Python 3, 96 bytes

PHP, 107 bytes

PHP, 109 bytes

JavaScript (ES6), 117 bytes

Explained

Tests

Japt, 31 27 17 bytes

Python 2, 115 bytes

Perl 6, 84 bytes

Julia, 66 63 bytes

Scala, 82 bytes

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions

Julia, ⁶⁶ 63 bytes