RandomizedSparsification

Implementation of the randomized sparsification algorithm from [1].

Index

Algorithm

RandomizedSparsification API

RandomizedSparsification.sparsifyFunction
sparsify(A; r, multithreaded=false, progress=false)

Sparsify the input matrix A using the randomized element-wise sparsification algorithm with r samples.

source
RandomizedSparsification.r_εFunction
r_ε(A, ε, [rank])
r_ε(d₁, d₂, ε, rank)

Compute the sufficient number of samples r required to achieve a given error tolerance ε in the sparsification of the matrix A ($A \in \mathbb{R}^{d_1 \times d_2}$) in the expected value. Optionally use rank as an upperbound for $\operatorname{srank}$. The number of samples returned by this function is not necessarily the least bound for the number of samples needed for the error to bound to hold.

source
Utility functions
RandomizedSparsification.srankFunction
srank(A::AbstractMatrix)

Compute the stable rank of the input matrix A.

The stable rank is defined as the ratio of the squared Frobenius norm to the squared spectral norm of the matrix.

\[ \frac{\|A\|^2_\text{F}}{\|A\|^2_\text{op}}\]

source
RandomizedSparsification.ρ_sFunction
ρ_s(A::AbstractSparseArray)

Compute the proportional sparsity of the input matrix A.

\[ \frac{\#\{(i,j)\mid A_{ij} \neq 0 \}}{\#\{(i,j)\}}\]

source
Internals
RandomizedSparsification.FindIndicesModule

This module contains functions for determining the indices from the cumulative sum of the probability matrix P_ij_cs (i.e. cumulative distribution function on the indices) and the randomly sampled numbers from $[0,1]$ of the length of the desired number perspired by the user to align with the necessary accuracy bounds.

Danger
  1. The methods do not check whether or not the number of the samples that are taken fit into the RAM memory. This should be considered before in the call stack. The number of samples desired must be separated into batches that fit into the memory of the particular algorithm used for the indices location.
  2. Assumption that P_ij_cs fits into memory is made. If it does not we need to be able to compute it at evaluation time for a given index efficiently (which is not yet implemented).

The functions that find the summation indices in the sparsification algorithm have the signature:

_indices(
    P_ij_cs::AbstractVector{<:Real},    # cumulative distribution function on matrix indices
    rands01::AbstractVector{<:Real},    # sampled values in [0,1]
    m::Integer, n::Integer;             # original matrix size
    progress=false                      # report progress to the user
)::AbstractVector{CartesianIndex{2}}

So the indices are returned as the vector [ij::CartesianIndex{2}...] with repetitions. This is passed to the sparse function for the creation of the sparsified matrix.

source
RandomizedSparsification._sparsify_context_dataFunction
_sparsify_context_data(A)

Compute the necessary context data, A_1_norm, A_F_norm, P_ij and P_ij_cs (cumulative distribution on the matrix elements) for the sparsify randomized sparisification algorithm of A.

source
RandomizedSparsification.FindIndices._indices_groupsortFunction
_indices_groupsort(P_ij_cs, rands01, m, n)

Algorithm

  1. Defines a DataFrame holiding P_ij_cs and rands01 in the :vals column having a boolean :group column that differentiates values from P_ij_cs from rands01.
  2. Rows are sorted by the :vals columns interleaving the rands01.
  3. Indices are determined by mapping the number of preceding ":group == P_ij_cs" values for ta given :group == rands01
Perf
`DataFrames.jl` should only bring forward a small overhead compared to doing the underlying sorting manually.
source

Bibliography

[1]
A. Kundu and P. Drineas. A Note on Randomized Element-wise Matrix Sparsification (2014). Accessed on Jan 6, 2025.