# Sampling¶

In contrast to the finite case where the ML community has put efforts in improving the efficiency and tractability of the sampling routine, much less has been done in the continuous setting.

## Exact sampling¶

In this section, we describe the main techniques for sampling exactly continuous DPPs.

As for finite DPPs the most prominent one relies on the fact that generic DPPs are mixtures of projection DPPs.

### Projection DPPs: the chain rule¶

Let’s focus on sampling from projection \(\operatorname{DPP}(K)\) with a real-valued orthogonal projection kernel \(K:\mathbb{X}\times \mathbb{X}\to \mathbb{R}\) and reference measure \(\mu\), that is

In this setting, recall that the number of points is \(\mu\)-almost surely equal to \(r=\operatorname{rank}(K)\).

To generate a valid sample \(X=\{x_{1}, \dots, x_{r}\} \sim \operatorname{DPP}(K)\), [HKPVirag06] Proposition 19 showed that it is sufficient to apply the chain rule to sample \((x_1, \dots, x_r)\) with joint distribution

and forget the order the points were selected.

The original projection DPP sampler of [HKPVirag06] Algorithm 18, was given in an abstract form, which can be implemented using the following strategy. Write the determinant in (27) as a telescopic product of ratios of determinants and use Schur complements to get

where \(\mathbf{K}_{i-1} = [K(x_p,x_q)]_{p,q=1}^{i-1}\) and \(\mathbf{K}_{i-1}(x) = (K(x,x_1), \dots, K(x,x_{i-1}))^{\top}\).

Important

The expression (27) indeed defines a probability distribution, with normalization constant \(r!\). In particular this distribution is exchangeable, i.e., invariant by permutation of the coordinates.

The successive ratios that appear in (28) are the normalized conditional densities (w.r.t. \(\mu\)) that drive the chain rule. The associated normalizing constants \(r-(i-1)\) are independent of the previous points.

Sampling

**projection**DPPs does not require the eigendecomposition of the kernel!

Hint

MLers will recognize (28) as the incremental posterior variance of a noise-free Gaussian Process (GP) model with kernel \(K\), see [RW06] Equation 2.26.

#### Geometrical interpretation¶

When the eigendecomposition of the kernel is available, the chain rule can be interpreted and implemented from a geometrical perspective, see, e.g., [LMollerR12] Algorithm 1.

Writing the Gram formulation of the kernel as

where \(\Phi(x) \triangleq (\phi_{1}(x), \dots, \phi_{r}(x))\) denotes the *feature vector* associated to \(x\in \mathbb{X}\).

The joint distribution (27) reads

Hint

The joint distribution (29) characterizes the fact that **projection** \(\operatorname{DPP}(K)\) favor sets of \(r=\operatorname{rank}(\mathbf{K})\) points \(\left(x_{1}, \dots, x_{r} \right)\) whose feature vectors \(\Phi(x_1), \dots \Phi(x_r)\) span a large volume.
This is another way of understanding the repulsive/diversity feature of DPPs.

Then, the previous telescopic product of ratios of determinants in (28) can be understood as the base \(\times\) height formula applied to compute \(\operatorname{Volume}^2 \{\Phi(x_1), \dots, \Phi(x_r)\}\), so that

where \(\mathbf{K}_{i-1} = \left[\left\langle \Phi(x_p)^{\top} \Phi(x_q) \right\rangle\right]_{p,q=1}^{i-1}\).

Hint

The overall procedure is akin to a sequential Gram-Schmidt orthogonalization of \(\Phi(x_{1}), \dots, \Phi(x_{N})\).

Attention

In contrast to the finite case where the conditionals are simply probability vectors, the chain rule formulations (28) and (30) require sampling from a continuous distribution. This can be done using a rejection sampling mechanism but finding a good proposal density with tight rejection bounds is a challenging problem [LMollerR12] Section 2.4. But it is achievable in some specific cases, see, e.g., Multivariate Jacobi Ensemble.

See also

Algorithm 18 [HKPVirag06] for the original abstract

**projection**DPP samplerProjection DPPs: the chain rule in the finite case

Some Orthogonal Polynomial Ensembles (specific instances of projection DPPs) can be sampled in \(\mathcal{O}(r^2)\) by computing the eigenvalues of properly randomised tridiagonal matrices.

The multivariate Jacobi ensemble whose

`sample()`

method relies on the chain rule described by (30).

### Generic DPPs: the spectral method¶

The procedure stems from the fact that generic DPPs are mixtures of projection DPPs, suggesting the following two steps algorithm. Given the spectral decomposition of the kernel

**Step 1.** Draw \(B_i\sim\operatorname{\mathcal{B}er}(\lambda_i)\) independently and note \(\{i_1,\dots,i_{N}\} = \{i~;~B_i=1\}\),

**Step 2.** Sample from the *projection* DPP with kernel \(\tilde{K}(x,y) = \sum_{n=1}^{N}\phi_{i_n}(x) \overline{\phi_{i_n}(y)}\).

Important

Step 1. selects a component of the mixture, see [LMollerR12] Section 2.4.1

Step 2. generates a sample from the projection \(\operatorname{DPP}(\tilde{K})\), cf. previous section.

Attention

Contrary to projection DPPs, the general case requires the eigendecomposition of the kernel (31).

See also

Spectral method for sampling finite DPPs.

## Approximate sampling¶

See also

Approximation of \(K(x,y)=K(x-y)\) by Fourier series [LMollerR12] Section 4