# Sampling¶

## Exact sampling¶

The procedure stems from the fact that generic DPPs are mixtures of projection DPPs, suggesting the following two steps algorithm. Given the spectral decomposition of the kernel

**Step 1.** Draw \(B_i\sim\operatorname{\mathcal{B}er}(\lambda_i)\) independently and note \(\{i_1,\dots,i_{N}\} = \{i~;~B_i=1\}\),

**Step 2.** Sample from the *projection* DPP with kernel \(\tilde{K}(x,y) = \sum_{n=1}^{N}\phi_{i_n}(x) \overline{\phi_{i_n}(y)}\).

Important

Step 1. selects a component of the mixture, see [LMollerR12] Section 2.4.1

Step 2. generates a sample from the projection \(\operatorname{DPP}(\tilde{K})\). This can be done using Algorithm 18 of [HKPVirag06] who provide a generic projection DPP sampler that we describe in the next section.

Attention

The strong requirement of the procedure is that the eigendecomposition of the continuous kernel in (27) must be available.

See also

Spectral method for sampling finite DPPs.

### Projection DPPs: the chain rule¶

In this section, we describe a generic procedure to perform Step 2., i.e., for sampling projection DPPs. It was originally proposed, in an abstract form, by [HKPVirag06] Algorithm 18.

For simplicity, consider a projection \(\operatorname{DPP}(K)\) with a real-valued orthogonal projection kernel. We note \(r=\operatorname{rank}(K)\) and write

where \(\Phi(x) \triangleq (\phi_{1}(x), \dots, \phi_{r}(x))\) denotes the *feature vector* associated to \(x\in \mathbb{X}\).

Important

The eigendecomposition of the kernel is not mandatory for sampling **projection** \(\operatorname{DPP}(K)\).

Recall that the number of points of this projection \(\operatorname{DPP}(K)\) is \(\mu\)-almost surely equal to \(r=\operatorname{rank}(K)\).

Using the invariance by permutation of the determinant and the fact that \(K\) is an orthogonal projection kernel, it is sufficient to apply the chain rule to sample \((x_1, \dots, x_r)\) with joint distribution

and forget about the order the points were selected, to obtain a valid sample \(X=\{x_{1}, \dots, x_{r}\} \sim \operatorname{DPP}(K)\).

Hint

In the end, the joint distribution (28) shows that projection DPPs favors sets of \(r=\operatorname{rank}(\mathbf{K})\) of items are associated to feature vectors that span large volumes. This is another way of understanding repulsiveness.

The chain rule can be interpreted from a geometrical perspective

where \(\mathbf{K}_{i-1} = [K(x_p,x_q)]_{p,q=1}^{i-1}\).

Using Woodbury’s formula the ratios of determinants in (29) can be expanded into

where \(\mathbf{K}_{i-1} = [K(x_p,x_q)]_{p,q=1}^{i-1}\) and \(\mathbf{K}_{i-1}(x) = (K(x,x_1), \dots, K(x,x_{i-1}))^{\top}\).

Hint

Important

The expression (28) indeed defines a probability distribution, with normalization constant \(r!\). In particular this distribution is exchangeable

The successive ratios that appear in (29) and (30) are the normalized conditional densities (w.r.t., \(\mu\)) that drive the chain rule. The associated normalizing constants \(r-(i-1)\) are independent of the previous points.

Caution

The main differences with the finite case is that we need to be able to sample from the conditionals that appear in (30). This can be done using a rejection sampling mechanism but finding the right proposal density is a challenging but achievable problem, see, e.g., Multivariate Jacobi Ensemble.

See also

Algorithm 18 [HKPVirag06]

Projection DPPs: the chain rule in the finite case

Orthogonal Polynomial Ensembles and their specific samplers

Multivariate Jacobi Ensemble whose

`sample()`

method relies on the chain rule described by (30)

## Approximate sampling¶

See also

Approximation of \(K(x,y)=K(x-y)\) by Fourier series [LMollerR12] Section 4