# Properties¶

Throughout this section, we assume $$\mathbf{K}$$ and $$\mathbf{L}$$ satisfy the sufficient conditions (3) and (4) respectively.

## Relation between correlation and likelihood kernels¶

1. Considering the DPP defined by $$\mathbf{L} \succeq 0_N$$, the associated correlation kernel $$\mathbf{K}$$ (1) can be derived as

(6)$\mathbf{K} = \mathbf{L}(I+\mathbf{L})^{—1} = I - (I+\mathbf{L})^{—1}.$

Theorem 2.2 [KT12].

2. Considering the DPP defined by $$0_N \preceq \mathbf{K} \prec I_N$$, the associated likelihood kernel $$\mathbf{L}$$ (2) can be derived as

(7)$\mathbf{L} = \mathbf{K}(I-\mathbf{K})^{—1} = -I + (I-\mathbf{K})^{—1}.$

Equation 25 [KT12].

Important

Thus, except for correlation kernels $$\mathbf{K}$$ with some eigenvalues equal to $$1$$, both $$\mathbf{K}$$ and $$\mathbf{L}$$ are diagonalizable in the same basis

(8)$\mathbf{K} = U \Lambda U^{\dagger}, \quad \mathbf{L} = U \Gamma U^{\dagger} \qquad \text{with} \qquad \lambda_n = \frac{\gamma_n}{1+\gamma_n}.$

Note

For DPPs with projection correlation kernel $$\mathbf{K}$$, the likelihood kernel $$\mathbf{L}$$ cannot be computed via (7), since $$\mathbf{K}$$ has at least one eigenvalue equal to $$1$$ ($$\mathbf{K}^2=\mathbf{K}$$).

Nevertheless, if you recall that the number of points of a projection DPP, then its likelihood reads

$\mathbb{P}[\mathcal{X}=S] = \det \mathbf{K}_S 1_{|S|=\operatorname{rank}(\mathbf{K})} \quad \forall S\subset [N].$
from numpy.random import randn, rand
from scipy.linalg import qr
from dppy.finite_dpps import FiniteDPP

r, N = 4, 10
eig_vals = rand(r)  # 0< <1
eig_vecs, _ = qr(randn(N, r), mode='economic')

DPP = FiniteDPP('correlation', **{'K_eig_dec': (eig_vals, eig_vecs)})
DPP.compute_L()

# - L (likelihood) kernel computed via:
# - eig_L = eig_K/(1-eig_K)
# - U diag(eig_L) U.T


## Generic DPPs as mixtures of projection DPPs¶

Projection DPPs are the building blocks of the model in the sense that generic DPPs are mixtures of projection DPPs.

Important

Consider $$\mathcal{X} \sim \operatorname{DPP}(\mathbf{K})$$ and write the spectral decomposition of the corresponding kernel as

$\mathbf{K} = \sum_{n=1}^N \lambda_n u_n u_n^{\dagger}.$

Then, denote $$\mathcal{X}^B \sim \operatorname{DPP}(\mathbf{K}^B)$$ with

$\mathbf{K}^B = \sum_{n=1}^N B_n u_n u_n^{\dagger}, \quad \text{where} \quad B_n \overset{\text{i.i.d.}}{\sim} \mathcal{B}er(\lambda_n),$

where $$\mathcal{X}^B$$ is obtained by first choosing $$B_1, \dots, B_N$$ independently and then sampling from $$\operatorname{DPP}(\mathbf{K}^B)$$ the DPP with orthogonal projection kernel $$\mathbf{K}^B$$.

Finally, we have $$\mathcal{X} \overset{d}{=} \mathcal{X}^B$$.

## Number of points¶

For projection DPPs, i.e., when $$\mathbf{K}$$ is an orthogonal projection matrix, one can show that $$|\mathcal{X}|=\operatorname{rank}(\mathbf{K})=\operatorname{Trace}(\mathbf{K})$$ almost surely (see, e.g., Lemma 17 of [HKPVirag06] or Lemma 2.7 of [KT12]).

In the general case, based on the fact that generic DPPs are mixtures of projection DPPs, we have

(9)$|\mathcal{X}| = \sum_{n=1}^N \operatorname{\mathcal{B}er} \left( \lambda_n \right) = \sum_{n=1}^N \operatorname{\mathcal{B}er} \left( \frac{\gamma_n}{1+\gamma_n} \right).$

Note

From (9) it is clear that $$|\mathcal{X}|\leq \operatorname{rank}(\mathbf{K})=\operatorname{rank}(\mathbf{L})$$.

### Expectation¶

(10)$\mathbb{E}[|\mathcal{X}|] = \operatorname{trace} \mathbf{K} = \sum_{n=1}^N \lambda_n = \sum_{n=1}^N \frac{\gamma_n}{1+\gamma_n}.$

The expected size of a DPP with likelihood matrix $$\mathbf{L}$$ is also related to the effective dimension $$d_{\text{eff}}(\mathbf{L}) = \operatorname{trace} (\mathbf{L}(\mathbf{L}+\mathbf{I})^{-1})= \operatorname{trace} \mathbf{K} = \mathbb{E}[|\mathcal{X}|]$$ of $$\mathbf{L}$$, a quantity with many applications in randomized numerical linear algebra and statistical learning theory (see e.g., [DerezinskiCV19]).

### Variance¶

(11)$\operatorname{\mathbb{V}ar}[|\mathcal{X}|] = \operatorname{trace} \mathbf{K} - \operatorname{trace} \mathbf{K}^2 = \sum_{n=1}^N \lambda_n(1-\lambda_n) = \sum_{n=1}^N \frac{\gamma_n}{(1+\gamma_n)^2}.$

Expectation and variance of Linear statistics.

import numpy as np
from scipy.linalg import qr
from dppy.finite_dpps import FiniteDPP

rng = np.random.RandomState(1)

r, N = 5, 10
eig_vals = rng.rand(r) # 0< <1
eig_vecs, _ = qr(rng.randn(N, r), mode='economic')

dpp_K = FiniteDPP('correlation', projection=False,
**{'K_eig_dec': (eig_vals, eig_vecs)})

nb_samples = 2000
for _ in range(nb_samples):
dpp_K.sample_exact(random_state=rng)

sizes = list(map(len, dpp_K.list_of_samples))
print('E[|X|]:\n emp={:.3f}, theo={:.3f}'
.format(np.mean(sizes), np.sum(eig_vals)))
print('Var[|X|]:\n emp={:.3f}, theo={:.3f}'
.format(np.var(sizes), np.sum(eig_vals*(1-eig_vals))))

E[|X|]:
emp=1.581, theo=1.587
Var[|X|]:
emp=0.795, theo=0.781


### Special cases¶

1. When the correlation kernel $$\mathbf{K}$$ (1) of $$\operatorname{DPP}(\mathbf{K})$$ is an orthogonal projection kernel, i.e., $$\operatorname{DPP}(\mathbf{K})$$ is a projection DPP, we have

(12)$|\mathcal{X}| = \operatorname{rank}(\mathbf{K}) = \operatorname{trace}(\mathbf{K}), \quad \text{almost surely}.$
import numpy as np
from scipy.linalg import qr
from dppy.finite_dpps import FiniteDPP

r, N = 4, 10
eig_vals = np.ones(r)
eig_vecs, _ = qr(rng.randn(N, r), mode='economic')

DPP = FiniteDPP('correlation', projection=True,
**{'K_eig_dec': (eig_vals, eig_vecs)})

for _ in range(1000):
DPP.sample_exact()

sizes = list(map(len, DPP.list_of_samples))
# np.array(DPP.list_of_samples).shape = (1000, 4)

assert([np.mean(sizes), np.var(sizes)] == [r, 0])


Important

Since $$|\mathcal{X}|=\operatorname{rank}(\mathbf{K})$$ points, almost surely, the likelihood of the projection $$\operatorname{DPP}(\mathbf{K})$$ reads

(13)$\mathbb{P}[\mathcal{X}=S] = \det \mathbf{K}_S 1_{|S|=\operatorname{rank} \mathbf{K}}.$

In other words, the projection DPP having for correlation kernel the orthogonal projection matrix $$\mathbf{K}$$ coincides with the k-DPP having likelihood kernel $$\mathbf{K}$$ when $$k=\operatorname{rank}(\mathbf{K})$$.

2. When the likelihood kernel $$\mathbf{L}$$ of $$\operatorname{DPP}(\mathbf{L})$$ (2) is an orthogonal projection kernel we have

(14)$|\mathcal{X}| \sim \operatorname{Binomial}(\operatorname{rank}(\mathbf{L}), 1/2).$
import numpy as np
from scipy.stats import binom, chisquare
from scipy.linalg import qr
import matplotlib.pyplot as plt
from dppy.finite_dpps import FiniteDPP

r, N = 5, 10
e_vals = np.ones(r)
e_vecs, _ = qr(np.random.randn(N, r), mode='economic')

dpp_L = FiniteDPP('likelihood',
projection=True,
**{'L_eig_dec': (e_vals, e_vecs)})

nb_samples = 1000
dpp_L.flush_samples
for _ in range(nb_samples):
dpp_L.sample_exact()

sizes = list(map(len, dpp_L.list_of_samples))

p = 0.5 # binomial parameter
rv = binom(r, p)

fig, ax = plt.subplots(1, 1)

x = np.arange(0, r + 1)

pdf = rv.pmf(x)
ax.plot(x, pdf,
'ro', ms=8,
label=r'pdf $Bin({}, {})$'.format(r, p))

hist = np.histogram(sizes, bins=np.arange(0, r + 2), density=True)[0]
ax.vlines(x, 0, hist,
colors='b', lw=5, alpha=0.5,
label='hist of sizes')

ax.legend(loc='best', frameon=False)

plt.title('p_value = {:.3f}'.format(chisquare(hist, pdf)[1]))
plt.show()


Fig. 5 Distribution of the numbe of points of $$\operatorname{DPP}(\mathbf{L})$$ with orthogonal projection kernel $$\mathbf{L}$$ with rank $$5$$.

## Geometrical insights¶

Kernels satisfying the sufficient conditions (3) and (4) can be expressed as

$\mathbf{K}_{ij} = \langle \phi_i, \phi_j \rangle \quad \text{and} \quad \mathbf{L}_{ij} = \langle \psi_i, \psi_j \rangle,$

where each item is represented by a feature vector $$\phi_i$$ (resp. $$\psi_i$$).

The geometrical view is then straightforward.

$\mathbb{P}[S\subset \mathcal{X}] = \det \mathbf{K}_S = \operatorname{Vol}^2 \{\phi_s\}_{s\in S}.$

$\mathbb{P}[\mathcal{X} = S] \propto \det \mathbf{L}_S = \operatorname{Vol}^2 \{\psi_s\}_{s\in S}.$

That is to say, DPPs favor subsets $$S$$ whose corresponding feature vectors span a large volume i.e. DPPs sample softened orthogonal bases.

## Diversity¶

The determinantal structure of DPPs encodes the notion of diversity. Deriving the pair inclusion probability, also called the 2-point correlation function using (1), we obtain

$\begin{split}\mathbb{P}[\{i, j\} \subset \mathcal{X}] &= \begin{vmatrix} \mathbb{P}[i \in \mathcal{X}] & \mathbf{K}_{i j}\\ \overline{\mathbf{K}_{i j}} & \mathbb{P}[j \in \mathcal{X}] \end{vmatrix}\\ &= \mathbb{P}[i \in \mathcal{X}] \mathbb{P}[j \in \mathcal{X}] - |\mathbf{K}_{i j}|^2,\end{split}$

so that, the larger $$|\mathbf{K}_{i j}|$$ less likely items $$i$$ and $$j$$ co-occur. If $$K_{ij}$$ models the similarity between items $$i$$ and $$j$$, DPPs are thus random diverse sets of elements.

## Conditioning¶

Like many other statistics of DPPs, the conditional probabilities can be expressed my means of a determinant and involve the correlation kernel $$\mathbf{K}$$ (1).

For any disjoint subsets $$S, T \subset [N]$$, i.e., such that $$S\cap T = \emptyset$$ we have

(15)$\mathbb{P}[T \subset \mathcal{X} \mid S \subset \mathcal{X}] = \det\left[\mathbf{K}_T - \mathbf{K}_{TS} \mathbf{K}_S^{-1} \mathbf{K}_{ST}\right],$
(16)$\mathbb{P}[T \subset \mathcal{X} \mid S \cap \mathcal{X} = \emptyset] = \det\left[\mathbf{K}_T - \mathbf{K}_{TS} (\mathbf{K}_S - I)^{-1} \mathbf{K}_{ST}\right].$