Documentation Status Build Status Coverage Status PyPI package

Welcome to DPPy’s documentation!

Determinantal point processes (DPPs) are specific probability distributions over clouds of points, which have been popular as models or computational tools across physics, probability, statistics, random matrices, and more recently machine learning. DPPs are often used to induce diversity or repulsiveness among the points of a sample.

Sampling from DPPs is more tractable than sampling generic point processes with interaction, but it remains a nontrivial matter and a research area of its own.

As a contraction of DPPs and Python, DPPy is an effort to gather:

The purpose of this documentation is to both provide a quick survey of DPPs and relate each mathematical property with its implementation in DPPy. The documentation can thus be read in different ways:

  • if you read the sections in the order they appear, they will first take you through mathematical definitions and quick illustrations of how these definitions are encoded in DPPy.

  • for more a traditional library documentation please refer to the corresponding API sections documenting the methods of each object, along with pointers to the mathematical definitions if needed.

  • you can also directly jump to the Jupyter notebooks, which showcase the use of some DPPy objects in more detail.

a sample of a 2D jacobi ensemble

histogram of the eigenvalues of the LUE

2D Multivariate Jacobi ensemble

\(\beta=2\)-Laguerre ensemble

kernel of the DPP associated to the uniform measure on spanning trees of a graph

graph associated to kernel K

\(\mathbf{K}\) kernel of Uniform Spanning Tree

Graph associated to \(\mathbf{K}\)

Installation instructions

See the installation instructions on GitHub.

How to cite this work?

We wrote a companion paper to DPPy which got accepted for publication in the MLOSS track of JMLR.

If you use this package, please consider citing it with this piece of BibTeX:

@article{GPBV19,
  author = {Gautier, Guillaume and Polito, Guillermo and Bardenet, R{\'{e}}mi and Valko, Michal},
  journal = {Journal of Machine Learning Research - Machine Learning Open Source Software (JMLR-MLOSS)},
  title = {{DPPy: DPP Sampling with Python}},
  keywords = {Computer Science - Machine Learning, Computer Science - Mathematical Software, Statistics - Machine Learning},
  url = {http://jmlr.org/papers/v20/19-179.html},
  year = {2019},
  archivePrefix = {arXiv},
  arxivId = {1809.07258},
  note = {Code at http://github.com/guilgautier/DPPy/ Documentation at http://dppy.readthedocs.io/}
}

Documentation contents

Finite DPPs

Definition

A finite point process \(\mathcal{X}\) on \([N] \triangleq \{1,\dots,N\}\) can be understood as a random subset. It is defined either via its:

  • inclusion probabilities (also called correlation functions)

    \[\mathbb{P}[S\subset \mathcal{X}], \text{ for } S\subset [N],\]
  • likelihood

    \[\mathbb{P}[\mathcal{X}=S], \text{ for } S\subset [N].\]

Hint

The determinantal feature of DPPs stems from the fact that such inclusion, resp. marginal probabilities are given by the principal minors of the corresponding correlation kernel \(\mathbf{K}\) (resp. likelihood kernel \(\mathbf{L}\)).

Inclusion probabilities

We say that \(\mathcal{X} \sim \operatorname{DPP}(\mathbf{K})\) with correlation kernel a complex matrix \(\mathbf{K}\) if

()\[\mathbb{P}[S\subset \mathcal{X}] = \det \mathbf{K}_S, \quad \forall S\subset [N],\]

where \(\mathbf{K}_S = [\mathbf{K}_{ij}]_{i,j\in S}\) i.e. the square submatrix of \(\mathbf{K}\) obtained by keeping only rows and columns indexed by \(S\).

Likelihood

We say that \(\mathcal{X} \sim \operatorname{DPP}(\mathbf{L})\) with likelihood kernel a complex matrix \(\mathbf{L}\) if

()\[\mathbb{P}[\mathcal{X}=S] = \frac{\det \mathbf{L}_S}{\det [I+\mathbf{L}]}, \quad \forall S\subset [N].\]
Existence

Some common sufficient conditions to guarantee existence are:

()\[\mathbf{K} = \mathbf{K}^{\dagger} \quad \text{and} \quad 0_N \preceq \mathbf{K} \preceq I_N,\]
()\[\mathbf{L} = \mathbf{L}^{\dagger} \quad \text{and} \quad \mathbf{L} \succeq 0_N,\]

where the dagger \(\dagger\) symbol means conjugate transpose.

Note

In the following, unless otherwise specified:

  • we work under the sufficient conditions (3) and (3),

  • \(\left(\lambda_{1}, \dots, \lambda_{N} \right)\) denote the eigenvalues of \(\mathbf{K}\),

  • \(\left(\gamma_{1}, \dots, \gamma_{N} \right)\) denote the eigenvalues of \(\mathbf{L}\).

# from numpy import sqrt
from numpy.random import rand, randn
from scipy.linalg import qr
from dppy.finite_dpps import FiniteDPP

r, N = 4, 10
e_vecs, _ = qr(randn(N, r), mode='economic')

# Inclusion K
e_vals_K = rand(r)  # in [0, 1]
dpp_K = FiniteDPP('correlation', **{'K_eig_dec': (e_vals_K, e_vecs)})
# or
# K = (e_vecs * e_vals_K).dot(e_vecs.T)
# dpp_K = FiniteDPP('correlation', **{'K': K})
dpp_K.plot_kernel()

# Marginal L
e_vals_L = e_vals_K / (1.0 - e_vals_K)
dpp_L = FiniteDPP('likelihood', **{'L_eig_dec': (e_vals_L, e_vecs)})
# or
# L = (e_vecs * e_vals_L).dot(e_vecs.T)
# dpp_L = FiniteDPP('likelihood', **{'L': K})
# Phi = (e_vecs * sqrt(e_vals_L)).T
# dpp_L = FiniteDPP('likelihood', **{'L_gram_factor': Phi})  # L = Phi.T Phi
dpp_L.plot_kernel()

(Source code)

_images/ex_plot_correlation_K_kernel_00.png

(png, hires.png, pdf) Correlation \(\mathbf{K}\) kernel

_images/ex_plot_correlation_K_kernel_01.png

(png, hires.png, pdf) Correlation \(\mathbf{K}\) kernel

Projection DPPs

Important

\(\operatorname{DPP}(\mathbf{K})\) defined by an orthogonal projection correlation kernel \(\mathbf{K}\) are called projection DPPs.

Recall that orthogonal projection matrices are notably characterized by

  1. \(\mathbf{K}^2=\mathbf{K}\) and \(\mathbf{K}^{\dagger}=\mathbf{K}\),

  2. or equivalently by \(\mathbf{K}=U U^{\dagger}\) with \(U^{\dagger} U=I_r\) where \(r=\operatorname{rank}(\mathbf{K})\).

They are indeed valid kernels since they meet the above sufficient conditions: they are Hermitian with eigenvalues \(0\) or \(1\).

from numpy import ones
from numpy.random import randn
from scipy.linalg import qr
from dppy.finite_dpps import FiniteDPP

r, N = 4, 10

eig_vals = ones(r)
A = randn(r, N)
eig_vecs, _ = qr(A.T, mode='economic')

proj_DPP = FiniteDPP('correlation', projection=True,
                     **{'K_eig_dec': (eig_vals, eig_vecs)})
# or
# proj_DPP = FiniteDPP('correlation', projection=True, **{'A_zono': A})
# K = eig_vecs.dot(eig_vecs.T)
# proj_DPP = FiniteDPP('correlation', projection=True, **{'K': K})
k-DPPs

A \(k\!\operatorname{-DPP}\) can be defined as \(\operatorname{DPP(\mathbf{L})}\) (2) conditioned to a fixed sample size \(|\mathcal{X}|=k\), we denote it \(k\!\operatorname{-DPP}(\mathbf{L})\).

It is naturally defined through its joint probabilities

()\[\mathbb{P}_{k\!\operatorname{-DPP}}[\mathcal{X}=S] = \frac{1}{e_k(L)} \det \mathbf{L}_S 1_{|S|=k},\]

where the normalizing constant \(e_k(L)\) corresponds to the elementary symmetric polynomial of order \(k\) evaluated in the eigenvalues of \(\mathbf{L}\),

\[\begin{split}e_k(\mathbf{L}) \triangleq e_k(\gamma_1, \dots, \gamma_N) = \sum_{\substack{S \subset [N]\\|S|=k}} \prod_{s\in S} \gamma_{s} = \sum_{\substack{S \subset [N]\\|S|=k}} \det L_S.\end{split}\]

Note

Obviously, one must take \(k \leq \operatorname{rank}(L)\) otherwise \(\det \mathbf{L}_S = 0\) for \(|S| = k > \operatorname{rank}(L)\).

Warning

k-DPPs are not DPPs in general. Viewed as a \(\operatorname{DPP}\) conditioned to a fixed sample size \(|\mathcal{X}|=k\), the only case where they coincide is when the original DPP is a projection \(\operatorname{DPP}(\mathbf{K})\), and \(k=\operatorname{rank}(\mathbf{K})\), see (13).

See also

Properties

Throughout this section, we assume \(\mathbf{K}\) and \(\mathbf{L}\) satisfy the sufficient conditions (3) and (4) respectively.

Relation between correlation and likelihood kernels
  1. Considering the DPP defined by \(\mathbf{L} \succeq 0_N\), the associated correlation kernel \(\mathbf{K}\) (1) can be derived as

    ()\[\mathbf{K} = \mathbf{L}(I+\mathbf{L})^{—1} = I - (I+\mathbf{L})^{—1}.\]

    See also

    Theorem 2.2 [KT12].

  2. Considering the DPP defined by \(0_N \preceq \mathbf{K} \prec I_N\), the associated likelihood kernel \(\mathbf{L}\) (2) can be derived as

    ()\[\mathbf{L} = \mathbf{K}(I-\mathbf{K})^{—1} = -I + (I-\mathbf{K})^{—1}.\]

    See also

    Equation 25 [KT12].

Important

Thus, except for correlation kernels \(\mathbf{K}\) with some eigenvalues equal to \(1\), both \(\mathbf{K}\) and \(\mathbf{L}\) are diagonalizable in the same basis

()\[\mathbf{K} = U \Lambda U^{\dagger}, \quad \mathbf{L} = U \Gamma U^{\dagger} \qquad \text{with} \qquad \lambda_n = \frac{\gamma_n}{1+\gamma_n}.\]

Note

For DPPs with projection correlation kernel \(\mathbf{K}\), the likelihood kernel \(\mathbf{L}\) cannot be computed via (7), since \(\mathbf{K}\) has at least one eigenvalue equal to \(1\) (\(\mathbf{K}^2=\mathbf{K}\)).

Nevertheless, if you recall that the number of points of a projection DPP, then its likelihood reads

\[\mathbb{P}[\mathcal{X}=S] = \det \mathbf{K}_S 1_{|S|=\operatorname{rank}(\mathbf{K})} \quad \forall S\subset [N].\]
from numpy.random import randn, rand
from scipy.linalg import qr
from dppy.finite_dpps import FiniteDPP

r, N = 4, 10
eig_vals = rand(r)  # 0< <1
eig_vecs, _ = qr(randn(N, r), mode='economic')

DPP = FiniteDPP('correlation', **{'K_eig_dec': (eig_vals, eig_vecs)})
DPP.compute_L()

# - L (likelihood) kernel computed via:
# - eig_L = eig_K/(1-eig_K)
# - U diag(eig_L) U.T
Generic DPPs as mixtures of projection DPPs

Projection DPPs are the building blocks of the model in the sense that generic DPPs are mixtures of projection DPPs.

Important

Consider \(\mathcal{X} \sim \operatorname{DPP}(\mathbf{K})\) and write the spectral decomposition of the corresponding kernel as

\[\mathbf{K} = \sum_{n=1}^N \lambda_n u_n u_n^{\dagger}.\]

Then, denote \(\mathcal{X}^B \sim \operatorname{DPP}(\mathbf{K}^B)\) with

\[\mathbf{K}^B = \sum_{n=1}^N B_n u_n u_n^{\dagger}, \quad \text{where} \quad B_n \overset{\text{i.i.d.}}{\sim} \mathcal{B}er(\lambda_n),\]

where \(\mathcal{X}^B\) is obtained by first choosing \(B_1, \dots, B_N\) independently and then sampling from \(\operatorname{DPP}(\mathbf{K}^B)\) the DPP with orthogonal projection kernel \(\mathbf{K}^B\).

Finally, we have \(\mathcal{X} \overset{d}{=} \mathcal{X}^B\).

Number of points

For projection DPPs, i.e., when \(\mathbf{K}\) is an orthogonal projection matrix, one can show that \(|\mathcal{X}|=\operatorname{rank}(\mathbf{K})=\operatorname{Trace}(\mathbf{K})\) almost surely (see, e.g., Lemma 17 of [HKPVirag06] or Lemma 2.7 of [KT12]).

In the general case, based on the fact that generic DPPs are mixtures of projection DPPs, we have

()\[|\mathcal{X}| = \sum_{n=1}^N \operatorname{\mathcal{B}er} \left( \lambda_n \right) = \sum_{n=1}^N \operatorname{\mathcal{B}er} \left( \frac{\gamma_n}{1+\gamma_n} \right).\]

Note

From (9) it is clear that \(|\mathcal{X}|\leq \operatorname{rank}(\mathbf{K})=\operatorname{rank}(\mathbf{L})\).

Expectation
()\[\mathbb{E}[|\mathcal{X}|] = \operatorname{trace} \mathbf{K} = \sum_{n=1}^N \lambda_n = \sum_{n=1}^N \frac{\gamma_n}{1+\gamma_n}.\]
Variance
()\[\operatorname{\mathbb{V}ar}[|\mathcal{X}|] = \operatorname{trace} \mathbf{K} - \operatorname{trace} \mathbf{K}^2 = \sum_{n=1}^N \lambda_n(1-\lambda_n) = \sum_{n=1}^N \frac{\gamma_n}{(1+\gamma_n)^2}.\]

See also

Expectation and variance of Linear statistics.

import numpy as np
from scipy.linalg import qr
from dppy.finite_dpps import FiniteDPP

rng = np.random.RandomState(1)

r, N = 5, 10
eig_vals = rng.rand(r) # 0< <1
eig_vecs, _ = qr(rng.randn(N, r), mode='economic')

dpp_K = FiniteDPP('correlation', projection=False,
                **{'K_eig_dec': (eig_vals, eig_vecs)})

nb_samples = 2000
for _ in range(nb_samples):
    dpp_K.sample_exact(random_state=rng)

sizes = list(map(len, dpp_K.list_of_samples))
print('E[|X|]:\n emp={:.3f}, theo={:.3f}'
      .format(np.mean(sizes), np.sum(eig_vals)))
print('Var[|X|]:\n emp={:.3f}, theo={:.3f}'
      .format(np.var(sizes), np.sum(eig_vals*(1-eig_vals))))
E[|X|]:
 emp=1.581, theo=1.587
Var[|X|]:
 emp=0.795, theo=0.781
Special cases
  1. When the correlation kernel \(\mathbf{K}\) (1) of \(\operatorname{DPP}(\mathbf{K})\) is an orthogonal projection kernel, i.e., \(\operatorname{DPP}(\mathbf{K})\) is a projection DPP, we have

    ()\[|\mathcal{X}| = \operatorname{rank}(\mathbf{K}) = \operatorname{trace}(\mathbf{K}), \quad \text{almost surely}.\]
    import numpy as np
    from scipy.linalg import qr
    from dppy.finite_dpps import FiniteDPP
    
    r, N = 4, 10
    eig_vals = np.ones(r)
    eig_vecs, _ = qr(rng.randn(N, r), mode='economic')
    
    DPP = FiniteDPP('correlation', projection=True,
                    **{'K_eig_dec': (eig_vals, eig_vecs)})
    
    for _ in range(1000):
        DPP.sample_exact()
    
    sizes = list(map(len, DPP.list_of_samples))
    # np.array(DPP.list_of_samples).shape = (1000, 4)
    
    assert([np.mean(sizes), np.var(sizes)] == [r, 0])
    

    Important

    Since \(|\mathcal{X}|=\operatorname{rank}(\mathbf{K})\) points, almost surely, the likelihood of the projection \(\operatorname{DPP}(\mathbf{K})\) reads

    ()\[\mathbb{P}[\mathcal{X}=S] = \det \mathbf{K}_S 1_{|S|=\operatorname{rank} \mathbf{K}}.\]

    In other words, the projection DPP having for correlation kernel the orthogonal projection matrix \(\mathbf{K}\) coincides with the k-DPP having likelihood kernel \(\mathbf{K}\) when \(k=\operatorname{rank}(\mathbf{K})\).

  2. When the likelihood kernel \(\mathbf{L}\) of \(\operatorname{DPP}(\mathbf{L})\) (2) is an orthogonal projection kernel we have

    ()\[|\mathcal{X}| \sim \operatorname{Binomial}(\operatorname{rank}(\mathbf{L}), 1/2).\]
    import numpy as np
    from scipy.stats import binom, chisquare
    from scipy.linalg import qr
    import matplotlib.pyplot as plt
    from dppy.finite_dpps import FiniteDPP
    
    
    r, N = 5, 10
    e_vals = np.ones(r)
    e_vecs, _ = qr(np.random.randn(N, r), mode='economic')
    
    dpp_L = FiniteDPP('likelihood',
                      projection=True,
                      **{'L_eig_dec': (e_vals, e_vecs)})
    
    nb_samples = 1000
    dpp_L.flush_samples
    for _ in range(nb_samples):
        dpp_L.sample_exact()
    
    sizes = list(map(len, dpp_L.list_of_samples))
    
    p = 0.5 # binomial parameter
    rv = binom(r, p)
    
    fig, ax = plt.subplots(1, 1)
    
    x = np.arange(0, r + 1)
    
    pdf = rv.pmf(x)
    ax.plot(x, pdf,
            'ro', ms=8,
            label=r'pdf $Bin({}, {})$'.format(r, p))
    
    hist = np.histogram(sizes, bins=np.arange(0, r + 2), density=True)[0]
    ax.vlines(x, 0, hist,
              colors='b', lw=5, alpha=0.5,
              label='hist of sizes')
    
    ax.legend(loc='best', frameon=False)
    
    plt.title('p_value = {:.3f}'.format(chisquare(hist, pdf)[1]))
    plt.show()
    

    (Source code, png, hires.png, pdf)

    _images/ex_plot_number_of_points_finite_dpp_L_projection.png

    Distribution of the numbe of points of \(\operatorname{DPP}(\mathbf{L})\) with orthogonal projection kernel \(\mathbf{L}\) with rank \(5\).

Geometrical insights

Kernels satisfying the sufficient conditions (3) and (4) can be expressed as

\[\mathbf{K}_{ij} = \langle \phi_i, \phi_j \rangle \quad \text{and} \quad \mathbf{L}_{ij} = \langle \psi_i, \psi_j \rangle,\]

where each item is represented by a feature vector \(\phi_i\) (resp. \(\psi_i\)).

The geometrical view is then straightforward.

  1. The inclusion probabilities read

    \[\mathbb{P}[S\subset \mathcal{X}] = \det \mathbf{K}_S = \operatorname{Vol}^2 \{\phi_s\}_{s\in S}.\]
  2. The likelihood reads

    \[\mathbb{P}[\mathcal{X} = S] \propto \det \mathbf{L}_S = \operatorname{Vol}^2 \{\psi_s\}_{s\in S}.\]

That is to say, DPPs favor subsets \(S\) whose corresponding feature vectors span a large volume i.e. DPPs sample softened orthogonal bases.

Diversity

The determinantal structure of DPPs encodes the notion of diversity. Deriving the pair inclusion probability, also called the 2-point correlation function using (1), we obtain

\[\begin{split}\mathbb{P}[\{i, j\} \subset \mathcal{X}] &= \begin{vmatrix} \mathbb{P}[i \in \mathcal{X}] & \mathbf{K}_{i j}\\ \overline{\mathbf{K}_{i j}} & \mathbb{P}[j \in \mathcal{X}] \end{vmatrix}\\ &= \mathbb{P}[i \in \mathcal{X}] \mathbb{P}[j \in \mathcal{X}] - |\mathbf{K}_{i j}|^2,\end{split}\]

so that, the larger \(|\mathbf{K}_{i j}|\) less likely items \(i\) and \(j\) co-occur. If \(K_{ij}\) models the similarity between items \(i\) and \(j\), DPPs are thus random diverse sets of elements.

Conditioning

Like many other statistics of DPPs, the conditional probabilities can be expressed my means of a determinant and involve the correlation kernel \(\mathbf{K}\) (1).

For any disjoint subsets \(S, T \subset [N]\), i.e., such that \(S\cap T = \emptyset\) we have

()\[\mathbb{P}[T \subset \mathcal{X} \mid S \subset \mathcal{X}] = \det\left[\mathbf{K}_T - \mathbf{K}_{TS} \mathbf{K}_S^{-1} \mathbf{K}_{ST}\right],\]
()\[\mathbb{P}[T \subset \mathcal{X} \mid S \cap \mathcal{X} = \emptyset] = \det\left[\mathbf{K}_T - \mathbf{K}_{TS} (\mathbf{K}_S - I)^{-1} \mathbf{K}_{ST}\right].\]

See also

Exact sampling

Consider a finite DPP defined by its correlation kernel \(\mathbf{K}\) (1) or likelihood kernel \(\mathbf{L}\) (2). There exist three main types of exact sampling procedures:

  1. The spectral method requires the eigendecomposition of the correlation kernel \(\mathbf{K}\) or the likelihood kernel \(\mathbf{L}\). It is based on the fact that generic DPPs are mixtures of projection DPPs together with the application of the chain rule to sample projection DPPs. It is presented in Section Spectral method.

  2. A Cholesky-based procedure which requires the correlation kernel \(\mathbf{K}\) (even non-Hermitian!). It boilds down to applying the chain rule on sets; where each item in turn is decided to be excluded or included in the sample. It is presented in Section Cholesky-based method.

  3. Recently, [DerezinskiCV19] have also proposed an alternative method to get exact samples: first sample an intermediate distribution and correct the bias by thinning the intermediate sample using a carefully designed DPP. It is presented in Section Intermediate sampling method.

Note

  • There exist specific samplers for special DPPs, like the ones presented in Section Exotic DPPs.

Important

In the next section, we describe the Algorithm 18 of [HKPVirag06], based on the chain rule, which was originally designed to sample continuous projection DPPs. Obviously, it has found natural a application in the finite setting for sampling projection \(\operatorname{DPP}(\mathbf{K})\). However, we insist on the fact that this chain rule mechanism is specific to orthogonal projection kernels. In particular, it cannot be applied blindly to sample general \(k\!\operatorname{-DPP}(\mathbf{L})\) but it is valid when \(\mathbf{L}\) is an orthogonal projection kernel.

This crucial point is developed in the following Caution section.

Projection DPPs: the chain rule

In this section, we describe the generic projection DPP sampler, originally derived by [HKPVirag06] Algorithm 18.

Recall that the number of points of a projection \(r=\operatorname{DPP}(\mathbf{K})\) is, almost surely, equal to \(\operatorname{rank}(K)\), so that the likelihood (13) of \(S=\{s_1, \dots, s_r\}\) reads

\[\mathbb{P}[\mathcal{X}=S] = \det \mathbf{K}_S.\]

Using the invariance by permutation of the determinant and the fact that \(\mathbf{K}\) is an orthogonal projection matrix, it is sufficient to apply the chain rule to sample \((s_1, \dots, s_r)\) with joint distribution

\[\mathbb{P}[(s_1, \dots, s_r)] = \frac{1}{r!} \mathbb{P}[\mathcal{X}=\{s_1, \dots, s_r\}] = \frac{1}{r!} \det \mathbf{K}_S,\]

and forget about the sequential feature of the chain rule to get a valid sample \(\{s_1, \dots, s_r\} \sim \operatorname{DPP}(\mathbf{K})\).

Considering \(S=\{s_1, \dots, s_r\}\) such that \(\mathbb{P}[\mathcal{X}=S] = \det \mathbf{K}_S > 0\), the following generic formulation of the chain rule

\[\mathbb{P}[(s_1, \dots, s_r)] = \mathbb{P}[s_1] \prod_{i=2}^{r} \mathbb{P}[s_{i} | s_{1:i-1}],\]

can be expressed as a telescopic ratio of determinants

()\[\mathbb{P}[(s_1, \dots, s_r)] = \dfrac{\mathbf{K}_{s_1,s_1}}{r} \prod_{i=2}^{r} \dfrac{1}{r-(i-1)} \frac{\det \mathbf{K}_{S_i}} {\det \mathbf{K}_{S_{i-1}}},\]

where \(S_{i-1} = \{s_{1}, \dots, s_{i-1}\}\).

Using Woodbury’s formula the ratios of determinants in (17) can be expanded into

()\[\mathbb{P}[(s_1, \dots, s_r)] = \dfrac{\mathbf{K}_{s_1,s_1}}{r} \prod_{i=2}^{r} \dfrac{ \mathbf{K}_{s_i, s_i} - \mathbf{K}_{s_i, S_{i-1}} {\mathbf{K}_{S_{i-1}}}^{-1} \mathbf{K}_{S_{i-1}, s_i} }{r-(i-1)}.\]

Hint

MLers will recognize in (18) the incremental posterior variance of the Gaussian Process (GP) associated to \(\mathbf{K}\), see [RW06] Equation 2.26.

Caution

The connexion between the chain rule (18) and Gaussian Processes is valid in the case where the GP kernel is an orthogonal projection kernel, see also Caution.

See also

Geometrical interpretation

Hint

Since \(\mathbf{K}\) is an orthogonal projection matrix, the following Gram factorizations provide an insightful geometrical interpretation of the chain rule mechanism (17):

1. Using \(\mathbf{K} = \mathbf{K}^2\) and \(\mathbf{K}^{\dagger}=\mathbf{K}\), we have \(\mathbf{K} = \mathbf{K} \mathbf{K}^{\dagger}\), so that the chain rule (17) becomes

()\[\begin{split}\mathbb{P}[(s_1, \dots, s_r)] &= \frac{1}{r!} \operatorname{Volume}^2( \mathbf{K}_{s_{1},:}, \dots, \mathbf{K}_{s_{r},:} )\\ &= \dfrac{\left\| \mathbf{K}_{s_1,:} \right\|^2}{r} \prod_{i=2}^{r} \dfrac{ \operatorname{distance}^2 (\mathbf{K}_{s_{i},:}, \operatorname{Span} \left\{ \mathbf{K}_{s_{1},:}, \dots, \mathbf{K}_{s_{i-1},:} \right\} }{r-(i-1)}.\end{split}\]
  1. Using the eigendecomposition, we can write \(\mathbf{K} = U U^{\dagger}\) where \(U^{\dagger} U = I_r\), so that the chain rule (17) becomes

()\[\begin{split}\mathbb{P}[(s_1, \dots, s_r)] &= \frac{1}{r!} \operatorname{Volume}^2( U_{s_{1},:}, \dots, U_{s_{r},:} )\\ &= \dfrac{\left\| U_{s_1,:} \right\|^2}{r} \prod_{i=2}^{r} \dfrac{ \operatorname{distance}^2 (U_{s_{i},:}, \operatorname{Span} \left\{ U_{s_{1},:}, \dots, U_{s_{i-1},:} \right\} }{r-(i-1)}.\end{split}\]

In other words, the chain rule formulated as (19) and (20) is akin to do Gram-Schmidt orthogonalization of the feature vectors \(\mathbf{K}_{i,:}\) or \(\mathbf{U}_{i,:}\). These formulations can also be understood as an application of the base \(\times\) height formula. In the end, projection DPPs favors sets of \(r=\operatorname{rank}(\mathbf{K})\) of items are associated to feature vectors that span large volumes. This is another way of understanding diversity.

See also

MCMC samplers

In practice

The cost of getting one sample from a projection DPP is of order \(\mathcal{O}(N\operatorname{rank}(\mathbf{K})^2)\), whenever \(\operatorname{DPP}(\mathbf{K})\) is defined through

  • \(\mathbf{K}\) itself; sampling relies on formulations (19) or (18)

    import numpy as np
    from scipy.linalg import qr
    from dppy.finite_dpps import FiniteDPP
    
    seed = 0
    rng = np.random.RandomState(seed)
    
    r, N = 4, 10
    eig_vals = np.ones(r)  # For projection DPP
    eig_vecs, _ = qr(rng.randn(N, r), mode='economic')
    
    DPP = FiniteDPP(kernel_type='correlation',
                    projection=True,
                    **{'K': (eig_vecs * eig_vals).dot(eig_vecs.T)})
    
    for mode in ('GS', 'Schur', 'Chol'):  # default: GS
    
        rng = np.random.RandomState(seed)
        DPP.flush_samples()
    
        for _ in range(10):
            DPP.sample_exact(mode=mode, random_state=rng)
    
        print(DPP.sampling_mode)
        print(DPP.list_of_samples)
    
    GS
    [[5, 7, 2, 1], [4, 6, 2, 9], [9, 2, 6, 4], [5, 9, 0, 1], [0, 8, 6, 7], [9, 6, 2, 7], [0, 6, 2, 9], [5, 2, 1, 8], [5, 4, 0, 8], [5, 6, 9, 1]]
    Schur
    [[5, 7, 2, 1], [4, 6, 2, 9], [9, 2, 6, 4], [5, 9, 0, 1], [0, 8, 6, 7], [9, 6, 2, 7], [0, 6, 2, 9], [5, 2, 1, 8], [5, 4, 0, 8], [5, 6, 9, 1]]
    Chol
    [[5, 7, 6, 0], [4, 6, 5, 7], [9, 5, 0, 1], [5, 9, 2, 4], [0, 8, 1, 7], [9, 0, 5, 1], [0, 6, 5, 9], [5, 0, 1, 9], [5, 0, 2, 8], [5, 6, 9, 1]]
    

    See also

    • sample_exact()

    • [HKPVirag06] Theorem 7, Algorithm 18 and Proposition 19, for the original idea

    • [Pou19] Algorithm 3, for the equivalent Cholesky-based perspective with cost of order \(\mathcal{O}(N \operatorname{rank}(\mathbf{K})^2)\)

  • its eigenvectors \(U\), i.e., \(\mathbf{K}=U U^{\dagger}\) with \(U^{\dagger}U = I_{\operatorname{rank}(\mathbf{K})}\); sampling relies on (20)

    import numpy as np
    from scipy.linalg import qr
    from dppy.finite_dpps import FiniteDPP
    
    seed = 0
    rng = np.random.RandomState(seed)
    
    r, N = 4, 10
    eig_vals = np.ones(r)  # For projection DPP
    eig_vecs, _ = qr(rng.randn(N, r), mode='economic')
    
    DPP = FiniteDPP(kernel_type='correlation',
                                    projection=True,
                                    **{'K_eig_dec': (eig_vals, eig_vecs)})
    
    rng = np.random.RandomState(seed)
    
    for _ in range(10):
            # mode='GS': Gram-Schmidt (default)
            DPP.sample_exact(mode='GS', random_state=rng)
    
    print(DPP.list_of_samples)
    
    [[5, 7, 2, 1], [4, 6, 2, 9], [9, 2, 6, 4], [5, 9, 0, 1], [0, 8, 6, 7], [9, 6, 2, 7], [0, 6, 2, 9], [5, 2, 1, 8], [5, 4, 0, 8], [5, 6, 9, 1]]
    

    See also

    • sample_exact()

    • [HKPVirag06] Theorem 7, Algorithm 18 and Proposition 19, for the original idea

    • [KT12] Algorithm 1, for a first interpretation of the spectral counterpart of [HKPVirag06] Algorithm 18 running in \(\mathcal{O}(N \operatorname{rank}(\mathbf{K})^3)\)

    • [Gil14] Algorithm 2, for the \(\mathcal{O}(N \operatorname{rank}(\mathbf{K})^2)\) implementation

    • [TBA18] Algorithm 3, for a technical report on DPP sampling

Spectral method
Main idea

The procedure stems from Theorem 7 of [HKPVirag06], i.e., the fact that generic DPPs are mixtures of projection DPPs, suggesting the following two steps algorithm. Given the spectral decomposition of the correlation kernel \(\mathbf{K}\)

\[\mathbf{K} = U \Lambda U^{\dagger} = \sum_{n=1}^{N} \lambda_n u_n u_n^{\dagger}.\]

Step 1. Draw independent Bernoulli random variables \(B_n \sim \operatorname{\mathcal{B}er}(\lambda_n)\) for \(n=1,\dots, N\) and collect \(\mathcal{B}=\left\{ n ~;~ B_n=1 \right\}\),

Step 2. Sample from the projection DPP with correlation kernel \(U_{:\mathcal{B}} {U_{:\mathcal{B}}}^{\dagger} = \sum_{n\in \mathcal{B}} u_n u_n^{\dagger}\), see the section above.

Note

Step 1. selects a component of the mixture while Step 2. requires sampling from the corresponding projection DPP, cf. Projection DPPs: the chain rule

In practice
  • Sampling projection \(\operatorname{DPP}(\mathbf{K})\) from the eigendecomposition of \(\mathbf{K}=U U^{\dagger}\) with \(U^{\dagger}U = I_{\operatorname{rank}(\mathbf{K})}\)) was presented in the section above

  • Sampling \(\operatorname{DPP}(\mathbf{K})\) from \(0_N \preceq\mathbf{K} \preceq I_N\) can be done by following

    Step 0. compute the eigendecomposition of \(\mathbf{K} = U \Lambda U^{\dagger}\) in \(\mathcal{O}(N^3)\).

    Step 1. draw independent Bernoulli random variables \(B_n \sim \operatorname{\mathcal{B}er}(\lambda_n)\) for \(n=1,\dots, N\) and collect \(\mathcal{B}=\left\{ n ~;~ B_n=1 \right\}\)

    Step 2. sample from the projection DPP with correlation kernel defined by its eigenvectors \(U_{:, \mathcal{B}}\)

    Important

    Step 0. must be performed once and for all in \(\mathcal{O}(N^3)\). Then the average cost of getting one sample by applying Steps 1. and 2. is \(\mathcal{O}(N \mathbb{E}\left[|\mathcal{X}|\right]^2)\), where \(\mathbb{E}\left[|\mathcal{X}|\right]=\operatorname{trace}(\mathbf{K})=\sum_{n=1}^{N} \lambda_n\).

    from numpy.random import RandomState
    from scipy.linalg import qr
    from dppy.finite_dpps import FiniteDPP
    
    rng = RandomState(0)
    
    r, N = 4, 10
    eig_vals = rng.rand(r)  # For projection DPP
    eig_vecs, _ = qr(rng.randn(N, r), mode='economic')
    
    DPP = FiniteDPP(kernel_type='correlation',
                                    projection=False,
                                    **{'K': (eig_vecs*eig_vals).dot(eig_vecs.T)})
    
    for _ in range(10):
            # mode='GS': Gram-Schmidt (default)
            DPP.sample_exact(mode='GS', random_state=rng)
    
    print(DPP.list_of_samples)
    
    [[7, 0, 1, 4], [6], [0, 9], [0, 9], [8, 5], [9], [6, 5, 9], [9], [3, 0], [5, 1, 6]]
    
  • Sampling \(\operatorname{DPP}(\mathbf{L})\) from \(\mathbf{L} \succeq 0_N\) can be done by following

    Step 0. compute the eigendecomposition of \(\mathbf{L} = V \Gamma V^{\dagger}\) in \(\mathcal{O}(N^3)\).

    Step 1. is adapted to: draw independent Bernoulli random variables \(B_n \sim \operatorname{\mathcal{B}er}(\frac{\gamma_n}{1+\gamma_n})\) for \(n=1,\dots, N\) and collect \(\mathcal{B}=\left\{ n ~;~ B_n=1 \right\}\)

    Step 2. is adapted to: sample from the projection DPP with correlation kernel defined by its eigenvectors \(V_{:,\mathcal{B}}\).

    Important

    Step 0. must be performed once and for all in \(\mathcal{O}(N^3)\). Then the average cost of getting one sample by applying Steps 1. and 2. is \(\mathcal{O}(N \mathbb{E}\left[|\mathcal{X}|\right]^2)\), where \(\mathbb{E}\left[|\mathcal{X}|\right]=\operatorname{trace}(\mathbf{L(I+L)^{-1}})=\sum_{n=1}^{N} \frac{\gamma_n}{1+\gamma_n}\)

    from numpy.random import RandomState
    from scipy.linalg import qr
    from dppy.finite_dpps import FiniteDPP
    
    rng = RandomState(0)
    
    r, N = 4, 10
    phi = rng.randn(r, N)
    
    DPP = FiniteDPP(kernel_type='likelihood',
                                    projection=False,
                                    **{'L': phi.T.dot(phi)})
    
    for _ in range(10):
            # mode='GS': Gram-Schmidt (default)
            DPP.sample_exact(mode='GS', random_state=rng)
    
    print(DPP.list_of_samples)
    
    [[3, 1, 0, 4], [9, 6], [4, 1, 3, 0], [7, 0, 6, 4], [5, 0, 7], [4, 0, 2], [5, 3, 8, 4], [0, 5, 2], [7, 0, 2], [6, 0, 3]]
    
  • Sampling a \(\operatorname{DPP}(\mathbf{L})\) for which each item is represented by a \(d\leq N\) dimensional feature vector, all stored in a _feature_ matrix \(\Phi \in \mathbb{R}^{d\times N}\), so that \(\mathbf{L}=\Phi^{\top} \Phi \succeq 0_N\), can be done by following

    Step 0. compute the so-called dual kernel \(\tilde{L}=\Phi \Phi^{\dagger}\in \mathbb{R}^{d\times}\) and eigendecompose it \(\tilde{\mathbf{L}} = W \Delta W^{\top}\). This corresponds to a cost of order \(\mathcal{O}(Nd^2 + d^3)\).

    Step 1. is adapted to: draw independent Bernoulli random variables \(B_i \sim \operatorname{\mathcal{B}er}(\delta_i)\) for \(i=1,\dots, d\) and collect \(\mathcal{B}=\left\{ i ~;~ B_i=1 \right\}\)

    Step 2. is adapted to: sample from the projection DPP with correlation kernel defined by its eigenvectors \(\Phi^{\top} W_{:,\mathcal{B}} \Delta_{\mathcal{B}}^{-1/2}\). The computation of the eigenvectors requires \(\mathcal{O}(Nd|\mathcal{B}|)\).

    Important

    Step 0. must be performed once and for all in \(\mathcal{O}(Nd^2 + d^3)\). Then the average cost of getting one sample by applying Steps 1. and 2. is \(\mathcal{O}(Nd\mathbb{E}\left[|\mathcal{X}|\right] + N \mathbb{E}\left[|\mathcal{X}|\right]^2)\), where \(\mathbb{E}\left[|\mathcal{X}|\right]=\operatorname{trace}(\mathbf{\tilde{L}(I+\tilde{L})^{-1}})=\sum_{i=1}^{d} \frac{\delta_i}{1+\delta_i}\)

    See also

    For a different perspective see

    • [Gil14] Section 2.4.4 and Algorithm 3

    • [KT12] Section 3.3.3 and Algorithm 3

    from numpy.random import RandomState
    from scipy.linalg import qr
    from dppy.finite_dpps import FiniteDPP
    
    rng = RandomState(0)
    
    r, N = 4, 10
    phi = rng.randn(r, N)  # L = phi.T phi, L_dual = phi phi.T
    
    DPP = FiniteDPP(kernel_type='likelihood',
                                    projection=False,
                                    **{'L_gram_factor': phi})
    
    for _ in range(10):
            # mode='GS': Gram-Schmidt (default)
            DPP.sample_exact(mode='GS', random_state=rng)
    
    print(DPP.list_of_samples)
    
    L_dual = Phi Phi.T was computed: Phi (dxN) with d<N
    [[9, 0, 2, 3], [0, 1, 5, 2], [7, 0, 9, 4], [2, 0, 3], [6, 4, 0, 3], [5, 0, 6, 3], [0, 6, 3, 9], [4, 0, 9], [7, 3, 9, 4], [9, 4, 3]]
    
Cholesky-based method
Main idea

This method requires access to the correlation kernel \(\mathbf{K}\) to perform a bottom-up chain rule on sets: starting from the empty set, each item in turn is decided to be added or excluded from the sample. This can be summarized as the exploration of the binary probability tree displayed in Fig. 7.

_images/cholesky_chain_rule_sets.png

Probability tree corresponding to the chain rule on sets

Example: for \(N=5\), if \(\left\{ 1, 4 \right\}\) was sampled, the path in the probability tree would correspond to

\[\begin{split}\mathbb{P}\!\left[\mathcal{X} = \left\{ 1, 4 \right\}\right] = &\mathbb{P}\!\left[ 1\in \mathcal{X} \right]\\ &\times\mathbb{P}\!\left[ 2\notin \mathcal{X} \mid 1\in \mathcal{X} \right]\\ &\times\mathbb{P}\!\left[ 3\notin \mathcal{X} \mid 1\in \mathcal{X}, 2\notin \mathcal{X} \right]\\ &\times\mathbb{P}\!\left[ 4\in \mathcal{X} \mid 1\in \mathcal{X}, \left\{ 2, 3 \right\} \cap \mathcal{X} = \emptyset \right]\\ &\times\mathbb{P}\!\left[ 5\notin \mathcal{X} \mid \left\{ 1, 4 \right\} \subset \mathcal{X}, \left\{ 2, 3 \right\} \cap \mathcal{X} = \emptyset \right],\end{split}\]

where each conditional probability can be written in closed formed using (15) and (16), namely

\[\begin{split}\mathbb{P}[T \subset \mathcal{X} \mid S \subset \mathcal{X}] &= \det\left[\mathbf{K}_T - \mathbf{K}_{TS} \mathbf{K}_S^{-1} \mathbf{K}_{ST}\right]\\ \mathbb{P}[T \subset \mathcal{X} \mid S \cap \mathcal{X} = \emptyset] &= \det\left[\mathbf{K}_T - \mathbf{K}_{TS} (\mathbf{K}_S - I)^{-1} \mathbf{K}_{ST}\right].\end{split}\]

Important

This quantities can be computed efficiently as they appear in the computation of the Cholesky-type \(LDL^{\dagger}\) or \(LU\) factorization of the correlation \(\mathbf{K}\) kernel, in the Hermitian or non-Hermitian case, respectively. See [Pou19] for the details.

Note

The sparsity of \(\mathbf{K}\) can be leveraged to derive faster samplers using the correspondence between the chain rule on sets and Cholesky-type factorizations, see e.g., [Pou19] Section 4.

In practice

Important

  • The method is fully generic since it applies to any (valid), even non-Hermitian, correlation kernel \(\mathbf{K}\).

  • Each sample costs \(\mathcal{O}(N^3)\).

  • Nevertheless, the connexion between the chain rule on sets and Cholesky-type factorization is nicely supported by the surprising simplicty to implement the corresponding sampler.

# Poulson (2019, Algorithm 1) pseudo-code

sample = []
A = K.copy()

for j in range(N):

        if np.random.rand() < A[j, j]:  # Bernoulli(A_jj)
                sample.append(j)
        else:
                A[j, j] -= 1

        A[j+1:, j] /= A[j, j]
        A[j+1:, j+1:] -= np.outer(A[j+1:, j], A[j, j+1:])

return sample, A
from numpy.random import RandomState
from scipy.linalg import qr
from dppy.finite_dpps import FiniteDPP

rng = RandomState(1)

r, N = 4, 10
eig_vals = rng.rand(r)
eig_vecs, _ = qr(rng.randn(N, r), mode='economic')

DPP = FiniteDPP(kernel_type='correlation',
                                projection=False,
                                **{'K': (eig_vecs*eig_vals).dot(eig_vecs.T)})

for _ in range(10):
        DPP.sample_exact(mode='Chol', random_state=rng)

print(DPP.list_of_samples)
[[2, 9], [0], [2], [6], [4, 9], [2, 7, 9], [0], [1, 9], [0, 1, 2], [2]]

See also

Intermediate sampling method
Main idea

This method is based on the concept of a distortion-free intermediate sample, where we draw a larger sample of points in such a way that we can then downsample to the correct DPP distribution. We assume access to the likelihood kernel \(\mathbf{L}\) (although a variant of this method also exists for projection DPPs). Crucially the sampling relies on an important connection between DPPs and so-called ridge leverage scores (RLS, see [AM15]), which are commonly used for sampling in randomized linear algebra. Namely, the marginal probability of the i-th point in \(\mathcal{X} \sim \operatorname{DPP}(\mathbf{L})\) is also the i-th ridge leverage score of \(\mathbf{L}\) (with ridge parameter equal 1):

\[\mathbb{P}[i \in \mathcal{X}] = \big[\mathbf{L}(I + \mathbf{L})^{-1}\big]_{ii}=\tau_i,\quad i\text{th 1-ridge leverage score}.\]

Suppose that we draw a sample of \(t\) points i.i.d. proportional to ridge leverage scores, i.e., \(\sigma=(\sigma_1, \sigma_2,...,\sigma_t)\) such that \(\mathbb{P}[\sigma_j=i]\propto\tau_i\). Intuitively, this sample is similar fo \(\mathcal{X}\sim \operatorname{DPP}(\mathbf{L})\) except that it “ignores” all the dependencies between the points. However, if we sample sufficiently many points i.i.d. according to RLS, then a proper sample \(\mathcal{X}\) will likely be contained within it. This can be formally shown for \(t = O(\mathbb{E}[|\mathcal{X}|]^2)\). When \(\mathbb{E}[|\mathcal{X}|]^2\ll N\), then this allows us to reduce the size of the DPP kernel \(\mathbf{L}\) from \(N\times N\) to a much smaller size \(\mathbf{\tilde{L}}\) \(t\times t\). Making this sampling exact requires considerably more care, because even with a large \(t\) there is always a small probability that the i.i.d. sample \(\sigma\) is not sufficiently diverse. We guard against this possibility by rejection sampling.

Important

Use this method for sampling \(\mathcal{X} \sim \operatorname{DPP}(\mathbf{L})\) when \(\mathbb{E}\left[|\mathcal{X}|\right]\ll\sqrt{N}\).

  • Preprocessing costs \(\mathcal{O}\big(N\cdot \text{poly}(\mathbb{E}\left[|\mathcal{X}|\right])\, \text{polylog}(N)\big)\).

  • Each sample costs \(\mathcal{O}\big(\mathbb{E}[|\mathcal{X}|]^6\big)\).

In practice
from numpy.random import RandomState
from dppy.finite_dpps import FiniteDPP
from dppy.utils import example_eval_L_linear

rng = RandomState(1)

r, N = 4, 10

DPP = FiniteDPP('likelihood',
        **{'L_eval_X_data': (example_eval_L_linear, rng.randn(N, r))})

for _ in range(10):
    DPP.sample_exact(mode='vfx', random_state=rng, verbose=False)

print(DPP.list_of_samples)
[[5, 1, 0, 3], [9, 0, 8, 3], [6, 4, 1], [5, 1, 2], [2, 1, 3], [3, 8, 4, 0], [0, 8, 1], [7, 8], [1, 8, 2, 0], [5, 8, 3]]

The verbose=False flag is used to suppress the default progress bars when running in batch mode (e.g. when generating these docs).

Given, the RLS \(\tau_1,\dots,\tau_N\), the normalization constant \(\det(I+\tilde{\mathbf L}_\sigma)\) and access to the likelihood kernel \(\tilde{\mathbf L}_\sigma\), the intermediate sampling method proceeds as follows:

\[\begin{split}&\textbf{repeat}&\\ &&\text{sample }t \sim\mathrm{Poisson}(k^2\,\mathrm{e}^{1/k}),\hspace{3.5cm}\text{where }k=\mathbb{E}[|\mathcal{X}|]\\ &&\text{sample }\sigma_1,\dots,\sigma_t\sim (\tau_1,\dots\tau_N),\\ &&\text{sample } \textit{Acc}\sim\!\text{Bernoulli}\Big(\frac{\mathrm{e}^{k}\det(I+\tilde{\mathbf{L}}_{\sigma})}{\mathrm{e}^{t/k}\det(I+\mathbf{L})}\Big),\quad\text{where }\tilde{L}_{ij} = \frac1{k\sqrt{\tau_i\tau_j}}L_{ij},\\ &\textbf{until }&\textit{Acc}=\text{true}\\ &\textbf{return }&\mathcal{X}=\{\sigma_i:i\in \tilde{\mathcal{X}}\}\quad\text{where }\tilde{\mathcal{X}}\sim \operatorname{DPP}(\tilde{\mathbf{L}}_{\sigma})\end{split}\]

It can be shown that \(\mathcal{X}\) is distributed exactly according to \(\operatorname{DPP}(\mathbf{L})\) and the expected number of rejections is a small constant. The intermediate likelihood kernel \(\tilde{\mathbf L}_\sigma\) forms a \(t\times t\) DPP subproblem that can be solved using any other DPP sampler.

  • Since the size of the intermediate sample is \(t=\mathcal{O}(\mathbb{E}[\mathcal{X}]^2)\), the primary cost of the sampling is computing \(\det(I+\tilde{\mathbf L}_\sigma)\) which takes \(\mathcal{O}(t^3)=\mathcal{O}(\mathbb{E}[\mathcal{X}]^6)\) time. This is also the expected cost of sampling from \(\operatorname{DPP}(\tilde{\mathbf{L}}_{\sigma})\) if we use, for example, the spectral method.

  • The algorithm requires precomputing the RLS \(\tau_1,\dots,\tau_n\) and \(\det(I+\mathbf L)\). Computing them exactly takes \(\mathcal{O}(N^3)\), however, surprisingly, if we use sufficiently accurate approximations then the exactness of the sampling can be retained (details in [DerezinskiCV19]). Efficient methods for approximating leverage scores (see [RCCR18]) bring the precomputing cost down to \(\mathcal{O}(N \text{poly}(\mathbb{E}\left[|\mathcal{X}|\right]) \text{polylog}(N))\).

  • When \(\mathbb{E}[|\mathcal{X}|]\) is sufficiently small, the entire sampling procedure only looks at a small fraction of the entries of \(\mathbf{L}\). This makes the method useful when we want to avoid constructing the entire likelihood kernel.

  • When the likelihood kernel is given implicitly via a matrix \(\mathbf{X}\) such that \(\mathbf{L}=\mathbf{X}\mathbf{X}^\top\) (dual formulation) then a version of this method is given by [Derezinski19]

  • A variant of this method also exists for projection DPPs [DWH18]

See also

\(k\)-DPPs
Main idea

Recall from (5) that \(k\!\operatorname{-DPP}(\mathbf{L})\) can be viewed as a \(\operatorname{DPP}(\mathbf{L})\) constrained to a have fixed cardinality \(k \leq \operatorname{rank}(L)\).

To generate a sample of \(k\!\operatorname{-DPP}(\mathbf{L})\), one natural solution would be to use a rejection mechanism: draw \(S \sim \operatorname{DPP}(\mathbf{L})\) and keep it only if \(|X| = k\). However, the rejection constant may be pretty bad depending on the choice of \(k\) regarding the distribution of the number of points (9) of \(S \sim \operatorname{DPP}(\mathbf{L})\).

An alternative solution was found by [KT12] Section 5.2.2. The procedure relies on a slight modification of Step 1. of the Spectral method which requires the computation of the elementary symmetric polynomials.

In practice

Sampling \(k\!\operatorname{-DPP}(\mathbf{L})\) from \(\mathbf{L} \succeq 0_N\) can be done by following

Step 0.
  1. compute the eigendecomposition of \(\mathbf{L} = V \Gamma V^{\dagger}\) in \(\mathcal{O}(N^3)\)

  2. evaluate the elementary symmetric polynomials in the eigenvalues of \(\mathbf{L}\): \(E[l, n]:=e_l(\gamma_1, \dots, \gamma_n)\) for \(0\leq l\leq k\) and \(0\leq n\leq N\). These computations can done recursively in \(\mathcal{O}(N k)\) using Algorithm 7 of [KT12].

Step 1. is replaced by Algorithm 8 of [KT12] which we illustrate with the following pseudo-code

# Algorithm 8 of Kulesza Taskar (2012).
# This is a pseudo-code of in particular Python indexing is not respected everywhere

B = set({})
l = k

for n in range(N, 0, -1):

  if Unif(0,1) < gamma[n] * E[l-1, n-1] / E[l, n]:
    l -= 1
    B.union({n})

    if l == 0:
      break

Step 2. is adapted to: sample from the projection DPP with correlation kernel defined by its eigenvectors \(V_{:,\mathcal{B}}\), with a cost of order \(\mathcal{O}(N k^2)\).

Important

Step 0. must be performed once and for all in \(\mathcal{O}(N^3 + Nk)\). Then the cost of getting one sample by applying Steps 1. and 2. is \(\mathcal{O}(N k^2)\).

import numpy as np
from dppy.finite_dpps import FiniteDPP

rng = np.random.RandomState(1)

r, N = 5, 10
# Random feature vectors
Phi = rng.randn(r, N)
DPP = FiniteDPP('likelihood', **{'L': Phi.T.dot(Phi)})

k = 4
for _ in range(10):
    DPP.sample_exact_k_dpp(size=k, random_state=rng)

print(DPP.list_of_samples)
[[1, 8, 5, 7], [3, 8, 5, 9], [5, 3, 1, 8], [5, 8, 2, 9], [1, 2, 9, 6], [1, 0, 2, 3], [7, 0, 3, 5], [8, 3, 7, 6], [0, 2, 3, 7], [1, 3, 7, 5]]

See also

  • sample_exact_k_dpp()

  • Step 0. requires [KT12] Algorithm 7 for the recursive evaluation of the elementary symmetric polynomials \([e_l(\gamma_1, \dots, \gamma_n)]_{l=1, n=1}^{k, N}\) in the eigenvalues of \(\mathbf{L}\)

  • Step 1. calls [KT12] Algorithm 8 for selecting the eigenvectors for Step 2.

Caution

Attention

Since the number of points of \(k\!\operatorname{-DPP}(\mathbf{L})\) is fixed, like for projection DPPs, it might be tempting to sample \(k\!\operatorname{-DPP}(\mathbf{L})\) using a chain rule in the way it was applied in (18) to sample projection DPPs. However, it is incorrect: sampling sequentially

()\[s_1 \propto \mathbf{L}_{s,s}, \quad\text{then}\quad s_{i} \mid s_1, \dots, s_{i-1} \propto \mathbf{L}_{s, s} - \mathbf{L}_{s, S_{i-1}} {\mathbf{L}_{S_{i-1}}}^{-1} \mathbf{L}_{S_{i-1}, s}, \quad \text{for } 2\leq i \leq k,\]

where \(S_{i-1} = \{s_{1}, \dots, s_{i-1}\}\), and forgetting about the order \(s_{1}, \dots, s_{k}\) were selected does not provide a subset \(\{s_{1}, \dots, s_{k}\} \sim k\!\operatorname{-DPP}(\mathbf{L})\), in the general case. Nevertheless, it is valid when \(\mathbf{L}\) is an orthogonal projection kernel!

Here are the reasons why

  1. First keep in mind that, the ultimate goal is to draw a subset \(S=\{ s_{1}, \dots, s_{k} \} \sim k\!\operatorname{-DPP}(\mathbf{L})\) with probability (5)

()\[\mathbb{P}[\mathcal{X}=S] = \frac{1}{e_k(\mathbf{L})} \det \mathbf{L}_S 1_{|S|=k}.\]
  1. Now, if we were to use the chain rule (21) this would correspond to sampling sequentially the items \(s_1, \dots, s_{k}\), so that the resulting vector \((s_{1}, \dots, s_{k})\) has probability

()\[\begin{split}\mathbb{Q}[(s_{1}, \dots, s_{k})] &= \dfrac{\mathbf{L}_{s_1,s_1}}{Z_1} \prod_{i=2}^{k} \dfrac{ \mathbf{L}_{s_i, s_i} - \mathbf{L}_{s_i, S_{i-1}} {\mathbf{L}_{S_{i-1}}}^{-1} \mathbf{L}_{S_{i-1}, s_i} }{Z_i(s_{1}, \dots, s_{i-1})}\\ &= \frac{1}{Z(s_{1}, \dots, s_{k})} \det \mathbf{L}_S.\end{split}\]

Contrary to \(Z_1=\operatorname{trace}(\mathbf{L})\), the normalizations \(Z_i(s_{1}, \dots, s_{i-1})\) of the successive conditionals depend, a priori, on the order \(s_{1}, \dots, s_{k}\) were selected. For this reason we denote the global normalization constant \(Z(s_{1}, \dots, s_{k})\).

Warning

Equation (23) suggests that, the sequential feature of the chain rule matters, a priori; the distribution of \(\left(s_{1}, \dots, s_{k} \right)\) is not exchangeable a priori, i.e., it is not invariant to permutations of its coordinates. This fact, would only come from the normalization \(Z(s_{1}, \dots, s_{k})\), since \(\mathbf{L}_S\) is invariant by permutation.

Note

To see this, let’s compute the normalization constant \(Z_i(s_1, \dots, s_{i-1})\) in (23) for a generic \(\mathbf{L}\succeq 0_N\) factored as \(\mathbf{L} = VV^{\dagger}\), with no specific assumption on \(V\).

()\[\begin{split}Z_i(s_1, \dots, s_{i-1}) &= \sum_{i=1}^N \mathbf{L}_{ii} - \mathbf{L}_{i,S_{i-1}} \mathbf{\mathbf{L}}_{S_{i-1}}^{-1} \mathbf{L}_{S_{i-1},i}\\ &= \operatorname{trace}( \mathbf{L} - \mathbf{L}_{:, S_{i-1}} \left[\mathbf{\mathbf{L}}_{S_{i-1}}\right]^{-1} \mathbf{L}_{S_{i-1}, :} )\\ &= \operatorname{trace}\left( \mathbf{L} - V {V^{\dagger}}_{:,S_{i-1}} \left[V_{S_{i-1},:} {V^{\dagger}}_{:,S_{i-1}}\right]^{-1} V_{S_{i-1},:} V^{\dagger} \right)\\ &= \operatorname{trace} \big( \mathbf{L}_{ii} - \underbrace{{V_{S_{i-1}, :}}^{\dagger} \left[V_{S_{i-1}, :} {V_{S_{i-1}, :}}^{\dagger}\right]^{-1} V_{S_{i-1}, :}}_{\Pi_{V_{S_{i-1}, :}}} V^{\dagger}V \big)\\ &= \operatorname{trace}(\mathbf{L}) - \operatorname{trace}(\Pi_{V_{S_{i-1}, :}}V^{\dagger}V),\end{split}\]

where \(\Pi_{V_{S_{i-1}, :}}\) denotes the orthogonal projection onto \(\operatorname{Span}\{V_{s_1,:}, \dots, V_{s_i-1, :}\}\), the supspace spanned the feature vectors associated to \(s_{1}, \dots, s_{i-1}\).

Then, summing (23) over the \(k!\) permutations of \(1, \dots, k\), yields the probability of drawing the subset \(S=\left\{ s_{1}, \dots, s_{k} \right\}\), namely

()\[\mathbb{Q}[\{ s_{1}, \dots, s_{k} \}] = \sum_{\sigma \in \mathfrak{S}_k} \mathbb{Q}[(s_{\sigma(1)}, \dots, s_{\sigma(k)})] = \det\mathbf{L}_S \underbrace{ \sum_{\sigma \in \mathfrak{S}_k} \frac{1}{Z(s_{\sigma(1)}, \dots, s_{\sigma(k)})} }_{ 1/Z_S }.\]
  1. For the chain rule (23) to be a valid procedure for sampling \(k\!\operatorname{-DPP}(\mathbf{L})\), we must be able to identify (22) and (25), i.e., \(\mathbb{Q}[S] = \mathbb{P}[S]\) for all \(|S|=k\), or equivalently \(Z_S = e_k(L)\) for all \(|S|=k\).

Important

A sufficient condition (very likely to be necessary) is that the joint distribution of \((s_{1}, \dots, s_{k})\), generated by the chain rule mechanism (23) is exchangeable (invariant to permutations of the coordinates). In that case, the normalization in (23) would then be constant \(Z(s_{1}, \dots, s_{k})=Z\) . So that \(Z_S\) would in fact play the role of the normalization constant of (25), since it would be constant as well and equal to \(Z_S = \frac{Z}{k!}\). Finally, \(Z_S = e_k(L)\) by identification of (22) and (25).

This is what we can prove in the particular case where \(\mathbf{L}\) is an orthogonal projection matrix.

To do this, denote \(r=\operatorname{rank}(\mathbf{L})\) and recall that in this case \(\mathbf{L}\) satisfies \(\mathbf{L}^2=\mathbf{L}\) and \(\mathbf{L}^{\dagger}=\mathbf{L}\), so that it can be factored as \(\mathbf{L}=\Pi_{\mathbf{L}}=\mathbf{L}^{\dagger}\mathbf{L}=\mathbf{L}\mathbf{L}^{\dagger}\)

Finally, we can plug \(V=\mathbf{L}\) in (24) to obtain

\[\begin{split}Z_i(s_1, \dots, s_{i-1}) &= \operatorname{trace}(\mathbf{L}) - \operatorname{trace}(\Pi_{\mathbf{L}_{S_{i-1}, :}}\mathbf{L}^{\dagger}\mathbf{L})\\ &= \operatorname{trace}(\Pi_{\mathbf{L}}) - \operatorname{trace}(\Pi_{\mathbf{L}_{S_{i-1}, :}}\Pi_{\mathbf{L}})\\ &= \operatorname{trace}(\Pi_{\mathbf{L}}) - \operatorname{trace}(\Pi_{\mathbf{L}_{S_{i-1}, :}})\\ &= \operatorname{rank}(\Pi_{\mathbf{L}}) - \operatorname{rank}(\Pi_{\mathbf{L}_{S_{i-1}, :}})\\ &= r - (i - 1) := Z_i.\end{split}\]

Thus, the normalization \(Z(s_1, \dots, s_k)\) in (24) is constant as well equal to

\[Z(s_1, \dots, s_k) = \prod_{i=1}^{k} Z_i = \prod_{i=1}^{k} r - (i - 1) = \frac{r!}{(r-k)!} = k! {r \choose k} = k! e_k(\mathbf{L}) := Z,\]

where the last equality is a simple computation of the elementary symmetric polynomial

\[\begin{split}e_k(\mathbf{L}) = e_k(\gamma_{1:r}=1, \gamma_{r+1:N}=0) = \sum_{\substack{S \subset [N]\\|S|=k}} \prod_{s\in S} \gamma_{s} = {r \choose k}\end{split}\]

Important

This shows that, when \(\mathbf{L}\) is an orthogonal projection matrix, the order the items \(s_1, \dots, s_r\) were selected by the chain rule (23) can be forgotten, so that \(\{s_1, \dots, s_r\}\) can be considered as valid sample of \(k\!\operatorname{-DPP}(\mathbf{L})\).

# For our toy example, this sub-optimized implementation is enough
# to illustrate that the chain rule applied to sample k-DPP(L)
# draws s_1, ..., s_k sequentially, with joint probability
# P[(s_1, ..., s_k)] = det L_S / Z(s_1, ..., s_k)
#
# 1. is exchangeable when L is an orthogonal projection matrix
#    P[(s1, s2)] = P[(s_2, s_1)]
# 2. is a priori NOT exchangeable for a generic L >= 0
#    P[(s1, s2)] /= P[(s_2, s_1)]

import numpy as np
import scipy.linalg as LA
from itertools import combinations, permutations

k, N = 2, 4
potential_samples = list(combinations(range(N), k))

rank_L = 3

rng = np.random.RandomState(1)

eig_vecs, _ = LA.qr(rng.randn(N, rank_L), mode='economic')

for projection in [True, False]:

    eig_vals = 1.0 + (0.0 if projection else 2 * rng.rand(rank_L))
    L = (eig_vecs * eig_vals).dot(eig_vecs.T)

    proba = np.zeros((N, N))
    Z_1 = np.trace(L)

    for S in potential_samples:

        for s in permutations(S):

            proba[s] = LA.det(L[np.ix_(s, s)])

            Z_2_s0 = np.trace(L - L[:, s[:1]].dot(LA.inv(L[np.ix_(s[:1], s[:1])])).dot(L[s[:1], :]))

            proba[s] /= Z_1 * Z_2_s0

    print('L is {}projection'.format('' if projection else 'NOT '))

    print('P[s0, s1]', proba, sep='\n')
    print('P[s0]', proba.sum(axis=0), sep='\n')
    print('P[s1]', proba.sum(axis=1), sep='\n')

    print(proba.sum(), '\n' if projection else '')
L is projection
P[s0, s1]
[[0.         0.09085976 0.01298634 0.10338529]
 [0.09085976 0.         0.06328138 0.15368033]
 [0.01298634 0.06328138 0.         0.07580691]
 [0.10338529 0.15368033 0.07580691 0.        ]]
P[s0]
[0.20723139 0.30782147 0.15207463 0.33287252]
P[s1]
[0.20723139 0.30782147 0.15207463 0.33287252]
1.0000000000000002

L is NOT projection
P[s0, s1]
[[0.         0.09986722 0.01463696 0.08942385]
 [0.11660371 0.         0.08062998 0.20535251]
 [0.01222959 0.05769901 0.         0.04170435]
 [0.07995922 0.15726273 0.04463087 0.        ]]
P[s0]
[0.20879253 0.31482896 0.13989781 0.33648071]
P[s1]
[0.20392803 0.4025862  0.11163295 0.28185282]
1.0

MCMC sampling

Add/exchange/delete

[AGR16], [LJS16c] and [LJS16d] derived variants of a Metropolis sampler having for stationary distribution \(\operatorname{DPP}(\mathbf{L})\) (2). The proposal mechanism works as follows.

At state \(S\subset [N]\), propose \(S'\) different from \(S\) by at most 2 elements by picking

\[s \sim \mathcal{U}_{S} \quad \text{and} \quad t \sim \mathcal{U}_{[N]\setminus S}.\]

Then perform

Exchange

Pure exchange moves

\[S' \leftrightarrow S \setminus s \cup t.\]
Add-Delete

Pure addition/deletion moves

  • Add \(S' \leftrightarrow S \cup t\)

  • Delete \(S' \leftrightarrow S \setminus s\)

Add-Exchange-Delete

Mix of exchange and add-delete moves

  • Delete \(S' \leftrightarrow S \setminus s\)

  • Exchange \(S' \leftrightarrow S \setminus s \cup t\)

  • Add \(S' \leftrightarrow S \cup t\)

Hint

Because moves are allowed between subsets having at most 2 different elements, transitions are very local inducing correlation, however fast mixing was proved.

import numpy as np
from dppy.finite_dpps import FiniteDPP

rng = np.random.RandomState(413121)

r, N = 4, 10

# Random feature vectors
Phi = rng.randn(r, N)
L = Phi.T.dot(Phi)
DPP = FiniteDPP('likelihood', **{'L': L})

DPP.sample_mcmc('AED', random_state=rng)  # AED, AD, E
print(DPP.list_of_samples)  # list of trajectories, here there's only one
[[[0, 2, 3, 6], [0, 2, 3, 6], [0, 2, 3, 6], [0, 2, 3, 6], [0, 2, 3, 6], [0, 2, 3, 6], [0, 2, 6, 9], [0, 2, 6, 9], [2, 6, 9], [2, 6, 9]]]
Zonotope

[GBV17] target a projection \(\operatorname{DPP}(\mathbf{K})\) with

\[\mathbf{K} = \Phi^{\top} [\Phi \Phi^{\top}]^{-1} \Phi,\]

where \(\Phi\), is the underlying \(r\times N\) feature matrix satisfying \(\operatorname{rank}(\Phi)=\operatorname{rank}(\mathbf{K})=r\).

In this setting the Number of points is almost surely equal to \(r\) and we have

()\[\mathbb{P}[\mathcal{X}=S] = \det \mathbf{K}_S 1_{|S|=r} = \frac{\det^2\Phi_{:S}}{\det\Phi \Phi^{\top}} 1_{|S|=r} = \frac{\operatorname{Vol}^2 \{\phi_s\}_{s\in S}} {\det\Phi \Phi^{\top}} 1_{|S|=r}.\]

The original finite ground set is embedded into a continuous domain called a zonotope. The hit-and-run procedure is used to move across this polytope and visit the different tiles. To recover the finite DPP samples one needs to identify the tile in which the successive points lie, this is done by solving linear programs (LPs).

Hint

Sampling from a projection DPP boils down to solving randomized linear programs (LPs).

Important

For its LPs solving needs DPPy uses the cvxopt library, but cvxopt is not installed by default when installing DPPy. Please refer to the installation instructions on GitHub for more details on how to install the necessary dependencies.

from numpy.random import RandomState
from dppy.finite_dpps import FiniteDPP

rng = RandomState(413121)

r, N = 4, 10
A = rng.randn(r, N)

DPP = FiniteDPP('correlation', projection=True, **{'A_zono': A})

DPP.sample_mcmc('zonotope', random_state=rng)
print(DPP.list_of_samples)  # list of trajectories, here there's only one
[[[2, 4, 5, 7], [2, 4, 5, 7], [2, 4, 5, 7], [1, 4, 5, 7], [1, 4, 5, 7], [1, 4, 5, 7], [0, 4, 7, 8], [0, 2, 7, 9], [0, 2, 7, 9], [2, 4, 5, 7]]]

Note

On the one hand, the Zonotope perspective on sampling projection DPPs yields a better exploration of the state space. Using hit-and-run, moving to any other state is possible but at the cost of solving LPs at each step. On the other hand, the Add/exchange/delete view allows to perform cheap but local moves.

k-DPPs

To preserve the size \(k\) of the samples of \(k\!\operatorname{-DPP}(\mathbf{L})\), only Exchange moves can be performed.

Caution

\(k\) must satisfy \(k \leq \operatorname{rank}(L)\)

from numpy.random import RandomState
from dppy.finite_dpps import FiniteDPP

rng = RandomState(123)

r, N = 5, 10

# Random feature vectors
Phi = rng.randn(r, N)
L = Phi.T.dot(Phi)
DPP = FiniteDPP('likelihood', **{'L': L})

k = 3
DPP.sample_mcmc_k_dpp(size=k, random_state=rng)
print(DPP.list_of_samples)  # list of trajectories, here there's only one
[[[7, 2, 5], [7, 2, 5], [7, 2, 9], [7, 8, 9], [7, 8, 9], [7, 8, 2], [7, 8, 2], [6, 8, 2], [1, 8, 2], [1, 8, 2]]]

See also

Approximate sampling

[LJS16b]

Todo

In a near future this section will include approximation of the kernel through random projections.

API

Implementation of FiniteDPP object which has 6 main methods:

class dppy.finite_dpps.FiniteDPP(kernel_type, projection=False, **params)[source]

Bases: object

Finite DPP object parametrized by

Parameters
  • kernel_type (string) –

    • 'correlation' \(\mathbf{K}\) kernel

    • 'likelihood' \(\mathbf{L}\) kernel

  • projection (bool, default False) – Indicate whether the provided kernel is of projection type. This may be useful when the FiniteDPP object is defined through its correlation kernel \(\mathbf{K}\).

  • params (dict) –

    Dictionary containing the parametrization of the underlying

    • correlation kernel

      • {'K': K}, with \(0 \preceq \mathbf{K} \preceq I\)

      • {'K_eig_dec': (eig_vals, eig_vecs)}, with \(0 \leq eigvals \leq 1\)

      • {'A_zono': A}, with \(A (d \times N)\) and \(\operatorname{rank}(A)=d\)

    • likelihood kernel

      • {'L': L}, with \(\mathbf{L}\succeq 0\)

      • {'L_eig_dec': (eig_vals, eig_vecs)}, with \(eigvals \geq 0\)

      • {'L_gram_factor': Phi}, with \(\mathbf{L} = \Phi^{ \top} \Phi\)

      • {'L_eval_X_data': (eval_L, X_data)}, with \(\mathbf{X}_{data}(N \times d)\) and \(eval \_ L\) a likelihood function such that \(\mathbf{L} = eval \_ L(\mathbf{X}_{data}, \{X}_{data})\). For a full description of the requirements imposed on eval_L’s interface, see the documentation dppy.vfx_sampling.vfx_sampling_precompute_constants(). For an example, see the implementation of any of the kernels provided by scikit-learn (e.g. sklearn.gaussian_process.kernels.PairwiseKernel).

Caution

For now we only consider real valued matrices \(\mathbf{K}, \mathbf{L}, A, \Phi, \mathbf{X}_{data}\).

Todo

add .kernel_rank attribute

compute_K(msg=False)[source]

Compute the correlation kernel \(\mathbf{K}\) from the original parametrization of the FiniteDPP object.

The kernel is stored in the K attribute.

compute_L(msg=False)[source]

Compute the likelihood kernel \(\mathbf{L}\) from the original parametrization of the FiniteDPP object.

The kernel is stored in the L attribute.

flush_samples()[source]

Empty the list_of_samples attribute.

info()[source]

Display infos about the FiniteDPP object

plot_kernel(title='')[source]

Display a heatmap of the kernel used to define the FiniteDPP object (correlation kernel \(\mathbf{K}\) or likelihood kernel \(\mathbf{L}\))

Parameters

title (string) – Plot title

sample_exact(mode='GS', **params)[source]

Sample exactly from the corresponding FiniteDPP object.

Parameters
  • mode (string, default 'GS') –

    • projection=True:
      • 'GS' (default): Gram-Schmidt on the rows of \(\mathbf{K}\).

      • 'Chol' [Pou19] Algorithm 3

      • 'Schur': when DPP defined from correlation kernel K, use Schur complement to compute conditionals

    • projection=False:
      • 'GS' (default): Gram-Schmidt on the rows of the eigenvectors of \(\mathbf{K}\) selected in Phase 1.

      • 'GS_bis': Slight modification of 'GS'

      • 'Chol' [Pou19] Algorithm 1

      • 'KuTa12': Algorithm 1 in [KT12]

      • 'vfx': the dpp-vfx rejection sampler in [DerezinskiCV19]

  • params (dict) –

    Dictionary containing the parameters for exact samplers with keys

    • 'random_state' (default None)

    • If mode='vfx'

      See dpp_vfx_sampler() for a full list of all parameters accepted by ‘vfx’ sampling. We report here the most impactful

      • 'rls_oversample_dppvfx' (default 4.0) Oversampling parameter used to construct dppvfx’s internal Nystrom approximation. This makes each rejection round slower and more memory intensive, but reduces variance and the number of rounds of rejections.

      • 'rls_oversample_bless' (default 4.0) Oversampling parameter used during bless’s internal Nystrom approximation. This makes the one-time pre-processing slower and more memory intensive, but reduces variance and the number of rounds of rejections

      Empirically, a small factor [2,10] seems to work for both parameters. It is suggested to start with a small number and increase if the algorithm fails to terminate.

Returns

Returns a sample from the corresponding FiniteDPP object. In any case, the sample is appended to the list_of_samples attribute as a list.

Return type

list

Note

Each time you call this method, the sample is appended to the list_of_samples attribute as a list.

The list_of_samples attribute can be emptied using flush_samples()

Caution

The underlying kernel \(\mathbf{K}\), resp. \(\mathbf{L}\) must be real valued for now.

sample_exact_k_dpp(size, mode='GS', **params)[source]

Sample exactly from \(\operatorname{k-DPP}\). A priori the FiniteDPP object was instanciated by its likelihood \(\mathbf{L}\) kernel so that

\[\mathbb{P}_{\operatorname{k-DPP}}(\mathcal{X} = S) \propto \det \mathbf{L}_S ~ 1_{|S|=k}\]
Parameters
  • size (int) – size \(k\) of the \(\operatorname{k-DPP}\)

  • mode (string, default 'GS') –

    • projection=True:
      • 'GS' (default): Gram-Schmidt on the rows of \(\mathbf{K}\).

      • 'Schur': Use Schur complement to compute conditionals.

    • projection=False:
      • 'GS' (default): Gram-Schmidt on the rows of the eigenvectors of \(\mathbf{K}\) selected in Phase 1.

      • 'GS_bis': Slight modification of 'GS'

      • 'KuTa12': Algorithm 1 in [KT12]

      • 'vfx': the dpp-vfx rejection sampler in [DerezinskiCV19]

  • params (dict) –

    Dictionary containing the parameters for exact samplers with keys

    'random_state' (default None)

    • If mode='vfx'

      See k_dpp_vfx_sampler() for a full list of all parameters accepted by ‘vfx’ sampling. We report here the most impactful

      • 'rls_oversample_dppvfx' (default 4.0) Oversampling parameter used to construct dppvfx’s internal Nystrom approximation. This makes each rejection round slower and more memory intensive, but reduces variance and the number of rounds of rejections.

      • 'rls_oversample_bless' (default 4.0) Oversampling parameter used during bless’s internal Nystrom approximation. This makes the one-time pre-processing slower and more memory intensive, but reduces variance and the number of rounds of rejections

      Empirically, a small factor [2,10] seems to work for both parameters. It is suggested to start with a small number and increase if the algorithm fails to terminate.

Returns

A sample from the corresponding \(\operatorname{k-DPP}\).

In any case, the sample is appended to the list_of_samples attribute as a list.

Return type

list

Note

Each time you call this method, the sample is appended to the list_of_samples attribute as a list.

The list_of_samples attribute can be emptied using flush_samples()

Caution

The underlying kernel \(\mathbf{K}\), resp. \(\mathbf{L}\) must be real valued for now.

sample_mcmc(mode, **params)[source]

Run a MCMC with stationary distribution the corresponding FiniteDPP object.

Parameters
  • mode (string) –

    • 'AED' Add-Exchange-Delete

    • 'AD' Add-Delete

    • 'E' Exchange

    • 'zonotope' Zonotope sampling

  • params (dict) –

    Dictionary containing the parameters for MCMC samplers with keys

    'random_state' (default None)

    • If mode='AED','AD','E'

      • 's_init' (default None) Starting state of the Markov chain

      • 'nb_iter' (default 10) Number of iterations of the chain

      • 'T_max' (default None) Time horizon

      • 'size' (default None) Size of the initial sample for mode='AD'/'E'

        • \(\operatorname{rank}(\mathbf{K})=\operatorname{trace}(\mathbf{K})\) for projection \(\mathbf{K}\) (correlation) kernel and mode='E'

    • If mode='zonotope':

      • 'lin_obj' linear objective in main optimization problem (default np.random.randn(N))

      • 'x_0' initial point in zonotope (default A*u, u~U[0,1]^n)

      • 'nb_iter' (default 10) Number of iterations of the chain

      • 'T_max' (default None) Time horizon

Returns

The last sample of the trajectory of Markov chain.

In any case, the full trajectory of the Markov chain, made of params['nb_iter'] samples, is appended to the list_of_samples attribute as a list of lists.

Return type

list

Note

Each time you call this method, the full trajectory of the Markov chain, made of params['nb_iter'] samples, is appended to the list_of_samples attribute as a list of lists.

The list_of_samples attribute can be emptied using flush_samples()

sample_mcmc_k_dpp(size, mode='E', **params)[source]

Calls sample_mcmc() with mode='E' and params['size'] = size

Continuous DPPs

Definition

Point Process

Let \(\mathbb{X} = \mathbb{R}^d, \mathbb{C}^d \text{ or } \mathbb{S}^{d-1}\) be the ambient space, we endow it with the corresponding Borel \(\sigma\)-algebra \(\mathcal{B}(\mathbb{X})\) together with a reference measure \(\mu\).

For our purpose, we consider point processes as locally finite random subsets \(\mathcal{X} \subset \mathbb{X}\) i.e.

\[\forall C \subset \mathbb{X} \text{ compact}, \quad \#(\mathcal{X} \cap C) < \infty.\]

Hint

A point process is a random subset of points \(\mathcal{X} \triangleq\{X_1, \dots, X_N\} \subset \mathbb{X}\) with \(N\) being random.

See also

More formal definitions can be found in [MollerW04] Section 2 and [Joh06] Section 2 and bibliography therein.

To understand the interaction between the points of a point process, one focuses on the interaction of each cloud of \(k\) points (for all \(k\)). The corresponding \(k\)-correlation functions characterize the underlying point process.

Correlation functions

For \(k\geq 0\), the \(k\)-correlation function \(\rho_k\) is defined by:

\(\forall f : \mathbb{X}^k \to \mathbb{C}\) bounded measurable

\[\begin{split}\mathbb{E} \left[ \sum_{ \substack{ (X_1,\dots,X_k) \\ X_1 \neq \dots \neq X_k \in \mathcal{X}} } f(X_1,\dots,X_k) \right] = \int_{\mathbb{X}^k} f(x_1,\dots,x_k) \rho_k(x_1,\dots,x_k) \prod_{i=1}^k \mu(dx_i).\end{split}\]

Hint

The \(k\)-correlation function does not always exists, but but when they do, they have a meaningful interpretation.

\[\begin{split}" \rho_k(x_1,\dots,x_k) \mu(dx_{1}), \dots, \mu(dx_{N}) = \mathbb{P} \left[ \begin{array}{c} \text{there is 1 point in each}\\ B(x_1, d x_1), \dots, B(x_n, d x_n) \end{array} \right] ",\end{split}\]

where \(B(x, dx)\) denotes the ball centered at \(x\) with radius \(dx\).

A Determinant Point Process (DPP) is a point process on \((\mathbb{X}, \mathcal{B}(\mathbb{X}), \mu)\) parametrized by a kernel \(K\) associated to the reference measure \(\mu\). The \(k\)-correlation functions read

\[\forall k\geq 1, \quad \rho_k(x_1,\dots,x_k) = \det [K(x_i, x_j)]_{i,j=1}^k.\]
Existence

One can view \(K\) as an integral operator on \(L^2(\mu)\)

\[\forall x \in \mathbb{X}, Kf(x) = \int_{\mathbb{X}} K(x,y) f(y) \mu(dy).\]

To access spectral properties of the kernel, it is common practice to assume \(K\)

  1. Hilbert Schmidt

    \[\iint_{\mathbb{X}\times \mathbb{X}} |K(x,y)|^2 \mu(dx) \mu(dy) < \infty,\]
  2. Self-adjoint equiv. Hermitian

    \[K(x,y) = \overline{K(y,x)},\]
  3. Locally trace class

    \[\forall B\subset \mathbb{X} \text{ compact}, \quad \int_B K(x,x) \mu(dx) < \infty.\]

Hint

    1. implies \(K\) to be a continuous compact operator.

    1. with 1. allows to apply the spectral theorem, providing

      \[K(x,y) = \sum_{n=0}^{\infty} \lambda_n \phi_{n}(x)\phi_{n}(y), \quad \text{where } K\phi_{n} = \lambda_n \phi_{n}.\]
    1. makes sure there is no accumulation of points: \(|\mathcal{X}\cap B| = \int_B K(x,x) \mu(dx) \leq \infty\), see also Number of points

Warning

These are only sufficient conditions, there indeed exist DPPs with non symmetric kernels, see, e.g., Carries process.

Important

Under assumptions 1, 2, and 3

\[\operatorname{DPP}(K) \text{ exists} \Longleftrightarrow 0\leq \lambda_n \leq 1, \quad \forall n \in \mathbb{N}\]

See also

Projection DPPs

\(\operatorname{DPP}(K)\) is said to be a projection DPP with reference measure \(\mu\) when \(K:\mathbb{X}\times \mathbb{X}\to \mathbb{C}\) is a orthogonal projection kernel, that is

\[K(x,y)=\overline{K(y,x)} \quad\text{and}\quad \int_{\mathbb{X}} K(x, z) K(z, y) \mu(d z) = K(x, y)\]
Construction

A canonical way to construct DPPs generating configurations of at most \(N\) points is the following.

Consider \(N\) orthonormal functions \(\phi_{0},...,\phi_{N−1} \in L^2(\mu)\)

\[\int \phi_{k}(x)\phi_{l}(x)\mu(dx) = \delta_{kl},\]

and attach \([0,1]\)-valued coefficients \(\lambda_n\) such that

\[K(x, y) = \sum_{n=0}^{N-1} \lambda_n \phi_{n}(x)\phi_{n}(y).\]

The special case where \(\lambda_0=\cdots=\lambda_{N-1}=1\) corresponds to the construction of a projection DPP with \(N\) points.

See also

Properties

Generic DPPs as mixtures of projection DPPs

Projection DPPs are the building blocks of the model in the sense that generic DPPs are mixtures of projection DPPs.

Consider \(\mathcal{X} \sim \operatorname{DPP}(K)\) and write the spectral decomposition of the corresponding kernel as

\[K = \sum_{n=1}^{\infty} \lambda_n \phi(x) \overline{\phi(y)}.\]

Then, denote \(\mathcal{X}^B \sim \operatorname{DPP}(K^B)\) with

\[K = \sum_{n=1}^{\infty} B_n \phi(x) \overline{\phi(y)}, \quad \text{where} \quad B_n \sim \mathcal{B}er(\lambda_n) \text{ are independent},\]

\(\mathcal{X}^B\) is obtained by first sampling \(B_1, \dots, B_N\) independently and then sampling conditionally from \(\operatorname{DPP}(K^B)\), the DPP with orthogonal projection kernel \(K^B\).

Finally, we have \(\mathcal{X} \overset{d}{=} \mathcal{X}^B\).

Linear statistics
Expectation
\[\mathbb{E}\left[ \sum_{X \in \mathcal{X}} f(X) \right] = \int f(x) K(x,x) \mu(dx) = \operatorname{trace}(Kf) = \operatorname{trace}(fK).\]
Variance
\[\begin{split}\operatorname{\mathbb{V}ar}\left[ \sum_{X \in \mathcal{X}} f(X) \right] &= \mathbb{E}\left[ \sum_{X \neq Y \in \mathcal{X}} f(X) f(Y) + \sum_{X \in \mathcal{X}} f(X)^2 \right] - \mathbb{E}\left[ \sum_{X \in \mathcal{X}} f(X) \right]^2\\ &= \iint f(x)f(y) [K(x,x)K(y,y)-K(x,y)K(y,x)] \mu(dx) \mu(dy)\\ &\quad + \int f(x)^2 K(x,x) \mu(dx) - \left[\int f(x) K(x,x) \mu(dx)\right]^2 \\ &= \int f(x)^2 K(x,x) \mu(dx) - \iint f(x)f(y) K(x,y)K(y,x) \mu(dx) \mu(dy)\\ &= \operatorname{trace}(f^2K) - \operatorname{trace}(fKfK).\end{split}\]
  1. Hermitian kernel i.e. \(K(x,y)=\overline{K(y,x)}\)

    \[\operatorname{\mathbb{V}ar}\left[ \sum_{X \in \mathcal{X}} f(X) \right] = \int f(x)^2 K(x,x) \mu(dx) - \iint f(x)f(y) |K(x,y)|^2 \mu(dx) \mu(dy).\]
  2. Orthogonal projection case i.e. \(K^2 = K = K^*\)

    Using \(K(x,x) = \int K(x,y) K(y,x) \mu(dy) = \int |K(x,y)|^2 \mu(dy)\),

    \[\operatorname{\mathbb{V}ar}\left[ \sum_{X \in \mathcal{X}} f(X) \right] = \frac12 \iint [f(x) - f(y)]^2 |K(x,y)|^2 \mu(dy) \mu(dx).\]
Number of points

For projection DPPs, i.e., when \(K\) is the kernel associated to an orthogonal projector, one can show that \(|\mathcal{X}|=\operatorname{rank}(K)=\operatorname{Trace}(K)\) almost surely (see, e.g., [HKPVirag06] Lemma 17).

In the general case, based on the fact that generic DPPs are mixtures of projection DPPs, we have

\[|\mathcal{X}| = \sum_{i=1}^{\infty} \operatorname{\mathcal{B}er}(\lambda_i).\]

Note

  • For any Borel set \(B\), instantiating \(f=1_{B}\) yields nice expressions for the expectation and variance of the number of points falling in \(B\).

See also

Number of points in the finite case

Thinning

Important

The class of DPPs is closed under independent thinning.

Let \(\lambda > 1\). The configuration of points \(\mathcal{X}^{\lambda}\) obtained after subsampling the points of a configuration \(\mathcal{X}\sim \operatorname{DPP}(K)\) with i.i.d. \(\operatorname{\mathcal{B}er}\left(\frac{1}{\lambda}\right)\) is still a DPP with kernel \(\frac{1}{\lambda} K\). To see this, let’s compute the correlation functions of the thinned process

\[\begin{split}\mathbb{E}\left[ \sum_{\substack{(x_1,\dots,x_k) \\ x_i \neq x_j \in \mathcal{X}^{\lambda}} } f(x_1,\dots,x_k) \right] &= \mathbb{E}\left[ \mathbb{E}\left[ \sum_{\substack{(x_1,\dots,x_k) \\ x_i \neq x_j \in \mathcal{X} } } f(x_1,\dots,x_k) \prod_{i=1}^k 1_{\{x_i \in \mathcal{X}^{\lambda} \}} \Bigg| \mathcal{X}\right] \right]\\ &= \mathbb{E}\left[ \sum_{\substack{(x_1,\dots,x_k) \\ x_i \neq x_j \in \mathcal{X} } } f(x_1,\dots,x_k) \mathbb{E}\left[ \prod_{i=1}^k B_i \Bigg| \mathcal{X} \right] \right]\\ &= \mathbb{E}\left[ \sum_{\substack{(x_1,\dots,x_k) \\ x_i \neq x_j \in \mathcal{X} } } f(x_1,\dots,x_k) \frac{1}{\lambda^k} \right]\\ &= \int f(x_1,\dots,x_k) \det \left[ \frac{1}{\lambda} K(x_i,x_j) \right]_{1\leq i,j\leq k} \mu^{\otimes k}(dx).\end{split}\]

Sampling

In contrast to the finite case where the ML community has put efforts in improving the efficiency and tractability of the sampling routine, much less has been done in the continuous setting.

Exact sampling

In this section, we describe the main techniques for sampling exactly continuous DPPs.

As for finite DPPs the most prominent one relies on the fact that generic DPPs are mixtures of projection DPPs.

Projection DPPs: the chain rule

Let’s focus on sampling from projection \(\operatorname{DPP}(K)\) with a real-valued orthogonal projection kernel \(K:\mathbb{X}\times \mathbb{X}\to \mathbb{R}\) and reference measure \(\mu\), that is

\[K(x,y)=K(y,x) \quad\text{and}\quad \int_{\mathbb{X}} K(x, z) K(z, y) \mu(d z) = K(x, y)\]

In this setting, recall that the number of points is \(\mu\)-almost surely equal to \(r=\operatorname{rank}(K)\).

To generate a valid sample \(X=\{x_{1}, \dots, x_{r}\} \sim \operatorname{DPP}(K)\), [HKPVirag06] Proposition 19 showed that it is sufficient to apply the chain rule to sample \((x_1, \dots, x_r)\) with joint distribution

()\[\mathbb{P}[(x_1, \dots, x_r)] = \frac{1}{r!} \det [K(x_p, x_q)]_{p,q=1}^r \mu^{\otimes r}(d x_{1:r})\]

and forget the order the points were selected.

The original projection DPP sampler of [HKPVirag06] Algorithm 18, was given in an abstract form, which can be implemented using the following strategy. Write the determinant in (27) as a telescopic product of ratios of determinants and use Schur complements to get

()\[\begin{split}\mathbb{P}[(x_1, \dots, x_r)] &= \dfrac{K(x_1,x_1)}{r} \mu(d x_{1}) \prod_{i=2}^{r} \dfrac{1}{r-(i-1)} \frac{\det \mathbf{K}_{i}} {\det \mathbf{K}_{i-1}} \mu(d x_{i})\\ &= \dfrac{K(x_1,x_1)}{r} \mu(d x_{1}) \prod_{i=2}^{r} \dfrac{ K(x_i, x_i) - \mathbf{K}_{i-1}(x_i)^{\top} \mathbf{K}_{i-1}^{-1} \mathbf{K}_{i-1}(x_i) } {r-(i-1)} \mu(d x_{i}),\end{split}\]

where \(\mathbf{K}_{i-1} = [K(x_p,x_q)]_{p,q=1}^{i-1}\) and \(\mathbf{K}_{i-1}(x) = (K(x,x_1), \dots, K(x,x_{i-1}))^{\top}\).

Important

  1. The expression (27) indeed defines a probability distribution, with normalization constant \(r!\). In particular this distribution is exchangeable, i.e., invariant by permutation of the coordinates.

  2. The successive ratios that appear in (28) are the normalized conditional densities (w.r.t. \(\mu\)) that drive the chain rule. The associated normalizing constants \(r-(i-1)\) are independent of the previous points.

  3. Sampling projection DPPs does not require the eigendecomposition of the kernel!

Hint

MLers will recognize (28) as the incremental posterior variance of a noise-free Gaussian Process (GP) model with kernel \(K\), see [RW06] Equation 2.26.

Caution

The connexion between the chain rule (28) and Gaussian Processes is valid in the case where the GP kernel is an orthogonal projection kernel, see also Caution.

Geometrical interpretation

When the eigendecomposition of the kernel is available, the chain rule can be interpreted and implemented from a geometrical perspective, see, e.g., [LMollerR12] Algorithm 1.

Writing the Gram formulation of the kernel as

\[K(x,y) = \sum_{i=1}^{r} \phi_i(x) \phi_i(y) = \Phi(x)^{\top} \Phi(y),\]

where \(\Phi(x) \triangleq (\phi_{1}(x), \dots, \phi_{r}(x))\) denotes the feature vector associated to \(x\in \mathbb{X}\).

The joint distribution (27) reads

()\[\begin{split}\mathbb{P}[(x_1, \dots, x_r)] &= \frac{1}{r!} \det [\Phi(x_p)^{\top} \Phi(x_q))]_{p,q=1}^r \mu^{\otimes r}(d x_{1:r})\\ &= \frac{1}{r!} \operatorname{Volume}^2 \{\Phi(x_1), \dots, \Phi(x_r)\} \mu^{\otimes r}(d x_{1:r}),\end{split}\]

Hint

The joint distribution (29) characterizes the fact that projection \(\operatorname{DPP}(K)\) favor sets of \(r=\operatorname{rank}(\mathbf{K})\) points \(\left(x_{1}, \dots, x_{r} \right)\) whose feature vectors \(\Phi(x_1), \dots \Phi(x_r)\) span a large volume. This is another way of understanding the repulsive/diversity feature of DPPs.

Then, the previous telescopic product of ratios of determinants in (28) can be understood as the base \(\times\) height formula applied to compute \(\operatorname{Volume}^2 \{\Phi(x_1), \dots, \Phi(x_r)\}\), so that

()\[\begin{split}\mathbb{P}[(x_1, \dots, x_r)] &= \dfrac{\left\langle \Phi(x_1)^{\top} \Phi(x_1) \right\rangle}{r} \mu(d x_{1}) \prod_{i=2}^{r} \dfrac{1}{r-(i-1)} \frac{\det \mathbf{K}_{i}} {\det \mathbf{K}_{i-1}} \mu(d x_{i})\\ &= \dfrac{\left\| \Phi(x_1) \right\|^2}{r} \mu(d x_{1}) \prod_{i=2}^{r} \dfrac{ \operatorname{distance}^2 \left(\Phi(x_i), \operatorname{Span} \{ \Phi(x_1), \dots, \Phi(x_{i-1}) \}\right) } {r-(i-1)} \mu(d x_{i}),\end{split}\]

where \(\mathbf{K}_{i-1} = \left[\left\langle \Phi(x_p)^{\top} \Phi(x_q) \right\rangle\right]_{p,q=1}^{i-1}\).

Hint

The overall procedure is akin to a sequential Gram-Schmidt orthogonalization of \(\Phi(x_{1}), \dots, \Phi(x_{N})\).

Attention

In contrast to the finite case where the conditionals are simply probability vectors, the chain rule formulations (28) and (30) require sampling from a continuous distribution. This can be done using a rejection sampling mechanism but finding a good proposal density with tight rejection bounds is a challenging problem [LMollerR12] Section 2.4. But it is achievable in some specific cases, see, e.g., Multivariate Jacobi Ensemble.

See also

Generic DPPs: the spectral method

The procedure stems from the fact that generic DPPs are mixtures of projection DPPs, suggesting the following two steps algorithm. Given the spectral decomposition of the kernel

()\[ K(x,y)=\sum_{i=1}^{\infty} \lambda_i \phi_i(x) \overline{\phi_i(y)},\]

Step 1. Draw \(B_i\sim\operatorname{\mathcal{B}er}(\lambda_i)\) independently and note \(\{i_1,\dots,i_{N}\} = \{i~;~B_i=1\}\),

Step 2. Sample from the projection DPP with kernel \(\tilde{K}(x,y) = \sum_{n=1}^{N}\phi_{i_n}(x) \overline{\phi_{i_n}(y)}\).

Important

  • Step 1. selects a component of the mixture, see [LMollerR12] Section 2.4.1

  • Step 2. generates a sample from the projection \(\operatorname{DPP}(\tilde{K})\), cf. previous section.

Attention

Contrary to projection DPPs, the general case requires the eigendecomposition of the kernel (31).

See also

Spectral method for sampling finite DPPs.

Perfect sampling

[DFL13] uses Coupling From The Past (CFTP).

Approximate sampling

See also

  • Approximation of \(K(x,y)=K(x-y)\) by Fourier series [LMollerR12] Section 4

\(\beta\)-Ensembles

Definition

Let \(\beta>0\), the joint distribution of the \(\beta\)-Ensemble associated to the reference measure \(\mu\) writes

()\[(x_1,\dots,x_N) \sim \frac{1}{Z_{N,\beta}} \left|\Delta(x_1,\dots,x_N)\right|^{\beta} \prod_{i= 1}^N \mu(d x_i).\]

Hint

  • \(|\Delta(x_1,\dots,x_N)| = \prod_{i<j} |x_i - x_j|\) is the absolute value of the determinant of the Vandermonde matrix,

    ()\[\begin{split}\Delta(x_1,\dots,x_N) = \det \begin{bmatrix} 1 & \dots & 1 \\ x_1 & \dots & x_N \\ \vdots & & \vdots \\ x_1^{N-1} & &x_N^{N-1} \end{bmatrix},\end{split}\]

    encoding the repulsive interaction. The closer the points are the lower the density.

  • \(\beta\) is the inverse temperature parameter quantifying the strength of the repulsion between the points.

Important

For Gaussian, Gamma and Beta reference measures, the \(\beta=1,2\) and \(4\) cases received a very special attention in the random matrix literature, e.g. [DE02].

The associated ensembles actually correspond to the eigenvalues of random matrices whose distribution is invariant to the action of the orthogonal (\(\beta=1\)), unitary (\(\beta=2\)) and symplectic (\(\beta=4\)) group respectively.

\(\mu\)

\(\mathcal{N}\)

\(\Gamma\)

\(\operatorname{Beta}\)

Ensemble name

Hermite

Laguerre

Jacobi

support

\(\mathbb{R}\)

\(\mathbb{R}^+\)

\([0,1]\)

Note

The study of the distribution of the eigenvalues of random orthogonal, unitary and symplectic matrices lying on the unit circle is also very thorough [KN04].

Orthogonal Polynomial Ensembles

The case \(\beta=2\) corresponds a specific type of projection DPPs also called Orthogonal Polynomial Ensembles (OPEs) [Konig04] with associated kernel

\[K_N(x, y) = \sum_{n=0}^{N-1} P_n(x) P_n(y),\]

where \((P_n)\) are the orthonormal polynomials w.r.t. \(\mu\) i.e. \(\operatorname{deg}(P_n)=n\) and \(\langle P_k, P_l \rangle_{L^2(\mu)}=\delta_{kl}\).

Note

OPEs (with \(N\) points) correspond to projection DPPs onto \(\operatorname{Span}\{P_n\}_{n=0}^{N-1} = \mathbb{R}^{N-1}[X]\)

Hint

First, linear combinations of the rows of \(\Delta(x_1,\dots,x_N)\) allow to make appear the orthonormal polynomials \((P_n)\) so that

\[\begin{split}|\Delta(x_1,\dots,x_N)| \propto \begin{vmatrix} P_0(x_1) & \dots & P_0(x_N) \\ P_1(x_1) & \dots & P_1(x_N) \\ \vdots & & \vdots \\ P_{N-1}(x_1) & & P_{N-1}(x_N) \end{vmatrix}.\end{split}\]

Then,

\[|\Delta|^2 = | \Delta^{\top} \Delta | \propto \det \left[ K_N(x_i, x_j)\right]_{i,j=1}^N.\]

Finally, the joint distribution of \((x_1, \dots, x_N)\) reads

()\[(x_1,\dots,x_N) \sim \frac{1}{N!} \det \left[ K_N(x_i, x_j)\right]_{i,j=1}^N \prod_{i= 1}^N \mu(d x_i).\]

See also

[Konig04], [Joh06]

Sampling
Full matrix models

For specific reference measures the \(\beta = 1, 2, 4\) cases are very singular in the sense that the corresponding ensembles coincide with the eigenvalues of random matrices.

This is a highway for sampling exactly such ensembles in \(\mathcal{O}(N^3)\)!

Hermite Ensemble

Take for reference measure \(\mu=\mathcal{N}(0, 2)\), the pdf of the corresponding \(\beta\)-Ensemble reads

\[ \begin{align}\begin{aligned}(x_1,\dots,x_N) \sim \left|\Delta(x_1,\dots,x_N)\right|^{\beta} \prod_{i= 1}^N e^{- \frac{1}{2}\frac{x_i^2}{2}} % \indic_{\bbR}(x_i) \ d x_i,\\where from the definition in :eq:`eq:abs_vandermonde_det` we have :math:`\left|\Delta(x_1,\dots,x_N)\right| = \prod_{i<j} |x_i - x_j|`.\end{aligned}\end{align} \]

Hint

The Hermite ensemble (whose name comes from the fact that Hermite polynomials are orthogonal w.r.t the Gaussian distribution) refers to the eigenvalue distribution of random matrices formed by i.i.d. Gaussian vectors.

  • \(\beta=1\)

\[X \sim \mathcal{N}_{N, N}(0,1) \qquad A = \frac{X+X^{\top}}{\sqrt{2}}.\]
  • \(\beta=2\)

\[X \sim \mathcal{N}_{N, N}(0,1) + i~ \mathcal{N}_{N, N}(0,1) \qquad A = \frac{X+X^{\dagger}}{\sqrt{2}}.\]
  • \(\beta=4\)

\[\begin{split} \begin{cases} X \sim \mathcal{N}_{N, M}(0,1) + i~ \mathcal{N}_{N, M}(0,1)\\ Y \sim \mathcal{N}_{N, M}(0,1) + i~ \mathcal{N}_{N, M}(0,1) \end{cases} \qquad A = \begin{bmatrix} X & Y \\ -Y^* & X^* \end{bmatrix} \quad A = \frac{X+X^{\dagger}}{\sqrt{2}}.\end{split}\]

Normalization \(\sqrt{\beta N}\) to concentrate as the semi-circle law.

\[\frac{\sqrt{4-x^2}}{2\pi} 1_{[-2,2]} dx.\]
from dppy.beta_ensembles import HermiteEnsemble


hermite = HermiteEnsemble(beta=4)  # beta in {0,1,2,4}, default beta=2
hermite.sample_full_model(size_N=500)
# hermite.plot(normalization=True)
hermite.hist(normalization=True)

# To compare with the sampling speed of the tridiagonal model simply use
# hermite.sample_banded_model(size_N=500)

(Source code, png, hires.png, pdf)

_images/ex_plot_hermite_full_matrix_model.png

Full matrix model for the Hermite ensemble

See also

Laguerre Ensemble

Take for reference measure \(\mu=\Gamma\left(\frac{\beta}{2}(M-N+1), 2\right)=\chi_{\beta(M-N+1)}^2\), the pdf of the corresponding \(\beta\)-Ensemble reads

\[ \begin{align}\begin{aligned}(x_1,\dots,x_N) \sim \left|\Delta(x_1,\dots,x_N)\right|^{\beta} % \prod_{i= 1}^N x_i^{\frac{\beta}{2}(M-N+1)-1} e^{- \frac12 x_i} % \indic_{\bbR}(x_i) \ d x_i,\\where from the definition in :eq:`eq:abs_vandermonde_det` we have :math:`\left|\Delta(x_1,\dots,x_N)\right| = \prod_{i<j} |x_i - x_j|`.\end{aligned}\end{align} \]

Hint

The Laguerre ensemble (whose name comes from the fact that Laguerre polynomials are orthogonal w.r.t the Gamma distribution) refers to the eigenvalue distribution of empirical covariance matrices of i.i.d. Gaussian vectors.

  • \(\beta=1\)

\[X \sim \mathcal{N}_{N, M}(0,1) \qquad A = XX^{\top}.\]
  • \(\beta=2\)

\[X \sim \mathcal{N}_{N, M}(0,1) + i~ \mathcal{N}_{N, M}(0,1) \qquad A = XX^{\dagger}.\]
  • \(\beta=4\)

\[\begin{split} \begin{cases} X \sim \mathcal{N}_{N, M}(0,1) + i~ \mathcal{N}_{N, M}(0,1)\\ Y \sim \mathcal{N}_{N, M}(0,1) + i~ \mathcal{N}_{N, M}(0,1) \end{cases} \qquad A = \begin{bmatrix} X & Y \\ -Y^* & X^* \end{bmatrix} \quad A = A A^{\dagger}.\end{split}\]

Normalization \(\beta M\) to concentrate as Marcenko-Pastur law.

\[\frac{1}{2\pi} \frac{\sqrt{(\lambda_+-x)(x-\lambda_-)}}{cx} 1_{[\lambda_-,\lambda_+]} dx,\]

where

\[c = \frac{M}{N} \quad \text{and} \quad \lambda_\pm = (1\pm\sqrt{c})^2.\]
from dppy.beta_ensembles import LaguerreEnsemble


laguerre = LaguerreEnsemble(beta=1)  # beta in {0,1,2,4}, default beta=2
laguerre.sample_full_model(size_N=500, size_M=800)  # M >= N
# laguerre.plot(normalization=True)
laguerre.hist(normalization=True)

# To compare with the sampling speed of the tridiagonal model simply use
# laguerre.sample_banded_model(size_N=500, size_M=800)

(Source code, png, hires.png, pdf)

_images/ex_plot_laguerre_full_matrix_model.png

Full matrix model for the Laguerre ensemble

See also

Jacobi Ensemble

Take for reference measure \(\mu=\operatorname{Beta}\left(\frac{\beta}{2}(M_1-N+1), \frac{\beta}{2}(M_2-N+1)\right)\), the pdf of the corresponding \(\beta\)-Ensemble reads

\[ \begin{align}\begin{aligned}(x_1,\dots,x_N) \sim \left|\Delta(x_1,\dots,x_N)\right|^{\beta} % \prod_{i= 1}^N x_i^{\frac{\beta}{2}(M_1-N+1)-1} (1-x_i)^{\frac{\beta}{2}(M_2-N+1)-1} % \indic_{\bbR}(x_i) \ d x_i,\\where from the definition in :eq:`eq:abs_vandermonde_det` we have :math:`\left|\Delta(x_1,\dots,x_N)\right| = \prod_{i<j} |x_i - x_j|`.\end{aligned}\end{align} \]

Hint

The Jacobi ensemble (whose name comes from the fact that Jacobi polynomials are orthogonal w.r.t the Beta distribution) is associated with the multivariate analysis of variance (MANOVA) model.

  • \(\beta=1\)

\[\begin{split}\begin{cases} X \sim \mathcal{N}_{N, M_1}(0,1)\\ Y \sim \mathcal{N}_{N, M_2}(0,1)\\ \end{cases} \qquad A = XX^{\top}\left(XX^{\top} + YY^{\top}\right)^{-1}.\end{split}\]
  • \(\beta=2\)

\[\begin{split}\begin{cases} X \sim \mathcal{N}_{N, M_1}(0,1) + i~ \mathcal{N}_{N, M_1}(0,1)\\ Y \sim \mathcal{N}_{N, M_2}(0,1) + i~ \mathcal{N}_{N, M_2}(0,1)\\ \end{cases} \qquad A = XX^{\dagger}\left(XX^{\dagger} + YY^{\dagger}\right)^{-1}.\end{split}\]
  • \(\beta=4\)

\[\begin{split} \begin{cases} X_1 \sim \mathcal{N}_{N, M_1}(0,1) + i~ \mathcal{N}_{N, M_1}(0,1)\\ X_2 \sim \mathcal{N}_{N, M_1}(0,1) + i~ \mathcal{N}_{N, M_1}(0,1)\\ Y_1 \sim \mathcal{N}_{N, M_2}(0,1) + i~ \mathcal{N}_{N, M_2}(0,1)\\ Y_2 \sim \mathcal{N}_{N, M_2}(0,1) + i~ \mathcal{N}_{N, M_2}(0,1) \end{cases} \qquad \begin{cases} X = \begin{bmatrix} X_1 & X_2 \\ -X_2^* & X_1^* \end{bmatrix}\\ Y = \begin{bmatrix} Y_1 & Y_2 \\ -Y_2^* & Y_1^* \end{bmatrix} \end{cases} \qquad A = XX^{\dagger}\left(XX^{\dagger} + YY^{\dagger}\right)^{-1}.\end{split}\]

Concentrates as Wachter law

\[\frac{(a+b)\sqrt{(\sigma_+-x)(x-\sigma_-)}}{2\pi x(1-x)}dx,\]

where

\[a = \frac{M_1}{N}, b = \frac{M_2}{N} \quad\text{and}\quad \sigma_{\pm} = \left(\frac{\sqrt{a(a+b-1)} \pm \sqrt{b}}{a+b}\right)^2,\]

itself tending to the arcsine law in the limit.

from dppy.beta_ensembles import JacobiEnsemble


jacobi = JacobiEnsemble(beta=2)  # beta must be in {0,1,2,4}, default beta=2
jacobi.sample_full_model(size_N=400, size_M1=500, size_M2=600)  # M_1, M_2 >= N
# jacobi.plot(normalization=True)
jacobi.hist(normalization=True)

# To compare with the sampling speed of the triadiagonal model simply use
# jacobi.sample_banded_model(size_N=400, size_M1=500, size_M2=600)

(Source code, png, hires.png, pdf)

_images/ex_plot_jacobi_full_matrix_model.png

Full matrix model for the Jacobi ensemble

Circular Ensemble
\[ \begin{align}\begin{aligned} \left|\Delta(e^{i \theta_1 },\dots, e^{i \theta_N})\right|^{\beta} \prod_{j = 1}^N \frac{1}{2\pi} \mathbf{1}_{[0,2\pi]} (\theta_j) d\theta_j,\\where from the definition in :eq:`eq:abs_vandermonde_det` we have :math:`\left|\Delta(x_1,\dots,x_N)\right| = \prod_{i<j} |x_i - x_j|`.\end{aligned}\end{align} \]

Hint

Eigenvalues of orthogonal (resp. unitary and self-dual unitary) matrices drawn uniformly i.e. Haar measure on the respective groups. The eigenvalues lie on the unit circle i.e. \(\lambda_n = e^{i \theta_n}\). The distribution of the angles \(\theta_n\) converges to the uniform measure on \([0, 2\pi[\) as \(N\) grows.

  • \(\beta=1\)

    Uniform measure i.e. Haar measure on orthogonal matrices \(\mathbb{O}_N\): \(U^{\top}U = I_N\)

    1. Via QR algorithm, see [Mez06] Section 5

    import numpy as np
    from numpy.random import randn
    import scipy.linalg as la
    
    A = randn(N, N)
    Q, R = la.qr(A)
    d = np.diagonal(R)
    U = np.multiply(Q, d/np.abs(d), Q)
    la.eigvals(U)
    
    1. The Hermite way

    \[\begin{split} X \sim \mathcal{N}_{N, N}(0,1)\\ A = X+X^{\top} = U^{\top}\Lambda U\\ eigvals(U).\end{split}\]
  • \(\beta=2\)

    Uniform measure i.e. Haar measure on unitary matrices \(\mathbb{U}_N\): \(U^{\dagger}U = I_N\)

    1. Via QR algorithm, see [Mez06] Section 5

    import numpy as np
    from numpy.random import randn
    import scipy.linalg as la
    
    A = randn(N, N) + 1j*randn(N, N)
    Q, R = la.qr(A)
    d = np.diagonal(R)
    U = np.multiply(Q, d / np.abs(d), Q)
    la.eigvals(U)
    
    from dppy.beta_ensembles import CircularEnsemble
    
    circular = CircularEnsemble(beta=2)  # beta in {0,1,2,4}, default beta=2
    
    # 1. Plot the eigenvalues, they lie on the unit circle
    circular.sample_full_model(size_N=30, haar_mode='QR')
    circular.plot()
    
    # 2. Histogram of the angle of more points, should look uniform on [0,2pi]
    circular.flush_samples()  # Flush previous sample
    
    circular.sample_full_model(size_N=1000, haar_mode='QR')
    circular.hist()
    

    (Source code)

    _images/ex_plot_circular_full_matrix_model_qr_00.png

    (png, hires.png, pdf) Full matrix model for the Circular ensemble from QR on random Gaussian matrix

    _images/ex_plot_circular_full_matrix_model_qr_01.png

    (png, hires.png, pdf) Full matrix model for the Circular ensemble from QR on random Gaussian matrix

    1. The Hermite way

    \[\begin{split} X \sim \mathcal{N}_{N, N}(0,1) + i~ \mathcal{N}_{N, N}(0,1)\\ A = X+X^{\dagger} = U^{\dagger}\Lambda U\\ eigvals(U).\end{split}\]
    from dppy.beta_ensembles import CircularEnsemble
    
    circular = CircularEnsemble(beta=2)  # beta in {0,1,2,4}, default beta=2
    
    # 1. Plot the eigenvalues, they lie on the unit circle
    circular.sample_full_model(size_N=30, haar_mode='Hermite')
    circular.plot()
    
    # 2. Histogram of the angle of more points, should look uniform on [0,2pi]
    circular.flush_samples()  # Flush previous sample
    
    circular.sample_full_model(size_N=1000, haar_mode='Hermite')
    circular.hist()
    

    (Source code)

    _images/ex_plot_circular_full_matrix_model_hermite_00.png

    (png, hires.png, pdf) Full matrix model for the Circular ensemble from Hermite matrix

    _images/ex_plot_circular_full_matrix_model_hermite_01.png

    (png, hires.png, pdf) Full matrix model for the Circular ensemble from Hermite matrix

  • \(\beta=4\)

    Uniform measure i.e. Haar measure on self-dual unitary matrices \(\mathbb{U}\operatorname{Sp}_{2N}\): \(U^{\dagger}U = I_{2N}\)

    \[\begin{split} \begin{cases} X \sim \mathcal{N}_{N, M}(0,1) + i~ \mathcal{N}_{N, M}(0,1)\\ Y \sim \mathcal{N}_{N, M}(0,1) + i~ \mathcal{N}_{N, M}(0,1) \end{cases}\\ A = \begin{bmatrix} X & Y \\ -Y^* & X^* \end{bmatrix} \quad A = X+X^{\dagger} = U^{\dagger} \Lambda U\\ eigvals(U).\end{split}\]

See also

Ginibre Ensemble
\[\left|\Delta(z_1,\dots,z_N)\right|^{2} \prod_{i = 1}^N e^{ - \frac{1}{2}|z_i|^2 } d z_i,\]

where from the definition in (33) we have \(\left|\Delta(x_1,\dots,x_N)\right| = \prod_{i<j} |x_i - x_j|\).

\[A \sim \frac{1}{\sqrt{2}} \left( \mathcal{N}_{N,N}(0,1) + i~ \mathcal{N}_{N, N}(0,1) \right).\]

Nomalization \(\sqrt{N}\) to concentrate in the unit circle.

from dppy.beta_ensembles import GinibreEnsemble


ginibre = GinibreEnsemble()  # beta must be 2 (default)

ginibre.sample_full_model(size_N=40)
ginibre.plot(normalization=True)

ginibre.sample_full_model(size_N=1000)
ginibre.hist(normalization=True)

(Source code)

_images/ex_plot_ginibre_full_matrix_model_00.png

(png, hires.png, pdf) Full matrix model for the Ginibre ensemble

_images/ex_plot_ginibre_full_matrix_model_01.png

(png, hires.png, pdf) Full matrix model for the Ginibre ensemble

See also

Banded matrix models

Computing the eigenvalues of a full \(N\times N\) random matrix is \(\mathcal{O}(N^3)\), and can thus become prohibitive for large \(N\). A way to circumvent the problem is to adopt the equivalent banded models i.e. diagonalize banded matrices.

The first tridiagonal models for the Hermite Ensemble and Laguerre Ensemble were revealed by [DE02], who left the Jacobi Ensemble as an open question, addressed by [KN04]. Such tridiagonal formulations made sampling possible at cost \(\mathcal{O}(N^2)\) but also unlocked sampling for generic \(\beta>0\)!

Note that [KN04] also derived a quindiagonal model for the Circular Ensemble.

Hermite Ensemble

Take for reference measure \(\mu=\mathcal{N}(\mu, \sigma)\)

\[(x_1,\dots,x_N) \sim \left|\Delta(x_1,\dots,x_N)\right|^{\beta} \prod_{i= 1}^N e^{- \frac{(x_i-\mu)^2}{2\sigma^2}} % \indic_{\bbR}(x_i) \ d x_i.\]

Note

Recall that from the definition in (33)

\[\left|\Delta(x_1,\dots,x_N)\right| = \prod_{i<j} |x_i - x_j|.\]

The equivalent tridiagonal model reads

\[\begin{split}\begin{bmatrix} \alpha_1 & \sqrt{\beta_2}& 0 & 0 & 0 \\ \sqrt{\beta_2} & \alpha_2 & \sqrt{\beta_3}& 0 & 0 \\ 0 & \ddots & \ddots & \ddots & 0 \\ 0 & 0 & \sqrt{\beta_{N-1}} & \alpha_{N- 1} & \sqrt{\beta_{N}} \\ 0 & 0 & 0 & \sqrt{\beta_N} & \alpha_{N} \end{bmatrix},\end{split}\]

with

\[\alpha_{i} \sim \mathcal{N}(\mu, \sigma^2) \quad \text{and} \quad \beta_{i+1} \sim \Gamma\left(\frac{\beta}{2}(N - i), \sigma^2\right).\]

To recover the full matrix model for Hermite Ensemble, recall that \(\Gamma(\frac{k}{2}, 2)\equiv \chi_k^2\) and take

\[\mu = 0 \quad \text{and} \quad \sigma^2 = 2.\]

That is to say,

\[\alpha_{i} \sim \mathcal{N}(0, 2) \quad \text{and} \quad \beta_{i+1} \sim \chi_{\beta(N - i)}^2.\]
from dppy.beta_ensembles import HermiteEnsemble


hermite = HermiteEnsemble(beta=5.43)  # beta can be >=0, default beta=2
# Reference measure is N(mu, sigma^2)
hermite.sample_banded_model(loc=0.0, scale=1.0, size_N=500)
# hermite.plot(normalization=True)
hermite.hist(normalization=True)

(Source code, png, hires.png, pdf)

_images/ex_plot_hermite_banded_matrix_model.png

Tridiagonal matrix model for the Hermite ensemble

See also

Laguerre Ensemble

Take for reference measure \(\mu=\Gamma(k,\theta)\)

\[(x_1,\dots,x_N) \sim \left|\Delta(x_1,\dots,x_N)\right|^{\beta} % \prod_{i= 1}^N x_i^{k-1} e^{- \frac{x_i}{\theta}} % \indic_{\bbR}(x_i) \ d x_i.\]

Note

Recall that from the definition in (33)

\[\left|\Delta(x_1,\dots,x_N)\right| = \prod_{i<j} |x_i - x_j|.\]

The equivalent tridiagonal model reads

\[\begin{split}\begin{bmatrix} \alpha_1 & \sqrt{\beta_2}& 0 & 0 & 0 \\ \sqrt{\beta_2} & \alpha_2 & \sqrt{\beta_3}& 0 & 0 \\ 0 & \ddots & \ddots & \ddots & 0 \\ 0 & 0 & \sqrt{\beta_{N-1}} & \alpha_{N- 1} & \sqrt{\beta_{N}} \\ 0 & 0 & 0 & \sqrt{\beta_N} & \alpha_{N} \end{bmatrix} = \begin{bmatrix} \sqrt{\xi_1} & & & \\ \sqrt{\xi_2} & \sqrt{\xi_3} & & \\ & \ddots & \ddots & \\ & & \sqrt{\xi_{2N-2}} & \sqrt{\xi_{2N-1}} \end{bmatrix} \begin{bmatrix} \sqrt{\xi_1} & \sqrt{\xi_2} & & \\ & \sqrt{\xi_3} & \ddots & \\ & & \ddots & \sqrt{\xi_{2N-2}} \\ & & & \sqrt{\xi_{2N-1}} \end{bmatrix},\end{split}\]

with

\[\xi_{2i-1} \sim \Gamma\left(\frac{\beta}{2}(N - i) + k, \theta \right) \quad \text{and} \quad \xi_{2i} \sim \Gamma\left(\frac{\beta}{2}(N - i), \theta \right).\]

To recover the full matrix model for Laguerre Ensemble, recall that \(\Gamma(\frac{k}{2}, 2)\equiv \chi_k^2\) and take

\[k = \frac{\beta}{2}(M-N+1) \quad \text{and} \quad \theta = 2.\]

That is to say,

\[\xi_{2i-1} \sim \chi_{\beta(M - i + 1)}^2 \quad \text{and} \quad \xi_{2i} \sim \chi_{\beta(N - i)}^2.\]
from dppy.beta_ensembles import LaguerreEnsemble


laguerre = LaguerreEnsemble(beta=2.98)  # beta can be >=0, default beta=2
# Reference measure is Gamma(k, theta)
laguerre.sample_banded_model(shape=600, scale=2.0, size_N=400)
# laguerre.plot(normalization=True)
laguerre.hist(normalization=True)

(Source code, png, hires.png, pdf)

_images/ex_plot_laguerre_banded_matrix_model.png

Tridiagonal matrix model for the Laguerre ensemble

See also

Jacobi Ensemble

Take for reference measure \(\mu=\operatorname{Beta}(a,b)\)

\[(x_1,\dots,x_N) \sim \left|\Delta(x_1,\dots,x_N)\right|^{\beta} % \prod_{i= 1}^N x_i^{a-1} (1-x_i)^{b-1} % \indic_{\bbR}(x_i) \ d x_i.\]

Note

Recall that from the definition in (33)

\[\left|\Delta(x_1,\dots,x_N)\right| = \prod_{i<j} |x_i - x_j|.\]

The equivalent tridiagonal model reads

\[\begin{split}\begin{bmatrix} \alpha_1 & \sqrt{\beta_2}& 0 & 0 & 0 \\ \sqrt{\beta_2} & \alpha_2 & \sqrt{\beta_3}& 0 & 0 \\ 0 & \ddots & \ddots & \ddots & 0 \\ 0 & 0 & \sqrt{\beta_{N-1}} & \alpha_{N- 1} & \sqrt{\beta_{N}} \\ 0 & 0 & 0 & \sqrt{\beta_N} & \alpha_{N} \end{bmatrix}.\end{split}\]
\[ \begin{align}\begin{aligned}\begin{split}\begin{aligned} \alpha_1 &= \xi_1 \quad & \\ \alpha_k &= \xi_{2k-2}+\xi_{2k-1} \quad &\beta_{k+1} &= \xi_{2k-1}\xi_{2k} \end{aligned}\end{split}\\\begin{split}\begin{aligned} \xi_1 &= c_1 \quad &\gamma_1 &= 1-c_1 \\ \xi_k &= (1-c_{k-1})c_k \quad &\gamma_k &= c_{k-1}(1-c_k) \end{aligned},\end{split}\end{aligned}\end{align} \]

with

\[c_{2i-1} \sim \operatorname{Beta} \left( \frac{\beta}{2}(N-i) + a, \frac{\beta}{2}(N-i) + b \right) \quad \text{and} \quad c_{2i} \sim \operatorname{Beta} \left( \frac{\beta}{2} (N-i), \frac{\beta}{2} (N-i-1) + a + b \right).\]

To recover the full matrix model for Laguerre Ensemble, recall that \(\Gamma(\frac{k}{2}, 2)\equiv \chi_k^2\) and take

\[a = \frac{\beta}{2}(M_1-N+1) \quad \text{and} \quad b = \frac{\beta}{2}(M_2-N+1).\]

That is to say,

\[c_{2i-1} \sim \operatorname{Beta} \left( \frac{\beta}{2}(M_1-i+1), \frac{\beta}{2}(M_2-i+1) \right) \quad \text{and} \quad c_{2i} \sim \operatorname{Beta} \left( \frac{\beta}{2} (N-i), \frac{\beta}{2} (M_1+M_2-N-i+1) \right).\]
from dppy.beta_ensembles import JacobiEnsemble


jacobi = JacobiEnsemble(beta=3.14)  # beta can be >=0, default beta=2
# Reference measure is Beta(a,b)
jacobi.sample_banded_model(a=500, b=300, size_N=400)
# jacobi.plot(normalization=True)
jacobi.hist(normalization=True)

(Source code, png, hires.png, pdf)

_images/ex_plot_jacobi_banded_matrix_model.png

Tridiagonal matrix model for the Jacobi ensemble

See also

Circular Ensemble
\[\left|\Delta(e^{i \theta_1},\dots, e^{i \theta_N})\right|^{\beta} \prod_{j = 1}^N \frac{1}{2\pi} \mathbf{1}_{[0,2\pi]} (\theta_j) d\theta_j.\]

Note

Recall that from the definition in (33)

\[\left|\Delta(x_1,\dots,x_N)\right| = \prod_{i<j} |x_i - x_j|.\]

Important

Consider the distribution \(\Theta_{\nu}\) that for integers \(\nu\geq2\) is defined as follows:

Draw \(v\) uniformly at random from the unit sphere \(\mathbb{S}^{\nu} \in \mathbb{R}^{\nu+1}\), then \(v_1 + iv_2\sim \Theta_{\nu}\)

Now, given \(\beta\in \mathbb{N}^*\), let

  • \(\alpha_k\sim \Theta_{\beta(N-k-1)+1}\) independent variables

  • for \(0\leq k\leq N-1\) set \(\rho_k = \sqrt{1-|\alpha_k|^2}\).

Then, the equivalent quindiagonal model corresponds to the eigenvalues of either \(LM\) or \(ML\) with

\[L = \operatorname{diag}[\Xi_0,\Xi_2,\dots] \quad \text{and} \quad M = \operatorname{diag}[\Xi_{-1},\Xi_1,\Xi_3\dots],\]

and where

\[\begin{split}\Xi_k = \begin{bmatrix} \overline{\alpha}_k & \rho_k\\ \rho_k & -\alpha_k \end{bmatrix} , \quad 0\leq k\leq N-2 , \quad \text{with} \quad \Xi_{-1} = [1] \quad \text{and} \quad \Xi_{N-1} = [\overline{\alpha}_{N-1}].\end{split}\]

Hint

The effect of increasing the \(\beta\) parameter can be nicely visualized on this Circular Ensemble. Viewing \(\beta\) as the inverse temperature, the configuration of the eigenvalues crystallizes with \(\beta\), see the figure below.

from dppy.beta_ensembles import CircularEnsemble

circular = CircularEnsemble(beta=2)  # beta must be >=0 integer, default beta=2

# See the cristallization of the configuration as beta increases
for b in [0, 1, 5, 10]:

    circular.beta = b
    circular.sample_banded_model(size_N=30)
    circular.plot()

circular.beta = 2
circular.sample_banded_model(size_N=1000)
circular.hist()

(Source code)

_images/ex_plot_circular_banded_matrix_model_00.png

(png, hires.png, pdf) Quindiagonal matrix model for the Circular ensemble

_images/ex_plot_circular_banded_matrix_model_01.png

(png, hires.png, pdf) Quindiagonal matrix model for the Circular ensemble

_images/ex_plot_circular_banded_matrix_model_02.png

(png, hires.png, pdf) Quindiagonal matrix model for the Circular ensemble

_images/ex_plot_circular_banded_matrix_model_03.png

(png, hires.png, pdf) Quindiagonal matrix model for the Circular ensemble

_images/ex_plot_circular_banded_matrix_model_04.png

(png, hires.png, pdf) Quindiagonal matrix model for the Circular ensemble

See also

Multivariate Jacobi ensemble

Important

For the details please refer to:

  1. the extensive documentation of MultivariateJacobiOPE below

  2. the associated Jupyter notebook which showcases MultivariateJacobiOPE

  3. our NeurIPS‘19 paper [GBV19] On two ways to use determinantal point processes for Monte Carlo integration

  4. our ICML‘19 workshop paper

The figures below display a sample of a \(d=2\) dimensional Jacobi ensemble MultivariateJacobiOPE with \(N=200\) points. The red and green dashed curves correspond to the normalized base densities proportional to \((1-x)^{a_1} (1+x)^{b_1}\) and \((1-y)^{a_2} (1+y)^{b_2}\), respectively.

import numpy as np
import matplotlib.pyplot as plt
from dppy.multivariate_jacobi_ope import MultivariateJacobiOPE

# The .plot() method outputs smtg only in dimension d=1 or 2

# Number of points / dimension
N, d = 200, 2
# Jacobi parameters in [-0.5, 0.5]^{d x 2}
jac_params = np.array([[0.5, 0.5],
                       [-0.3, 0.4]])

dpp = MultivariateJacobiOPE(N, jac_params)

# Get an exact sample
sampl = dpp.sample()

# Display
# the cloud of points
# the base probability densities
# the marginal empirical histograms
dpp.plot(sample=sampl, weighted=False)
plt.tight_layout()

# Attach a weight 1/K(x,x) to each of the points
dpp.plot(sample=sampl, weighted=True)
plt.tight_layout()

(Source code)

  • In the first plot, observe that the empirical marginal density converges to the arcsine density \(\frac{1}{\pi\sqrt{1-x^2}}\), displayed in orange.

  • In the second plot, we take the same sample and attach a weight \(\frac{1}{K(x,x)}\) to each of the points. This illustrates the choice of the weights defining the estimator of [BH16] as a proxy for the reference measure.


Implementation of the class MultivariateJacobiOPE used in [GBV19] for Monte Carlo with Determinantal Point Processes

It has 3 main methods:

  • sample() to generate samples

  • K() to evaluate the corresponding projection kernel

  • plot() to display 1D or 2D samples

class dppy.multivariate_jacobi_ope.MultivariateJacobiOPE(N, jacobi_params)[source]

Bases: object

Multivariate Jacobi Orthogonal Polynomial Ensemble used in [GBV19] for Monte Carlo with Determinantal Point Processes

This corresponds to a continuous multivariate projection DPP with state space \([-1, 1]^d\) with respect to

  • reference measure \(\mu(dx) = w(x) dx\) (see also eval_w()), where

    \[w(x) = \prod_{i=1}^{d} (1-x_i)^{a_i} (1+x_i)^{b_i}\]
  • kernel \(K\) (see also K())

    \[K(x, y) = \sum_{\mathfrak{b}(k)=0}^{N-1} P_{k}(x) P_{k}(y) = \Phi(x)^{\top} \Phi(y)\]

    where

    • \(k \in \mathbb{N}^d\) is a multi-index ordered according to the ordering \(\mathfrak{b}\) (see compute_ordering())

    • \(P_{k}(x) = \prod_{i=1}^d P_{k_i}^{(a_i, b_i)}(x_i)\) is the product of orthonormal Jacobi polynomials

      \[\int_{-1}^{1} P_{k}^{(a_i,b_i)}(u) P_{\ell}^{(a_i,b_i)}(u) (1-u)^{a_i} (1+u)^{b_i} d u = \delta_{k\ell}\]

      so that \((P_{k})\) are orthonormal w.r.t \(\mu(dx)\)

    • \(\Phi(x) = \left(P_{\mathfrak{b}^{-1}(0)}(x), \dots, P_{\mathfrak{b}^{-1}(N-1)}(x) \right)^{\top}\)

Parameters
  • N (int) – Number of points \(N \geq 1\)

  • jacobi_params (array_like) – Jacobi parameters \([(a_i, b_i)]_{i=1}^d\) The number of rows \(d\) prescribes the ambient dimension of the points i.e. \(x_{1}, \dots, x_{N} \in [-1, 1]^d\). - when \(d=1\), \(a_1, b_1 > -1\) - when \(d \geq 2\), \(|a_i|, |b_i| \leq \frac{1}{2}\)

See also

  • Multivariate Jacobi ensemble

  • when \(d=1\), the univariate Jacobi ensemble is sampled by computing the eigenvalues of a properly randomized tridiagonal matrix of [KN04]

  • [BH16] initiated the use of the multivariate Jacobi ensemble for Monte Carlo integration. In particular, they proved CLT with variance decay of order \(N^{-(1+1/d)}\) which is faster that the \(N^{-1}\) rate of vanilla Monte Carlo where the points are drawn i.i.d. from the base measure.

K(X, Y=None, eval_pointwise=False)[source]

Evalute \(\left(K(x, y)\right)_{x\in X, y\in Y}\) if eval_pointwise=False or \(\left(K(x, y)\right)_{(x, y)\in (X, Y)}\) otherwise

\[K(x, y) = \sum_{\mathfrak{b}(k)=0}^{N-1} P_{k}(x) P_{k}(y) = \phi(x)^{\top} \phi(y)\]

where

  • \(k \in \mathbb{N}^d\) is a multi-index ordered according to the ordering \(\mathfrak{b}\), compute_ordering()

  • \(P_{k}(x) = \prod_{i=1}^d P_{k_i}^{(a_i, b_i)}(x_i)\) is the product of orthonormal Jacobi polynomials

    \[\int_{-1}^{1} P_{k}^{(a_i,b_i)}(u) P_{\ell}^{(a_i,b_i)}(u) (1-u)^{a_i} (1+u)^{b_i} d u = \delta_{k\ell}\]

    so that \((P_{k})\) are orthonormal w.r.t \(\mu(dx)\)

  • \(\Phi(x) = \left(P_{\mathfrak{b}^{-1}(0)}(x), \dots, P_{\mathfrak{b}^{-1}(N-1)}(x) \right)\), see eval_multiD_polynomials()

Parameters
  • X (array_like) – \(M\times d\) array of \(M\) points \(\in [-1, 1]^d\)

  • Y (array_like (default None)) – \(M'\times d\) array of \(M'\) points \(\in [-1, 1]^d\)

  • eval_pointwise (bool (default False)) – sets pointwise evaluation of the kernel, if True, \(X\) and \(Y\) must have the same shape, see Returns

Returns

If eval_pointwise=False (default), evaluate the kernel matrix

\[\left(K(x, y)\right)_{x\in X, y\in Y}\]

If eval_pointwise=True kernel matrix Pointwise evaluation of \(K\) as depicted in the following pseudo code output

  • if Y is None

    • \(\left(K(x, y)\right)_{x\in X, y\in X}\) if eval_pointwise=False

    • \(\left(K(x, x)\right)_{x\in X}\) if eval_pointwise=True

  • otherwise

    • \(\left(K(x, y)\right)_{x\in X, y\in Y}\) if eval_pointwise=False

    • \(\left(K(x, y)\right)_{(x, y)\in (X, Y)}\) if eval_pointwise=True (in this case X and Y should have the same shape)

Return type

array_like

eval_multiD_polynomials(X)[source]

Evaluate

\[\begin{split}\mathbf{\Phi}(X) := \begin{pmatrix} \Phi(x_1)^{\top}\\ \vdots\\ \Phi(x_M)^{\top} \end{pmatrix}\end{split}\]

where \(\Phi(x) = \left(P_{\mathfrak{b}^{-1}(0)}(x), \dots, P_{\mathfrak{b}^{-1}(N-1)}(x) \right)^{\top}\) such that \(K(x, y) = \Phi(x)^{\top} \Phi(y)\). Recall that \(\mathfrak{b}\) denotes the ordering chosen to order multi-indices \(k\in \mathbb{N}^d\).

This is done by evaluating each of the three-term recurrence relations satisfied by each univariate orthogonal Jacobi polynomial, using the dedicated see also SciPy scipy.special.eval_jacobi() satistified by the respective univariate Jacobi polynomials \(P_{k_i}^{(a_i, b_i)}(x_i)\). Then we use the slicing feature of the Python language to compute \(\Phi(x)=\left(P_{k}(x) = \prod_{i=1}^d P_{k_i}^{(a_i, b_i)}(x_i)\right)_{k=\mathfrak{b}^{-1}(0), \dots, \mathfrak{b}^{-1}(N-1)}^{\top}\)

Parameters

X (array_like) – \(M\times d\) array of \(M\) points \(\in [-1, 1]^d\)

Returns

\(\mathbf{\Phi}(X)\) - \(M\times N\) array

Return type

array_like

See also

  • evaluation of the kernel K()

eval_w(X)[source]

Evaluate \(w(x) = \prod_{i=1}^{d} (1-x_i)^{a_i} (1+x_i)^{b_i}\) which corresponds to the density of the base measure \(\mu(dx) = w(x) dx\)

Parameters

X (array_like) – \(M\times d\) array of \(M\) points \(\in [-1, 1]^d\)

Returns

\(w(x) = \prod_{i=1}^{d} (1-x_i)^{a_i} (1+x_i)^{b_i}\)

Return type

array_like

sample(nb_trials_max=10000, random_state=None, tridiag_1D=True)[source]

Use the chain rule [HKPVirag06] (Algorithm 18) to sample \(\left(x_{1}, \dots, x_{N} \right)\) with density

\[\begin{split}& \frac{1}{N!} \left(K(x_n,x_p)\right)_{n,p=1}^{N} \prod_{n=1}^{N} w(x_n)\\ &= \frac{1}{N} K(x_1,x_1) w(x_1) \prod_{n=2}^{N} \frac{ K(x_n,x_n) - K(x_n,x_{1:n-1}) \left[\left(K(x_k,x_l)\right)_{k,l=1}^{n-1}\right]^{-1} K(x_{1:n-1},x_n) }{N-(n-1)} w(x_n)\\ &= \frac{\| \Phi(x) \|^2}{N} \omega(x_1) d x_1 \prod_{n=2}^{N} \frac{\operatorname{distance}^2(\Phi(x_n), \operatorname{span}\{\Phi(x_p)\}_{p=1}^{n-1})} {N-(n-1)} \omega(x_n) d x_n\end{split}\]

The order in which the points were sampled can be forgotten to obtain a valid sample of the corresponding DPP

  • \(x_1 \sim \frac{1}{N} K(x,x) w(x)\) using sample_chain_rule_proposal()

  • \(x_n | Y = \left\{ x_{1}, \dots, x_{n-1} \right\}\), is sampled using rejection sampling with proposal density \(\frac{1}{N} K(x,x) w(x)\) and rejection bound frac{N}{N-(n-1)}

    \[\frac{1}{N-(n-1)} [K(x,x) - K(x, Y) K_Y^{-1} K(Y, x)] w(x) \leq \frac{N}{N-(n-1)} \frac{1}{N} K(x,x) w(x)\]

Note

Using the gram structure \(K(x, y) = \Phi(x)^{\top} \Phi(y)\) the numerator of the successive conditionals reads

\[\begin{split}K(x, x) - K(x, Y) K(Y, Y)^{-1} K(Y, x) &= \operatorname{distance}^2(\Phi(x_n), \operatorname{span}\{\Phi(x_p)\}_{p=1}^{n-1})\\ &= \left\| (I - \Pi_{\operatorname{span}\{\Phi(x_p)\}_{p=1}^{n-1}} \phi(x)\right\|^2\end{split}\]

which can be computed simply in a vectorized way. The overall procedure is akin to a sequential Gram-Schmidt orthogonalization of \(\Phi(x_{1}), \dots, \Phi(x_{N})\).

sample_chain_rule_proposal(nb_trials_max=10000, random_state=None)[source]

Use a rejection sampling mechanism to sample

\[\frac{1}{N} K(x, x) w(x) dx = \frac{1}{N} \sum_{\mathfrak{b}(k)=0}^{N-1} \left( \frac{P_k(x)}{\left\| P_k \right\|} \right)^2 w(x)\]

with proposal distribution

\[w_{eq}(x) d x = \prod_{i=1}^{d} \frac{1}{\pi\sqrt{1-(x_i)^2}} d x_i\]

Since the target density is a mixture, we can sample from it by

  1. Select a multi-index \(k\) uniformly at random in \(\left\{ \mathfrak{b}^{-1}(0), \dots, \mathfrak{b}^{-1}(N-1) \right\}\)

  2. Sample from \(\left( \frac{P_k(x)}{\left\| P_k \right\|} \right)^2 w(x) dx\) with proposal \(w_{eq}(x) d x\).

    The acceptance ratio writes

    \[\frac{ \left( \frac{P_k(x)}{\left\| P_k \right\|} \right)^2 w(x)} {w_{eq}(x)} = \prod_{i=1}^{d} \pi \left( \frac{P_{k_i}^{(a_i, b_i)}(x)} {\left\| P_{k_i}^{(a_i, b_i)} \right\|} \right)^2 (1-x_i)^{a_i+\frac{1}{2}} (1+x_i)^{b_i+\frac{1}{2}} \leq C_{k}\]

    which can be bounded using the result of [Gau09] on Jacobi polynomials.

    Note

    Each of the rejection constant \(C_{k}\) is computed at initialization of the MultivariateJacobiOPE object using compute_rejection_bounds()

Returns

A sample \(x\in[-1,1]^d\) with probability distribution \(\frac{1}{N} K(x,x) w(x)\)

Return type

array_like

dppy.multivariate_jacobi_ope.compute_degrees_1D_polynomials(max_degrees)[source]

deg[i, j] = i if i <= max_degrees[j] else 0

dppy.multivariate_jacobi_ope.compute_norms_1D_polynomials(jacobi_params, deg_max)[source]

Compute the square norms \(\|P_{k}^{(a_i,b_i)}\|^2\) of each (univariate) orthogoanl Jacobi polynomial for \(k=0\) to deg_max and \(a_i, b_i =\) jacobi_params[i, :] Recall that the Jacobi polynomials \(\left( P_{k}^{(a_i,b_i)} \right)\) are orthogonal w.r.t. \((1-u)^{a_i} (1+u)^{b_i} du\).

\[\begin{split}\|P_{k}^{(a_i,b_i)}\|^2 &= \int_{-1}^{1} \left( P_{k}^{(a_i,b_i)}(u) \right)^2 (1-u)^{a_i} (1+u)^{b_i} d u\\ &= \frac{2^{a_i+b_i+1}} {2k+a_i+b_i+1} \frac{\Gamma(k+a_i+1)\Gamma(k+b_i+1)} {\Gamma(k+a_i+b_i+1)n!}\end{split}\]
Parameters
  • jacobi_params (array_like) – Jacobi parameters \([(a_i, b_i)]_{i=1}^d \in [-\frac{1}{2}, \frac{1}{2}]^{d \times 2}\) The number of rows \(d\) prescribes the ambient dimension of the points i.e. \(x_{1}, \dots, x_{N} \in [-1, 1]^d\)

  • deg_max (int) – Maximal degree of 1D Jacobi polynomials

Returns

Array of size deg_max + 1 \(\times d\) with entry \(k,i\) given by \(\|P_{k}^{(a_i,b_i)}\|^2\)

Return type

array_like

dppy.multivariate_jacobi_ope.compute_ordering(N, d)[source]

Compute the ordering of the multi-indices \(\in\mathbb{N}^d\) defining the order between the multivariate monomials as described in Section 2.1.3 of [BH16].

Parameters
  • N (int) – Number of polynomials \((P_k)\) considered to build the kernel K() (number of points of the corresponding MultivariateJacobiOPE)

  • d (int) – Size of the multi-indices \(k\in \mathbb{N}^d\) characterizing the _degree_ of \(P_k\) (ambient dimension of the points x_{1}, dots, x_{N} in [-1, 1]^d)

Returns

Array of size \(N\times d\) containing the first \(N\) multi-indices \(\in\mathbb{N}^d\) in the order prescribed by the ordering \(\mathfrak{b}\) [BH16] Section 2.1.3

Return type

array_like

For instance, for \(N=12, d=2\)

[(0, 0), (0, 1), (1, 0), (1, 1), (0, 2), (1, 2), (2, 0), (2, 1), (2, 2), (0, 3), (1, 3), (2, 3)]

See also

dppy.multivariate_jacobi_ope.compute_rejection_bounds(jacobi_params, ordering, log_scale=True)[source]

Compute the rejection constants for the acceptance/rejection mechanism used in sample_chain_rule_proposal() to sample

\[\frac{1}{N} K(x, x) w(x) dx = \frac{1}{N} \sum_{\mathfrak{b}(k)=0}^{N-1} \left( \frac{P_k(x)}{\left\| P_k \right\|} \right)^2 w(x)\]

with proposal distribution

\[w_{eq}(x) d x = \prod_{i=1}^{d} \frac{1}{\pi\sqrt{1-(x_i)^2}} d x_i\]

To get a sample:

  1. Draw a multi-index \(k\) uniformly at random in \(\left\{ \mathfrak{b}^{-1}(0), \dots, \mathfrak{b}^{-1}(N-1) \right\}\)

  2. Sample from \(P_k(x)^2 w(x) dx\) with proposal \(w_{eq}(x) d x\).

    The acceptance ratio writes

    \[\frac{\left( \frac{P_k(x)}{\left\| P_k \right\|} \right)^2 w(x)} {w_{eq}(x)} = \prod_{i=1}^{d} \pi \left( \frac{P_{k_i}^{(a_i, b_i)}(x)} {\left\| P_{k_i}^{(a_i, b_i)} \right\|} \right)^2 (1-x_i)^{a_i+\frac{1}{2}} (1+x_i)^{b_i+\frac{1}{2}} \leq C_k\]
    • For \(k_i>0\) we use a result on Jacobi polynomials given by, e.g., [Gau09], for \(\quad|a|,|b| \leq \frac{1}{2}\)

    \[\begin{split}& \pi (1-u)^{a+\frac{1}{2}} (1+u)^{b+\frac{1}{2}} \left( \frac{P_{n}^{(a, b)}(u)} {\left\| P_{n}^{(a, b)} \right\|} \right)^2\\ &\leq \frac{2} {n!(n+(a+b+1) / 2)^{2 \max(a,b)}} \frac{\Gamma(n+a+b+1) \Gamma(n+\max(a,b)+1)} {\Gamma(n+\min(a,b)+1)}\end{split}\]
    • For \(k_i=0\), we use less involved properties of the Jacobi polynomials:

      • \(P_{0}^{(a, b)} = 1\)

      • \(\|P_{0}^{(a, b)}\|^2 = 2^{a+b+1} \operatorname{B}(a+1,b+1)\)

      • \(m = \frac{b-a}{a+b+1}\) is the mode of \((1-u)^{a+\frac{1}{2}} (1+u)^{b+\frac{1}{2}}\) (valid since \(a+\frac{1}{2}, b+\frac{1}{2} > 0\))

      So that,

      \[\begin{split} \pi (1-u)^{a+\frac{1}{2}} (1+u)^{b+\frac{1}{2}} \left(\frac{P_{0}^{(a, b)}(u)} {\|P_{0}^{(a, b)}\|}\right)^{2} &= \frac {\pi (1-u)^{a+\frac{1}{2}} (1+u)^{b+\frac{1}{2}}} {\|P_{0}^{(a, b)}\|^2} \\ &\leq \frac {\pi (1-m)^{a+\frac{1}{2}} (1+m)^{b+\frac{1}{2}}} {2^{a+b+1} \operatorname{B}(a+1,b+1)}\end{split}\]
Parameters
  • jacobi_params (array_like) –

    Jacobi parameters \([(a_i, b_i)]_{i=1}^d \in [-\frac{1}{2}, \frac{1}{2}]^{d \times 2}\).

    The number of rows \(d\) prescribes the ambient dimension of the points i.e. \(x_{1}, \dots, x_{N} \in [-1, 1]^d\)

  • ordering (array_like) –

    Ordering of the multi-indices \(\in\mathbb{N}^d\) defining the order between the multivariate monomials (see also compute_ordering())

    • the number of rows corresponds to the number \(N\) of monomials considered.

    • the number of columns \(=d\)

  • log_scale (bool) – If True, the rejection bound is computed using the logarithmic versions betaln, gammaln of beta and gamma functions to avoid overflows

Returns

The rejection bounds \(C_{k}\) for \(k = \mathfrak{b}^{-1}(0), \dots, \mathfrak{b}^{-1}(N-1)\)

Return type

array_like

See also

  • [Gau09] for the domination when \(k_i > 0\)

  • compute_poly1D_norms()

API

Implementation of the meta-class BetaEnsemble see \(\beta-\)Ensembles with children:

Such objects have 4 main methods:

  • sample_full_model()

  • sample_banded_model()

  • plot() to display a scatter plot of the last sample and eventually the limiting distribution (after normalization)

  • hist() to display a histogram of the last sample and eventually the limiting distribution (after normalization)

class dppy.beta_ensembles.BetaEnsemble(beta=2)[source]

Bases: object

\(\beta\)-Ensemble object parametrized by

Parameters

beta (int, float, default \(2\)) –

\(\beta >= 0\) inverse temperature parameter.

The default beta\(=2\) corresponds to the DPP case, see Orthogonal Polynomial Ensembles

See also

flush_samples()[source]

Empty the list_of_samples attribute.

abstract hist()[source]

Display histogram of the last realization of the underlying \(\beta\)-Ensemble. For some \(\beta\)-Ensembles, a normalization argument is available to display the limiting (or equilibrium) distribution and scale the points accordingly.

abstract normalize_points()[source]

Normalize points ormalization argument is available to display the limiting (or equilibrium) distribution and scale the points accordingly.

abstract plot()[source]

Display last realization of the underlying \(\beta\)-Ensemble. For some \(\beta\)-Ensembles, a normalization argument is available to display the limiting (or equilibrium) distribution and scale the points accordingly.

abstract sample_banded_model()[source]

Sample from underlying \(\beta\)-Ensemble using the corresponding banded matrix model. Arguments are the associated reference measure’s parameters, or the matrix dimensions used in sample_full_model()

abstract sample_full_model()[source]

Sample from underlying \(\beta\)-Ensemble using the corresponding full matrix model. Arguments are the associated matrix dimensions

class dppy.beta_ensembles.CircularEnsemble(beta=2)[source]

Bases: dppy.beta_ensembles.BetaEnsemble

Circular Ensemble object

See also

flush_samples()

Empty the list_of_samples attribute.

hist(normalization=True)[source]

Display the histogram of the angles \(\theta_{1}, \dots, \theta_{N}\) associated to the last realization \(\left\{ e^{i \theta_{1}}, \dots, e^{i \theta_{N}} \right\}\) object.

Parameters

normalization (bool, default True) – When True, the limiting distribution of the angles, i.e., the uniform distribution in \([0, 2\pi]\) is displayed

See also

normalize_points(points)[source]

No need to renormalize the points

plot(normalization=True)[source]

Display the last realization of the CircularEnsemble object.

Parameters

normalization (bool, default True) – When True, the unit circle is displayed

See also

sample_banded_model(size_N=10, random_state=None)[source]

Sample from Quindiagonal matrix model associated to the Circular Ensemble. Available for beta \(\in\mathbb{N}^*\), and the degenerate case beta \(=0\) corresponding to i.i.d. uniform points on the unit circle

Parameters

size_N (int, default \(10\)) – Number \(N\) of points, i.e., size of the matrix to be diagonalized

Note

To compare sample_banded_model() with sample_full_model() simply use the size_N parameter.

See also

sample_full_model(size_N=10, haar_mode='Hermite', random_state=None)[source]

Sample from tridiagonal matrix model associated to the Circular ensemble. Only available for beta \(\in\{1, 2, 4\}\) and the degenerate case beta \(=0\) corresponding to i.i.d. uniform points on the unit circle

Parameters
  • size_N (int, default \(10\)) – Number \(N\) of points, i.e., size of the matrix to be diagonalized

  • haar_mode (str, default 'hermite') – Sample Haar measure i.e. uniformly on the orthogonal/unitary/symplectic group using: - ‘QR’, - ‘Hermite’

See also

class dppy.beta_ensembles.GinibreEnsemble(beta=2)[source]

Bases: dppy.beta_ensembles.BetaEnsemble

Ginibre Ensemble object

See also

flush_samples()

Empty the list_of_samples attribute.

hist(normalization=True)[source]

Display the histogram of the radius of the points the last realization of the GinibreEnsemble object

Parameters

normalization (bool, default True) – When True, the points are first normalized so as to concentrate in the unit disk (see normalize_points()) and the limiting density \(2r 1_{[0, 1]}(r)\) of the radii is displayed

See also

normalize_points(points)[source]

Normalize points to concentrate in the unit disk.

\[x \mapsto \frac{x}{\sqrt{N}}\]

See also

plot(normalization=True)[source]

Display the last realization of the GinibreEnsemble object

Parameters

normalization (bool, default True) – When True, the points are first normalized so as to concentrate in the unit disk (see normalize_points()) and the unit circle is displayed

See also

sample_banded_model(*args, **kwargs)[source]

No banded model is known for Ginibre, use sample_full_model()

sample_full_model(size_N=10, random_state=None)[source]

Sample from full matrix model associated to the Ginibre ensemble. Only available for beta \(=2\)

Parameters

size_N (int, default \(10\)) – Number \(N\) of points, i.e., size of the matrix to be diagonalized

See also

class dppy.beta_ensembles.HermiteEnsemble(beta=2)[source]

Bases: dppy.beta_ensembles.BetaEnsemble

Hermite Ensemble object

See also

flush_samples()

Empty the list_of_samples attribute.

hist(normalization=True)[source]

Display the histogram of the last realization of the HermiteEnsemble object.

Parameters

normalization (bool, default True) –

When True, the points are first normalized (see normalize_points()) so that they concentrate as

  • If beta \(=0\), the \(\mathcal{N}(0, 2)\) reference measure associated to full full matrix model

  • If beta \(>0\), the limiting distribution, i.e., the semi-circle distribution

in both cases, the corresponding p.d.f. is displayed

See also

normalize_points(points)[source]

Normalize points obtained after sampling to fit the limiting distribution, i.e., the semi-circle

\[f(x) = \frac{1}{2\pi} \sqrt{4-x^2}\]
Parameters

points (array_like) – A sample from Hermite ensemble, accessible through the list_of_samples attribute

  • If sampled using sample_banded_model() with reference measure \(\mathcal{N}(\mu,\sigma^2)\)

    1. Normalize the points to fit the p.d.f. of \(\mathcal{N}(0,2)\) reference measure of the full matrix model

      \[x \mapsto \sqrt{2}\frac{x-\mu}{\sigma}\]
    2. If beta \(>0\), normalize the points to fit the semi-circle distribution

      \[x \mapsto \frac{x}{\beta N}\]

      Otherwise if beta \(=0\) do nothing more

  • If sampled using sample_full_model(), apply 2. above

Note

This method is called in plot() and hist() when normalization=True

plot(normalization=True)[source]

Display the last realization of the HermiteEnsemble object

Parameters

normalization (bool, default True) –

When True, the points are first normalized (see normalize_points()) so that they concentrate as

  • If beta \(=0\), the \(\mathcal{N}(0, 2)\) reference measure associated to full full matrix model

  • If beta \(>0\), the limiting distribution, i.e., the semi-circle distribution

in both cases, the corresponding p.d.f. is displayed

See also

sample_banded_model(loc=0.0, scale=1.4142135623730951, size_N=10, random_state=None)[source]

Sample from tridiagonal matrix model associated to the Hermite Ensemble. Available for beta \(>0\) and the degenerate case beta \(=0\) corresponding to i.i.d. points from the Gaussian \(\mathcal{N}(\mu,\sigma^2)\) reference measure

Parameters
  • loc (float, default \(0\)) – Mean \(\mu\) of the Gaussian \(\mathcal{N}(\mu, \sigma^2)\)

  • scale (float, default \(\sqrt{2}\)) – Standard deviation \(\sigma\) of the Gaussian \(\mathcal{N}(\mu, \sigma^2)\)

  • size_N (int, default \(10\)) – Number \(N\) of points, i.e., size of the matrix to be diagonalized

Note

The reference measure associated with the full matrix model is \(\mathcal{N}(0,2)\). For this reason, in the sampling_params attribute, the default values are set to loc\(=0\) and scale\(=\sqrt{2}\).

To compare sample_banded_model() with sample_full_model() simply use the size_N parameter.

See also

sample_full_model(size_N=10, random_state=None)[source]

Sample from full matrix model associated to the Hermite ensemble. Only available for beta \(\in\{1, 2, 4\}\) and the degenerate case beta \(=0\) corresponding to i.i.d. points from the Gaussian \(\mathcal{N}(\mu,\sigma^2)\) reference measure

Parameters

size_N (int, default \(10\)) – Number \(N\) of points, i.e., size of the matrix to be diagonalized

Note

The reference measure associated with the full matrix model is \(\mathcal{N}(0,2)\). For this reason, in the sampling_params attribute, the values of the parameters are set to loc\(=0\) and scale\(=\sqrt{2}\).

To compare sample_banded_model() with sample_full_model() simply use the size_N parameter.

See also

class dppy.beta_ensembles.JacobiEnsemble(beta=2)[source]

Bases: dppy.beta_ensembles.BetaEnsemble

Jacobi Ensemble object

See also

flush_samples()

Empty the list_of_samples attribute.

hist(normalization=True)[source]

Display the histogram of the last realization of the JacobiEnsemble object.

Parameters

normalization (bool, default True) –

When True

  • If beta \(=0\), display the p.d.f. of the \(\operatorname{Beta}(a, b)\)

  • If beta \(>0\), display the limiting distribution, i.e., the Wachter distribution

See also

normalize_points(points)[source]

No need to renormalize the points

plot(normalization=True)[source]

Display the last realization of the JacobiEnsemble object

Parameters

normalization (bool, default True) –

When True

  • If beta \(=0\), display the p.d.f. of the \(\operatorname{Beta}(a, b)\)

  • If beta \(>0\), display the limiting distribution, i.e., the Wachter distribution

See also

sample_banded_model(a=1.0, b=2.0, size_N=10, size_M1=None, size_M2=None, random_state=None)[source]

Sample from tridiagonal matrix model associated to the Jacobi ensemble. Available for beta \(>0\) and the degenerate case beta \(=0\) corresponding to i.i.d. points from the \(\operatorname{Beta}(a,b)\) reference measure

Parameters
  • shape (float, default \(1\)) – Shape parameter \(k\) of \(\Gamma(k, \theta)\) reference measure

  • scale (float, default \(2.0\)) – Scale parameter \(\theta\) of \(\Gamma(k, \theta)\) reference measure

  • size_N (int, default \(10\)) – Number \(N\) of points, i.e., size of the matrix to be diagonalized. Equivalent to the first dimension \(N\) of the matrices used in the full matrix model.

  • size_M1 (int) – Equivalent to the second dimension \(M_1\) of the first matrix used in the full matrix model.

  • size_M2 (int) – Equivalent to the second dimension \(M_2\) of the second matrix used in the full matrix model.

Note

The reference measure associated with the full matrix model is :

\[\operatorname{Beta}\left(\frac{\beta}{2}(M_1-N+1), \frac{\beta}{2}(M_2-N+1)\right)\]

For this reason, in the sampling_params attribute, the values of the parameters are set to a\(=\frac{\beta}{2}(M_1-N+1)\) and b\(=\frac{\beta}{2}(M_2-N+1)\).

To compare sample_banded_model() with sample_full_model() simply use the size_N, size_M2 and size_M2 parameters.

  • If size_M1 and size_M2 are not provided:

    In the sampling_params attribute, size_M1,2 are set to size_M1\(= \frac{2a}{\beta} + N - 1\) and size_M2\(= \frac{2b}{\beta} + N - 1\), to give an idea of the corresponding second dimensions \(M_{1,2}\).

  • If size_M1 and size_M2 are provided:

    In the sampling_params attribute, a and b are set to: a\(=\frac{\beta}{2}(M_1-N+1)\) and b\(=\frac{\beta}{2}(M_2-N+1)\).

See also

sample_full_model(size_N=100, size_M1=150, size_M2=200, random_state=None)[source]

Sample from full matrix model associated to the Jacobi ensemble. Only available for beta \(\in\{1, 2, 4\}\) and the degenerate case beta \(=0\) corresponding to i.i.d. points from the \(\operatorname{Beta}(a,b)\) reference measure

Parameters
  • size_N (int, default \(100\)) – Number \(N\) of points, i.e., size of the matrix to be diagonalized. First dimension of the matrix used to form the covariance matrix to be diagonalized, see full matrix model.

  • size_M1 (int, default \(150\)) – Second dimension \(M_1\) of the first matrix used to form the matrix to be diagonalized, see full matrix model.

  • size_M2 (int, default \(200\)) – Second dimension \(M_2\) of the second matrix used to form the matrix to be diagonalized, see full matrix model.

Note

The reference measure associated with the full matrix model is

\[\operatorname{Beta}\left(\frac{\beta}{2}(M_1-N+1), \frac{\beta}{2}(M_2-N+1)\right)\]

For this reason, in the sampling_params attribute, the values of the parameters are set to a\(=\frac{\beta}{2}(M_1-N+1)\) and b\(=\frac{\beta}{2}(M_2-N+1)\).

To compare sample_banded_model() with sample_full_model() simply use the size_N, size_M2 and size_M2 parameters.

See also

class dppy.beta_ensembles.LaguerreEnsemble(beta=2)[source]

Bases: dppy.beta_ensembles.BetaEnsemble

Laguerre Ensemble object

See also

flush_samples()

Empty the list_of_samples attribute.

hist(normalization=True)[source]

Display the histogram of the last realization of the LaguerreEnsemble object.

Parameters

normalization (bool, default True) –

When True, the points are first normalized (see normalize_points()) so that they concentrate as

  • If beta \(=0\), the \(\Gamma(k, 2)\) reference measure associated to full full matrix model

  • If beta \(>0\), the limiting distribution, i.e., the Marcenko-Pastur distribution

in both cases the corresponding p.d.f. is displayed

See also

normalize_points(points)[source]

Normalize points obtained after sampling to fit the limiting distribution, i.e., the Marcenko-Pastur distribution

\[\frac{1}{2\pi} \frac{\sqrt{(\lambda_+-x)(x-\lambda_-)}}{cx} 1_{[\lambda_-,\lambda_+]} dx\]

where \(c = \frac{M}{N}\) and \(\lambda_\pm = (1\pm\sqrt{c})^2\)

Parameters

points (array_like) – A sample from Laguerre ensemble, accessible through the list_of_samples attribute

  • If sampled using sample_banded_model() with reference measure \(\Gamma(k,\theta)\)

    \[x \mapsto \frac{2x}{\theta} \quad \text{and} \quad x \mapsto \frac{x}{\beta M}\]
  • If sampled using sample_full_model()

    \[x \mapsto \frac{x}{\beta M}\]

Note

This method is called in plot() and hist() when normalization=True.

plot(normalization=True)[source]

Display the last realization of the LaguerreEnsemble object

Parameters

normalization (bool, default True) –

When True, the points are first normalized (see normalize_points()) so that they concentrate as

  • If beta \(=0\), the \(\Gamma(k, 2)\) reference measure associated to full full matrix model

  • If beta \(>0\), the limiting distribution, i.e., the Marcenko-Pastur distribution

in both cases the corresponding p.d.f. is displayed

See also

sample_banded_model(shape=1.0, scale=2.0, size_N=10, size_M=None, random_state=None)[source]

Sample from tridiagonal matrix model associated to the Laguerre ensemble. Available for beta \(>0\) and the degenerate case beta \(=0\) corresponding to i.i.d. points from the \(\Gamma(k,\theta)\) reference measure

Parameters
  • shape (float, default \(1\)) – Shape parameter \(k\) of \(\Gamma(k, \theta)\) reference measure

  • scale (float, default \(2.0\)) – Scale parameter \(\theta\) of \(\Gamma(k, \theta)\) reference measure

  • size_N (int, default \(10\)) – Number \(N\) of points, i.e., size of the matrix to be diagonalized. Equivalent to the first dimension \(N\) of the matrix used to form the covariance matrix in the full matrix model.

  • size_M (int, default None) – Equivalent to the second dimension \(M\) of the matrix used to form the covariance matrix in the full matrix model.

  • If size_M is not provided:

    In the sampling_params, size_M is set to size_M\(= \frac{2k}{\beta} + N - 1\), to give an idea of the corresponding second dimension \(M\).

  • If size_M is provided:

    In the sampling_params, shape and scale are set to: shape\(=\frac{1}{2} \beta (M-N+1)\) and scale\(=2\)

Note

The reference measure associated with the full matrix model is \(\Gamma\left(\frac{\beta}{2}(M-N+1), 2\right)\). This explains the role of the size_M parameter.

To compare sample_banded_model() with sample_full_model() simply use the size_N and size_M parameters

See also

sample_full_model(size_N=10, size_M=100, random_state=None)[source]

Sample from full matrix model associated to the Laguerre ensemble. Only available for beta \(\in\{1, 2, 4\}\) and the degenerate case beta \(=0\) corresponding to i.i.d. points from the \(\Gamma(k,\theta)\) reference measure

Parameters
  • size_N (int, default \(10\)) – Number \(N\) of points, i.e., size of the matrix to be diagonalized. First dimension of the matrix used to form the covariance matrix to be diagonalized, see full matrix model.

  • size_M (int, default \(100\)) – Second dimension \(M\) of the matrix used to form the covariance matrix to be diagonalized, see full matrix model.

Note

The reference measure associated with the full matrix model is \(\Gamma\left(\frac{\beta}{2}(M-N+1), 2\right)\). For this reason, in the sampling_params, the values of the parameters are set to shape\(=\frac{\beta}{2}(M-N+1)\) and scale\(=2\).

To compare sample_banded_model() with sample_full_model() simply use the size_N and size_M parameters.

See also

Exotic DPPs

Uniform Spanning Trees

The Uniform measure on Spanning Trees (UST) of a directed connected graph corresponds to a projection DPP with kernel the transfer current matrix of the graph. The later is actually the orthogonal projection matrix onto the row span of the vertex-edge incidence matrix. In fact, one can discard any row of the vertex-edge incidence matrix - note \(A\) the resulting matrix - to compute \(\mathbf{K}=A^{\top}[AA^{\top}]^{-1}A\).

See also

Important

DPPy uses the networkx library to handle DPPs related to trees and graphs, but networkx is not installed by default when installing DPPy. Please refer to the installation instructions on GitHub for more details on how to install the necessary dependencies.

histogram showing uniformity of the spanning trees generated by the different procedures
from networkx import Graph
from dppy.exotic_dpps import UST


# Build graph
g = Graph()
edges = [(0, 2), (0, 3), (1, 2), (1, 4), (2, 3), (2, 4), (3, 4)]
g.add_edges_from(edges)

# Initialize UST object
ust = UST(g)

(Source code)

# Display underlying kernel i.e. transfer current matrix
ust.plot_graph()

(Source code, png, hires.png, pdf)

_images/ust-1.png

(Source code)

# Display underlying kernel i.e. transfer current matrix
ust.plot_kernel()

(Source code, png, hires.png, pdf)

_images/ust-2.png

(Source code)

# Display some samples
for md in ('Wilson', 'Aldous-Broder', 'GS'):
    ust.sample(md)
    ust.plot()

(Source code)

Stationary 1-dependent process

A point process \(\mathcal{X}\) on \(\mathbb{Z}\) (resp. \(\mathbb{N}\)) is called 1-dependent if for any \(A,B\subset \mathbb{Z}\) (resp. \(\mathbb{N}\)), such as the distance between \(A\) and \(B\) is greater than 1,

\[\mathbb{P}(A\cup B\subset \mathcal{X}) =\mathbb{P}(A\subset \mathcal{X}) \mathbb{P}(B\subset \mathcal{X}).\]

If \(\mathcal{X}\) is stationary and 1-dependent then \(\mathcal{X}\) forms a DPP.

The following 3 examples are stationary and 1-dependent process.

Carries process

The sequence of carries appearing when computing the cumulative sum (in base \(b\)) of a sequence of i.i.d. digits forms a DPP on \(\mathbb{N}\) with non symmetric kernel.

from dppy.exotic_dpps import CarriesProcess


base = 10  # base
cp = CarriesProcess(base)

size = 100
cp.sample(size)

cp.plot(vs_bernoullis=True)

(Source code, png, hires.png, pdf)

_images/ex_plot_carries_process.png

Carries process

Descent process

The descent process obtained from a uniformly chosen permutation of \(\{1,2,\dots,n\}\) forms a DPP on \(\{1,2,\dots,n-1\}\) with non symmetric kernel. It can be seen as the limit of the carries process as the base goes to infinity.

from dppy.exotic_dpps import DescentProcess


dp = DescentProcess()

size = 100
dp.sample(size)

dp.plot(vs_bernoullis=True)

(Source code, png, hires.png, pdf)

_images/ex_plot_descent_process.png

Descent process

Limiting Descent process for virtual permutations

For non uniform permutations the descent process is not necessarily determinantal but in the particular case of virtual permutations with law stable under conjugation of the symmetric group the limiting descent process is a mixture of determinantal point processes.

from dppy.exotic_dpps import VirtualDescentProcess


vdp = VirtualDescentProcess(x_0=0.5)

size = 100
vdp.sample(size)

vdp.plot(vs_bernoullis=True)

(Source code, png, hires.png, pdf)

_images/ex_plot_virtual_descent_process.png

Virtual descent process

Poissonized Plancherel measure

The poissonized Plancherel measure is a measure on partitions \(\lambda=(\lambda_1 \geq \lambda_2 \geq \cdots \geq 0)\in \mathbb{N}^{\mathbb{N}^*}\). Samples from this measure can be obtained in the following way

  • Sample \(N \sim \mathcal{P}(\theta)\)

  • Sample a uniform permutation \(\sigma\in \mathfrak{S}_N\)

  • Compute the sorting tableau \(P\) associated to the RSK (Robinson-Schensted-Knuth correspondence) applied to \(\sigma\)

  • Consider only the shape \(\lambda\) of \(P\).

Finally, the point process formed by \(\{\lambda_i - i + \frac12\}_{i\geq 1}\) is a DPP on \(\mathbb{Z}+\frac12\).

from dppy.exotic_dpps import PoissonizedPlancherel


theta = 500  # Poisson parameter
pp = PoissonizedPlancherel(theta=theta)
pp.sample()
pp.plot_diagram(normalization=True)

(Source code, png, hires.png, pdf)

_images/ex_plot_poissonized_plancherel.png

Poissonized Plancherel measure

See also

API

Implementation of exotic DPP objects:

class dppy.exotic_dpps.CarriesProcess(base=10)[source]

Bases: dppy.exotic_dpps.Descent

DPP on \(\{1, \dots, N-1\}\) (with a non symmetric kernel) derived from the cumulative sum of \(N\) i.i.d. digits in \(\{0, \dots, b-1\}\).

Parameters

base (int, default 10) – Base/radix

flush_samples()

Empty the list_of_samples attribute.

plot(vs_bernoullis=True, random_state=None)

Display the last realization of the process. If vs_bernoullis=True compare it to a sequence of i.i.d. Bernoullis with parameter _bernoulli_param

See also

sample(size=100, random_state=None)[source]

Compute the cumulative sum (in base \(b\)) of a sequence of i.i.d. digits and record the position of carries.

Parameters

size (int) – size of the sequence of i.i.d. digits in \(\{0, \dots, b-1\}\)

class dppy.exotic_dpps.DescentProcess[source]

Bases: dppy.exotic_dpps.Descent

DPP on \(\{1, \dots, N-1\}\) associated to the descent process on the symmetric group \(\mathfrak{S}_N\).

flush_samples()

Empty the list_of_samples attribute.

plot(vs_bernoullis=True, random_state=None)

Display the last realization of the process. If vs_bernoullis=True compare it to a sequence of i.i.d. Bernoullis with parameter _bernoulli_param

See also

sample(size=100, random_state=None)[source]

Draw a permutation \(\sigma \in \mathfrak{S}_N\) uniformly at random and record the descents i.e. \(\{ i ~;~ \sigma_i > \sigma_{i+1} \}\).

Parameters

size (int) – size of the permutation i.e. degree \(N\) of \(\mathfrak{S}_N\).

class dppy.exotic_dpps.PoissonizedPlancherel(theta=10)[source]

Bases: object

DPP on partitions associated to the Poissonized Plancherel measure

Parameters

theta (int, default 10) – Poisson parameter i.e. expected length of permutation

plot(title='')[source]

Display the process on the real line

Parameters

title (string) – Plot title

See also

plot_diagram(normalization=False)[source]

Display the Young diagram (russian convention), the associated sample and potentially rescale the two to visualize the limit-shape theorem [Ker96]. The sample corresponds to the projection onto the real line of the descending surface edges.

Parameters

normalization (bool, default False) – If normalization=True, the Young diagram and the corresponding sample are scaled by a factor \(\sqrt{\theta}\) and the limiting

sample(random_state=None)[source]

Sample from the Poissonized Plancherel measure.

Parameters

random_state (None, np.random, int, np.random.RandomState) –

class dppy.exotic_dpps.UST(graph)[source]

Bases: object

DPP on edges of a connected graph \(G\) with correlation kernel the projection kernel onto the span of the rows of the incidence matrix \(\text{Inc}\) of \(G\).

This DPP corresponds to the uniform measure on spanning trees (UST) of \(G\).

Parameters

graph (networkx graph) – Connected undirected graph

compute_kernel()[source]

Compute the orthogonal projection kernel \(\mathbf{K} = \text{Inc}^+ \text{Inc}\) i.e. onto the span of the rows of the vertex-edge incidence matrix \(\text{Inc}\) of size \(|V| \times |E|\).

In fact, for a connected graph, \(\text{Inc}\) has rank \(|V|-1\) and any row can be discarded to get an basis of row space. If we note \(A\) the amputated version of \(\text{Inc}\), then \(\text{Inc}^+ = A^{\top}[AA^{\top}]^{-1}\).

In practice, we orthogonalize the rows of \(A\) to get the eigenvectors \(U\) of \(\mathbf{K}=UU^{\top}\).

See also

compute_kernel_eig_vecs()[source]

See explaination in compute_kernel

flush_samples()[source]

Empty the list_of_samples attribute.

plot(title='')[source]

Display the last realization (spanning tree) of the corresponding UST object.

Parameters

title (string) – Plot title

See also

plot_graph(title='')[source]

Display the original graph defining the UST object

Parameters

title (string) – Plot title

See also

plot_kernel(title='')[source]

Display a heatmap of the underlying orthogonal projection kernel \(\mathbf{K}\) associated to the DPP underlying the UST object

Parameters

title (string) – Plot title

See also

sample(mode='Wilson', root=None, random_state=None)[source]

Sample a spanning of the underlying graph uniformly at random. It generates a networkx graph object.

Parameters
  • mode (string, default 'Wilson') –

    Markov-chain-based samplers:

    • 'Wilson', 'Aldous-Broder'

    Chain-rule-based samplers:

    • 'GS', 'GS_bis', 'KuTa12' from eigenvectors

    • 'Schur', 'Chol', from \(\mathbf{K}\) correlation kernel

  • root (int) – Starting node of the random walk when using Markov-chain-based samplers

  • random_state (None, np.random, int, np.random.RandomState) –

See also

class dppy.exotic_dpps.VirtualDescentProcess(x_0=0.5)[source]

Bases: dppy.exotic_dpps.Descent

This is a DPP on \(\{1, \dots, N-1\}\) with a non symmetric kernel appearing in (or as a limit of) the descent process on the symmetric group \(\mathfrak{S}_N\).

flush_samples()

Empty the list_of_samples attribute.

plot(vs_bernoullis=True, random_state=None)

Display the last realization of the process. If vs_bernoullis=True compare it to a sequence of i.i.d. Bernoullis with parameter _bernoulli_param

See also

sample(size=100, random_state=None)[source]

Draw a permutation uniformly at random and record the descents i.e. indices where \(\sigma(i+1) < \sigma(i)\) and something else…

Parameters

size (int) – size of the permutation i.e. degree \(N\) of \(\mathfrak{S}_N\).

See also

Todo

ask @kammmoun to complete the docsting and Section in see also

Bibliography

AKFT13

Raja Hafiz Affandi, Alex Kulesza, Emily B Fox, and Ben Taskar. Nyström Approximation for Large-Scale Determinantal Processes. In International Conference on Artificial Intelligence and Statistics (AISTATS), volume 31, 85–98. 2013. URL: http://proceedings.mlr.press/v31/affandi13a.

AM15

Ahmed El Alaoui and Michael W. Mahoney. Fast randomized kernel ridge regression with statistical guarantees. In Proceedings of the 28th International Conference on Neural Information Processing Systems, 775–783. Montreal, Canada, December 2015.

Ald90

David J Aldous. The Random Walk Construction of Uniform Spanning Trees and Uniform Labelled Trees. SIAM Journal on Discrete Mathematics, 3(4):450–465, nov 1990. URL: http://epubs.siam.org/doi/10.1137/0403039, doi:10.1137/0403039.

AGR16

Nima Anari, Shayan Oveis Gharan, and Alireza Rezaei. Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes. In Conference on Learning Theory (COLT), 103–115. New York, USA, 2016. PMLR. URL: http://proceedings.mlr.press/v49/anari16, arXiv:1602.05242.

AGaudilliere13

Luca Avena and Alexandre Gaudillière. On some random forests with determinantal roots. e-prints, 2013. URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.740.6173{\&}rep=rep1{\&}type=pdf.

BT05

Adrian Baddeley and Rolf Turner. spatstat : An R Package for Analyzing Spatial Point Patterns. Journal of Statistical Software, 12(6):1–42, jan 2005. URL: http://www.jstatsoft.org/v12/i06/, doi:10.18637/jss.v012.i06.

BH16

Rémi Bardenet and Adrien Hardy. Monte Carlo with Determinantal Point Processes. ArXiv e-prints, 2016. URL: http://arxiv.org/abs/1605.00361, arXiv:1605.00361.

Bor09

Alexei Borodin. Determinantal point processes. ArXiv e-prints, 2009. URL: http://arxiv.org/abs/0911.1153, arXiv:0911.1153.

BDF10

Alexei Borodin, Persi Diaconis, and Jason Fulman. On adding a list of numbers (and other one-dependent determinantal processes). Bulletin of the American Mathematical Society, 47(4):639–670, 2010. URL: http://www.ams.org/journals/bull/2010-47-04/S0273-0979-2010-01306-9/S0273-0979-2010-01306-9.pdf, arXiv:0904.3740.

BRW19

David Burt, Carl Edward Rasmussen, and Mark Van Der Wilk. Rates of Convergence for Sparse Variational Gaussian Process Regression. In International Conference on Machine Learning (ICML), 862–871. may 2019. URL: http://proceedings.mlr.press/v97/burt19a.html, arXiv:1903.03571.

CLV17

Daniele Calandriello, Alessandro Lazaric, and Michal Valko. Distributed adaptive sampling for kernel matrix approximation. In Artificial Intelligence and Statistics, 1421–1429. 2017.

DVJ03

Daryl J. Daley and David Vere-Jones. An Introduction to the Theory of Point Processes. Volume I: Elementary Theory and Methods. Probability and its Applications. Springer-Verlag New York, New York, USA, 2 edition, 2003. ISBN 0-387-95541-0. URL: http://link.springer.com/10.1007/b97277, doi:10.1007/b97277.

DFL13

Laurent Decreusefond, Ian Flint, and Kah Choon Low. Perfect Simulation of Determinantal Point Processes. ArXiv e-prints, 2013. URL: http://arxiv.org/abs/1311.1027, arXiv:1311.1027.

DWH18

Michal Derezinski, Manfred K Warmuth, and Daniel J Hsu. Leveraged volume sampling for linear regression. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 2505–2514. Curran Associates, Inc., 2018. URL: http://papers.nips.cc/paper/7517-leveraged-volume-sampling-for-linear-regression.pdf.

Derezinski19

Michał Dereziński. Fast determinantal point processes via distortion-free intermediate sampling. In Alina Beygelzimer and Daniel Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, 1029–1049. Phoenix, USA, 25–28 Jun 2019. PMLR. URL: http://proceedings.mlr.press/v99/derezinski19a.html.

DerezinskiCV19

Michal Dereziński, Daniele Calandriello, and Michal Valko. Exact sampling of determinantal point processes with sublinear time preprocessing. In Workshop on Negative Dependence in Machine Learning, International Conference on Machine Learning (ICML). 2019. URL: https://negative-dependence-in-ml-workshop.lids.mit.edu/wp-content/uploads/sites/29/2019/06/icml.pdf, arXiv:1905.13476.

DE15

Alexander Dubbs and Alan Edelman. Infinite Random Matrix Theory, Tridiagonal Bordered Toeplitz Matrices, and the Moment Problem. Linear Algebra and its Applications, 467:188–201, 2015. arXiv:1502.04931, doi:10.1016/j.laa.2014.11.006.

DE02

Ioana Dumitriu and Alan Edelman. Matrix Models for Beta Ensembles. Journal of Mathematical Physics, 43(11):5830–5847, 2002. URL: https://sites.math.washington.edu/{~}dumitriu/JMathPhys{\_}43{\_}5830.pdf, arXiv:0206043, doi:10.1063/1.1507823.

DB18

Christophe Dupuy and Francis Bach. Learning Determinantal Point Processes in Sublinear Time. In International Conference on Artificial Intelligence and Statistics (AISTATS), volume 84, 244–257. Lanzarote, Spain, 2018. PMLR. URL: http://proceedings.mlr.press/v84/dupuy18a, arXiv:1610.05925.

GBDK19

Mike Gartrell, Victor-Emmanuel Brunel, Elvis Dohmatob, and Syrine Krichene. Learning Nonsymmetric Determinantal Point Processes. ArXiv e-prints, may 2019. URL: http://arxiv.org/abs/1905.12962, arXiv:1905.12962.

GPK16

Mike Gartrell, Ulrich Paquet, and Noam Koenigstein. Low-Rank Factorization of Determinantal Point Processes for Recommendation. In AAAI Conference on Artificial Intelligence, 1912–1918. 2016. URL: https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14657/14354, arXiv:1602.05436.

GBV17

Guillaume Gautier, Rémi Bardenet, and Michal Valko. Zonotope hit-and-run for efficient sampling from projection DPPs. International Conference on Machine Learning (ICML), pages 1223–1232, may 2017. URL: http://proceedings.mlr.press/v70/gautier17a, arXiv:1705.10498.

GBV19

Guillaume Gautier, Rémi Bardenet, and Michal Valko. On two ways to use determinantal point processes for Monte Carlo integration. In Neural Information Processing Systems (NeurIPS). 2019. URL:.

GPBV19

Guillaume Gautier, Guillermo Polito, Rémi Bardenet, and Michal Valko. DPPy: DPP Sampling with Python. Journal of Machine Learning Research - Machine Learning Open Source Software (JMLR-MLOSS), in press, 2019.

Gau09

Walter Gautschi. How sharp is Bernstein’s Inequality for Jacobi polynomials? Electronic Transactions on Numerical Analysis, 36:1–8, 2009. URL: http://emis.ams.org/journals/ETNA/vol.36.2009-2010/pp1-8.dir/pp1-8.pdf.

Gil14

Jennifer Gillenwater. Approximate inference for determinantal point processes. PhD thesis, University of Pennsylvania, 2014. URL: https://repository.upenn.edu/edissertations/1285.

HKPVirag06

J. Ben Hough, Manjunath Krishnapur, Yuval Peres, and Bálint Virág. Determinantal Processes and Independence. In Probability Surveys, volume 3, 206–229. The Institute of Mathematical Statistics and the Bernoulli Society, 2006. URL: http://arxiv.org/abs/math/0503110, arXiv:0503110, doi:10.1214/154957806000000078.

Joh06

Kurt Johansson. Random matrices and determinantal processes. Les Houches Summer School Proceedings, 83(C):1–56, 2006. arXiv:0510038, doi:10.1016/S0924-8099(06)80038-7.

Kam18

Mohamed Slim Kammoun. Monotonous subsequences and the descent process of invariant random permutations. Electronic Journal of Probability, 2018. URL: https://projecteuclid.org/euclid.ejp/1543287754, arXiv:1805.05253, doi:10.1214/18-EJP244.

KDK16

Tarun Kathuria, Amit Deshpande, and Pushmeet Kohli. Batched Gaussian Process Bandit Optimization via Determinantal Point Processes. In Neural Information Processing Systems (NIPS), 4206–4214. 2016. URL: http://papers.nips.cc/paper/6452-batched-gaussian-process-bandit-optimization-via-determinantal-point-processes, arXiv:1611.04088.

Ker96

Sergei Kerov. A Differential Model Of Growth Of Young Diagrams. Proceedings of St.Petersburg Mathematical Society, 1996. URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.7744.

KN04

Rowan Killip and Irina Nenciu. Matrix models for circular ensembles. International Mathematics Research Notices, 2004(50):2665, 2004. URL: https://academic.oup.com/imrn/article-lookup/doi/10.1155/S1073792804141597, arXiv:0410034, doi:10.1155/S1073792804141597.

KT12

Alex Kulesza and Ben Taskar. Determinantal Point Processes for Machine Learning. Foundations and Trends in Machine Learning, 5(2-3):123–286, 2012. URL: http://arxiv.org/abs/1207.6083, arXiv:1207.6083, doi:10.1561/2200000044.

Konig04

Wolfgang König. Orthogonal polynomial ensembles in probability theory. Probab. Surveys, 2:385–447, 2004. URL: http://arxiv.org/abs/math/0403090, arXiv:0403090, doi:10.1214/154957805100000177.

LGD18

Claire Launay, Bruno Galerne, and Agnès Desolneux. Exact Sampling of Determinantal Point Processes without Eigendecomposition. ArXiv e-prints, feb 2018. URL: http://arxiv.org/abs/1802.08429, arXiv:1802.08429.

LMollerR12

Frédéric Lavancier, Jesper Møller, and Ege Rubak. Determinantal point process models and statistical inference : Extended version. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 77(4):853–877, may 2012. URL: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12096, arXiv:1205.4818, doi:10.1111/rssb.12096.

LJS16a

Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Efficient Sampling for k-Determinantal Point Processes. In International Conference on Artificial Intelligence and Statistics (AISTATS), 1328–1337. Cadiz, Spain, 2016. URL: http://proceedings.mlr.press/v51/li16f, arXiv:1509.01618.

LJS16b

Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast DPP Sampling for Nyström with Application to Kernel Methods. In International Conference on Machine Learning (ICML), 2061–2070. New York, USA, 2016. URL: http://proceedings.mlr.press/v48/lih16, arXiv:1603.06052.

LJS16c

Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling. In Neural Information Processing Systems (NIPS), 4188–4196. Barcelona, Spain, 2016. URL: https://papers.nips.cc/paper/6182-fast-mixing-markov-chains-for-strongly-rayleigh-measures-dpps-and-constrained-sampling, arXiv:1608.01008.

LJS16d

Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes. ArXiv e-prints, 2016. URL: http://arxiv.org/abs/1607.03559, arXiv:1607.03559.

Lyo02

Russell Lyons. Determinantal probability measures. Publications mathématiques de l’IHÉS, 98(1):167–212, apr 2002. URL: http://link.springer.com/10.1007/s10240-003-0016-0, arXiv:0204325, doi:10.1007/s10240-003-0016-0.

Mac75

Odile Macchi. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(01):83–122, 1975. URL: https://www.cambridge.org/core/product/identifier/S0001867800040313/type/journal{\_}article, doi:10.2307/1425855.

MCA19

Adrien Mazoyer, Jean-François Coeurjolly, and Pierre-Olivier Amblard. Projections of determinantal point processes. ArXiv e-prints, 2019. URL: https://arxiv.org/pdf/1901.02099.pdf, arXiv:1901.02099v3.

Mez06

Francesco Mezzadri. How to generate random matrices from the classical compact groups. Notices of the American Mathematical Society, 54:592–604, sep 2006. URL: http://arxiv.org/abs/math-ph/0609050, arXiv:0609050.

MollerW04

Jesper. Møller and Rasmus Plenge. Waagepetersen. Statistical inference and simulation for spatial point processes. Volume 23. Chapman & Hall/CRC, 2004. ISBN 1584882654. URL: https://www.crcpress.com/Statistical-Inference-and-Simulation-for-Spatial-Point-Processes/Moller-Waagepetersen/p/book/9781584882657, doi:10.1201/9780203496930.

PB11

Raj K. Pathria and Paul D. Beale. Statistical Mechanics. Academic Press, 2011. ISBN 0123821894. URL: http://linkinghub.elsevier.com/retrieve/pii/B9780123821881000207, doi:10.1016/B978-0-12-382188-1.00020-7.

Pou19

Jack Poulson. High-performance sampling of generic Determinantal Point Processes. ArXiv e-prints, apr 2019. URL: http://arxiv.org/abs/1905.00165, arXiv:1905.00165.

PW98

James Gary Propp and David Bruce Wilson. How to Get a Perfectly Random Sample from a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph. Journal of Algorithms, 27(2):170–217, may 1998. URL: https://www.sciencedirect.com/science/article/pii/S0196677497909172, doi:10.1006/JAGM.1997.0917.

RW06

Carl Edward. Rasmussen and Christopher K. I. Williams. Gaussian processes for machine learning. MIT Press, 2006. ISBN 026218253X. URL: http://www.gaussianprocess.org/gpml/.

RCCR18

Alessandro Rudi, Daniele Calandriello, Luigi Carratino, and Lorenzo Rosasco. On fast leverage score sampling and optimal learning. In Advances in Neural Information Processing Systems 31, pages 5672–5682. 2018.

Sos00

Alexander Soshnikov. Determinantal random point fields. Russian Mathematical Surveys, 55(5):923–975, feb 2000. URL: http://dx.doi.org/10.1070/RM2000v055n05ABEH000321, arXiv:0002099, doi:10.1070/RM2000v055n05ABEH000321.

TAB17

Nicolas Tremblay, Pierre-Olivier Amblard, and Simon Barthelme. Graph sampling with determinantal processes. In European Signal Processing Conference (EUSIPCO), 1674–1678. IEEE, aug 2017. URL: http://ieeexplore.ieee.org/document/8081494/, arXiv:1703.01594, doi:10.23919/EUSIPCO.2017.8081494.

TBA18

Nicolas Tremblay, Simon Barthelme, and Pierre-Olivier Amblard. Optimized Algorithms to Sample Determinantal Point Processes. ArXiv e-prints, feb 2018. URL: http://arxiv.org/abs/1802.08471, arXiv:1802.08471.

Wig67

Eugene P. Wigner. Random Matrices in Physics. SIAM Review, 9(1):1–23, 1967. doi:10.1137/1009001.