class: center, middle, inverse, title-slide #
Moments of Uniformly Random Multigraphs with Fixed Degree Sequences
### Phil Chodrow (MIT) --- <!-- This file was retrieved from file was retrieved from Danielle J. Navarro's repository "Robust Tools for Psychological Science" (https://github.com/djnavarro/robust-tools) */ --> <!-- layout: true --> <!-- <div class="my-footer"> --> <!-- <span> --> <!-- <a href="https://djnavarro.link/robust-tools" target="_blank">djnavarro.link/robust-tools</a> --> <!-- </span> --> <!-- </div> --> <!-- --- --> --- class: middle # .large[Two Classics] --- class: top # Modularity <br> `$$Q(\mathbf{G}) = \frac{1}{2m}\sum_{ij}\left[W_{ij} - \color{#C49377}{\frac{d_id_j}{2m}}\right]\delta(g_i, g_j)$$` Maximize this to find communities in networks (terms and conditions apply). **Question:** where does the .orange[second term] come from? .footnote[Newman + Girvan, "Finding and Evaluating Community Structure in Networks." *PRE*, (2003)] --- class: top # Spreading in Scale-Free Networks <br> *"The probability that a link points to a node with `\(s\)` links is proportional to * `\(\color{#C49377}{sP(s)}\)`." `\(\color{#C49377}{P(s)}\)` is the proportion of nodes of degree `\(s\)`. **Question:** how do we derive this probability `\(\color{#C49377}{P(s)}\)`? .footnote[Pastor-Satorras + Vespignani, "Epidemic Spreading in Scale-Free Networks." *PRL*, (2001)] --- class: middle .large[.orange[Standard argument]: "These claims are approximately true in a **random graph** with specified degree sequence **d**."] --- --- # A Folk Theorem .funding[ The expected number of edges `\(w_{ij}\)` between two nodes in **"random"** graph **"with degree sequence d"** is `$$w_{ij} = \frac{d_i d_j}{\sum_\ell d_\ell} + \text{small terms}\;.$$` ] -- This is right if you choose your random graph model .orange[conveniently]. - Chung-Lu model (**d** fixed in expectation) - Stub-matching (biased) --- class: top # Which random graph?... <br> We .orange[say]: "(entropy-maximizing) random graph." We .orange[mean]: "random graph that makes calculations convenient." -- **These are not the same.** .footnote[Fosdick et al., "Configuring Random Graph Models with Fixed Degree Sequences", *SIAM Review* (2018)] -- What if we *actually* used the entropy-maximizing random graph for calculations? --- class: middle # .large[The Uniform Model] --- # Which Random Graph? <br> Let `\(\mathcal{G}_\mathbf{d}\)` be the set of all **multigraphs** with fixed degree sequence `\(\mathbf{d}\)`. The **uniform random graph model** `\(\eta_\mathbf{d}\)` is a probability distribution that assigns equal weight to each element of `\(\mathcal{G}_\mathbf{d}\)`. `\(\eta_\mathbf{d}\)` is the entropy-maximizing random graph with fixed degree sequence `\(\mathbf{d}\)`. --- # Edge-Swap Monte Carlo <br> ![](img/swaps.png) **Idea**: sample from `\(\eta_\mathbf{d}\)` by crawling through `\(\mathcal{G}_\mathbf{d}\)` .footnote[Image from Fosdick et al., "Configuring Random Graph Models with Fixed Degree Sequences", *SIAM Review* (2018)] --- # Edge-Swap Monte Carlo ![](img/swaps.png) 1. **Sample edges** `\((i,j)\)` and `\((k,\ell)\)` with random orientations. 2. **Propose swap** `\((i,j), (k,\ell) \mapsto (i,\ell), (k,j)\)`. 3. **Accept swap** with probability `\(w_{ij}^{-1}w_{k\ell}^{-1}\)`. **Theorem:** This works.<sup>1</sup> .footnote[<sup>1</sup>Fosdick et al., "Configuring Random Graph Models with Fixed Degree Sequences", *SIAM Review* (2018)] --- class: top # Analysis Approach <br> Edge-Swap Monte Carlo is a stochastic process with known equilibrium distribution `\(\eta_\mathbf{d}\)`. At equilibrium, all **expectations** are fixed. `$$\mathbb{E}_\eta[f(G_{t+1}) - f(G_t)] = 0\;.$$` If we choose `\(f\)` carefully, we can get analytical formulae for expressions we care about. .footnote[PSC, "Moments of Uniformly Random Multigraphs with Fixed Degree Sequences, `arxiv:1909.09037` (2020)] --- # Expected Edge Indicators <br> Let `\(\beta_i = \mathbb{E}_\eta[\text{# of distinct neighbors of } i]\)`. .alert[ **Theorem**: `$$\chi_{ij}\triangleq \mathbb{E}_\eta[\mathbb{I}(W_{ij} > 0)] = \frac{\beta_i\beta_j}{\sum_\ell \beta_\ell} + \text{small terms}$$` ] .footnote[PSC, `arxiv:1909.09037` (2020)] --- # A Familiar Structure .funding[ **Folk Theorem**: `$$\omega_{ij}\triangleq \mathbb{E}_\eta[W_{ij}] = \frac{d_id_j}{\sum_\ell d_\ell} + \text{small terms}$$` ] .alert[ **Theorem**: `$$\chi_{ij}\triangleq \mathbb{E}_\eta[\mathbb{I}(W_{ij} > 0)] = \frac{\beta_i\beta_j}{\sum_\ell \beta_\ell} + \text{small terms}$$` ] --- # Expected Edge Densities <br> .alert[ **Theorem**: `$$\omega_{ij}\triangleq \mathbb{E}_\eta[W_{ij}] = \frac{\chi_{ij}}{1 - \chi_{ij}} + \text{small terms}$$` ] Can also get higher-order moments: variances, etc. .footnote[PSC, `arxiv:1909.09037` (2020)] --- # Determining `\(\beta\)` The vector `\(\beta\)` approximately solves `$$\sum_{j}\frac{\beta_i\beta_j}{\sum_\ell\beta_\ell - \beta_i\beta_j} = d_i \quad \forall i\;.$$` .alert[ **Theorem**: Solution (if it exists) is unique on the set `$$\mathcal{B} = \left\{\beta: \beta \geq \mathbf{0},\; \beta_i^2 < \sum_\ell \beta_\ell \quad \forall i\right\}\;.$$` ] .footnote[PSC, `arxiv:1909.09037` (2020)] --- class: middle # .large[Experiments] --- # High School Contact Data Collected by the SocioPatterns project. .centered-image[![](img/sociopatterns.jpg)] .footnote[Mastrandrea et al. (2015), Benson et al. (2018)] --- background-image: url(img/eval_figs.png) background-size: contain ## Evaluating Approximations .footnote[PSC, `arxiv:1909.09037` (2020)] --- background-image: url(img/error_matrices.png) background-size: contain ## Estimating `\(\mathbb{E}[W]\)` .footnote[PSC, `arxiv:1909.09037` (2020)] --- background-image: url(img/modularity_high_school.png) background-size: contain ## Downstream Consequences: Modularity Maximization .footnote[PSC, `arxiv:1909.09037` (2020)] --- # Summing Up **Rigor** means - Carefully articulating our models - Checking that our heuristics work. Entropy-maximizing random graphs with fixed degree sequences break our standard heuristics. Using a dynamical approach, we get **rigorous approximations** with **excellent performance** on real data. **#MethodsMatter** for downstream analysis. --- # Thanks! .pull-left-narrow[.thanks[ .centered-image[![:scale 100%](img/bio-photo.jpg)] Phil <br> Chodrow .small[ MIT `\(\rightarrow\)` UCLA [philchodrow.com](https://www.philchodrow.com) [@philchodrow](https://twitter.com/PhilChodrow) ] ]] .pull-right-wide[.centered-image[.alert[ ![](img/arxiv.png) `arxiv:1909.09037` ] .funding[ ![:scale 120px](img/nsf_logo.png) ![:scale 220px](img/mit_logo.png)] ]]