Moments of Uniformly Random Multigraphs with Fixed Degree Sequences

# Moments of Uniformly Random Multigraphs with Fixed Degree Sequences
### Phil Chodrow (MIT)

---

---

# .large[Two Classics]

---

# Modularity

`$$Q(\mathbf{G}) = \frac{1}{2m}\sum_{ij}\left[W_{ij} - \color{#C49377}{\frac{d_id_j}{2m}}\right]\delta(g_i, g_j)$$`

Maximize this to find communities in networks (terms and conditions apply).

**Question:** where does the .orange[second term] come from?

---

# Spreading in Scale-Free Networks

*"The probability that a link points to a node with `$s$` links is proportional to * `$\color{#C49377}{sP(s)}$`."

`$\color{#C49377}{P(s)}$` is the proportion of nodes of degree `$s$`.

**Question:** how do we derive this probability `$\color{#C49377}{P(s)}$`?

---

.large[.orange[Standard argument]: "These claims are approximately true in a **random graph** with specified degree sequence **d**."]

---
---

# A Folk Theorem

.funding[
The expected number of edges `$w_{ij}$` between two nodes in **"random"** graph **"with degree sequence d"** is

`$$w_{ij} = \frac{d_i d_j}{\sum_\ell d_\ell} + \text{small terms}\;.$$`
]

This is right if you choose your random graph model .orange[conveniently].

- Chung-Lu model (**d** fixed in expectation)
- Stub-matching (biased)

---

# Which random graph?...

We .orange[say]: "(entropy-maximizing) random graph."

We .orange[mean]: "random graph that makes calculations convenient."

**These are not the same.**

.footnote[Fosdick et al., "Configuring Random Graph Models with Fixed Degree Sequences", *SIAM Review* (2018)]

What if we *actually* used the entropy-maximizing random graph for calculations?

---
class: middle

# .large[The Uniform Model]

---

# Which Random Graph?

Let `$\mathcal{G}_\mathbf{d}$` be the set of all **multigraphs** with fixed degree sequence `$\mathbf{d}$`.

The **uniform random graph model** `$\eta_\mathbf{d}$` is a probability distribution that assigns equal weight to each element of `$\mathcal{G}_\mathbf{d}$`.

`$\eta_\mathbf{d}$` is the entropy-maximizing random graph with fixed degree sequence `$\mathbf{d}$`.

---

# Edge-Swap Monte Carlo

![](img/swaps.png)

**Idea**: sample from `$\eta_\mathbf{d}$` by crawling through `$\mathcal{G}_\mathbf{d}$`

.footnote[Image from Fosdick et al., "Configuring Random Graph Models with Fixed Degree Sequences", *SIAM Review* (2018)]

---

# Edge-Swap Monte Carlo

![](img/swaps.png)

1. **Sample edges** `$(i,j)$` and `$(k,\ell)$` with random orientations. 
2. **Propose swap** `$(i,j), (k,\ell) \mapsto (i,\ell), (k,j)$`.  
3. **Accept swap** with probability `$w_{ij}^{-1}w_{k\ell}^{-1}$`.

**Theorem:** This works.1

.footnote[1Fosdick et al., "Configuring Random Graph Models with Fixed Degree Sequences", *SIAM Review* (2018)]

---

# Analysis Approach

Edge-Swap Monte Carlo is a stochastic process with known equilibrium distribution `$\eta_\mathbf{d}$`.

At equilibrium, all **expectations** are fixed.

`$$\mathbb{E}_\eta[f(G_{t+1}) - f(G_t)] = 0\;.$$`

If we choose `$f$` carefully, we can get analytical formulae for expressions we care about.

.footnote[PSC, "Moments of Uniformly Random Multigraphs with Fixed Degree Sequences, `arxiv:1909.09037` (2020)]
---

# Expected Edge Indicators

Let `$\beta_i = \mathbb{E}_\eta[\text{# of distinct neighbors of } i]$`.

.alert[
**Theorem**: `$$\chi_{ij}\triangleq \mathbb{E}_\eta[\mathbb{I}(W_{ij} > 0)] = \frac{\beta_i\beta_j}{\sum_\ell \beta_\ell} + \text{small terms}$$`
]

---

# A Familiar Structure

.funding[
**Folk Theorem**: `$$\omega_{ij}\triangleq \mathbb{E}_\eta[W_{ij}] = \frac{d_id_j}{\sum_\ell d_\ell} + \text{small terms}$$`
]

.alert[
**Theorem**: `$$\chi_{ij}\triangleq \mathbb{E}_\eta[\mathbb{I}(W_{ij} > 0)] = \frac{\beta_i\beta_j}{\sum_\ell \beta_\ell} + \text{small terms}$$`
]

---

# Expected Edge Densities

.alert[
**Theorem**: `$$\omega_{ij}\triangleq \mathbb{E}_\eta[W_{ij}] = \frac{\chi_{ij}}{1 - \chi_{ij}} + \text{small terms}$$` 
]

Can also get higher-order moments: variances, etc.

---

# Determining `$\beta$`

The vector `$\beta$` approximately solves

`$$\sum_{j}\frac{\beta_i\beta_j}{\sum_\ell\beta_\ell - \beta_i\beta_j} = d_i \quad \forall i\;.$$`

.alert[
**Theorem**: Solution (if it exists) is unique on the set 
`$$\mathcal{B} = \left\{\beta: \beta \geq \mathbf{0},\; \beta_i^2 < \sum_\ell \beta_\ell \quad \forall i\right\}\;.$$`
]

# .large[Experiments]

---

# High School Contact Data

Collected by the SocioPatterns project.

---

## Evaluating Approximations

---

## Estimating `$\mathbb{E}[W]$`

---

## Downstream Consequences: Modularity Maximization

---

# Summing Up

**Rigor** means 
- Carefully articulating our models 
- Checking that our heuristics work.

Entropy-maximizing random graphs with fixed degree sequences break our standard heuristics.

Using a dynamical approach, we get **rigorous approximations** with **excellent performance** on real data.

**#MethodsMatter** for downstream analysis.

---

# Thanks!

.pull-left-narrow[.thanks[
.centered-image[![:scale 100%](img/bio-photo.jpg)]
Phil Chodrow
.small[
 
 MIT `$\rightarrow$` UCLA
 
 [philchodrow.com](https://www.philchodrow.com)

[@philchodrow](https://twitter.com/PhilChodrow)
]

]]
.pull-right-wide[.centered-image[.alert[
![](img/arxiv.png)
`arxiv:1909.09037`
]
.funding[
![:scale 120px](img/nsf_logo.png)
![:scale 220px](img/mit_logo.png)]
]]