Random Graphs Basics
Random graphs study what a graph typically looks like when edges are chosen by chance. Instead of analyzing one fixed graph, we analyze a probability distribution over many graphs. This shift makes it possible to prove existence results, estimate thresholds for properties, and model large networks whose exact structure is not known in advance.
The basic model is the Erdos-Renyi graph : start with labelled vertices and include each possible edge independently with probability . As increases from to , the graph evolves from isolated vertices to a dense graph, and properties such as containing a triangle, being connected, or having a giant component appear around predictable thresholds.
Definitions
The random graph has vertex set . Each of the possible edges is included independently with probability .
The related model chooses uniformly from all labelled graphs with exactly edges.
For a graph property , a function is a threshold if the probability that has changes from near to near around that scale.
A graph property is monotone increasing if adding edges cannot destroy it. Connectivity, containing a triangle, and having a perfect matching are increasing properties.
The expected value of a random variable is denoted . Indicator variables are the main counting tool: if , then
Key results
Expected number of edges. In ,
Each possible edge contributes with probability .
Expected degree. For any fixed vertex ,
The degree has binomial distribution .
Expected triangles. The expected number of triangles is
Each triple of vertices forms a triangle exactly when its three edges are present.
Connectivity threshold. The threshold for connectivity in is around
Below this scale isolated vertices are likely; above it the graph is likely connected.
Giant component threshold. Around , a component of linear size emerges.
Linearity of expectation does not require independence. Independence is needed for many probability calculations, but not for expected counts formed by indicators. To count expected triangles, define one indicator for each vertex triple. Even though triangle events overlap and are not independent, the expected total is still the sum of the individual probabilities.
First moment method. If counts bad objects and , then the probability that any bad object exists tends to . This follows from Markov's inequality:
This method is often used to show that a sparse random graph likely has no copy of a fixed subgraph.
Probabilistic method. If a random graph has positive probability of satisfying a property, then at least one graph with that property exists. This is a nonconstructive existence proof. Ramsey lower bounds, high-girth high-chromatic graphs, and extremal examples are often approached this way.
Almost surely. A property holds asymptotically almost surely if its probability tends to as . Random graph thresholds are asymptotic statements, so finite examples may deviate from the threshold behavior.
Variance and concentration. Expected value gives a center of mass, but it does not by itself say that most samples are close to that value. To prove concentration, one often studies variance or uses inequalities such as Chernoff bounds. For example, the number of edges in is binomial with parameters and , so it is sharply concentrated around when that expectation is large.
Subgraph thresholds. For a fixed graph , the expected number of labelled copies of in is roughly
up to constants depending on automorphisms. This heuristic predicts the scale at which copies of begin to appear. For triangles, setting near gives .
Random graphs as examples. Random graphs often show that a proposed deterministic statement is false. If a random graph has a property with positive probability, then some concrete graph has that property, even if the proof does not display it. This is the probabilistic method in its most basic form.
Visual
This is one possible outcome of . The model is not this graph; it is the probability distribution that could have produced it.
| Scale of | Typical behavior in |
|---|---|
| mostly small tree components | |
| giant component begins to appear | |
| isolated vertices disappear; connectivity emerges | |
| constant | dense graph with many small subgraphs |
| complete graph |
Worked example 1: Expected edges and triangles
Problem. In , compute the expected number of edges and expected number of triangles.
Method.
There are
possible edges. Each appears with probability , so
For triangles, there are
vertex triples. A fixed triple forms a triangle if all three internal edges appear, which has probability
Therefore
Checked answer. The expected number of edges is , and the expected number of triangles is .
The expectation does not mean the graph must contain a triangle. It means that over many independent samples, the average triangle count approaches . Some samples have no triangles, some have one, and occasionally a sample has several overlapping triangles.
Worked example 2: Probability a vertex is isolated
Problem. In , find the probability that vertex is isolated. Then evaluate it for and .
Method.
- Vertex has possible incident edges.
- A particular incident edge is absent with probability .
- The choices are independent.
- Therefore
For and ,
Compute approximately:
So vertex has about an chance of being isolated.
Check. The expected degree is , so isolation is possible but not dominant. The probability is plausible.
Code
import random
from collections import deque
def random_graph(n, p, seed=None):
rng = random.Random(seed)
adj = {i: set() for i in range(n)}
for i in range(n):
for j in range(i + 1, n):
if rng.random() < p:
adj[i].add(j)
adj[j].add(i)
return adj
def component_sizes(adj):
unseen = set(adj)
sizes = []
while unseen:
start = unseen.pop()
q = deque([start])
size = 1
while q:
u = q.popleft()
for v in adj[u]:
if v in unseen:
unseen.remove(v)
q.append(v)
size += 1
sizes.append(size)
return sorted(sizes, reverse=True)
G = random_graph(20, 0.1, seed=7)
print(sum(len(nbrs) for nbrs in G.values()) // 2)
print(component_sizes(G))
Changing the seed changes the sampled graph but not the model. For experiments, it is better to run many samples and average statistics such as edge count, largest component size, number of isolated vertices, or triangle count. One sample can illustrate a definition; many samples reveal the distribution.
A useful simulation habit is to compare empirical averages with the formulas above. For example, repeated samples of should have average edge count close to . If they do not, the most likely error is checking each unordered pair twice or using the wrong probability.
When using random graphs for proofs, state whether the claim is exact, high probability, or positive probability. These are different levels of conclusion. Positive probability proves existence; high probability describes typical behavior as grows.
That wording keeps asymptotic intuition separate from finite numerical claims.
It also makes simulations easier to interpret responsibly.
Random graph calculations should specify the random variable before computing its expectation or probability. "The expected number of triangles" means a sum of indicators over vertex triples; "the probability of a triangle" means at least one triangle exists. These are related but not the same, and confusing them is a common source of incorrect threshold reasoning.
Common pitfalls
- Treating as one graph rather than a probability distribution over graphs.
- Forgetting independence when computing probabilities. Edge events are independent in , but many graph properties are not independent.
- Confusing with . One fixes edge probability; the other fixes the exact edge count.
- Assuming expected value means typical value in every regime. Concentration needs separate proof.
- Using the connectivity threshold as an exact finite- cutoff. It is an asymptotic statement.
- Ignoring isolated vertices when reasoning about connectivity near .