Mathematical Preliminaries
Theory of computation is written in the language of discrete mathematics. The machines are finite descriptions, the inputs are strings, the behaviors are languages, and the proofs reason about sets, functions, relations, graphs, and logical statements. A student who can manipulate these objects cleanly will find automata theory much less mysterious because most machine definitions are just tuples of sets and functions.
The aim of this page is not to repeat a full discrete-math course. It is to collect the notation and habits that appear constantly in automata, computability, and complexity. The most important shift is to treat syntax as data. Strings are mathematical objects; encodings of graphs and programs are strings too; and a decision problem is represented by the set of strings whose answer is yes.
Definitions
A set is an unordered collection of distinct objects. Write when is an element of , when every element of is also in , for union, for intersection, and for complement relative to an understood universe. The power set is the set of all subsets of .
A sequence is an ordered list. A finite sequence is often called a tuple. Order and repetition matter in sequences, so and are different, and is different from . The Cartesian product is the set of ordered pairs with and .
A function assigns each element of exactly one element of . The set is the domain and is the codomain. A relation from to is any subset of . Relations may be one-to-many or partial; functions are total and single-valued unless explicitly stated otherwise.
A graph is a set of vertices together with edges. In an undirected graph an edge is an unordered pair of vertices; in a directed graph it is an ordered pair. Computability and complexity use graphs both as examples of mathematical structures and as encoded inputs to decision problems such as reachability, clique, Hamiltonian path, and generalized geography.
An alphabet is a finite nonempty set of symbols. A string over is a finite sequence of symbols from . The empty string is , and denotes all finite strings over . A language over is any subset of . Concatenation joins strings, and denotes the reverse of .
Boolean logic uses connectives such as , , , , and . A truth assignment gives each variable a value. A Boolean formula is satisfiable if at least one assignment makes it true and a tautology if every assignment makes it true.
Key results
Many definitions in the course are finite tuples. A DFA is a 5-tuple , a CFG is a 4-tuple , and a Turing machine is a tuple containing states, alphabets, a transition function, and halting states. The tuple format is not decoration. It says exactly what data determine the object and which parts must be checked when proving a construction valid.
Set operations on languages are central. If , then , , and are languages. Concatenation is , and star is . Closure theorems ask whether a family of languages remains inside the family after these operations.
Encodings connect abstract objects to strings. A graph, automaton, grammar, formula, or Turing machine can be written as a string over a fixed alphabet. Once an object is encoded, a decision problem becomes a language. For example, the graph-reachability problem can be written as has a path from to . The angle brackets indicate a standard effective encoding, not a special alphabet symbol.
Boolean formulas bridge computation and complexity. The satisfiability problem asks whether a formula has a satisfying assignment. Its importance comes from representing the local choices of computations. In the Cook-Levin theorem, a tableau of a nondeterministic polynomial-time computation is encoded by Boolean variables and constraints.
The distinction between object language and metalanguage is worth keeping explicit. When we write a string such as 0011, the characters are part of the object being studied. When we write a sentence such as "the string has two zeros," we are speaking in the metalanguage about that object. Encodings connect the two levels: a graph or machine is a mathematical object, but its code is a string that another machine can read. Many mistakes in computability proofs come from blurring these levels, especially when a machine is given its own description as input.
Tuples also carry arity information that sets do not. The transition function of a DFA takes a pair , where the first coordinate must be a state and the second coordinate must be an input symbol. Swapping the coordinates is not meaningful even if both are ultimately encoded as strings. Similarly, an encoded triple must be parsed as a graph plus two vertices. A decider for a language of triples may first reject malformed encodings; after that, it reasons about the decoded object.
Closure operations on languages are not merely notation. They correspond to machine constructions. Union corresponds to "accept if either submachine accepts." Intersection corresponds to "accept if both accept." Complement corresponds to "swap accepting and rejecting outcomes," which is safe for deciders and complete DFAs but dangerous for recognizers that may loop. Concatenation and star correspond to splitting an input into pieces; nondeterminism is often the cleanest way to guess those split points.
Boolean logic should be read both syntactically and semantically. Syntactically, a formula is a finite string or tree built from variables and connectives. Semantically, it defines a truth value under each assignment. SAT asks whether at least one assignment works; TQBF asks whether a quantified formula is true under alternating existential and universal choices. This movement from syntax to semantics mirrors the rest of the course: the input is a finite string, but the question is about the object that the string denotes.
Visual
| Object | Notation | Role in theory of computation |
|---|---|---|
| Alphabet | finite set of symbols used for inputs | |
| String | one finite input instance | |
| Language | yes-instances of a decision problem | |
| Function | transition rule, reduction, encoding | |
| Graph | common input type for P, NP, PSPACE, NL | |
| Formula | satisfiability and computation encoding |
Worked example 1: Turning a graph question into a language
Problem. Let PATH be the problem "given a directed graph and vertices , is there a directed path from to ?" Write it as a language and test one instance.
Method. Choose an encoding and identify the yes-instances.
- Represent the input triple by . The exact binary encoding can list the vertices, then the directed edges, then the two distinguished vertices.
- Define
- Consider and with and .
- The path exists, so .
- If and , no listed edge leaves , so .
Checked answer. The language contains exactly the encodings whose graph has the requested reachability property.
Worked example 2: Computing language operations
Problem. Let and . Compute and .
Method. Use the definitions directly.
- Concatenation forms every with and .
- Pair with to get .
- Pair with to get .
- Pair with to get .
- Pair with to get .
- Therefore .
- Now . The four products are , , , and .
Checked answer. and . Notice that acts as an identity under concatenation, but it is present only when the language explicitly contains it.
Code
from itertools import product
def concat_language(A, B):
return {x + y for x, y in product(A, B)}
def star_up_to(A, max_pieces):
result = {""}
current = {""}
for _ in range(max_pieces):
current = concat_language(current, A)
result |= current
return result
A = {"0", "11"}
B = {"", "1"}
print(concat_language(A, B))
print(star_up_to(A, 3))
Common pitfalls
- Confusing a symbol with a string of length one. In formal notation they are related but not identical.
- Forgetting that is a string and is a language with no strings.
- Treating as infinite but vague. It is the precise set of all finite strings over , including .
- Using set braces when order matters. A transition such as is an ordered pair, not a two-element set.
- Assuming encodings are magical. A proof involving must rely on an effective, finite representation.
Connections
- Proof styles are developed in proof methods and countability.
- Formal automata begin in finite automata and DFAs.
- Encoded machines become central in decidability and the halting problem.
- Boolean formulas return in NP-completeness and classic reductions.