Reflections on Equality

Posted on October 30, 2020

When shopping for a dependent type theory, many factors should be taken into consideration: how inductive data is represented (inductive schemas vs W-types), how inductive data computes (eliminators vs case trees), how types of types are represented (universes à la Tarski vs à la Russell). However, the most important is their treatment of equality.

Conor McBride, a prominent figure in type theory research, noted in a reddit comment that you should never trust a type theorist who has not changed their mind about equality (I’m paraphrasing). Recently, I’ve embarked on a journey to improve my credibility (or, at least, get out of the “instantly discarded opinion” pool): I’ve changed my mind about equality.

What’s the fuss about?

Equality is very prevalent when using dependently typed languages, whether as frameworks for writing mathematical proofs or for writing verified computer programs. Most properties of mathematical operators are expressed as equalities which they should respect. For example, a semigroup is a set SS with an associative ×\times operator. The property of associativity just means the operator respects the equality a×(b×c)(a×b)×ca \times (b \times c) \equiv (a \times b) \times c!

However, the equality relation which can be expressed with proofs is not the only equality which a dependently-typed system needs to consider. There’s also the judgemental equality, that is, which terms are always identified independently of their semantics. For example, a+ba + b and b+ab + a are propositionally equal, which can be shown by writing an inductive proof of this fact, but they’re judgmentally different, because they have different term structure.

There are two ‘main’ traditions of Martin-Löf dependent type theory: the intensional type theories are as in the paragraph above, but the extensional type theories make me a liar. Extensional type theory is obtained by adding a rule of equality reflection, which collapses the distinction between propositional and judgemental equality: whenever there exists a proof of an equality between terms xx and yy, they’re considered judgmentally equal.

Adding equality reflection makes a type system more expressive with respect to equality: for example, in extensional type theories, one can derive the rule of function extensionality, which says two functions are equal when they are equal pointwise. However, equality reflection also makes a type system less expressive with respect to equality: there is only one way for two things to be equal!

Moreover, equality reflection complicates type checking to an unacceptable degree. Rather than being able to check the validity of a proof by comparing a term against a known type, an entire typing derivation is necessary as input to the type checker. To see why, consider the following derivation:

(+):NNNno: "foo":Strno: no:no: absurd(no):StrNno: Str=Nno: "foo":N2:Nno:(2+"foo"):Nλno(2+"foo"):N \cfrac{ \cfrac{ \cfrac{}{\mathop{\mathop{\vdash}} (+) \mathop{:} \mathbb{N} \to \mathbb{N} \to \mathbb{N}} \quad \cfrac{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \mathtt{"foo"} \mathop{:} \text{Str} \quad \cfrac{ \cfrac{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \text{no} \mathop{:} \bot }{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \mathrm{absurd}(\text{no}) \mathop{:} \text{Str} \equiv \mathbb{N} } }{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \text{Str} = \mathbb{N} } }{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \mathtt{"foo"} \mathop{:} \mathbb{N} } \quad \cfrac{}{\mathop{\mathop{\vdash}} 2 \mathop{:} \mathbb{N}} }{ \text{no} \mathop{:} \bot \mathop{\mathop{\vdash}} (2 + \mathtt{"foo"}) \mathop{:} \mathbb{N} } }{ \mathop{\mathop{\vdash}} \lambda \text{no} \to (2 + \mathtt{"foo"}) : \bot \to \mathbb{N} }

If the context contains an element of the empty type, every term needs to be accepted

Here, the context contains an element of the empty type, written \bot. It’s comparatively easy for the type checker to see that in this case, the context we’re working under is absurd, and any equality should be accepted. However, there are many such empty types: Fin(0)\mathrm{Fin}(0), for example. Consider the type family SAT\mathrm{SAT}, indexed by a Boolean clause, such that SAT(c)\mathrm{SAT}(c) reduces to \top when cc is satisfiable and \bot otherwise. How would you type-check the program λx2+"foo"\lambda x \to 2 + \mathtt{"foo"} at type SAT(c)N\mathrm{SAT}(c) \to \mathbb{N}, where cc is some adversarially-chosen, arbitrarily complex expression? How would you type check it at type Halts(m)N\mathrm{Halts}(m) \to \mathbb{N}?

In contrast, Intensional Type Theory, or ITT for short, treats equality as if it were any other type. In ITT, equality is inductively generated by the constructor reflx\mathrm{refl}_x for any element xx of a type AA, which leads to an induction principle saying that, if

we can deduce, for all x,y:Ax, y : A and p:xAyp : x \equiv_{A} y, that C(x,y,p)C(x, y, p) holds.

Given this operator, generally called axiom J (since “J” is the letter after Identity, another word for “equality”), we can derive many of the properties of equality: transitivity (given xyx \equiv y and yzy \equiv z, get xzx \equiv z) and symmetry (given xyx \equiv y, get yxy \equiv x) make \equiv an equivalence relation, and substitutivity (assuming P(x)P(x) and xyx \equiv y, get P(y)P(y)) justifies calling it “equality”.

However, axiom J is both weaker and stronger than equality reflection: for one, it doesn’t let us prove that functions are equal when they are pointwise equal, which leads to several complications. However, it also doesn’t let us prove that all equalities are equal to refl\mathrm{refl}, which lets us strengthen equality by postulating, for example,

The univalence axiom of Voevodsky

The “tagline” for the univalence axiom, which lies at the center of Homotopy Type Theory (HoTT), is that “equality is equivalent to equivalence”. More specifically, given a function f:ABf : A \to B, together with a proof that ff has left and right inverses, univalence gives us an equality ua(f)\mathrm{ua}(f) such that transporting “along” this path is the same as applying ff.

For example, we could define the “usual” unary naturals N\mathbb{N} (which are easy to use for proving but terrible computationally) and binary naturals N2\mathbb{N}_2 (which have more efficient computational behaviour at the cost of more complicated structure), demonstrate an equivalence peano2Binary:NN2\mathrm{peano2Binary} : \mathbb{N} \cong \mathbb{N}_2, then transport proofs about N\mathbb{N} to proofs about N2\mathbb{N}_2!

A recent paper by Tabareau et al explores the consequences of strengthening a type theory with univalence together with parametricity, unlocking efficient and automated (given the equivalence) transport between algebraic structures of types.

A note on terminology The HoTT interpretation of “types as topological spaces” leads us to interpret the type of equalities as the type of paths in a space. From this, we get some terminology: instead of saying “cast xx with the equality ee”, we can equivalently say “transport xx along the path pp”.

Using the “path” terminology in a context with unicity of equality proofs is a bit misleading, but the terminology does not break down (a set-truncated system is just one where all paths are loops). Because of this, I’ll use “path” and “equality” interchangeably. Sorry if this causes any confusion.

The undecidability of type checking ETT and the comparative weakness of ITT has led many researchers to consider the following question:

Can we have:

  • Decidable type checking
  • A “more extensional” equality
  • Good computational behaviour

All at the same time?

Turns out, the answer is yes!

Observational Equality

One early positive answer is that of Observational Type Theory, presented in a 2007 paper by Altenkirch, McBride, and Swierstra. The basic idea is that, instead of defining the type aAba \equiv_{A} b as an inductive family of types, we define it so that equality computes on the structure of AA, aa and bb to reduce to tractable forms. Observational type theory (OTT, from now on) has two universes (types whose elements are types), Prop\mathrm{Prop} and Set\mathrm{Set}, such that Prop:Set\mathrm{Prop} : \mathrm{Set} and Set:Set\mathrm{Set} : \mathrm{Set}1

The elements of Prop\mathrm{Prop} are taken to be propositions, not in the sense of propositions-as-types, but in the sense of HoTT. Propositions are the types TT for which, given x,y:Tx, y : T, one has xyx \equiv y: they’re types with at most one element. Some propositions are \top, the trivially true proposition, and \bot, the empty proposition.

Given A:uA : u, and some B:vB : v with one variable x:Ax : A free (where uu and vv are possibly distinct universes), we can form the type x:AB\prod_{x : A} B of dependent products from AA to BB. If vv is in Prop\mathrm{Prop}, then the dependent product also lives in Prop\mathrm{Prop}. Otherwise, it lives in Set\mathrm{Set}. Moreover, we can also form the type x:AB\sum_{x : A} B of dependent sums of AA and BB, which always lives in Set\mathrm{Set}.

Given a type AA and two elements a,b:Aa, b : A, we can form the proposition aAba \equiv_{A} b. Note the emphasis on the word proposition here! Since equivalence is a proposition, we have uniqueness of equality proofs by definition: there’s at most one way for things to be equal, conflicting with univalence. So we get some extensionality, namely of functions, but not for arbitrary types. Given types AA and BB, a proof p:ABp : A \equiv B and x:Ax : A, we have the term coe(A,B,P,x):B\mathrm{coe}(A, B, P, x) : B, which represents the coercion of xx along the path pp.

Here is where my presentation of observational equality starts to differentiate from the paper’s: McBride et al use heterogeneous equality, i.e. a 4-place relation (x:A)(y:B)(x : A) \equiv (y : B), where AA and BB are potentially distinct types. But! Their system only allows you to use an equality when ABA \equiv B. The main motivation for heterogeneous equality is to “bunch up” as many equalities as possible to be eliminated all in one go, since coercion in their system does not compute. However, if coercion computes normally, then we don’t need to do this “bunching”: one can just use coercion normally.

The key idea of OTT is to identify as “equal” objects which support the same observations: for functions, observation is application; for pairs, it’s projection, etc. This is achieved by making the definition of equality acts as a “pattern matching function” on the structure of terms and types. For example, there is a rule which says an equality between functions is a function that returns equalities:

f(x:A)B(x)g(x:A)(f x)B(x)(g x) \cfrac{}{f \equiv_{(x : A) \to B(x)} g \longrightarrow (x : A) \to (f\ x) \equiv_{B(x)} (g\ x)}

Equality of functions is extensional by definition

So, not only do we have a term funext of type ((x : A) → f x == g x) → f == g but one with a stronger type, namely ((x : A) → f x == g x) == (f == g), and that term is.. refl!

OTT is appropriate, in my opinion, for doing set-level mathematics: where types have no “interesting” equality structure. However, it breaks down at higher h-levels, where there is interesting structure to be found in the equalities between elements of types. This is because OTT, by placing the \equiv type in its universe of propositions, validates the principle of uniqueness of identity proofs, which says any two proofs of the same equality are themselves equal. UIP conflicts with the univalence axiom of Homotopy Type Theory, by (to use the HoTT terminology) saying that all types are sets.

Theorem. Suppose there exists one universe, U\mathscr{U}, which contains the type of Booleans. Assuming univalence, the type aUba \equiv_{\mathscr{U}} b is not a proposition.

Proof. The function not:22\mathrm{not} : 2 \to 2, which maps tt\mathtt{tt} to ff\mathtt{ff} and vice-versa, is an equivalence, being its own left and right inverses. Thus, by univalence, we have a path ua(not):2U2\mathrm{ua}(\mathrm{not}) : 2 \equiv_{\mathscr{U}} 2.

To see that this path is different from refl2\mathrm{refl}_2, consider its behaviour with respect to transport: transp(refl2,tt)tt\mathrm{transp}(\mathrm{refl}_2, \mathtt{tt}) \equiv tt but transp(ua(not),tt)ff\mathrm{transp}(\mathrm{ua}(\mathrm{not}), \mathtt{tt}) \equiv \mathtt{ff}. Since tt\mathtt{tt} is different from ff\mathtt{ff}, it follows that ua(not)\mathrm{ua}(\mathrm{not}) is different from refl2\mathrm{refl}_2. \blacksquare

So, if having univalence is desirable, OTT is off the table. However, the previous example of transporting proofs between equivalent types of natural numbers might not have convinced you that HoTT is indeed an interesting field of study, and univalence might seem mostly like a novelty, a shiny thing to pursue for its own sake (it certainly did to me, at first). So why is HoTT interesting?


Between 2012 and 2013, a special year of research took place in the Institute for Advanced Studies to develop a type theory that can be used as a foundation for mathematics “at large”. Their result: the book Homotopy Type Theory: Univalent Foundations for Mathematics. The IAS Special Year on Univalent Foundations would have been interesting even if it hadn’t started a new branch of type theory, just from the participants: Thierry Coquand, Thorsten Altenkirch, Andrej Bauer, Per Martin-Löf and, of course, the late Vladimir Voevodsky.

HoTT’s main contribution to the field, in my (lay) opinion, is the interpretation of types as spaces: by interpreting types as homotopy theoretical spaces, a semantics for types with more interesting “equality structure”, so to speak, arises. In the “classical” intensional type theory of Agda and friends, and indeed the theory of OTT commented on above, equality is a proposition. Agda (without the --without-K option) accepts the following proof:

uip : (A : Set) (x y : A) (p q : x ≡ y)  p ≡ q
uip A x .x refl refl = refl

Apart from ruling out univalence, UIP rules out another very interesting class of types which can be found in HoTT: the higher inductive types, which contain constructors for equalities as well as for values.

The simplest higher inductive type is the interval, which has two “endpoints” (i0 and i1) and a path between them:

data I : Type where
  i0 i1 : I
  seg   : i0 ≡ i1

The names i0i_0, i1i_1 and seg\mathrm{seg} were chosen to remind the reader of a line segment between a pair of points. Therefore, we may represent the type I\mathbb{I} as a diagram, with discrete points representing i0i_0 and i1i_1, and a line, labelled seg\mathrm{seg}, connecting them.

A diagrammatic representation of the type I

The real power of the I type comes from its induction principle, which says that, given a proposition P:ITypeP : \mathbb{I} \to \mathrm{Type}, if:

PP holds for every element of the interval. However, that second constraint, which relates the two proofs, needs a bit of explaining. What does it mean for two elements to be equal with respect to a path? Well, the “obvious” interpretation would be that we require a proof pseg:pi0pi1p\mathrm{seg} : pi_0 \equiv pi_1. However, this isn’t well-typed! The type of pi0pi_0 is P(i0)P(i_0) and the type of pi1pi_1 is P(i1)P(i_1), so we need a way to make them equal.

This is where the path seg\mathrm{seg} comes in to save the day. Since it states i0i_0 and i1i_1 are equal, we can transport the proof pi0pi_0 along the path seg\mathrm{seg} to get an inhabitant of the type P(i1)P(i_1), which makes our desired equality pi0pi_0 between pi1pi_1 with respect to seg\mathrm{seg} come out as transp(P,seg,pi0)pi1\mathrm{transp}(P, \mathrm{seg}, pi_0) \equiv pi_1, which is well-typed.

That’s fine, I hear you say, but there is a question: how is the interval type useful? It certainly looks as though if it were useless, considering it’s just got one element pretending to be two! However, using the interval higher inductive type, we can actually prove functional extensionality, the principle that says two functions are equal when they’re equal pointwise, everywhere. The proof below can be found, in a more presentable manner, in the HoTT book, section 6.3.

Theorem. If ff and gg are two functions between AA and BB such that f(x)g(x)f(x) \equiv g(x) for all elements x:Ax : A, then fgf \equiv g.

Proof. Call the proof we were given p:(x:A)f(x)g(x)p : \prod_{(x : A)} f(x) \equiv g(x) We define, for all x:Ax : A, the function px:IBp^\prime_{x} : \mathbb{I} \to B by induction on II. Let px(i0)=f(x)p^\prime_{x}(i_0) = f(x), px(i1)=g(x)p^\prime_{x}(i_1) = g(x). The equality between these terms is given by px(seg)=p(x)p^\prime_{x}(\mathrm{seg}) = p(x).

Now, define q:IABq : \mathbb{I} \to A \to B by q(i)=λx.px(i)q(i) = \lambda x. p^\prime_x(i). We have that q(i0)q(i_0) is the function λx.px(i0)\lambda x. p^\prime_{x}(i_0), which is defined to be λx.f(x)\lambda x. f(x), which is η\eta-equal to ff. Similarly, q(i1)q(i_1) is equal to gg, and thus, q(seg):fgq(\mathrm{seg}) : f \equiv g. \blacksquare

Isn’t that cool? By augmenting our inductive types with the ability to additionally specify equalities between elements, we get a proof of function extensionality! I think that’s pretty cool. Of course, if HITs were limited to structures like the interval, spheres, and other abstract mathematical things, they wouldn’t be very interesting for programmers. However, the ability to endow types with additional equalities is also useful when doing down-to-earth programming! A 2017 paper by Basold et al explores three applications of HITs to programming, in addition to containing an accessible introduction to HoTT in general.

Another very general higher inductive type, one that might be more obviously useful, is the general type of quotients. Whenever AA is a type and RR is a binary, propositional relation between members of AA, we can form the quotient type A/RA/R, which is given by the following constructors2:

The induction principle for quotients, which is far too complicated to include here (but can be derived mechanically from the specification given above and the knowledge of “equal with respect to some path” from the section on the interval), says roughly that we can pattern-match on A/RA/R if and only the function we’re defining does not distinguish between elements related by RR.

This type is very general! For example, given a type of naturals N\mathbb{N} and a two-place relation mod2\mathrm{mod2} which holds for numbers congruent modulo 2, we can form the quotient type N/mod2\mathbb{N}/\mathrm{mod2} of naturals mod 2. Functions (say, ff) defined on this type must then respect the relation that, whenever xymod2x \equiv y \mod 2, f(x)f(y)f(x) \equiv f(y).

All of this talk about HoTT, and in fact the book itself, though, neglected to mention one thing. What is the computational content of the univalence axiom? What are the reduction rules for matching on higher inductive types? How do we take a proof, written in the language of HoTT, and run it? The Book does not address this, and in fact, it’s still a half-open problem, and has been since 2013. Computing in the presence of all these paths, paths between paths, ad infinitum is mighty complicated.

Cubical TT

The challenge of making a computational HoTT did not stop Cohen et al, who in a 2016 paper presented Cubical Type Theory, which, after three years of active research, provides a way to compute in the presence of univalence. There’s just one problem, though: cubical type theory is hella complicated.

The core idea is simple enough: we extend type theory with a set of names I\mathbb{I}, with points 00 and 11 and operations \vee, \wedge, and 1r1 - r, which behave like a de Morgan algebra. To represent equalities, the type Path A t u\mathrm{Path}\ A\ t\ u is introduced, together with a “name abstraction” operation i t\langle i \rangle\ t and “path application” t rt\ r, where rr is an element of the interval.

Γ,i:It:AΓi t:Path A t[0/i] t[1/i] \frac{\Gamma, i : \mathbb{I} \vdash t : A}{\Gamma \vdash \langle i \rangle\ t : \mathrm{Path}\ A\ t[0/i]\ t[1/i]}

Path formation

Γ,t:Path A a bΓr:IΓt r:A \frac{\Gamma, t : \mathrm{Path}\ A\ a\ b\quad\Gamma \vdash r : \mathbb{I}}{\Gamma \vdash t\ r : A}

Path elimination

The intuition, the authors say, is that a term with nn variables of I\mathbb{I}-type free corresponds to an nn-dimensional cube.

A:Type\cdot \vdash A : \mathrm{Type} 0 dimensional cube
i:IA:Typei : \mathbb{I} \vdash A : \mathrm{Type} 1 dimensional cube
i,j:IA:Typei, j : \mathbb{I} \vdash A : \mathrm{Type} 2 dimensional cube

This is about where anyone’s “intuition” for Cubical Type Theory, especially my own, flies out the window. Specifically, using abstraction and the de Morgan algebra on names, we can define operations such as reflexivity (ia:Path A a a\langle i \rangle a : \mathrm{Path}\ A\ a\ a), symmetry (λp.ip (1i):Path A a bPath A b a\lambda p. \langle i \rangle p\ (1 - i) : \mathrm{Path}\ A\ a\ b \to \mathrm{Path}\ A\ b\ a), congruence, and even function extensionality, which has a delightfully simple proof: λp.i λx.p x i\lambda p. \langle i \rangle\ \lambda x. p\ x\ i.

However, to transport along these paths, the paper defines a “face lattice”, which consists of constraints on elements of the interval, uses that to define “systems”, which are arbitrary restrictions of cubes; From systems, one can define “composition”, which compute the lid of an open box (yeah, I don’t get it either), “Kan filling”, and finally, transport. Since the authors give a semantics of Cubical Type Theory in a previously well-established model of cubical sets, I’ll just.. take their word on this.

The Cubical Agda documentation has a section explaining a generalised transport operation transp and the regular transport operation transport. I recommend that you go check that out, since explaining each is beyond my powers. However, this transport operation does let us prove that the J induction principle for equalities also holds for these cubical paths, and from that we can define all of the other nice operations!

Cubical Agda, a proof assistant based on Cubical Type Theory, exports a library which provides all of HoTT’s primitive notions (The identity type, transport, the J rule, function extensionality, univalence), that compute properly. Furthermore, it supports higher inductive types! However, as I understand it, these can not be compiled into an executable program yet. This is because the transp operation, fundamental to computation in the presence of path types, is defined on the structure of the type we’re transporting over, and type structure is not preserved when compiling.

So What?

Even after all this explanation of fancy equality relations, you might still be unconvinced. I certainly was, for a while. But I’d argue that, if you care about typed programming enough to be interested in dependent types at all, you should be interested in, at least, the quotient inductive types of OTT, if not the more general higher inductive types of HoTT.

The reason for this is simple: inductive types let you restrict how elements of a type are formed. Quotient inductive types let you restrict, in a principled way, how elements of a type are used. Whereas in languages without quotients, like Haskell or even non-cubical Agda, one is forced to use the module system to hide the inductive structure of a type if they wish to prevent unauthorized fiddling with structure, in a language with quotients, we can have the type checker enforce, internally, that these invariants are maintained.

Just like quantitative type theory gives programmers a way to reason about resource usage, and efficiently implement mutable structures in a referentially-transparent way, I strongly believe that quotient types, and even univalent parametricity (when we figure out how to compile that) are the “next step forward” in writing reliable software using types as a verification method.

However, dependent types are not the only verification method! Indeed, there are a number of usability problems to solve with dependent types for them to be adopted by the mainstream. A grim fact that everyone who wants reliable software has to face every day is that most programmers are out there using JavaScript, and that the most popular typed language released recently is Go.3

So I leave you, dear reader, with this question: what can we do, as researchers and implementors, to make dependently typed languages more user-friendly? If you are not a researcher, and you have tried a dependently typed language, what are pain points you have encountered? And if you haven’t used a dependently typed language, why not?

  1. This is a simplified presentation that uses a single, inconsistent universe (Girard’s paradox applies). The actual presentation uses a stratified, predicative hierarchy of Seti\mathrm{Set}_i universes to avoid this paradox.↩︎

  2. The presentation of quotients in the HoTT book also contains a 0-truncation constructor, which has to do with limiting quotients to work on sets only. The details are, IMO, out of scope for this post; So check out section 6.10 of the book to get all of them.↩︎

  3. Please read the italics on this paragraph as derision.↩︎