Published on

Type Unification

During my time tutoring a university course on functional programming languages with Haskell, I noticed that many students found the concept of type unification challenging. To help bridge this gap I wrote this summary, complete with visualized examples to make type unification more approachable and easier to grasp.

Motivation

Type unification lets us determine the types of an expression and helps us check if the types of an expression are consistent.

the map function in Haskell has the following type:

map ::(ab)1. argumentFunction to be appliedto each element[a]2. argumentList of elements[b]return valueProcessed list \textsf{map ::} \underbrace{\textsf{(a} \rightarrow \textsf{b)}}_{ \colorbox{slategray}{$ \displaystyle \color{white}{ \begin{array}{c} \textrm{1. argument} \\ \\ %\hline \textrm{\footnotesize Function to be applied} \\ \textrm{\footnotesize to each element} \end{array} } $} } \rightarrow \underbrace{\textsf{[a]}}_{ \colorbox{slategray}{$ \displaystyle \color{white}{ \begin{array}{c} \textrm{2. argument} \\ \\ \textrm{\footnotesize List of elements} \end{array} } $} } \rightarrow \underbrace{\textsf{[b]}}_{ \colorbox{slategray}{$ \displaystyle \color{white}{ \begin{array}{c} \textrm{return value} \\ \\ \textrm{\footnotesize Processed list} %\textrm{\footnotesize List of elements where the} \\ %\textrm{\footnotesize function in argument 1 has ben applied} \end{array} } $} }

Its 1st argument is a function that will be applied to each element of the list in the 2nd argument (e.g. map abs [-1,-2,3,4] returns [1,2,3,4])

Imagine we define a function:

mapAbs = map abs -- which is equivalent to: mapAbs xs = map abs xs

Where map and abs have these types:

map :: (a -> b) -> [a] -> [b]
abs :: Num a => a -> a

This function expects a list and will apply the abs funciton on each element. Since abs gives us the absolute value of a number the most general type of mapAbs will be stricter than map's type. As you can see, the arguments of mapAbs have to be derived from the class Num (e.g. Int, Float) and cannot be of type Char for example. Running :type map abs in GHCi returns the type of mapAbs:

mapAbs :: Num b => [b] -> [b]

How can we determine the types of such expressions ourselves though? This is where type unification comes into play. First we will look at the typing rules, particularly the application rule and then we will look at some examples, where we will calculate the types of different expressions.

Typing Rules

There are several typing rules [1,2,3,4,5]. For type unification we are interested in the application rule.

Application Rule

s::στ;t::ρandγ(σ)=γ(ρ)(s  t)::γ(τ)%\begin{equation} \dfrac{ {\color{OrangeRed}s} :: {\color{Orange}σ} \rightarrow {\color{Plum}τ}; \quad {\color{YellowGreen}t} :: {\color{Cyan}ρ} \quad \text{and} \quad γ({\color{Orange}σ}) = γ({\color{Cyan}ρ}) }{ ({\color{OrangeRed}s} \; {\color{YellowGreen}t}) :: γ({\color{Plum}τ}) } %\end{equation}

Application Rule

s{\color{OrangeRed}s} and t{\color{YellowGreen}t} are two functions where t{\color{YellowGreen}t} is passed as the first argument to s{\color{OrangeRed}s} and γγ is the most general unifier. σ{\color{Orange}σ} is the type of the argument passed to the function s{\color{OrangeRed}s} which is the function t{\color{YellowGreen}t} . τ{\color{Plum}τ} is the type of the return value of s{\color{OrangeRed}s} and ρ{\color{Cyan}ρ} is the type of the argument passed to t{\color{YellowGreen}t} .

In γ(σ)=γ(ρ)γ({\color{Orange}σ}) = γ({\color{Cyan}ρ}) we match the type of the argument of s{\color{OrangeRed}s} with the type signature of t{\color{YellowGreen}t} . Lastly, in (s  t)::γ(τ)({\color{OrangeRed}s} \; {\color{YellowGreen}t}) :: γ({\color{Plum}τ}) we define the type of the application (s  t)({\color{OrangeRed}s} \; {\color{YellowGreen}t}) as γ(τ)γ({\color{Plum}τ}) . Notice that γ(τ)γ({\color{Plum}τ}) is exactly what we want to determine with the type unification.

Multiple Arguments

s::σ1σ2 σnτ;t1::ρ1,  ,tn::ρnandγ(σ1)=γ(ρ1),  ,γ(σn)=γ(ρn)(s  t1  tn)::γ(τ)%\begin{equation} % https://latexeditor.lagrida.com/ % https://hslpicker.com/#98CC70 % https://color.adobe.com/de/create/color-wheel % https://paletton.com/#uid=10o0u0koPuKf7I6klzytTrovplp % Orange #F58137 (max saturation = #ff7d2d) (ffa46b ff7d2d ff6200) (#f9b58b Orange #ff7d2d) % YellowGreen #98CC70 (#75b842) (#a0d07b #75b842) % cyan #00FFFF (#00c7c7) (#7affff #00c7c7) % OrangeRed #ED135A \dfrac{ {\color{OrangeRed}s} :: {\color{#f9b58b}σ_1} \rightarrow {\color{Orange}σ_2} \ \cdots \rightarrow {\color{#ff7d2d}σ_n} \rightarrow {\color{Plum}τ}; \quad {\color{#a0d07b}t_1} :: {\color{#7affff}ρ_1}, \ \ldots \ , {\color{#75b842}t_n} :: {\color{#00c7c7}ρ_n} \quad \text{and} \quad γ({\color{#f9b58b}σ_1}) = γ({\color{#7affff}ρ_1}), \ \ldots \ , γ({\color{#ff7d2d}σ_n}) = γ({\color{#00c7c7}ρ_n}) }{ ( {\color{OrangeRed}s} \; {\color{#a0d07b}t_1} \ \ldots \ {\color{#75b842}t_n} ) :: γ({\color{Plum}τ}) } %\end{equation}

Application Rule for multiple arguments

Similar to the application rule for one argument s{\color{OrangeRed}s} and t1n{\color{YellowGreen}t_{1 \ldots n}} are functions where the functions t1n{\color{YellowGreen}t_{1 \ldots n}} are passed to s{\color{OrangeRed}s} . γγ is the most general unifier again . σ1n{\color{Orange}σ_{1 \ldots n}} are the types of the arguments passed to the function s1n{\color{OrangeRed}s_{1 \ldots n}} . τ{\color{Plum}τ} is the type of the return value of s{\color{OrangeRed}s} and ρ1n{\color{Cyan}ρ_{1 \ldots n}} are the types of the arguments passed to t1n{\color{YellowGreen}t_{1 \ldots n}} .

In γ(σ1n)=γ(ρ1n)γ({\color{Orange}σ_{1 \ldots n}}) = γ({\color{Cyan}ρ_{1 \ldots n}}) we match the types of the arguments of s{\color{OrangeRed}s} with the type signatures of t1n{\color{YellowGreen}t_{1 \ldots n}} . Lastly, in (s  t1  tn)::γ(τ)({\color{OrangeRed}s} \; {\color{#a0d07b}t_1} \ \ldots \ {\color{#75b842}t_n}) :: γ({\color{Plum}τ}) we define the type of the application (s  t1  tn)({\color{OrangeRed}s} \; {\color{#a0d07b}t_1} \ \ldots \ {\color{#75b842}t_n}) as γ(τ)γ({\color{Plum}τ}) .

Example

Given two functions ff and gg we want to determine the type of the application (f g)(f \ g) . Calculate the type of the application (f g).

TIP

It might help to imagine defining a function h which applies f to g:

h = (f g)

The types of the functions f and g are defined as follows:

f :: (a -> a -> b) -> a -> b
f g xs = g xs xs

g :: c -> Int -> c
g x y = x
-- This is the const function, but with a constraint on the
-- type of the second argument which has to be of type Int here.

Application Rule

Applying the types defined above to the application rule gives us:

f::(aab)ab;g::cIntcandγ(aab)=γ(cIntc)(f  g)::γ(ab)\dfrac{ {\color{OrangeRed}f} :: % σ {\color{Orange}(a \rightarrow a \rightarrow b)} \rightarrow % τ {\color{Plum}a \rightarrow b} ; \quad {\color{YellowGreen}g} :: % ρ {\color{Cyan}c \rightarrow \text{Int} \rightarrow c} \quad \text{and} \quad γ( % σ {\color{Orange}a \rightarrow a \rightarrow b} ) = γ( % ρ {\color{Cyan}c \rightarrow \text{Int} \rightarrow c} ) }{ ({\color{OrangeRed}f} \; {\color{YellowGreen}g}) :: γ( % τ {\color{Plum}a \rightarrow b} ) }

CAUTION

The type variables in ρ{\color{Orange}\rho} and σ{\color{Cyan}\sigma} must be distinct. If they are not, we have to rename them.

Calculation of γγ

We find γ(aab)=γ(cIntc)γ({\color{Orange}a \rightarrow a \rightarrow b}) = γ({\color{Cyan}c \rightarrow \text{Int} \rightarrow c}) .

In the following we decompose the expression and use expressions that cannot be further decomposed to substitute all their occurrences. [1]

GEExplanation
\emptysetaabcIntc{\color{Orange}a \rightarrow a \rightarrow b} \doteq {\color{Cyan}c \rightarrow \text{Int} \rightarrow c}Decomposition: First we need to decompose the expression.
\emptysetacaIntbc{\color{Orange}a} \doteq {\color{Cyan}c} \\ {\color{Orange}a} \doteq {\color{Cyan}\text{Int}} \\ {\color{Orange}b} \doteq {\color{Cyan}c}Substitution: After the decomposition we see that ac{\color{Orange}a} \doteq {\color{Cyan}c} cannot be further decomposed. Therefore, we substitute all occurrences of aa with c_\rlap{\raisebox{1pt}{c}}{{\color{red}\bold\_}} and move ac{\color{Orange}a} \doteq {\color{Cyan}c} to the left side.
ac{\color{Orange}a} \nobreak \mapsto \nobreak {\color{Cyan}c}cIntbc{\color{red}\underline{\color{Orange}c}} \doteq {\color{Cyan}\text{Int}} \\ {\color{Orange}b} \doteq {\color{Cyan}c}Substitution: Now we substitute all occurrences of cc with IntInt\rlap{\text{Int}}{{\color{red}\underline{\hphantom{\text{Int}}}}}. Notice that we substitute on the left and the right side.
aIntcInt{\color{Orange}a} \nobreak \mapsto \nobreak {\color{red}\underline{\color{Cyan}\text{Int}}} \\ {\color{red}\underline{\color{Orange}c}} \mapsto {\color{Cyan}\text{Int}}bInt{\color{Orange}b} \doteq {\color{red}\underline{\color{Cyan}\text{Int}}}Substitution: Finally we need to substitute all occurrences of bb. But since there are no further occurences of bb there is nothing to substitute.
aIntcIntbInt{\color{Orange}a} \nobreak \mapsto \nobreak {\color{red}\underline{\color{Cyan}\text{Int}}} \\ {\color{red}\underline{\color{Orange}c}} \mapsto {\color{Cyan}\text{Int}} \\ {\color{Orange}b} \mapsto {\color{red}\underline{\color{Cyan}\text{Int}}}\emptysetWith the empty set on the right side we are done and can create the most general unifier.

Type Substitution

Thus, the most general unifier (MGU) [1,2,3] is:

γ={  aInt,cInt,bInt  }γ = \{\; {\color{Orange}a} \mapsto {\color{Cyan}\text{Int}}, \quad {\color{Orange}c} \mapsto {\color{Cyan}\text{Int}}, \quad {\color{Orange}b} \mapsto {\color{Cyan}\text{Int}} \;\}

Type of (f  g)({\color{OrangeRed}f} \; {\color{YellowGreen}g})

Now we can determine the type of (f  g)({\color{OrangeRed}f} \; {\color{YellowGreen}g}) by applying the type substitution to γ(ab)γ({\color{Plum}a \rightarrow b}) :

(f  g)::IntInt=γ(ab)  ({\color{OrangeRed}f} \; {\color{YellowGreen}g}) :: {\color{Cyan}\text{Int}} \rightarrow {\color{Cyan}\text{Int}} = γ({\color{Plum}a \rightarrow b}) %\qquad \phantom{}_{\square} \quad \quad \raisebox{-0.5em}{$\blacksquare$} \;

Therefore, (f g) expects an argument of type Int and returns an Int.

Validation

We can validate the type calculation using the Haskell interpreter. Therefore, open ghci in your terminal and enter the following commands. If you don't have GHCi installed you can open GHCi in the terminal online on tutorialspoint.com, or you can run Haskell locally in your browser using WASM on VaibhavSagar.com/WebVM [1].

-- This will always print the type after evaluation
:set +t

-- Multiline input can be entered in GHCi by surrounding it with :{ and :}
:{
f :: (a -> a -> b) -> a -> b
f g xs = g xs xs

g :: c -> Int -> c
g x y = x
:}

-- Now we can check the type of (f g)
:type (f g)

TIP

  • On hoogle.haskell.org you can search for functions by their type signature.
  • For a list of all available commands in GHCi type :help in the GHCi prompt.

Example (f length words)

How do we determine the type of the application (f length words)?

This is type of f:

f :: (a -> b) -> (c -> [a]) -> c -> [b]
f g1 g2 xs = map g1 $ g2 xs
-- :type (f length words)       -- output: (f length words) :: String -> [Int]
-- f length words "a bb cccc"   -- output: [1,2,4]

The builtin functions (from the Prelude library) length and words have the following types (Note that String = [Char] [1])

length :: Foldable t => t a -> Int -- ≅ [a] -> Int
words :: String -> [String] -- = [Char] -> [[Char]]

Next we apply the application rule:

f::(ab)(c[a])c[b]; length::[d]Int,words::[Char][[Char]] and γ(ab)=γ([d]Int),γ(c[a])=γ([Char][[Char]])(f  length  words)::γ(c[b])\dfrac{ {\color{OrangeRed}f} :: {\color{#f9b58b}(a \rightarrow b)} \rightarrow {\color{#ff7d2d}(c \rightarrow [a])} \rightarrow {\color{Plum}c \rightarrow [b]}; \ \begin{array}{c} {\color{#a0d07b}length} :: {\color{#7affff}[d] \rightarrow \text{Int}}, \\ {\color{#75b842}words} :: {\color{#00c7c7}\text{[Char]} \rightarrow [\text{[Char]}]} \end{array} \ \text{and} \ \begin{array}{c} γ({\color{#f9b58b}a \rightarrow b}) = γ({\color{#7affff}[d] \rightarrow \text{Int}}), \\ γ({\color{#ff7d2d}c \rightarrow [a]}) = γ({\color{#00c7c7}\text{[Char]} \rightarrow [\text{[Char]}]}) \end{array} }{ ( {\color{OrangeRed}f} \; {\color{#a0d07b}length} \; {\color{#75b842}words} ) :: γ({\color{Plum}c \rightarrow [b]}) }

Now that we have the constraints we can calculate the most general unifier

GEExplanation
\emptysetab[d]Intc[a][Char][[Char]]{\color{Orange}a \rightarrow b} \doteq {\color{Cyan}[d] \rightarrow \text{Int}} \\ {\color{Orange}c \rightarrow [a]} \doteq {\color{Cyan}\text{[Char]} \rightarrow [\text{[Char]}]}Decomposition: First we decompose the first expression.
\emptyseta[d]bIntc[a][Char][[Char]]{\color{Orange}a} \doteq {\color{Cyan}[d]} \\ {\color{Orange}b} \doteq {\color{Cyan}\text{Int}} \\ {\color{Orange}c \rightarrow [a]} \doteq {\color{Cyan}\text{[Char]} \rightarrow [\text{[Char]}]}Substitution: a[d]{\color{Orange}a} \doteq {\color{Cyan}[d]} cannot be further decomposed. Therefore, we substitute all occurrences of aa with [d][d]\rlap{{\color{red}\underline{\hphantom{[d]}}}}{[d]}.
a[d]{\color{Orange}a} \nobreak \mapsto \nobreak {\color{Cyan}[d]}bIntc[[d]][Char][[Char]]{\color{Orange}b} \doteq {\color{Cyan}\text{Int}} \\ {\color{Orange}c \rightarrow [{\color{red}\underline{{\color{Orange}[d]}}}]} \doteq {\color{Cyan}\text{[Char]} \rightarrow [\text{[Char]}]}Substitution of bb (No further bb's to substitute).
a[d]bInt{\color{Orange}a} \nobreak \mapsto \nobreak {\color{Cyan}[d]} \\ {\color{Orange}b} \mapsto {\color{Cyan}\text{Int}}c[[d]][Char][[Char]]{\color{Orange}c \rightarrow [{\color{red}\underline{{\color{Orange}[d]}}}]} \doteq {\color{Cyan}\text{[Char]} \rightarrow [\text{[Char]}]}Decomposition of the second expression.
a[d]bInt{\color{Orange}a} \nobreak \mapsto \nobreak {\color{Cyan}[d]} \\ {\color{Orange}b} \mapsto {\color{Cyan}\text{Int}}c[Char][[d]][[Char]]{\color{Orange}c} \doteq {\color{Cyan}\text{[Char]}} \\ {\color{Orange}[{\color{red}\underline{{\color{Orange}[d]}}}]} \doteq {\color{Cyan}[\text{[Char]}]}Substitution: of cc (No further cc's to substitute).
a[d]bIntc[Char]{\color{Orange}a} \nobreak \mapsto \nobreak {\color{Cyan}[d]} \\ {\color{Orange}b} \mapsto {\color{Cyan}\text{Int}} \\ {\color{Orange}c} \mapsto {\color{Cyan}\text{[Char]}}[[d]][[Char]]{\color{Orange}[{\color{red}\underline{{\color{Orange}[d]}}}]} \doteq {\color{Cyan}[\text{[Char]}]}We unpack the lists on both sides.
a[d]bIntc[Char]{\color{Orange}a} \nobreak \mapsto \nobreak {\color{Cyan}[d]} \\ {\color{Orange}b} \mapsto {\color{Cyan}\text{Int}} \\ {\color{Orange}c} \mapsto {\color{Cyan}\text{[Char]}}dChar{\color{Orange}{\color{red}\underline{{\color{Orange}d}}}} \doteq {\color{Cyan}\text{Char}}Substitution: dChar{\color{Orange}d} \doteq {\color{Cyan}\text{Char}} cannot be further decomposed. Therefore, we substitute all occurrences of dd with CharChar\rlap{{\color{red}\underline{\hphantom{\text{Char}}}}}{\text{Char}}.
a[Char]bIntc[Char]dChar{\color{Orange}a} \nobreak \mapsto \nobreak {\color{Cyan}[{\color{red}\underline{\color{Cyan}\text{Char}}}]} \\ {\color{Orange}b} \mapsto {\color{Cyan}\text{Int}} \\ {\color{Orange}c} \mapsto {\color{Cyan}\text{[Char]}} \\ {\color{Orange}{\color{red}\underline{{\color{Orange}d}}}} \mapsto {\color{Cyan}\text{Char}}\emptyset

The most general unifier (MGU) is:

γ={  a[Char],bInt,c[Char],dChar  }γ = \{\; {\color{Orange}a} \mapsto {\color{Cyan}[\text{Char}]}, \quad {\color{Orange}b} \mapsto {\color{Cyan}\text{Int}}, \quad {\color{Orange}c} \mapsto {\color{Cyan}\text{[Char]}}, \quad {\color{Orange}d} \mapsto {\color{Cyan}\text{Char}} \;\}

The type of (f length words) is:

(f  length  words)::[Char][Int]=γ(c[b])  ( {\color{OrangeRed}f} \; {\color{#a0d07b}length} \; {\color{#75b842}words} ) :: {\color{Cyan}[\text{Char}]} \rightarrow {\color{Cyan}[\text{Int}]} = γ({\color{Plum}c \rightarrow [b]}) %\qquad \phantom{}_{\square} \quad \quad \raisebox{-0.5em}{$\blacksquare$} \;

In other words (f length words) takes an argument of type String (String = [Char]) and returns a list of Int's. The returned list shows the length of the words in the passed string.

Do you want to try to find the type of (f2 length words) yourself?

f2 :: (a -> b) -> (c -> a) -> [c] -> [b]
f2 g1 g2 xs = map (g1 . g2) xs
-- :type (f2 length words)                   -- output: (f2 length words) :: [String] -> [Int]
-- f2 length words ["a", "b b", "c c c c"]   -- output: [1,2,4]

Example Error

Calculate the type of the application (all g).

all :: Foldable t => (a -> Bool) -> t a -> Bool -- ≅ (a -> Bool) -> [a] -> Bool

g :: (Ord a, Num a) => a -> [Bool] -- ≅ a -> [Bool]
g x = [x > 5, x > 50]
-- Another function with the same type signature would be:
g = zipWith (>) [5, 50] . replicate 2

Next we apply the application rule:

all::(aBool)[a]Bool;g::b[Bool]andγ(aBool)=γ(b[Bool])(all  g)::γ([a]Bool)\dfrac{ {\color{OrangeRed}all} :: % σ {\color{Orange}(a \rightarrow Bool)} \rightarrow % τ {\color{Plum}[a] \rightarrow Bool} ; \quad {\color{YellowGreen}g} :: % ρ {\color{Cyan}b \rightarrow [Bool]} \quad \text{and} \quad γ( % σ {\color{Orange}a \rightarrow Bool} ) = γ( % ρ {\color{Cyan}b \rightarrow [Bool]} ) }{ ({\color{OrangeRed}all} \; {\color{YellowGreen}g}) :: γ( % τ {\color{Plum}[a] \rightarrow Bool} ) }

Now that we have the constraints we can calculate the most general unifier

GEExplanation
\emptysetaBoolb[Bool]{\color{Orange}a \rightarrow Bool} \doteq {\color{Cyan}b \rightarrow [Bool]}Decomposition
\emptysetabBool[Bool]{\color{Orange}a} \doteq {\color{Cyan}b} \\ {\color{Orange}Bool} \doteq {\color{Cyan}[Bool]}Substitution of aa with bb.
ab{\color{Orange}a} \nobreak \mapsto \nobreak {\color{Cyan}b}Bool[Bool]{\color{Orange}Bool} \doteq {\color{Cyan}[Bool]}Error, because Bool{\color{Orange}Bool} is in the domain of [Bool]{\color{Cyan}[Bool]}.

The application (all g) is not typable because Bool{\color{Orange}Bool} is in the domain of [Bool]{\color{Cyan}[Bool]}. Therefore, we cannot unify the types. \quad \raisebox{-0.5em}{$\blacksquare$}

Other examples of untypable applications are:

foldl (:) []
-- foldl    :: (b -> a -> b) -> b -> [a] -> b
-- (:)      :: c -> [c] -> [c]
-- []       :: [d]

words ["a bb ccc", "dd ee"]
-- words                    :: [Char] -> [[Char]]
-- ["a bb ccc", "dd ee"]    :: [[Char]]