One Axiom to bring them all, and in the darkness bind them.
In Polish Notation where the Shadows lie.
DDfDpcDDoDooDDopDDfoDfo
Okay, let's back up a little. What is this shadowy Polish notation? It promises to be the one improvement in mathematical notation that will render parentheses unnecessary. But doesn't seem to have been adopted that widely. Probably because for all its simplicity, it is not as easy to read and parse. By humans, that is, because robots and Lisp programmes love nothing better.
So let's try to explain it with the use of a pet logical formula like $$ \left( (p\land (\neg q)) \rightarrow r\right) \rightarrow \left( p\rightarrow((\neg q)\rightarrow r)\right), \tag{0}$$ in which I inserted spurious parentheses on purpose, pretending that we are unsure as to the precedence rules for \(\neg\), \(\land\) and \(\rightarrow\). In fact, they are conventional anyway, and can't be avoided in some situation like defining the precedence in the first place. And, let's be honest, this: $$ p\land \neg q \rightarrow r \rightarrow p\rightarrow \neg q \rightarrow r \tag{0*}$$ would not be easy to read — it wouldn't even be the same formula 1.1.
The formula features both prefix (\(\neg x\)), and infix (\(x\land y\)) notation. It is easy to make the first step, and switch to the (prefix) functional notation. Exactly like \(f\) in \( f(x,y) = x + y \), where \(f\) replaces the \(+\) sign, we can write the logical connective as functions. Say \(C(x,y) := x\rightarrow y\), \(K(x,y):= x\land y \) and \(N(x):=\neg x\). The our formula becomes: $$ C(p\land N(q) \rightarrow r, p \rightarrow (N(q)\rightarrow r)) $$ $$ C(K(p,N(q))\rightarrow r, C(p,(N(q)\rightarrow r)))$$ $$ C(C(K(p,N(q)),r), C(p,C(N(q),r))). \tag{1}$$ Somehow the parentheses and commas are not making it any clearer, although no one would complain if this was encountered in a calculus textbook — just another composition of functions...
Now here's the main step: neither the commas1.2 nor the parentheses are necessary! Firstly, because we know exactly how many arguments each function/connective accepts. Secondly, because there are no multi-letter variables or functions — all logical constants and variables use a single character. We can thus write just: $$ CCKpNqrCpCNqr. \tag{2}$$ We could even keep the symbols: \(\rightarrow \rightarrow \land p\neg q r \rightarrow p \rightarrow \neg q r \), but somehow this isn't very aesthetically pleasing (for me at least); perhaps because of the uneven glyph sizes of the connectives — perhaps because some subexpressions mimic the usual infix notation.
So let's try and make some mistakes in parsing the Polish notation version. Could the last \( Nqr \) be a valid expression? No: \(qr\) is two variables (propositions), and they can't be concatenated on their own to form a single argument for the negation \(N\). But if \(N\) were to take \(q\) as its argument, then \(r\) would be left dangling: \(\neg q r\). Then there's the preceding \(C\), which needs two arguments. It seems those are \(N\) and \(q\), so... \(CNq\) means “negation implies q”, or \(\neg \rightarrow q \)... What‽ Ah, wait a minute, \(Nq\) on its own is a valid expression, and that as a whole must be the first argument of \(C\), while \(p\) is the second. We get \(CNqr = (\neg q)\rightarrow r\). Whew!
Let's check the previous \(Nqr\) — we have to go back to conjuction \(K\), but its arguments can't be \(p\) and \(N\) — it must be like before: \(p\) is the first argument of \(K\), and the second argument is the whole \(Nq\). But that leaves the dangling \(r\), so we move back a bit further, and there's a \(C\) that could help. We already expect, that \(K\) is not an argument of \(C\) (what would \(\land\rightarrow p\) mean anyway?) — rather, it must be the whole conjuction under \(K\), which we have just established to be \(KpNq\), so the dangling \(r\) is simply the second argument we need for \(C\). Step by step, we deciphered \(Nq\) to be just \(\neg q\), then \(KpNq\) to be \(p\land(\neg q)\), and finally the whole chunk \(CKpNqr\) to mean \((p\land \neg q)\rightarrow r\). Success!
Of course the above is not how the notation is parsed in practice. One of the easiest algorithmic methods is to proceed from the end, keeping the variables in a stack, and replecaing them with a full expression whenever a connective is encounterd1.25. It works, because in a well formed formula there will always be enough variables to fill all slots, and each connective will immediately transform into a closed expression, without any dangling arguments. This still requires pre-defined knowledge of possible connectives and their arity. And unfortunately the commonly written \(p\land q\land r\land s\) has to be translated as \(KKKpqrs\) or perhaps \(KpKqKrs\), instead of the alluring \(Kpqrs\), because \(K\) always takes only 2 arguments.
Perhaps, using a composition operator \(Q\) acting as \(Q2Kpqr=KKpqr\), \(Q5Kpqrstu=KKKKKpqrstu\) etc. would help. But the usual problem with a multidigit first argument comes back 1.3. Using multi-level indices suggests itself: \(Q^{11}Kx_1\ldots x_{11}\) or just \(K^{17}y_1\ldots x_{17}\), and two-dimensional notatios is in fact where we are going, although not to deal with digits per se.
When it comes to parsimony, (2) is shorter than the original (0) or even the ambiguous (0*), and much shorter than the functional version (1), while keeping one of its benefits: we immediately know what type of expression it is. Just as we would call \(2\cdot 4 + 3\cdot 5\) a sum, and \((2+4)\cdot(3+5)\) a product, (2) is obviously an implication — the main overlapping operator is \(C\). By contrast, it takes a moment to find the main \(\rightarrow\) in (0).
Yet, the compact notation comes with an obvious cost: the immediate arguments of each operator are hard to find. There is just too much information crammed into the expression, without any helpful redundancy to facilitate parsing to ordinary language. Consider the first connective \(C\) — as we keep reading, we see that its first argument is another implication, whose first argument is a conjuction, whose first argument is \(p\)... We keep building (sub-)sub-expression, so that by the time we get to the first \(r\), we go back just to make it's the second argument of the second \(C\), so we keep going. And it isn't really any easier with the functional notation, because we essentially have the same task disguised as perentheses matching (how many are still open after the first \(r\)?). If only there was some way of indicating it...
My first thought was to draw braces above (or below) connecting parentheses of the functional notation, like so: $$ C\undergroup{(C\overgroup{(K\overgroup{(p,N(q))},r)}, C\overgroup{(p,C\overgroup{(N(q),r)})})},$$ but this obviously clutters the formula even more. It also duplicates what the parentheses are supposed to achieve. So let's get rid of them: $$ C\undergroup{C\overgroup{K\overgroup{pNq}r} C\overgroup{pC\overgroup{Nqr}}}.$$ Not bad, in fact, I'm guessing this extended Polish notation is what many of us write to actually work out a formula. The programming language Lisp mentioned earlier takes a different approach, that also deals with the problem of longer names and unknown arities: use space instead of commas, and enclose the function together with its arguments in a parenthesis group (what is called an S-expression) like so:
(C (C (K p (N q)) r) (C p (C (N q) r))),and the same syntax holds for the whole language1.4.
But if we are ok with a bit of 2-dimensional math, let's just incorporate it into the flow of the expression itself,
depicting the levels of recursion as we traverse the argument tree. Each time a connective is encountered, we draw
branches for its expected arguments, and keep filling in the first argument — with its own sub-branches or complete
arguments — and when we are done with it, we start gathering the next argument. The branches are drawn as horizontal
lines on top of each other, so each successive sub-argument is written slightly higher, like an exponent. Each time an
argument is filled with a complete expression, we write it at the the branch-line's end, and go down one step. This way,
the whole expression is still written linearly from left to right — only the level changes, as we enter and
exit arguments. Here's the picture of our formula, worth a paragraph of words:
CCKpNqrCpCNqr
Let's see some of the other standard axioms/tautologies. \(\alpha\rightarrow(\eta\rightarrow\alpha)\), or
\(C\alpha C\eta\alpha\), becomes
CaCea
\((\alpha\rightarrow(\eta\rightarrow\omega))\rightarrow ((\alpha\rightarrow\eta)\rightarrow(\alpha\rightarrow\omega)\),
or \(CC\alpha\eta C\eta\omega CC\alpha\eta C\alpha\omega\), becomes
CCaCeoCCaeCao
And here is one of Frege's axioms, which we all know and love
CCpqCNqNp
Finally, let's draw the de Morgan law \(\neg(\alpha\land\omega)\leftrightarrow (\neg\alpha\lor\neg\omega)\),
just to introduce equivalnce \(E\) and alternative \(A\). It's Polish form \(ENK\alpha\omega AN\alpha N\omega\) turns into
ENKaoANaNo
Notice that if the lines were to disappear, the remaining string is again the plain Polish notation. The vertical shifts
merely make the argument ownership immediately apparent. And more than that, it makes it easy to change the order of
arguments for symmetric connectives, because whole branches are distinct. The above formula can at once be rewritten as
e.g.
EANaNoNKao
This brings us to the One Ring... I'm sorry, to the single Axiom. Łukasiewicz writes2.1 that it was Nicod who first found the single axiom sufficient to obtain classical propositional calculus (using substitution and modus ponens as derivation rules). In the spirit of Unity, the Single Axiom uses the single connective \(D\), so let's deal with it first. It signifies the Sheffer stroke \(|\) (also written \(\uparrow\)), equivalent to the NAND gate: $$(p\,|\,q) = p\uparrow q = \neg (p\land q),$$ which means “\(p\) and \(q\) are not both true” in natural language. Its crucial feature is that it is functionally complete: all the other connectives can be defined via \(D\) as $$ \left\{ \begin{align*} \neg x &:= (x|x)\\ x\land y &:= ((x|y)|(x|y))\\ x\rightarrow y &:= (x|(y|y))\\ x\lor y &:= ((x|x)|(y|y)) \end{align*} \right. \quad \mathrm{or} \quad \left\{\begin{align*} N x &:= Dxx\\ K x y &:= NDxy = DDxyDxy\\ C x y &:= DxNy = DxDyy\\ A x y &:= DNxNy = DDxxDyy \end{align*}\right. , $$ where I chose the parantheses/infix notation because it leads to another delightful detour: Quine's New Foundations2.15, which was actually where I first encountered two of our heroes: the single connective and the single axiom. The subject of the article is much larger than just notation or logic. It is after all a seminal work on which more recent set theories rest, but wanting to present a complete foundation, Quine has to include also the logical part of his language, and one connective with one axiom does so without bloating the overall volume (he still manages a tight 11 pages!). I highly recommend it both for the writing style, the important innovation, and for making sense of the universal set. It's just one of those perfect classic pieces of art. Beware though, there are still typos: the axiom which we are trying to reach has one letter wrong — see below for the correct forms.
Quine writes all formulae with non-optional parantheses — just like in the quantum brakets \(\langle\psi|\varphi\rangle\) — they are part of the connective notation, not just for grouping. But you could go one step further: since there is only one connective, it can be skipped too (he goes the other direction, and keeps \(\supset\) as implication for brevity's sake). The arguments are single-letter propositions or parenthesized expressions anyway, so it's safe to write \(((pp)((pp)(pp)))\) for the law of excluded middle. But we are getting dangerously close to the Iota programming language, so let's get back to the main topic.
Quine wants a new set theory built from scratch, so he also needs to include logic. But it is the other modified axioms (restricted comprehension, stratification) that are of central interest, so (I'm guessing) for brevity, he packages the whole of classical propositional calculus with just one axiom. Well, OK, some inference and substitution rules are also required, we'll get to that. For now here is the correct axiom in his notation $$ ((\varphi|(\psi|\chi))|((\omega\supset \omega)|((\omega|\psi)\supset(\varphi|\omega)))),$$ but let's rewrite to Polish notation, and expand that implication to use \(D\) only: $$ DD\varphi D\psi\chi DD\omega D\omega\omega DD\omega\psi DD\varphi\omega D\varphi\omega, $$ which which is exactly what accompanies the opening poem. And by now, you recognize that \(D\omega D\omega\omega\) really is \(\omega\rightarrow\omega\), while \(DD\omega\psi DD\varphi\omega D\varphi\omega\) is \((\omega |\psi)\rightarrow(\varphi |\omega)\). It is both too complex to nicely translate into natural language, and at the same time contains some apparently trivial bits — couldn't we just write \(\mathsf{T}\)(rue) for \(C\omega\omega\)? The notation proposed above really does help in chunking — let's draw it again, so that you don't have to scroll. DDfDycDDoDooDDoyDDfoDfo
At least now we can make out distinct arguments to the main connective (that they differ in length), and where the trivial \(\omega\) block sits exactly. But there is still something unsatisfying here: \(D\) is symmetric, and in drawing the above we somehow have to distinguish between the arguments (although that is desired for general connectives). It is not so easy to see how to exchange the various branches. Couldn't we make it fully two-dimensional? Why, I'm glad you asked! Here's your reward for getting this far. A truly geometric depiction, which you can poke and arrange just the way you like it:
The hollow and filled circles represent the first and second argument, respectively (needed for other connectives). The root (main connective) is flashing, although it can only be the one to which no arrows lead. In this case, with just one symmetric connective, perhaps the graph could be somewhat simplified. And do we need so many separate nodes for the same propositional variables? But all this really deserves a separate article, which will happen eventually,
An axiom (schema) on its own is not worth much, because there must also be rules that produce new theorems. Substitution is obviously required, but we have no quantifiers here, so let's not go into the bureaucratic details. As for inference, Quine follows Nicod in taking a modified modus ponens: If \(\alpha\) and \(D\alpha D\phi\omega\) are theorems, then so is \(\phi\). In preparation for sequents, let's introduce \(\vdash\) as the meta-implication/derivation/theorem symbol, so the above rule can be stated as $$\alpha, D\alpha D\varphi\omega \vdash \varphi. \tag{MP} $$ It's tempting to try and write \(\vdash\) in Polish notation, but it has variable arity, and is not really a connective of the logic — it's a symbol of the meta-language (English).
That this really is modus ponens becomes clear when substituting \(\varphi\) for \(\omega\), so the second premise becomes \(D\alpha D\varphi\varphi=C\alpha\varphi\). In a sense then, Nicod's modus ponens is more general than the ordinary one. But not quite, since \(D\alpha D\varphi\omega = D\alpha NK\varphi\omega = C\alpha K\varphi\omega\), so the premise has to be stronger: \(\alpha\) must imply both \(\omega\) and \(\varphi\), which leads to \(C\alpha\varphi\). A stronger premise means it is harder to apply, but if one only has the ordinary \(C\alpha\varphi\) then also \(C\alpha K\varphi\varphi = D\alpha D\varphi\varphi\) must hold and (MP) is applicable. This sort of circular inter-dependence was also noticed by Łukasiewicz, but purely for formulae, not for (meta)rules. We will get back to this interesting phenomenon later. For now, let's add that \(D\) is symmetric so there should really be two consequents in the rule, both \(\varphi\) and \(\omega\). It does save some steps in proofs, but for some reason (conformity?) people use the above version.
Either way this (Hilbert style) approach is quite cumbersome, and a painful leap from checking truth tables that students (usually) learn first. Formally These are different domains (serivations and semantics), but in the simple logic we are dealing with here, their equivalnce is guaranteed by the soundness and completeness theorems. It also makes sense that just checking whether a formula is true, should be simpler than deriving completely new theorems. But does it have to so painful? One of the standard examples is proving that \(p\rightarrow p\) is not an independent axiom, but that it follows from the first two: $$\begin{aligned} [\mathrm{A}1]\quad &p\rightarrow (q\rightarrow p),\\ [\mathrm{A}2]\quad &(p\rightarrow (s\rightarrow r)) \rightarrow ((p\rightarrow s)\rightarrow(p\rightarrow r)). \end{aligned}$$
You can find the standard proof on Wikipedia, but let's try the new notation and see if it is a bit more readable. And since the only connective is \(C\), let's stick the ordinary modus ponens. First, we write [A2] with the substitution \([r/p]\): CCpCspCCpsCpp The highlighted part is an instance of [A1], so modus ponens yields the rest as a theorem: CCpsCpp We now choose a convenient formula for \(s\), substituting \([s/Cqp]\) to get CCpCqpCpp and since the first argument is again [A1], we conclude that \(\vdash Cpp\).
What's so unsatisfying is the bloating of formulae in the middle of the proof — we usually have no choice but to substitute axioms into axioms, and then detaching them as premises throgh modus ponens (hence its other name: the rule of detachment), with occasional renaming of variables to simplify (specialize) the unwieldy expressions. You probably suspect by now where this is going... Imagine proceeding from the single axiom, which already has 23 characters in its compact form, and substituting it into itself!
And yes, it had to be done, to show that you can obtain Hilbert-style axioms from it. There is a short article by Scharle2.2, containing a complete derivation from the single axiom to the complete 3-axiom set of Łukasiewicz — the proof is only 2 pages, but it uses a shorthand, so only upon inspection do we realize some formulae (in pure Polish notation) are over 60 characters long. 32 steps/theorems are needed to get all 3 axioms of Łukasiewicz. The first 7 steps are really illuminating for three reasons.
First, we notice that Scharle uses the rule \(p, DpDqr\vdash r\), discarding \(q\) as a possible conclusion. This is a different order to Quine's, and seemed wasteful as mentioned before. But the difference made me ask: do we actually assume \(D\) to be symmetric? And the answer is no, we don't have to. Scharle's 7th theorem is in fact \(DDstDDtsDDts\) or \(s\uparrow t \implies t\uparrow s), so symmetry follows from the axiom itself!
Second, even though the first step results in \(D\tau_1 D \tau_6 \tau_2\), where \(\tau_n\) is the \(n\)-th theorem, we cannot immediately jump to the 6th step, because symmetry hasn't been proven yet. And if we were to switch the order, the 7th theorem couldn't be obtained from the 6th, which yields \(D\tau_6D\tau_6\tau_7\). So even though it is tempting to use the stronger modus ponens, which gives two conclusions, it would mean explicitly assuming the symmetry, and there is something satisfying in the single axiom containing all information, and \(D\) being just a two-place connective.
Thir, the “obvious” thesis \(DtDtt\) or \(t\rightarrow t\) is obtained in the 6th step, so again, it shouldn't be plugged in as \(\mathsf{T}\) into the axiom itself. Plus, a separate truth symbols is absent in the actual language, and implication \(C\) is defined as secondary from \(D\), so \(Ctt\) is not really obvious.
Just for fun, let's write the first step, but with our (Quine's) modus ponens. Start with the axiom DDpDqrDDsDssDDsqDDpsDps and make the substitutions \([p/DpDqr], [q/DsDss], [r/DDsqDDpsDps], [s/t]\) to get
What a glorious beast! And as the entire first “exponent” (argument) is the axiom itself, we can detach the simple conclusion of \(DtDtt\). Is that proof of \(t\rightarrow t\) shorter than before? You be the judge.
One more interesting detour, before we save ourselves by jumping to a more practical derivation system, is a curious phenomenon that Łukasiewicz called “generalizing deduction”. Going back to that curious \(D\omega D\omega\omega\) block, and repeated sub-expressions, several changes can be made. And indeed various logicians had their own variants of the axiom. Nicod's original axiom was DDpDqrDDtDttDDsqDDpsDps The version we have been using so far was obtained by Łukasiewicz by substituting \([t/s]\) in Nicod's original. Wajsberg gave another version DDfDycDDDocDDfoDfoDfDfy And you can find many other in the usual place.
What is distinctive about this last one is that it has no subexpressions which themselves are theorems. In particular no \(C\omega\omega\). Łukasiewicz calls such an axiom “organic”, and neither his nor Nicod's version exhibit this property. Wajsberg's axiom really feels more fundamental. But of course each of them is sufficient to generate the whole logical system under question. Which means something unusual has happened. Did you notice it?
Łukasiewicz wrote, that he simplified Nicod's axiom, because he reduced the number of variables from 5 to 4 by substitution. And not just renaming — the variable \(s\) was already present, and its truth value was independent from \(t\). Łukasiewicz's version is thus a special, less general case of Nicod's. But it still holds the same generative power! How can that be? Was there a mistake? No, Łukasiewicz goes on to show, how Nicod's axiom can be derived from his own. In other words, he obtains a more general formula from its own less general form. He calls it “generalising deduction”, and comments on its philosophical implications contrasting it with deduction and induction. I'm not sure if it can be put to mathematical use, after all the system is sound and complete, and if this method went beyond standard inference, it would produce theorems outside of what can be derived with modus ponens.
This circular procedure is still surprising, so let's see how it works on a simpler example, also supplied by Łukasiewicz. Start with \(CpCqCrCsr\) as the general formula. Then take its special case when \(p=q\), i.e. \(CqCqCrCsr\), and try to get back to the original... Fortunately, he also gives us the derivation, which proceeds as follows. Call the second formula \(\omega_1\) and substitute it into \(q\) to get \(\omega_2\): $$ \omega_1 [q/\omega1] = C\omega_1\omega_2, \quad\text{where}\quad \omega_2 = C\omega_1\omega_3, \quad\text{and}\quad \omega_3 = CrCsr.$$ Since we assume \(\omega_1\), we get \(\omega_2\) by modus ponens, and in turn \(\omega_3\). The next substitution is $$ \omega_3 [r/\omega_3; s/q] = C\omega_3\omega_4,\quad\text{where}\quad \omega_4 = CqCrCsr. $$ And the last one: $$\omega_3[r/\omega_4; s/p] = C\omega_4\omega_0, \quad\text{where}\quad \omega_0 = CpCqCrCsr $$ is the original formula, from which we again get \(\omega_1 = \omega_0[p/q]\), and the whole fun can begin anew.
As you can imagine, the proof for the axiom itself is three times as long with formulae four times as long (although our favourite \(DtDtt\) makes an appearance too), but the idea is the same. Perhaps I will someday draw it if only to show off the 2D notation. But now it is time, to introduce a more practical method of proving theorems in Polish notation.
Sequent calculus is a notation/proof system where, in essence, we have a (meta-)implication connecting a (meta-)conjuction to a (meta-)alternative like so $$ (x_1\land x_2\land\ldots\land x_n) \rightarrow (y_1\lor y_2\lor\ldots\lor y_m), \tag{Int} $$ denoted by $$x_1, x_2,\ldots, x_n \vdash y_1, y_2,\ldots, y_m.$$ This makes it clearer that \(x\)'s and \(y\)'s are the actual logical expressions, while the sequent (including commas) is another level of notation.3.1 What it says is that given all the propositions on the left, at least one of the propositions on the right follows, or that if at least one of the propositions on the left is false, then whatever follows (ex absurdo quodlibet). This might seem strange at first, but it will not matter in the practical application of the calculus below (there will be no actual checking of truth values). The alternative on the right-hand side was introduced by Gentzen as a means to making sequents applicable to classical logic not just the intuitionistic one. A sequent is a tool for proofs, not an indication of conclusions following from premises, except when there is only one item on the right. Thus \(\vdash\alpha\) simply means that \(\alpha\) is a theorem.
If you need an intuitive analogy, the closest I have found is the proof by contradiction in the tableaux method. We assume the conclusion is false, and try to arrive at a contradiction, dilligently keeping track of what has to be true or false. So, for example, if we want to check the formula \( (p\land q)\rightarrow( \neg q\rightarrow p) \), we ask: when is it false? Only when the antecedent \(p\land q\) is true, and the consequent \(\neg q\rightarrow p\) is false. Ah, but the former means that both \(p\) and \(q\) are true, while the latter has to be analysed again, yielding \(\neg q\) as true, and \(p\) as false. But then the truth values of the variables are in contradiction, so the initial formula must be true.
A sequent work similarly, for if we assume the intedent ineterpretation (Int) to be a false formula, that is the same as saying that the conjuctions on the left is true, or that all \(x_i\) are true, but that the alternative on the right is false, so all \(y_j\) are false. That amounts to writing all \(x_i\) on the \(\mathsf{T}\) side of the tableau, and all \(y_j\) on the \(\mathsf{F}\) side. A contradiction appears when at least one \(y_j\) turns out to be true, or at least one \(x_i\) turns out to be false, which is what we said at the outset: writing a sequent is like stating the initial negated formula. Or indeed, a relation between sets of formulae. Here's how the above short proof would look in sequent form: $$ \dfrac{\vdash (p\land q)\rightarrow (\neg q \rightarrow p)}{ \dfrac{p\land q \vdash \neg q \rightarrow p}{ \dfrac{p, q \vdash \neg q \rightarrow p}{ \dfrac{p, q, \neg q \vdash p}{\mathrm{\small AX}} } \; \mathsf{L}\rightarrow } \; \mathsf{L}\land } \; \mathsf{R}\rightarrow $$
It is essentially the same as the tableau, except there are no truth values: we move expression between the left- and right-hand sides of the sequent sign. And we have to spell out the rules of how we change the sequent from line to line. Mostly it was the intuitively clear rule for the implication: it is false, when falsity follows from truth. For sequents, the rules define the allowed transformations, stating the connective and the side it appears on. The two rules for implication are: $$ \mathsf{L}\rightarrow\; \dfrac{\Gamma \vdash p\rightarrow q, \Delta}{\Gamma, p \vdash q, \Delta} \quad\mathrm{and}\quad \dfrac{\Gamma, p\rightarrow q \vdash \Delta}{ \Gamma \vdash q, \Delta \quad \Gamma, p \vdash \Delta}\;\mathsf{R}\rightarrow . $$ In addition to the implication itself, we allow arbitrary (perhaps empty) sets of formulae \(\Gamma\) and \(\Delta\), as we need to take care of implication in all contexts; and you can already see that as the proof procedes, there are more and more comma-separate bits. Those are the \(x\)'s and \(y\)'s of (Int). It is customary to write the rules in such general form, but I will ignore those \(\Gamma\) and \(\Delta\) below, when they are just trivially repeated (as above). Note also, that the second rule produced branching: we ended up with two sequents, and each of them has to be proved in turn.
Clearly, we need two rules for each connective, but that is not enough. We also need to be able to stop, and for that we have the axioms, or axiom schemas, of the form: \(\Gamma, \alpha \vdash \Delta, \alpha\). A derivation is supposed to present a proof leading from an axiom (or axioms), through judicious application of the rules to the formula in question. In that sense, the actual proof proceeds in the opposite direction to how it is recovered it: it starts at the axioms and arrives at the initial formula. Which is why usually one writes the formula on the bottom, and constructs the proof above, to make it appear in the same order as a Hilbert-style proof. This is a fiction, obviously, since in practice we always start with a formula to prove, and need to check many possible rules and paths, and create many branches, before arriving at a satisfying proof. I abandoned this fiction here, also for a more important reason: there is what could be called a bi-directional sequent calculus, which I use below for Sheffer stroke.
But before that, additional observations are in order. Notice that the axiom says, that the same formula \(\alpha\) must appear on both sides as the last element. But we finished our proof, with \(p\) being the first on the left. That's formally wrong. And we need several structural rules to remedy this, such as $$ \dfrac{p, q, \Gamma \vdash \Delta}{ q, p, \Gamma \vdash \Delta}\quad\text{or}\quad \dfrac{\Gamma \vdash p, \Delta}{ \Gamma \vdash p, p, \Delta}\quad\text{or}\quad \dfrac{\Gamma, p \vdash \Delta}{\Gamma\vdash \Delta}. $$ Annoying, aren't they? Yet, a computer system for sequents must be told all such rules too. They are called quite appropriately: permutation, contraction and weakening. I will assume the first two tacitly, because they all follow from the intended meta-formula (Int). Note however, that weakening has a peculiar feature: it only works in one direction. You can eliminate \(p\), to weaken your axioms/theorems to get to your desired sequent, but not the other way around, as it would then be possible to prove any \(q \vdash p\), through \(q, p \vdash q\).
If we want to move freely up and down the sequent tree, we can only have rules that preserve the full content. Then, so long as there are no branchings, we are dealing with equivalent sequents or formulae, it is more of a transformation than reduction. But since we are ultimately interested in a proof, we want to eliminate all connectives one by one, so that in the end, only variables are left, and the tree ends in axioms or counter examples. Here is a presentation about it, and if you want to know more about the standard sequent calculus start with its Wikipedia entry (opposite proof direction), or check out the Open Logic textbook, as I just barely scratched the surface here.
Needless to say, when there is on only one connective — Sheffer's stroke — the resulting sequent calculus must be one of the simplest possible, with effectively only two logical rules and one axiom. They are (omitting the arbitrary \(\Gamma\) and \(\Delta\) ): $$\mathsf{\small L}\; \dfrac{D\alpha\omega\vdash}{\vdash\alpha\quad \vdash\omega}, \quad\quad \mathsf{\small R}\; \dfrac{\vdash D\alpha\omega}{\alpha,\omega\vdash}, \quad\quad \text{\small AX:}\; \alpha \vdash\alpha. $$ It doesn't even seem necessary to supply the usual glosses (left-introduction-\(D\)), as the left rule is branching, the right rule is not, and there is only one connective. These rules can be applied in both directions, preserving all information, and leaving no real choices in the proof, as opposed to the usual natural deduction/sequent, where various paths have to be explored by trail and error (and intuition). What is so beautiful about this version is that it always reduces the number of connectives, and we must arrive at an axiom or a counter-example.
So let's look at some examples (and I apologize for not using the new 2D notation below — I have yet to work out a satisfying compromise between HTML and \(\LaTeX\)). The proof of \(DtDtt\), or \(t\rightarrow t\), is immediate: $$\dfrac{\vdash DtDtt}{ \dfrac{t,Dtt\vdash}{ \begin{matrix}t\vdash t\\\hline\end{matrix} \quad \begin{matrix}t\vdash t\\\hline\end{matrix} } } $$ It shows off what frequently happens in this version: \(Dtt\) gives duplicate branches, when moved from the left, so we will be skipping them. This will make the proof of the symmetry (\(s|t \rightarrow t|s)\) more concise: $$ \dfrac{\vdash DDstDDtsDts}{ \dfrac{Dst, DDtsDts\vdash}{ \dfrac{Dst \vdash Dts}{ \dfrac{t, s, Dst \vdash}{ \begin{matrix}t, s \vdash s\\\hline\end{matrix} \quad \begin{matrix}t, s \vdash t\\\hline\end{matrix} } } } } $$ And note, that we could have really started from \(Dst\vdash Dts\), with entailment instead of implication. We used the expanded version of \(CDtsDst\) instead, and this also shows (suggests) that the deduction theorem is built in. Let's check: $$ \dfrac{\Delta \vdash D\varphi D\chi\chi}{ \dfrac{\Delta, \varphi, D\chi\chi \vdash}{ {\Delta, \varphi \vdash \chi} } } $$ So yes, assuming that \(\chi\) follows from \(\varphi\) and premises \(\Delta\), we obtain that the implication \(C\varphi\chi\) follows from just the premises — the deduction (meta-)theorem holds. Note that we wanted to derive one sequent from another, and axioms were not needed.
Finally, let's find the counter-example to the single axiom with Quine's typo (marked with overbar), expanding his \(C\) to \(D\) in the second line: $$\dfrac{\vdash DD\varphi D\psi\chi DC\omega\omega CD\omega\psi D\varphi\bar\psi}{ \dfrac{D\varphi D\psi\chi,\; DD\omega D\omega\omega DD\omega\psi DD\varphi\bar\psi D\varphi\bar\psi \vdash}{ \dfrac{D\varphi D\psi\chi \vdash D\omega D\omega\omega}{ \dfrac{\omega, D\omega\omega, D\varphi D\psi\chi \vdash}{ \begin{matrix}\omega, D\varphi D\psi\chi \vdash \omega\\ \hline\end{matrix} } } \quad \dfrac{D\varphi D\psi\chi \vdash DD\omega\psi DD\varphi\bar\psi D\varphi\bar\psi}{ \dfrac{D\omega\psi, D\varphi D\psi\chi, DD\varphi\bar\psi D\varphi\bar\psi \vdash}{ \dfrac{D\omega\psi, D\varphi D\psi\chi \vdash D\varphi\bar\psi}{ \dfrac{D\omega\psi \vdash \varphi, D\varphi\bar\psi}{ \begin{matrix}\varphi, \bar\psi, D\omega\psi \vdash \varphi\\ \hline\end{matrix} } \quad \dfrac{D\omega\psi \vdash D\psi\chi, D\varphi\bar\psi}{ \dfrac{\psi, \chi, D\omega\psi \vdash D\varphi\bar\psi}{ \dfrac{\psi, \chi \vdash \psi, D\varphi\bar\psi}{} \quad \dfrac{\psi, \chi \vdash \omega, D\varphi\bar\psi}{ \varphi, \psi, \bar\psi, \chi \vdash \omega } } } } } } } }$$ and the last line contains only propositional variables so can't be brought to an axiom. Unless \(\bar\psi\) is \(\omega\), whereupon we get the correct axiom. If we didn't know where the mistake were in advance, the remaining sequent would be \(\varphi, \psi, \chi \vdash \omega\), and the analogy with tableaux tells us, that if \(\omega\) were true, then making all the other variables true would give a counter-example, i.e. a consistent assining of truth values to all variables that is is consistent with negation of the initial formula. Even shorter, in terms of the interpretation (Int), we would have \(\mathsf{T}\rightarrow \mathsf{F}\).
We seem to have lost something along the way, for “the axiom” is no longer an axiom, but just a theorem that we can prove. There is just the axiom (schema) of \(\alpha\vdash\alpha\), and the two rules. Nor is there modus ponens, although we can easily prove \(\alpha, D\alpha D\phi\omega \vdash \omega\). It would still be perfectly alright to derive a theorem by reducing it only to another thoerem proven earlier, and it would definitely save time. But each proof step would really mix the properties of \(D\) with the properties of the meta-implication in (Int). The analogy with tableaux makes it clear, that we are closer to checking truth values than analysing structure; and the instant proof of modus ponens means that the distinction between inference and material implication is blurred. This is the price we pay for the ease of proof. Perhaps a more intricate structure is impossible in systems which are both sound and complete, like this one.
Finally, what can be said about Łukasiewicz's organic theorems in this context? At first glance, his proof could be adopted as is, with modus ponens written as $$ \frac{\vdash\omega_1 \quad \omega_1 \vdash \omega_2 }{\vdash\omega_2} \mathsf{\small Cut}, $$ for the first step, but this forces us to use the dreaded cut rule. There is a cut-elimination theorem (for which I haven't been able to find a satisfying reference), which says that every proof can be rewritten without this rule by inlining, like in code. That is, the “lemma” stated as \(\vdash \omega_j\) has its own proof after all, so just write that proof out explicitly inside the proof of the next \(\vdash \omega_{j+1}\). But we can't do that here, because Łukasiewicz wants to show the progression without relying on other axioms: \(\omega_1\) is the only assumption, and only substition and modus ponens are used.
Let's try writing the whole thing as a sequent with its basic meaning: infering the more general \(\omega_0=CpCq\omega_3\) from the particular \(\omega_1=CqCq\omega_3\) is just the sequent \(DqD\omega_4\omega_4\vdash DpD\omega_4\omega_4\), where, in keeping with the previous section, \(\omega_4=DqD\omega_3\omega_3\) and \(\omega_3 = CrCsr\). The derivation reads $$ \dfrac{DqD\omega_4\omega_4 \vdash DpD\omega_4\omega_4}{ \dfrac{p, D\omega_4\omega_4, DqD\omega_4\omega_4 \vdash }{ \dfrac{p, DqD\omega_4\omega_4 \vdash \omega_4}{ \dfrac{p\vdash q, DqD\omega_3\omega_3}{ \begin{matrix}p, q, \vdash q, \omega_3\\ \hline \end{matrix} } \quad \dfrac{p\vdash\omega_4, D\omega_4\omega_4}{ \begin{matrix}p, \omega_4 \vdash \omega_4\\ \hline\end{matrix} } } } }\tag{\#} $$ So it holds. but... something doesn't feel right. Łukasiewicz doesn't use any other axioms in his proof, and we had to rely on the general \(q \vdash q\). Also, we didn't even have to expand \(\omega_3\) — any formula, even a false one, would have worked. But in his proof, it was crucial to derive \(\vdash\omega_3\) on its own, into which an appropriate substitution could be made to obtain \(\omega_0\). Part of his trick seems to be, that \(\omega_3\) is one of the standard Hilbert axioms, which blends neatly with modus ponens. It just has to be dug up from the unnecessary dressing3.2. Here, however, we already have some axioms included in the rules for \(D\). Both \(\omega_0\) and \(\omega_1\), can be proven independently of each other.
Having tried several paths, the best I came up with is to rewrite the relevant formulas as separate sequents,
and proceed from one to the other. Thus, our starting from \(\vdash\omega_1\) or
The only other (short) way to conduct the proof would be to say, upon reaching \(q\vdash\omega_3\), that we have a new axiom: \(\Gamma, q \vdash \omega_3, \Delta\) (and similarly for \(\vdash\omega_4\)). Then the second sequent tree would start from the axiom instance \(p,q \vdash \omega_3\), for which \(\Gamma=\{p\}\) and \(\Delta=\varnothing\). But this just hides the weakening (adding irrelevant assumptions) in a different step. Still it would also make the usual sequent proof \((\#)\) rely on those axioms alone and not on \(\alpha\vdash\alpha\). Perhaps it is the restriction on axioms that makes the system such, that cuts cannot be eliminated, or weakening abandoned. I was unable to (dis)prove it. So let us end here.
1.1 | Even if we were to conventionally assume that \(\neg\) precedes \(\land\), which precedes \(\rightarrow\), we still have the problem of associativity: how to insert parentheses into \(x\rightarrow y\rightarrow z\). Is it \((x\rightarrow y)\rightarrow z\) or \(x\rightarrow (y\rightarrow z)\)? And we need to place them before deciding on the order of evaluation. This too is a matter of convention, not fixed in stone, and gives rise to countless inane memes like “The answer to 4÷2÷2 will shock you!” |
1.2 | Not to worry, though, tensor indices supply a counter-example when there are more than 10 dimensions, and \(T_{123}^{5}\) has to be distinguished from \(T_{12,3}^5\). Oh, wait, that could be confused with the shorthand for derivatives: \(T_{12,3} := \partial_3 T_{12}\), let's use the semicolon instead! Too late — in relativity you also need the covariant derivative: \(T_{7;12} := \nabla_{12}T_7\). The horror never ends. |
1.25 |
Desk calculator (or dc)
which is a reverse-Polish calculator with arbitrary-precision arithetic, is, according to Wikipedia,
the oldest surviving Unix program, predating even C (the language, not the connective).
I was very surprised to find it installed by default on my system. And even
more surprised by how short the Chudnovsky algorithm for calculating the digits of \(\pi\) is in dc (one-line really). _640320[0ksslk3^16lkd12+sk*-lm*lhd1+sh3^/smlxlj*sxll545140134+dsllm*lxlnk/ls+dls!=P]sP3^sj7sn[6sk1ddshsxsm13591409dsllPx10005v426880*ls/K3-k1/pcln14+snlMx]dsMx And yes, dc is Turing-complete. |
1.3 | Remember that we care about pen and paper usage too. In code, this concatenation is unteable anyway and has to employ some sort of array type with explicitly separated elements. |
1.4 |
For example, here is a function definition with conditional statements
(defun inv (a) (if (and (> b a) (> a 1)) (/ (- b 1) (- a 1)) nil ) )which would ordinarily be written as $$ \mathsf{inv}(a,b):=\left\{ \begin{aligned} &\frac{b-1}{a-1} &&\mathrm{if}\quad b>a \land a > 1\\ &\mathsf{nil} &&\mathrm{otherwise} \end{aligned}\right., $$ which could be called with (inv 13 17), avoiding the confusion of just \(inv1371\). Or how about the Fibonacci sequence implemented as (defun フ (n &optional (l1 0 ) (l2 1)) (if (< n 2) (* n l2) (フ (- n 1) l2 (+ l1 l2))))so that \(F_{15}\) is obtained through (フ 15). We also see, why some people take Lisp to stand for Lots of Irritating Superfluous Parentheses. |
2.1 | Jan Łukasiewicz, "Uwagi o aksyomacie Nicod'a i o “dedukcyi uogólniającej”",
Księga pamiątkowa Polskiego Towarzystwa Filozoficznego we Lwowie, Lwów (1931)
sbc.org.pl/dlibra/publication/edition/18864 The English version "Comments on Nicod's Axiom and on “Generalizing Deduction”" can be found in Selected Works by Jan Łukasiewicz (ed. by L. Borkowski), Studies in logic and the foundations of mathematics, North-Holland 1970. |
2.15 | Willard Van Orman Quine, "New Foundations for Mathematical Logic", The American Mathematical Monthly, 44:2, 70-80, (1937) DOI: 10.1080/00029890.1937.11987928 |
2.2 | Thomas W. Scharle, "Axiomatization of propositional calculus with Scheffer functors", Notre Dame Journal of Formal Logic, Volume VI, Number 3 (1965) |
3.1 | As always some convention is needed to account for zero arguments. Because truth is the neutral element of conjuction, \(\mathsf{T}\land p = p\), any repetition can be extended via \(p_1\land p_2 = p_1\land p_2\land\mathsf{T}\). An argumentless conjuction will be just the \(\mathsf{T}\). And likewise for the alternative, whose neutral element is falsity, or \(\mathsf{F}\). Another way to think about it is the analogy between multiplication and addition. Repeated addition is multiplication: \(n\) additions of \(x\) is \(nx\) equal to 0 for \(n=0\). Repeated multiplication is exponentiation and \(x^0 = 1\). |
3.2 | Łukasiewicz also gives an intuitive account of building his formula ground up. The basic building block is the tautology \(r\rightarrow(s\rightarrow r)\), which is obvious (\(r\) follows from \(r\) even if we add a spurious condition \(s\)). But if the tautology is true unconditionally, it's true on condition \(q\), from which he builds both \(\omega_1\) and \(\omega_0\). This is not so much a proof, as an additional illustration that these formulae really are true. It also hints at the problems we have with \(\omega_3\) being true on its own. |