A classical theorem is that $\sqrt{2}$ is irrational. https://en.wikipedia.org/wiki/Square_root_of_2#Proofs_of_irrationality This means it cannot be written as $\frac{p}{q}$ for integers $p$ and $q$.

This and the existence of an infinite number of primes are some simple prominent theorems and reasonable test cases for a system intended to support proofs.

There is an interesting article The Seventeen Provers of the World compiled by Freek Wiedijk which does the irrationality of two in many different systems.

I’ve been interested in trying to build infrastructure for a highly automated interactive proof system using off the shelf solvers and this is an test case.

First off, just asking the question in a naive-ish way chokes the solvers. It is not inconceivable that they could have baked this in.

from z3 import *
# from cvc5.pythonic import * # cvc5 is a pretty good dropin. It also fails here.
n,m = Ints("n m")
prove(Not(Exists([n,m], And(m != 0, n*n == 2 * m * m))))
failed to prove



---------------------------------------------------------------------------

Z3Exception                               Traceback (most recent call last)

File ~/.local/lib/python3.10/site-packages/z3/z3.py:7130, in Solver.model(self)
   7129 try:
-> 7130     return ModelRef(Z3_solver_get_model(self.ctx.ref(), self.solver), self.ctx)
   7131 except Z3Exception:


File ~/.local/lib/python3.10/site-packages/z3/z3core.py:4065, in Z3_solver_get_model(a0, a1, _elems)
   4064 r = _elems.f(a0, a1)
-> 4065 _elems.Check(a0)
   4066 return r


File ~/.local/lib/python3.10/site-packages/z3/z3core.py:1475, in Elementaries.Check(self, ctx)
   1474 if err != self.OK:
-> 1475     raise self.Exception(self.get_error_message(ctx, err))


Z3Exception: b'there is no current model'


During handling of the above exception, another exception occurred:


Z3Exception                               Traceback (most recent call last)

BLAH BLAH BLAH

Z3Exception: model is not available

Ok, well we may have to help it along.

I took a look in Baby Rudin to see what their proof looks like. This led me down a rabbithole that was seemingly leading no where.

I started proving lemmas about even and odd numbers. There were some basic facts that seemed difficult to push through.

even = Function("even", IntSort(), BoolSort())
even_def = ForAll(x, even(x) == Exists([y], x == 2*y))
odd = Function("odd", IntSort(), BoolSort())
odd_def = ForAll(x, odd(x) == Exists([y], x == 2*y+1))

even_or_odd = ForAll(x, even(x) != odd(x))
# Z3 can't prove this for some reason. I suppose this might be a complex diophantine equation or something.
# CVC5 seemingly can prove this.
prove(Implies(And(even_def, odd_def), even_or_odd))

div2_even = ForAll([x], even(x) == (x == 2 * (x/2)))
prove(Implies(even_def, div2_even))

div2_odd = ForAll([x], odd(x) == (x == 2 * (x/2) + 1))
prove(Implies(odd_def, div2_odd))

So instead I took a look at the Otter proof in the above paper. Otter is an automated theorem and the predecessor of Prover9. It offers some features with more operational flavor that the modern contingent of solvers (vampire and eprover) don’t.

This encoding of the problem seems pretty clever. d(x,y) is the y is divisible by x. m is multiplication. I think implicitly it is working over positive integers to avoid != 0 constraints.

Note that in CNF, ~ foo | bar should often be read as foo => bar which is classically equivalent. In that case, this reads as some equations and horn clauses / conditional equations.

%%file /tmp/sqrt2.in
% From the Wiedijk review. Wos, Beeson, McCune
set(auto).
%set(ur_res).
%assign(max_distinct_vars, 1).
formulas(sos).
    x = x.  %? Why did they write this?
    m(1,x) = x. % identity
    m(x,1) = x.
    m(x,m(y,z)) = m(m(x,y),z). % associativity
    m(x,y) = m(y,x). % commutativity
    m(x,y) != m(x,z) | y = z. % cancellation... ??? What about x = 0? Implicit positive integers only?
    -d(x,y) | m(x,f(x,y)) = y. % this and next line define divides
    m(x,z) != y | d(x,y).
    -d(2,m(x,y)) | d(2,x) | d(2,y). % 2 is prime
    m(a,a) = m(2,m(b,b)). % a/b = 2
    -d(x,a) | -d(x,b) | x = 1. % a/b is in lowest terms
    2 != 1. % I almost forgot this!
end_of_list.
Overwriting /tmp/sqrt2.in
! prover9 -f /tmp/sqrt2.in
============================== Prover9 ===============================
Prover9 (64) version 2017-11A (CIIRC), November 2017.
Process 370431 was started by philip on philip-ThinkPad-Z13-Gen-2,
Mon Apr 29 20:47:34 2024
The command was "prover9 -f /tmp/sqrt2.in".
============================== end of head ===========================

============================== INPUT =================================

% Reading from file /tmp/sqrt2.in

set(auto).
    % set(auto) -> set(auto_inference).
    % set(auto) -> set(auto_setup).
    % set(auto_setup) -> set(predicate_elim).
    % set(auto_setup) -> assign(eq_defs, unfold).
    % set(auto) -> set(auto_limits).
    % set(auto_limits) -> assign(max_weight, "100.000").
    % set(auto_limits) -> assign(sos_limit, 20000).
    % set(auto) -> set(auto_denials).
    % set(auto) -> set(auto_process).

formulas(sos).
x = x.
m(1,x) = x.
m(x,1) = x.
m(x,m(y,z)) = m(m(x,y),z).
m(x,y) = m(y,x).
m(x,y) != m(x,z) | y = z.
-d(x,y) | m(x,f(x,y)) = y.
m(x,z) != y | d(x,y).
-d(2,m(x,y)) | d(2,x) | d(2,y).
m(a,a) = m(2,m(b,b)).
-d(x,a) | -d(x,b) | x = 1.
2 != 1.
end_of_list.

============================== end of input ==========================

============================== PROCESS NON-CLAUSAL FORMULAS ==========

% Formulas that are not ordinary clauses:

============================== end of process non-clausal formulas ===

============================== PROCESS INITIAL CLAUSES ===============

% Clauses before input processing:

formulas(usable).
end_of_list.

formulas(sos).
x = x.  [assumption].
m(1,x) = x.  [assumption].
m(x,1) = x.  [assumption].
m(x,m(y,z)) = m(m(x,y),z).  [assumption].
m(x,y) = m(y,x).  [assumption].
m(x,y) != m(x,z) | y = z.  [assumption].
-d(x,y) | m(x,f(x,y)) = y.  [assumption].
m(x,y) != z | d(x,z).  [assumption].
-d(2,m(x,y)) | d(2,x) | d(2,y).  [assumption].
m(a,a) = m(2,m(b,b)).  [assumption].
-d(x,a) | -d(x,b) | x = 1.  [assumption].
2 != 1.  [assumption].
end_of_list.

formulas(demodulators).
end_of_list.

============================== PREDICATE ELIMINATION =================

No predicates eliminated.

============================== end predicate elimination =============

Auto_denials:  (non-Horn, no changes).

Term ordering decisions:
Predicate symbol precedence:  predicate_order([ =, d ]).
Function symbol precedence:  function_order([ 2, 1, a, b, m, f ]).
After inverse_order:  (no changes).
Unfolding symbols: (none).

Auto_inference settings:
  % set(paramodulation).  % (positive equality literals)
  % set(binary_resolution).  % (non-Horn)
  % set(neg_ur_resolution).  % (non-Horn, less than 100 clauses)

Auto_process settings:
  % set(factor).  % (non-Horn)
  % set(unit_deletion).  % (non-Horn)

kept:      1 m(1,x) = x.  [assumption].
kept:      2 m(x,1) = x.  [assumption].
           3 m(x,m(y,z)) = m(m(x,y),z).  [assumption].
kept:      4 m(m(x,y),z) = m(x,m(y,z)).  [copy(3),flip(a)].

% Operation m is commutative; C redundancy checks enabled.
kept:      5 m(x,y) = m(y,x).  [assumption].
kept:      6 m(x,y) != m(x,z) | y = z.  [assumption].
kept:      7 -d(x,y) | m(x,f(x,y)) = y.  [assumption].
kept:      8 m(x,y) != z | d(x,z).  [assumption].
kept:      9 -d(2,m(x,y)) | d(2,x) | d(2,y).  [assumption].
           10 m(a,a) = m(2,m(b,b)).  [assumption].
kept:      11 m(2,m(b,b)) = m(a,a).  [copy(10),flip(a)].
           12 -d(x,a) | -d(x,b) | x = 1.  [assumption].
kept:      13 -d(x,a) | -d(x,b) | 1 = x.  [copy(12),flip(c)].
           14 2 != 1.  [assumption].
kept:      15 1 != 2.  [copy(14),flip(a)].
kept:      16 -d(2,m(x,x)) | d(2,x).  [factor(9,b,c)].

============================== end of process initial clauses ========

============================== CLAUSES FOR SEARCH ====================

% Clauses after input processing:

formulas(usable).
end_of_list.

formulas(sos).
1 m(1,x) = x.  [assumption].
2 m(x,1) = x.  [assumption].
4 m(m(x,y),z) = m(x,m(y,z)).  [copy(3),flip(a)].
5 m(x,y) = m(y,x).  [assumption].
6 m(x,y) != m(x,z) | y = z.  [assumption].
7 -d(x,y) | m(x,f(x,y)) = y.  [assumption].
8 m(x,y) != z | d(x,z).  [assumption].
9 -d(2,m(x,y)) | d(2,x) | d(2,y).  [assumption].
11 m(2,m(b,b)) = m(a,a).  [copy(10),flip(a)].
13 -d(x,a) | -d(x,b) | 1 = x.  [copy(12),flip(c)].
15 1 != 2.  [copy(14),flip(a)].
16 -d(2,m(x,x)) | d(2,x).  [factor(9,b,c)].
end_of_list.

formulas(demodulators).
1 m(1,x) = x.  [assumption].
2 m(x,1) = x.  [assumption].
4 m(m(x,y),z) = m(x,m(y,z)).  [copy(3),flip(a)].
5 m(x,y) = m(y,x).  [assumption].
        % (lex-dep)
11 m(2,m(b,b)) = m(a,a).  [copy(10),flip(a)].
end_of_list.

============================== end of clauses for search =============

============================== SEARCH ================================

% Starting search at 0.00 seconds.

given #1 (I,wt=5): 1 m(1,x) = x.  [assumption].

given #2 (I,wt=5): 2 m(x,1) = x.  [assumption].

BLAH BLAH BLAH - PHIL

given #130 (A,wt=14): 40 m(x,m(y,z)) != m(y,u) | m(x,z) = u.  [para(17(a,1),6(a,1))].

given #131 (F,wt=9): 173 m(b,b) != m(x,m(2,y)).  [para(17(a,1),108(a,2))].

given #132 (F,wt=7): 639 m(b,b) != m(x,a).  [para(81(a,1),173(a,2,2))].
-------- Proof 1 -------- 

============================== PROOF =================================

% Proof 1 at 0.02 (+ 0.01) seconds.
% Length of proof is 27.
% Level of proof is 10.
% Maximum clause weight is 12.000.
% Given clauses 132 (13.200 givens/level).

3 m(x,m(y,z)) = m(m(x,y),z).  [assumption].
4 m(m(x,y),z) = m(x,m(y,z)).  [copy(3),flip(a)].
5 m(x,y) = m(y,x).  [assumption].
6 m(x,y) != m(x,z) | y = z.  [assumption].
7 -d(x,y) | m(x,f(x,y)) = y.  [assumption].
8 m(x,y) != z | d(x,z).  [assumption].
9 -d(2,m(x,y)) | d(2,x) | d(2,y).  [assumption].
10 m(a,a) = m(2,m(b,b)).  [assumption].
11 m(2,m(b,b)) = m(a,a).  [copy(10),flip(a)].
12 -d(x,a) | -d(x,b) | x = 1.  [assumption].
13 -d(x,a) | -d(x,b) | 1 = x.  [copy(12),flip(c)].
14 2 != 1.  [assumption].
15 1 != 2.  [copy(14),flip(a)].
16 -d(2,m(x,x)) | d(2,x).  [factor(9,b,c)].
17 m(x,m(y,z)) = m(y,m(x,z)).  [para(5(a,1),4(a,1,1)),rewrite([4(2)])].
33 d(2,m(a,a)).  [resolve(11,a,8,a)].
36 m(a,a) != m(2,x) | m(b,b) = x.  [para(11(a,1),6(a,1))].
79 d(2,a).  [resolve(33,a,16,a)].
81 m(2,f(2,a)) = a.  [resolve(79,a,7,a)].
82 -d(2,b).  [ur(13,a,79,a,c,15,a)].
101 -d(2,m(b,b)).  [ur(16,b,82,a)].
108 m(b,b) != m(2,x).  [ur(8,b,101,a),flip(a)].
173 m(b,b) != m(x,m(2,y)).  [para(17(a,1),108(a,2))].
186 m(2,m(f(2,a),x)) = m(a,x).  [para(81(a,1),4(a,1,1)),flip(a)].
639 m(b,b) != m(x,a).  [para(81(a,1),173(a,2,2))].
642 m(a,a) != m(2,m(x,a)).  [ur(36,b,639,a)].
643 $F.  [resolve(642,a,186,a(flip))].

============================== end of proof ==========================

============================== STATISTICS ============================

Given=132. Generated=2152. Kept=638. proofs=1.
Usable=127. Sos=485. Demods=56. Limbo=1, Disabled=36. Hints=0.
Kept_by_rule=0, Deleted_by_rule=0.
Forward_subsumed=1514. Back_subsumed=15.
Sos_limit_deleted=0. Sos_displaced=0. Sos_removed=0.
New_demodulators=58 (2 lex), Back_demodulated=9. Back_unit_deleted=0.
Demod_attempts=18123. Demod_rewrites=1480.
Res_instance_prunes=0. Para_instance_prunes=0. Basic_paramod_prunes=0.
Nonunit_fsub_feature_tests=1506. Nonunit_bsub_feature_tests=528.
Megabytes=0.81.
User_CPU=0.02, System_CPU=0.01, Wall_clock=0.

============================== end of statistics =====================

============================== end of search =========================

THEOREM PROVED

THEOREM PROVED

Exiting with 1 proof.

------ process 370431 exit (max_proofs) ------

Process 370431 exit (max_proofs) Mon Apr 29 20:47:34 2024

This problem is solved instantly. I thought the same problem given to vampire or eprover takes a while (10s for vampire, a couple minutes for eprover) but it turns out running them without their command line arguments significantly slows them down. I think this has tripped me up before. These seems like very bad defaults. I used the included ladr_to_tptp tool to extract the following tptp problem.

%%file /tmp/sqrt.p

cnf(sos,axiom,m(one,A) = A).
cnf(sos,axiom,m(A,one) = A).
cnf(sos,axiom,m(A,m(B,C)) = m(m(A,B),C)).
cnf(sos,axiom,m(A,B) = m(B,A)).
cnf(sos,axiom,m(A,B) != m(A,C) | B = C).
cnf(sos,axiom,~ d(A,B) | m(A,f(A,B)) = B).
cnf(sos,axiom, m(A,B) != C | d(A,C)).
cnf(sos,axiom,~ d(two,m(A,B)) | d(two,A) | d(two,B)).
cnf(sos,axiom,m(a,a) = m(two,m(b,b))).
cnf(sos,axiom,~ d(A,a) | ~ d(A,b) | A = one).
cnf(sos,axiom,~ two = one).

Both of these succeed basically immediately, but the output is quite long.

! vampire --mode casc /tmp/sqrt.p
% Running in auto input_syntax mode. Trying TPTP
% lrs+10_11_cond=on:drc=off:flr=on:fsr=off:gsp=on:gs=on:gsem=off:lma=on:msp=off:nm=4:nwc=1.5:nicw=on:sas=z3:sims=off:sp=scramble:stl=188_1169 on sqrt for (600ds/0Mi)
perf_event_open failed (instruction limiting will be disabled): Permission denied
(If you are seeing 'Permission denied' ask your admin to run 'sudo sysctl -w kernel.perf_event_paranoid=-1' for you.)
% Refutation found. Thanks to Tanya!
% SZS status Unsatisfiable for sqrt
% SZS output start Proof for sqrt
fof(f39584,plain,(
  $false),
  inference(avatar_smt_refutation,[],[f17,f22,f49,f84,f118,f123,f132,f137,f143,f186,f192,f212,f262,f268,f318,f322,f326,f332,f353,f363,f384,f393,f398,f400,f411,f419,f461,f704,f761,f766,f772,f981,f1016,f1022,f1284,f1409,f1414,f1416,f1442,f1447,f1452,f1516,f1569,f1807,f2104,f3775,f3966,f3971,f3977,f4241,f4400,f4406,f4415,f4437,f4442,f4666,f4671,f4677,f4683,f4950,f4963,f5012,f5016,f5020,f5062,f5069,f5217,f5222,f5227,f5242,f5247,f5254,f5259,f5320,f5539,f5544,f5549,f5554,f5559,f5564,f5569,f5574,f5579,f5586,f5591,f5596,f5689,f5695,f5960,f5967,f5976,f5983,f6011,f6026,f6035,f6042,f6048,f6054,f6060,f6066,f6072,f6078,f6084,f6090,f6096,f6102,f6108,f6206,f6211,f6216,f6222,f6229,f6234,f6235,f6245,f7019,f7058,f7063,f7068,f7165,f7166,f7171,f7172,f7177,f7182,f7187,f7194,f7199,f7200,f7208,f7660,f8108,f8114,f8120,f8124,f8130,f8135,f8141,f8146,f8165,f8170,f8175,f8177,f8191,f8196,f8219,f8222,f8234,f9517,f10565,f10647,f10753,f10759,f10819,f10826,f11051,f12292,f12299,f12305,f12428,f12436,f12560,f12750,f14032,f14316,f15133,f15139,f15145,f15151,f15157,f15163,f15164,f15169,f15174,f15183,f15190,f15205,f15210,f15216,f15227,f15233,f15242,f15291,f15297,f19429,f19489,f19498,f19591,f19599,f19668,f22729,f25452,f25458,f25464,f25538,f25718,f26169,f26173,f28151,f28165,f28811,f28822,f29812,f29824,f30299,f30816,f30951,f30975,f31082,f31241,f31255,f31260,f31294,f31482,f32339,f32354,f33397,f33412,f35130,f38938,f39203,f39210,f39215,f39221,f39239,f39245,f39254,f39262,f39265,f39270,f39278,f39290,f39297,f39302,f39310,f39311,f39312,f39313,f39315,f39355,f39365,f39377,f39398,f39410,f39415,f39436,f39449,f39462,f39467,f39472,f39477,f39512,f39525,f39530,f39544,f39550,f39570,f39575,f39578,f39583])).
fof(f39583,plain,(
  spl0_232 | ~spl0_118),
  inference(avatar_split_clause,[],[f39577,f7158,f39580])).
fof(f39580,plain,(
  spl0_232 <=> b = m(two,f(two,b))),
  introduced(avatar_definition,[new_symbols(naming,[spl0_232])])).

BLAH BLAH BLAH - PHIL

  inference(avatar_split_clause,[],[f39,f46,f42])).
fof(f39,plain,(
  ~d(b,a) | one = b),
  inference(resolution,[],[f10,f24])).
fof(f22,plain,(
  spl0_2),
  inference(avatar_split_clause,[],[f9,f19])).
fof(f9,axiom,(
  m(a,a) = m(two,m(b,b))),
  file('/tmp/sqrt.p',unknown)).
fof(f17,plain,(
  ~spl0_1),
  inference(avatar_split_clause,[],[f11,f14])).
fof(f11,axiom,(
  one != two),
  file('/tmp/sqrt.p',unknown)).
% SZS output end Proof for sqrt
% ------------------------------
% Version: Vampire 4.8 (commit 23d7e9c3b on 2024-03-22 08:41:08 +0000)
% Linked with Z3 4.12.3.0 79bbbf76d0c123481c8ca05cd3a98939270074d3 z3-4.8.4-7980-g79bbbf76d
% Termination reason: Refutation

% Memory used [KB]: 6366
% Time elapsed: 0.542 s
% ------------------------------
% ------------------------------
% Success in time 0.568 s
! eprover-ho --auto-schedule --proof-object /tmp/sqrt.p
# Preprocessing class: FSSSSMSSSSSNFFN.
# Scheduled 1 strats onto 1 cores with 300 seconds (300 total)
# Starting G-E--_302_C18_F1_URBAN_RG_S04BN with 300s (1) cores
# G-E--_302_C18_F1_URBAN_RG_S04BN with pid 415821 completed with status 0
# Result found by G-E--_302_C18_F1_URBAN_RG_S04BN
# No SInE strategy applied
# Search class: FGUSM-FFSF22-SFFFFFNN
# Scheduled 6 strats onto 1 cores with 300 seconds (300 total)
# Starting SAT001_MinMin_p005000_rr_RG with 109s (1) cores
# SAT001_MinMin_p005000_rr_RG with pid 415822 completed with status 0
# Result found by SAT001_MinMin_p005000_rr_RG
# Initializing proof state
# Scanning for AC axioms
# m is AC
# AC handling enabled
#
#cnf(i_0_22, plain, (one!=two)).
#
#cnf(i_0_13, plain, (m(X1,one)=X1)).
#
#cnf(i_0_12, plain, (m(one,X1)=X1)).
#
#cnf(i_0_16, plain, (X2=X3|m(X1,X2)!=m(X1,X3))).
#
#cnf(i_0_21, plain, (X1=one|~d(X1,a)|~d(X1,b))).
#
#cnf(i_0_15, plain, (m(X1,X2)=m(X2,X1))).
#
#cnf(i_0_20, plain, (m(two,m(b,b))=m(a,a))).
#
#cnf(i_0_18, plain, (d(X1,X3)|m(X1,X2)!=X3)).
#
#cnf(i_0_18, plain, (d(X1,m(X1,X2)))).

BLAH BLAH BLAH - PHIL

###########
#cnf(i_0_24911, plain, (m(a,m(b,b))!=one|m(a,m(b,b))!=two)).
######################
#cnf(i_0_27279, plain, (m(X2,m(a,X1))!=m(X1,X2)|m(b,b)!=a)).
#####
#cnf(i_0_695, plain, (d(two,m(X1,X2))|d(two,X3)|~d(two,m(X1,m(X3,X2))))).
###
#cnf(i_0_35819, plain, (a!=one|a!=two|~d(m(a,X2),X1)|~d(X1,X2))).
#########
#cnf(i_0_24720, plain, (d(X1,m(b,b))|~d(X1,a))).
######
#cnf(i_0_38677, plain, (d(two,b))).

# Proof found!
# SZS status Unsatisfiable
# SZS output start CNFRefutation
cnf(sos, axiom, (d(X1,X3)|m(X1,X2)!=X3), file('/tmp/sqrt.p', sos)).
cnf(sos, axiom, (m(X1,f(X1,X2))=X2|~d(X1,X2)), file('/tmp/sqrt.p', sos)).
cnf(sos, axiom, (m(X1,m(X2,X3))=m(m(X1,X2),X3)), file('/tmp/sqrt.p', sos)).
cnf(sos, axiom, (X2=X3|m(X1,X2)!=m(X1,X3)), file('/tmp/sqrt.p', sos)).
cnf(sos, axiom, (d(two,X1)|d(two,X2)|~d(two,m(X1,X2))), file('/tmp/sqrt.p', sos)).
cnf(sos, axiom, (m(a,a)=m(two,m(b,b))), file('/tmp/sqrt.p', sos)).
cnf(sos, axiom, (m(X1,X2)=m(X2,X1)), file('/tmp/sqrt.p', sos)).
cnf(sos, axiom, (X1=one|~d(X1,a)|~d(X1,b)), file('/tmp/sqrt.p', sos)).
cnf(sos, axiom, (two!=one), file('/tmp/sqrt.p', sos)).
cnf(c_0_9, plain, (d(X1,X3)|m(X1,X2)!=X3), inference(fof_simplification,[status(thm)],[sos])).
cnf(c_0_10, plain, (m(X1,f(X1,X2))=X2|~d(X1,X2)), inference(fof_simplification,[status(thm)],[sos])).
cnf(c_0_11, plain, (d(X1,X3)|m(X1,X2)!=X3), c_0_9).
cnf(c_0_12, plain, (m(X1,f(X1,X2))=X2|~d(X1,X2)), c_0_10).
cnf(c_0_13, axiom, (m(X1,m(X2,X3))=m(m(X1,X2),X3)), sos).
cnf(c_0_14, plain, (X2=X3|m(X1,X2)!=m(X1,X3)), inference(fof_simplification,[status(thm)],[sos])).
cnf(c_0_15, plain, (d(X1,m(X1,X2))), inference(er,[status(thm)],[c_0_11])).
cnf(c_0_16, plain, (m(X1,m(X2,f(m(X1,X2),X3)))=X3|~d(m(X1,X2),X3)), inference(spm,[status(thm)],[c_0_12, c_0_13])).
cnf(c_0_17, plain, (X2=X3|m(X1,X2)!=m(X1,X3)), c_0_14).
cnf(c_0_18, plain, (d(two,X1)|d(two,X2)|~d(two,m(X1,X2))), inference(fof_simplification,[status(thm)],[sos])).
cnf(c_0_19, axiom, (m(a,a)=m(two,m(b,b))), sos).
cnf(c_0_20, plain, (d(X1,X2)|~d(m(X1,X3),X2)), inference(spm,[status(thm)],[c_0_15, c_0_16])).
cnf(c_0_21, plain, (f(X1,m(X1,X2))=X2), inference(cn,[status(thm)],[inference(rw,[status(thm)],[inference(er,[status(thm)],[inference(spm,[status(thm)],[c_0_17, c_0_12])]), c_0_15])])).
cnf(c_0_22, plain, (m(X1,m(f(X1,X2),X3))=m(X2,X3)|~d(X1,X2)), inference(spm,[status(thm)],[c_0_13, c_0_12])).
cnf(c_0_23, plain, (d(two,X1)|d(two,X2)|~d(two,m(X1,X2))), c_0_18).
cnf(c_0_24, plain, (d(two,m(a,a))), inference(spm,[status(thm)],[c_0_15, c_0_19])).
cnf(c_0_25, plain, (d(X1,X2)|~d(X3,X2)|~d(X1,X3)), inference(spm,[status(thm)],[c_0_20, c_0_12])).
cnf(c_0_26, plain, (f(two,m(a,a))=m(b,b)), inference(spm,[status(thm)],[c_0_21, c_0_19])).
cnf(c_0_27, plain, (f(X1,m(X2,X3))=m(f(X1,X2),X3)|~d(X1,X2)), inference(spm,[status(thm)],[c_0_21, c_0_22])).
cnf(c_0_28, plain, (d(two,a)), inference(spm,[status(thm)],[c_0_23, c_0_24])).
cnf(c_0_29, axiom, (m(X1,X2)=m(X2,X1)), sos).
cnf(c_0_30, plain, (d(X1,m(X2,X3))|~d(X1,X2)), inference(spm,[status(thm)],[c_0_25, c_0_15])).
cnf(c_0_31, plain, (m(a,f(two,a))=m(b,b)), inference(rw,[status(thm)],[inference(cn,[status(thm)],[inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_26, c_0_27]), c_0_28])]), c_0_29])).
cnf(c_0_32, plain, (X1=one|~d(X1,a)|~d(X1,b)), inference(fof_simplification,[status(thm)],[sos])).
cnf(c_0_33, plain, (d(X1,m(b,b))|~d(X1,a)), inference(spm,[status(thm)],[c_0_30, c_0_31])).
cnf(c_0_34, plain, (two!=one), inference(fof_simplification,[status(thm)],[sos])).
cnf(c_0_35, plain, (X1=one|~d(X1,a)|~d(X1,b)), c_0_32).
cnf(c_0_36, plain, (d(two,b)), inference(cn,[status(thm)],[inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_23, c_0_33]), c_0_28])])).
cnf(c_0_37, plain, (two!=one), c_0_34).
cnf(c_0_38, plain, ($false), inference(sr,[status(thm)],[inference(cn,[status(thm)],[inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_35, c_0_36]), c_0_28])]), c_0_37]), ['proof']).
# SZS output end CNFRefutation

Anyhow, with that guidance, we can attempt to write a proof like thing in python.

I’ve been tinkering with trying to enforce a proof discipline / kernel in python, but I’ve been leaning towards reducing it all to idioms until I’m more sure what I want.

The kernel is only enforcing rules that I think make sense anyway.

A small wrapper enables us the make stepwise proofs. We only give python names to things that we want to accept as axioms or things that come out of lemma. The justifications that go into by should only be previously named theorems.

This avoids all the rigamorole of keeping a separate Theorem datatype or theorem database, but is more susceptible to accidental error than the styles I’ve used in those other posts.

def lemma(thm, by=[]):
    prove(Implies(And(by), thm))
    return thm

The following let’s me see last assignment, prewtty printing the statement of the lemma in a nice form.

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "last_expr_or_assign"

The strategy is to make opaque functions that we can hide the definitions that may be confusing the solver. The approach semi fails.

Z = IntSort()
B = BoolSort()
x,y,z = Ints("x y z")
mul = Function("mul", Z, Z, Z)
mul_def = ForAll([x,y], mul(x, y) == x*y)
proved

∀x : mul(x, 1) = x

Z3 easily proves the basic algebraic properties of multiplication

mul_x_1 = lemma(ForAll([x], mul(x, 1) == x), by=[mul_def])
proved

∀x : mul(x, 1) = x

mul_1_x = lemma(ForAll([x], mul(1, x) == x), by=[mul_def])
proved

∀x : mul(1, x) = x

mul_assoc = lemma(ForAll([x,y,z], mul(x, mul(y,z)) == mul(mul(x,y), z)), by=[mul_def])
proved

∀x, y, z : mul(x, mul(y, z)) = mul(mul(x, y), z)

mul_comm = lemma(ForAll([x,y], mul(x,y) == mul(y,x)), by=[mul_def])
proved

∀x, y : mul(x, y) = mul(y, x)

mul_cancel = lemma(ForAll([x,y,z], Implies(And(x != 0, mul(x,y) == mul(x,z)), y == z)), by=[mul_def])
proved

∀x, y, z : x ≠ 0 ∧ mul(x, y) = mul(x, z) ⇒ y = z

We can define a divides relation. I define it in terms of x/y which is flooring division. Is there a built in divides relation in SMT?

divides = Function("divides", Z, Z, B)
divides_def = ForAll([x,y], divides(x,y) == (x == y*(x/y)))

∀x, y : divides(x, y) = (x = y·(x/y))

div_mul_cancel = lemma(ForAll([x,y], Implies(divides(x,y), mul(x/y, y) == x)), by=[divides_def, mul_def])
proved

∀x, y : divides(x, y) ⇒ mul(x/y, y) = x

Here we have two properties that I can’t get to pass. They are basic facts. We don’t get a counterexample, so perhaps that lends a mild amount of credence.

#mul_divides = lemma(ForAll([x,y,z], Implies(mul(x,y) == z, divides(z,x))), by=[mul_def, divides_def])
# can't get this one to prove
mul_divides = ForAll([x,y,z], Implies(mul(x,y) == z, divides(z,x)))

∀x, y, z : mul(x, y) = z ⇒ divides(z, x)

#two_prime = lemma(ForAll([x,y], Implies(divides(mul(x,y),2), Or(divides(x,2), divides(y,2)))), by=[divides_def, mul_def])
# can't get this one to pass
two_prime = ForAll([x,y], Implies(divides(mul(x,y),2), Or(divides(x,2), divides(y,2))))

∀x, y : divides(mul(x, y), 2) ⇒ divides(x, 2) ∨ divides(y, 2)

Part of the cleverness of the Otter proof is abusing this notion that every fraction can be expressed in lowest common terms. This is a nontrivial seeming fact.

Here I use an expanded form of solve because I was fiddling with the smt formula.

no_common_factor = ForAll([z], Implies(And(divides(x,z), divides(y,z)), z == 1))
prime_thm = Not(Exists([x,y], And(no_common_factor, x != 0, mul(x,x) == mul(2,mul(y,y)))))
lemmas = [mul_1_x, mul_x_1, mul_comm, mul_assoc, mul_divides, mul_cancel, div_mul_cancel, two_prime]
s = Solver()
s.add(lemmas)
s.add(Not(prime_thm))
with open("/tmp/prime.smt2", "w") as f:
    f.write(s.sexpr()) 
#lemma(Not(Exists([x,y], And(no_factor, mul(x,x) == mul(2,mul(y,y))))), by=[mul_1_x, mul_x_1, mul_assoc, two_prime, div_mul_cancel])
s.check()

unsat

Bits and Bobbles

I am not satisfied. There are big holes and there was some nontrivial workarounds. I’m not particularly sure my thrashing even helped that much.

Recent prover9 paper https://formalweb3.uibk.ac.at/docs/24/kajjck-lpar24.pdf

CVC5 debugging interfaces https://cvc5.github.io/blog/2024/04/15/interfaces-for-understanding-cvc5.html timeout cores

!vampire --mode tclausify /tmp/prime.smt2  # vampire is useful for converting to clauses or smt to tptp
% Running in auto input_syntax mode. Trying SMTLIB2
tff(func_def_0, type, mul: ($int * $int) > $int).
tff(func_def_12, type, sK3: $int).
tff(func_def_13, type, sK4: $int).
tff(pred_def_1, type, divides: ($int * $int) > $o).
tff(pred_def_5, type, lG2: ($int * $int) > $o).
tff(u63,axiom,
    (lG2(sK4,sK3) | ~sLP1)).

tff(u62,axiom,
    ((0 != sK3) | ~sLP1)).

tff(u61,axiom,
    ((mul(2,mul(sK4,sK4)) = mul(sK3,sK3)) | ~sLP1)).

tff(u64,axiom,
    (![X2 : $int, X0 : $int, X1 : $int] : (((1 = X2) | ~divides(X1,X2) | ~divides(X0,X2) | ~lG2(X0,X1))))).

tff(u35,axiom,
    (![X0 : $o] : (((($true = X0)) | (($false = X0)))))).

tff(u34,axiom,
    (($true != $false))).

tff(u33,axiom,
    (![X0 : $int, X1 : $int] : ((~$less(X0,X1) | ~$less(X1,$sum(X0,1)))))).

tff(u32,axiom,
    (![X2 : $int, X1 : $int] : (((0 = X2) | ~$less($sum($abs(X2),$uminus(1)),$remainder_e(X1,X2)))))).

tff(u31,axiom,
    (![X2 : $int, X1 : $int] : (((0 = X2) | ~$less($remainder_e(X1,X2),0))))).

tff(u30,axiom,
    (![X2 : $int, X1 : $int] : (((0 = X2) | ($sum($remainder_e(X1,X2),$product(X2,$quotient_e(X1,X2))) = X1))))).

tff(u29,axiom,
    (![X1 : $int] : ((~$less(X1,0) | ($uminus(X1) = $abs(X1)))))).

tff(u28,axiom,
    (![X1 : $int] : ((~$less(0,X1) | ($abs(X1) = X1))))).

tff(u74,axiom,
    (![X2 : $int, X0 : $int, X3 : $int] : (((0 = X0) | ($product(X0,X2) != $product(X0,X3)) | (X2 = X3))))).

tff(u26,axiom,
    (![X2 : $int, X0 : $int, X1 : $int] : (($product(X0,$sum(X1,X2)) = $sum($product(X0,X1),$product(X0,X2)))))).

tff(u25,axiom,
    (![X0 : $int] : ((0 = $product(X0,0))))).

tff(u24,axiom,
    (![X0 : $int] : (($product(X0,1) = X0)))).

tff(u23,axiom,
    (![X2 : $int, X0 : $int, X1 : $int] : (($product(X0,$product(X1,X2)) = $product($product(X0,X1),X2))))).

tff(u22,axiom,
    (![X0 : $int, X1 : $int] : (($product(X0,X1) = $product(X1,X0))))).

tff(u21,axiom,
    (![X0 : $int] : (($uminus($uminus(X0)) = X0)))).

tff(u20,axiom,
    (![X0 : $int, X1 : $int] : (($less(X0,X1) | $less(X1,$sum(X0,1)))))).

tff(u19,axiom,
    (![X2 : $int, X0 : $int, X1 : $int] : ((~$less(X0,X1) | $less($sum(X0,X2),$sum(X1,X2)))))).

tff(u18,axiom,
    (![X0 : $int, X1 : $int] : (($less(X0,X1) | $less(X1,X0) | (X0 = X1))))).

tff(u17,axiom,
    (![X2 : $int, X0 : $int, X1 : $int] : ((~$less(X0,X1) | ~$less(X1,X2) | $less(X0,X2))))).

tff(u16,axiom,
    (![X0 : $int] : (~$less(X0,X0)))).

tff(u15,axiom,
    (![X0 : $int] : ((0 = $sum(X0,$uminus(X0)))))).

tff(u14,axiom,
    (![X0 : $int, X1 : $int] : (($uminus($sum(X0,X1)) = $sum($uminus(X1),$uminus(X0)))))).

tff(u13,axiom,
    (![X0 : $int] : (($sum(X0,0) = X0)))).

tff(u12,axiom,
    (![X2 : $int, X0 : $int, X1 : $int] : (($sum(X0,$sum(X1,X2)) = $sum($sum(X0,X1),X2))))).

tff(u11,axiom,
    (![X0 : $int, X1 : $int] : (($sum(X0,X1) = $sum(X1,X0))))).

tff(u65,hypothesis,
    (![X0 : $int] : ((mul(1,X0) = X0)))).

tff(u66,hypothesis,
    (![X0 : $int] : ((mul(X0,1) = X0)))).

tff(u67,hypothesis,
    (![X0 : $int, X1 : $int] : ((mul(X1,X0) = mul(X0,X1))))).

tff(u68,hypothesis,
    (![X2 : $int, X0 : $int, X1 : $int] : ((mul(mul(X1,X0),X2) = mul(X1,mul(X0,X2)))))).

tff(u75,hypothesis,
    (![X0 : $int, X1 : $int] : (divides(mul(X1,X0),X1)))).

tff(u70,hypothesis,
    (![X2 : $int, X0 : $int, X1 : $int] : (((X0 = X2) | (0 = X1) | (mul(X1,X0) != mul(X1,X2)))))).

tff(u71,hypothesis,
    (![X0 : $int, X1 : $int] : (((mul($quotient_e(X1,X0),X0) = X1) | ~divides(X1,X0))))).

tff(u72,hypothesis,
    (![X0 : $int, X1 : $int] : ((divides(X1,2) | divides(X0,2) | ~divides(mul(X1,X0),2))))).

tff(u73,hypothesis,
    sLP1).

I think I have less interest in rigor than others who might be interested in such things.

I was once stumped on how to derive the irrationality of the square root of 2 in Z3. Here I’m going to recpliate it. I suspect ATP systems like vampire or eprover will not get stuck

It can be formulated as proving not exists m n, n^2 = 2 m^2 $\neg \exists m n, n^2 = 2 m^2$

Cody also suggested infinitude of primes.

This is all math shit. These are not my theorems. I think maybe they are the wrong ones to attack.

Facts about evens and odds.

#from cvc5.pythonic import *
from z3 import *
x,y,z = Ints("x y z")
p = Real("p")
Z = IntSort()
R = RealSort()
B = BoolSort()

The paper mentions that many systems don’t have the reals. Z3 and CVC5 do. So you can state something closer to what you meant.

In a certain sense, the SMT “reals” are not the reals at all. They can prove some under approximation of reasonable facts about the reals. They are based around the rationals, linear equations, linear inequalities, polynomials and algebraic numbers.

rational = Function("rational", R, B)
rat_def = ForAll([p], rational(p) == Exists([x,y], p == ToReal(x)/ToReal(y)))
rat_def

∀p : (∃x, y : p = ToReal(x)/ToReal(y)) = rational(p)

step1 = Implies(And(rational(p), p*p == 2), Exists([x,y], x*x == 2*y*y))
prove(Implies(rat_def, step1))

#prove(ForAll([p], Implies(rational(p), Not(p*p == 2))))
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Cell In[13], line 1
----> 1 prove(ForAll([p], Implies(rational(p), Not(p*p == 2))))


NameError: name 'rational' is not defined

Lemmas about even and oddness

I have a suspicion that it is wise to define things in “open” and explicitly skolemized form.

evenp(x,y) is the “evidence” form of evenness, where you give explicitly the number that is half of x. This perhaps gives you the option of helping the solver find the needed existential by just giving it. even(x) = exists p, evenp(x,p)

Explicit skolem function help you refer to something that is guaranteed to exist.

Choice functions would make this easier probably.

evenp = Function("evenp", I, I, B)
evenp_def = ForAll([x,y], evenp(x,y) == (x == 2*y))


xsq_even = Implies(x*x == 2*y*y, evenp(x*x, y*y))
prove(Implies(evenp_def, xsq_even))


ForAll([x,y], Implies( evenp(x,y), even(x)))
#ForAll([x,y], Implies( evenp(x,y), even(y)))






rearrange = Implies(x*x == 2*y*y, y*y == (x*x)/2)
prove(rearrange)
proved
rearrange = Implies(x*x == 2*y*y, y*y == (x*x)/2)
prove(rearrange)
xsq_even = Implies(x*x == 2*y, even(x*x))
prove(Implies(even_def, xsq_even))
xsq_even
#from cvc5.pythonic import *
#from z3 
x,y,z = Ints("x y z")
even = Function("even", IntSort(), BoolSort())
even_def = ForAll(x, even(x) == Exists([y], x == 2*y))
odd = Function("odd", IntSort(), BoolSort())
odd_def = ForAll(x, odd(x) == Exists([y], x == 2*y+1))

even_not_odd = ForAll([x], Implies(even(x), Not(odd(x))))
prove(Implies(And(even_def, odd_def), even_not_odd))

even_or_odd = ForAll(x, even(x) != odd(x))
prove(Implies(And(even_def, odd_def), even_or_odd))


proved
proved
proved

Facts about even odd

even_not_odd = ForAll(x, Implies(even(x), Not(odd(x))))
prove(Implies(And(even_def, odd_def), even_not_odd))

odd_not_even = ForAll(x, Implies(odd(x), Not(even(x))))
prove(Implies(And(even_def, odd_def), odd_not_even))


not_even_odd = ForAll(x, Not(And(even(x),odd(x))))
prove(Implies(And(even_def, odd_def), not_even_odd))




def induct(P):
    return Implies(
            And(P(0), 
                ForAll([x], Implies(P(x),  And(P(x + 1), P(x - 1))))),
            ForAll([x], P(x)))

even_or_odd = ForAll(x, Or(even(x), odd(x)))
#prove(Implies(And(even_def, odd_def, induct(lambda n: Or(even(n), odd(n)))), even_or_odd))

evenodd = lambda n: Or(even(n), odd(n))
prove(Implies(And(even_def, odd_def), Or(even(0), odd(0))))
prove(Implies(And(even_def, odd_def), ForAll([x], Implies(evenodd(x), evenodd(x+1)))))
#prove(Implies(And(even_def, odd_def), ForAll([x], Implies(evenodd(x), evenodd(x-1)))))

prove(Implies(And(even_def, odd_def), ForAll([x], Implies(odd(x), Exists([y], x == 2*y-1)))))

#prove(Implies(And(even_def, odd_def), ForAll([x], odd(x) == Exists([y], x == 2*y-1))))

even_or_odd = ForAll(x, even(x) != odd(x))
# Z3 can't prove this for some reason. I suppose this might be a complex diophantine equation or something.
prove(Implies(And(even_def, odd_def), even_or_odd))


sqrt_thm = ForAll([x,y], Implies(x != 0, 2*x*x != y*y))
#prove(sqrt_thm)

#solve(2*x*x == y*y)

even_sq = ForAll([x,y], Implies(even(x), even(x*x)))
#prove(Implies(even_def, even_sq))



#def divides(x,y):
#    return x == y*(x/y)

divides = Function("divides", I, I, B)
divides_def = ForAll([x,y], divides(x,y) == (x == y*(x/y)))
prove(ForAll([x], divides(x, 0)))


---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Cell In[3], line 4
      1 #def divides(x,y):
      2 #    return x == y*(x/y)
----> 4 divides = Function("divides", I, I, B)
      5 divides_def = ForAll([x,y], divides(x,y) == (x == y*(x/y)))
      6 prove(ForAll([x], divides(x, 0)))


NameError: name 'Function' is not defined
%%file /tmp/prime2.smt2
(declare-sort I 0)
(declare-fun mul (I I) I)
(declare-fun divsk (I I) I)
(declare-fun divides (I I) Bool)
(declare-const one I)
(declare-const two I)
(declare-const a I)
(declare-const b I)
(assert (distinct one two))
(assert (forall ((x I)) (= (mul one x) x)))
(assert (forall ((x I)) (= (mul x one) x)))
(assert (forall ((x I) (y I) (z I)) (= (mul x (mul y z)) (mul (mul x y) z))))
(assert (forall ((x I) (y I)) (= (mul x y) (mul y x))))
(assert (forall ((x I) (y I) (z I))
  (=> (= (mul x y) (mul x z)) (= y z))))
(assert (forall ((x I) (y I)) (=> (divides x y) (= (mul y (divsk x y)) x))))
(assert (forall ((x I) (y I))
  (=> (divides (mul x y) two) (or (divides x two) (divides y two)))))
(assert (= (mul a a) (mul two (mul b b)))) ; 
(assert (forall ((x I)) (=> (and (divides a x) (divides b x)) (= x one)))) ; lowest common terms
;(assert (let ((a!1 (exists ((x I) (y I))
;             (let ((a!1 (forall ((z I))
;                         (=> (and (divides x z) (divides y z)) (= z one)))))
;               (and a!1 (= (mul x x) (mul two (mul y y))))))))
;  (not (not a!1))))
Writing /tmp/prime2.smt2

Overwriting /tmp/sqrt2.in
! prover9 -f /tmp/sqrt2.in
============================== Prover9 ===============================
Prover9 (64) version 2017-11A (CIIRC), November 2017.
Process 259687 was started by philip on philip-ThinkPad-Z13-Gen-2,
Mon Apr 29 17:27:10 2024
The command was "prover9 -f /tmp/sqrt2.in".
============================== end of head ===========================

============================== INPUT =================================

% Reading from file /tmp/sqrt2.in

set(auto).
    % set(auto) -> set(auto_inference).
    % set(auto) -> set(auto_setup).
    % set(auto_setup) -> set(predicate_elim).
    % set(auto_setup) -> assign(eq_defs, unfold).
    % set(auto) -> set(auto_limits).
    % set(auto_limits) -> assign(max_weight, "100.000").
    % set(auto_limits) -> assign(sos_limit, 20000).
    % set(auto) -> set(auto_denials).
    % set(auto) -> set(auto_process).

formulas(sos).
x = x.
m(1,x) = x.
m(x,1) = x.
m(x,m(y,z)) = m(m(x,y),z).
m(x,y) = m(y,x).
m(x,y) != m(x,z) | y = z.
-d(x,y) | m(x,f(x,y)) = y.
m(x,z) != y | d(x,y).
-d(2,m(x,y)) | d(2,x) | d(2,y).
m(a,a) = m(2,m(b,b)).
-d(x,a) | -d(x,b) | x = 1.
2 != 1.
end_of_list.

============================== end of input ==========================

============================== PROCESS NON-CLAUSAL FORMULAS ==========

% Formulas that are not ordinary clauses:

============================== end of process non-clausal formulas ===

============================== PROCESS INITIAL CLAUSES ===============

% Clauses before input processing:

formulas(usable).
end_of_list.

formulas(sos).
x = x.  [assumption].
m(1,x) = x.  [assumption].
m(x,1) = x.  [assumption].
m(x,m(y,z)) = m(m(x,y),z).  [assumption].
m(x,y) = m(y,x).  [assumption].
m(x,y) != m(x,z) | y = z.  [assumption].
-d(x,y) | m(x,f(x,y)) = y.  [assumption].
m(x,y) != z | d(x,z).  [assumption].
-d(2,m(x,y)) | d(2,x) | d(2,y).  [assumption].
m(a,a) = m(2,m(b,b)).  [assumption].
-d(x,a) | -d(x,b) | x = 1.  [assumption].
2 != 1.  [assumption].
end_of_list.

formulas(demodulators).
end_of_list.

============================== PREDICATE ELIMINATION =================

No predicates eliminated.

============================== end predicate elimination =============

Auto_denials:  (non-Horn, no changes).

Term ordering decisions:
Predicate symbol precedence:  predicate_order([ =, d ]).
Function symbol precedence:  function_order([ 2, 1, a, b, m, f ]).
After inverse_order:  (no changes).
Unfolding symbols: (none).

Auto_inference settings:
  % set(paramodulation).  % (positive equality literals)
  % set(binary_resolution).  % (non-Horn)
  % set(neg_ur_resolution).  % (non-Horn, less than 100 clauses)

Auto_process settings:
  % set(factor).  % (non-Horn)
  % set(unit_deletion).  % (non-Horn)

kept:      1 m(1,x) = x.  [assumption].
kept:      2 m(x,1) = x.  [assumption].
           3 m(x,m(y,z)) = m(m(x,y),z).  [assumption].
kept:      4 m(m(x,y),z) = m(x,m(y,z)).  [copy(3),flip(a)].

% Operation m is commutative; C redundancy checks enabled.
kept:      5 m(x,y) = m(y,x).  [assumption].
kept:      6 m(x,y) != m(x,z) | y = z.  [assumption].
kept:      7 -d(x,y) | m(x,f(x,y)) = y.  [assumption].
kept:      8 m(x,y) != z | d(x,z).  [assumption].
kept:      9 -d(2,m(x,y)) | d(2,x) | d(2,y).  [assumption].
           10 m(a,a) = m(2,m(b,b)).  [assumption].
kept:      11 m(2,m(b,b)) = m(a,a).  [copy(10),flip(a)].
           12 -d(x,a) | -d(x,b) | x = 1.  [assumption].
kept:      13 -d(x,a) | -d(x,b) | 1 = x.  [copy(12),flip(c)].
           14 2 != 1.  [assumption].
kept:      15 1 != 2.  [copy(14),flip(a)].
kept:      16 -d(2,m(x,x)) | d(2,x).  [factor(9,b,c)].

============================== end of process initial clauses ========

============================== CLAUSES FOR SEARCH ====================

% Clauses after input processing:

formulas(usable).
end_of_list.

formulas(sos).
1 m(1,x) = x.  [assumption].
2 m(x,1) = x.  [assumption].
4 m(m(x,y),z) = m(x,m(y,z)).  [copy(3),flip(a)].
5 m(x,y) = m(y,x).  [assumption].
6 m(x,y) != m(x,z) | y = z.  [assumption].
7 -d(x,y) | m(x,f(x,y)) = y.  [assumption].
8 m(x,y) != z | d(x,z).  [assumption].
9 -d(2,m(x,y)) | d(2,x) | d(2,y).  [assumption].
11 m(2,m(b,b)) = m(a,a).  [copy(10),flip(a)].
13 -d(x,a) | -d(x,b) | 1 = x.  [copy(12),flip(c)].
15 1 != 2.  [copy(14),flip(a)].
16 -d(2,m(x,x)) | d(2,x).  [factor(9,b,c)].
end_of_list.

formulas(demodulators).
1 m(1,x) = x.  [assumption].
2 m(x,1) = x.  [assumption].
4 m(m(x,y),z) = m(x,m(y,z)).  [copy(3),flip(a)].
5 m(x,y) = m(y,x).  [assumption].
        % (lex-dep)
11 m(2,m(b,b)) = m(a,a).  [copy(10),flip(a)].
end_of_list.

============================== end of clauses for search =============

============================== SEARCH ================================

% Starting search at 0.00 seconds.

given #1 (I,wt=5): 1 m(1,x) = x.  [assumption].

given #2 (I,wt=5): 2 m(x,1) = x.  [assumption].

given #3 (I,wt=11): 4 m(m(x,y),z) = m(x,m(y,z)).  [copy(3),flip(a)].

given #4 (I,wt=7): 5 m(x,y) = m(y,x).  [assumption].

% Operation m is associative-commutative; CAC redundancy checks enabled.

given #5 (I,wt=10): 6 m(x,y) != m(x,z) | y = z.  [assumption].

given #6 (I,wt=10): 7 -d(x,y) | m(x,f(x,y)) = y.  [assumption].

given #7 (I,wt=8): 8 m(x,y) != z | d(x,z).  [assumption].

given #8 (I,wt=11): 9 -d(2,m(x,y)) | d(2,x) | d(2,y).  [assumption].

given #9 (I,wt=9): 11 m(2,m(b,b)) = m(a,a).  [copy(10),flip(a)].

given #10 (I,wt=9): 13 -d(x,a) | -d(x,b) | 1 = x.  [copy(12),flip(c)].

given #11 (I,wt=3): 15 1 != 2.  [copy(14),flip(a)].

given #12 (I,wt=8): 16 -d(2,m(x,x)) | d(2,x).  [factor(9,b,c)].

given #13 (A,wt=11): 17 m(x,m(y,z)) = m(y,m(x,z)).  [para(5(a,1),4(a,1,1)),rewrite([4(2)])].

given #14 (F,wt=5): 38 m(x,2) != x.  [ur(6,b,15,a),rewrite([2(2)]),flip(a)].

given #15 (F,wt=5): 45 m(2,x) != x.  [para(5(a,1),38(a,1))].

given #16 (F,wt=7): 47 m(b,b) != m(a,a).  [para(11(a,1),45(a,1)),flip(a)].

given #17 (F,wt=9): 44 m(x,m(y,2)) != m(x,y).  [ur(6,b,38,a)].

given #18 (T,wt=3): 24 d(x,x).  [resolve(8,a,2,a)].

given #19 (T,wt=3): 25 d(1,x).  [resolve(8,a,1,a)].

given #20 (T,wt=5): 22 d(x,m(y,x)).  [resolve(8,a,5,a)].

given #21 (T,wt=5): 27 d(x,m(x,y)).  [resolve(8,a,2,a(flip)),rewrite([5(3),1(3)])].

given #22 (A,wt=8): 18 m(x,y) != x | 1 = y.  [para(2(a,1),6(a,1)),flip(a)].

given #23 (F,wt=9): 46 m(x,m(2,y)) != m(x,y).  [ur(6,b,45,a)].

given #24 (F,wt=9): 50 m(x,m(2,y)) != m(y,x).  [para(5(a,1),44(a,1)),rewrite([4(3)])].

given #25 (F,wt=9): 51 m(x,m(y,2)) != m(y,x).  [para(5(a,1),44(a,2))].

given #26 (F,wt=9): 66 m(2,m(x,y)) != m(y,x).  [para(5(a,1),46(a,1)),rewrite([4(3)])].

given #27 (T,wt=5): 33 d(2,m(a,a)).  [resolve(11,a,8,a)].

given #28 (T,wt=3): 79 d(2,a).  [resolve(33,a,16,a)].

given #29 (T,wt=5): 55 f(1,x) = x.  [resolve(25,a,7,a),rewrite([1(4)])].

given #30 (T,wt=6): 28 x != y | d(x,y).  [para(2(a,1),8(a,1))].

given #31 (A,wt=14): 19 m(x,m(y,z)) != m(x,m(y,u)) | z = u.  [para(4(a,1),6(a,1)),rewrite([4(4)])].

given #32 (F,wt=3): 82 -d(2,b).  [ur(13,a,79,a,c,15,a)].

given #33 (F,wt=3): 100 b != 2.  [ur(28,b,82,a),flip(a)].

given #34 (F,wt=5): 101 -d(2,m(b,b)).  [ur(16,b,82,a)].

given #35 (F,wt=5): 102 m(2,x) != b.  [ur(8,b,82,a)].

given #36 (T,wt=6): 53 -d(b,a) | b = 1.  [resolve(24,a,13,b),flip(b)].

given #37 (T,wt=7): 39 d(x,m(y,m(x,z))).  [resolve(17,a,8,a)].

given #38 (T,wt=5): 118 d(b,m(a,a)).  [para(11(a,1),39(a,2))].

given #39 (T,wt=7): 54 m(x,f(x,x)) = x.  [resolve(24,a,7,a)].

given #40 (A,wt=10): 20 m(x,y) != m(z,x) | z = y.  [para(5(a,1),6(a,1)),flip(a)].

given #41 (F,wt=5): 105 m(b,b) != 2.  [ur(28,b,101,a),flip(a)].

given #42 (F,wt=5): 111 m(x,2) != b.  [para(5(a,1),102(a,1))].

given #43 (F,wt=5): 112 m(a,a) != b.  [para(11(a,1),102(a,1))].

given #44 (F,wt=7): 104 m(x,b) != m(x,2).  [ur(6,b,100,a)].

given #45 (T,wt=5): 120 f(x,x) = 1.  [resolve(54,a,18,a),flip(a)].

given #46 (T,wt=6): 131 x != y | y = x.  [para(1(a,1),20(a,1)),rewrite([2(2)])].

given #47 (T,wt=7): 57 d(x,m(y,m(z,x))).  [para(4(a,1),22(a,2))].

given #48 (T,wt=7): 58 d(m(b,b),m(a,a)).  [para(11(a,1),22(a,2))].

given #49 (A,wt=10): 21 m(x,y) != m(z,x) | y = z.  [para(5(a,1),6(a,2))].

given #50 (F,wt=7): 107 -d(2,m(b,m(b,b))).  [ur(9,b,82,a,c,101,a)].

given #51 (F,wt=7): 108 m(b,b) != m(2,x).  [ur(8,b,101,a),flip(a)].

given #52 (F,wt=7): 113 m(x,m(2,y)) != b.  [para(17(a,1),102(a,1))].

given #53 (F,wt=7): 122 m(b,x) != m(x,2).  [ur(20,b,100,a),flip(a)].

given #54 (T,wt=7): 81 m(2,f(2,a)) = a.  [resolve(79,a,7,a)].

given #55 (T,wt=5): 191 d(f(2,a),a).  [para(81(a,1),22(a,2))].

given #56 (T,wt=5): 197 d(2,m(x,a)).  [para(81(a,1),39(a,2,2))].

given #57 (T,wt=5): 206 d(2,m(a,x)).  [para(5(a,1),197(a,2))].

given #58 (A,wt=9): 23 d(m(x,y),m(x,m(y,z))).  [resolve(8,a,4,a)].

given #59 (F,wt=3): 196 b != a.  [para(81(a,1),102(a,1)),flip(a)].

given #60 (F,wt=5): 190 f(2,a) != a.  [para(81(a,1),45(a,1)),flip(a)].

given #61 (F,wt=5): 201 m(b,b) != a.  [para(81(a,1),108(a,2))].

given #62 (F,wt=5): 202 m(x,a) != b.  [para(81(a,1),113(a,1,2))].

given #63 (T,wt=6): 188 a != x | d(2,x).  [para(81(a,1),8(a,1))].

given #64 (T,wt=7): 84 d(m(x,y),m(y,x)).  [resolve(28,a,5,a)].

given #65 (T,wt=7): 200 d(f(2,a),m(x,a)).  [para(81(a,1),57(a,2,2))].

given #66 (T,wt=7): 205 d(2,m(x,m(y,a))).  [para(4(a,1),197(a,2))].

given #67 (A,wt=12): 29 m(x,m(y,z)) != u | d(m(x,y),u).  [para(4(a,1),8(a,1))].

given #68 (F,wt=5): 233 m(a,x) != b.  [para(5(a,1),202(a,1))].

given #69 (F,wt=3): 255 -d(a,b).  [ur(7,b,233,a)].

given #70 (F,wt=7): 128 m(2,x) != m(x,b).  [ur(20,b,100,a(flip)),flip(a)].

given #71 (F,wt=7): 147 m(x,m(y,2)) != b.  [para(4(a,1),111(a,1))].

given #72 (T,wt=7): 208 d(2,m(x,m(a,y))).  [para(17(a,1),206(a,2))].

given #73 (T,wt=7): 214 d(m(2,b),m(a,a)).  [para(11(a,1),23(a,2))].

given #74 (T,wt=7): 217 d(m(x,2),m(x,a)).  [para(81(a,1),23(a,2,2))].

given #75 (T,wt=7): 241 d(f(2,a),m(a,x)).  [para(5(a,1),200(a,2))].

given #76 (A,wt=8): 30 m(x,y) != z | d(y,z).  [para(5(a,1),8(a,1))].

given #77 (F,wt=7): 164 m(b,m(b,b)) != 2.  [ur(28,b,107,a),flip(a)].

given #78 (F,wt=7): 172 m(b,b) != m(x,2).  [para(5(a,1),108(a,2))].

given #79 (F,wt=7): 184 m(b,x) != m(2,x).  [para(5(a,1),122(a,2))].

given #80 (F,wt=7): 218 m(a,x) != m(x,b).  [ur(21,b,196,a),flip(a)].

given #81 (T,wt=7): 275 d(m(2,x),m(x,a)).  [para(5(a,1),217(a,1))].

given #82 (T,wt=7): 276 d(m(x,2),m(a,x)).  [para(5(a,1),217(a,2))].

given #83 (T,wt=7): 304 d(m(2,x),m(a,x)).  [para(5(a,1),275(a,2))].

given #84 (T,wt=8): 37 m(a,a) != x | d(2,x).  [para(11(a,1),8(a,1))].

given #85 (A,wt=15): 31 -d(2,m(x,m(y,z))) | d(2,m(x,y)) | d(2,z).  [para(4(a,1),9(a,2))].

given #86 (F,wt=7): 219 m(b,x) != m(x,a).  [ur(20,b,196,a),flip(a)].

given #87 (F,wt=7): 221 m(x,b) != m(x,a).  [ur(6,b,196,a)].

given #88 (F,wt=7): 232 m(x,m(y,a)) != b.  [para(4(a,1),202(a,1))].

given #89 (F,wt=7): 234 m(b,m(b,b)) != a.  [ur(188,b,107,a),flip(a)].

given #90 (T,wt=8): 62 m(x,y) != y | 1 = x.  [para(5(a,1),18(a,1))].

given #91 (T,wt=8): 192 a != 2 | f(2,a) = 1.  [para(81(a,1),18(a,1)),flip(b)].

given #92 (T,wt=8): 282 a != x | d(f(2,a),x).  [para(81(a,1),30(a,1))].

given #93 (T,wt=9): 59 d(m(x,y),m(x,m(z,y))).  [para(17(a,1),22(a,2))].

given #94 (A,wt=14): 32 -d(2,m(x,m(y,m(x,y)))) | d(2,m(x,y)).  [factor(31,b,c)].

given #95 (F,wt=7): 257 m(x,m(a,y)) != b.  [para(17(a,1),233(a,1))].

given #96 (F,wt=7): 263 m(b,f(2,a)) != a.  [para(81(a,1),128(a,1)),rewrite([5(6)]),flip(a)].

given #97 (F,wt=7): 301 m(b,x) != m(a,x).  [para(5(a,1),218(a,2)),flip(a)].

given #98 (F,wt=9): 106 -d(2,m(b,m(b,m(b,b)))).  [ur(16,b,101,a),rewrite([17(8),5(7)])].

given #99 (T,wt=9): 116 d(x,m(y,m(z,m(x,u)))).  [para(4(a,1),39(a,2))].

given #100 (T,wt=7): 368 d(b,m(x,m(a,a))).  [para(11(a,1),116(a,2,2))].

given #101 (T,wt=7): 371 d(b,m(a,m(a,x))).  [para(5(a,1),368(a,2)),rewrite([4(5)])].

given #102 (T,wt=7): 372 d(b,m(a,m(x,a))).  [para(17(a,1),368(a,2))].

given #103 (A,wt=13): 34 m(2,m(b,m(b,x))) = m(a,m(a,x)).  [para(11(a,1),4(a,1,1)),rewrite([4(4),4(9)]),flip(a)].

given #104 (F,wt=9): 110 m(x,m(2,y)) != m(x,b).  [ur(6,b,102,a)].

given #105 (F,wt=9): 121 m(2,m(x,y)) != m(y,b).  [ur(20,b,102,a),rewrite([4(5)]),flip(a)].

given #106 (F,wt=9): 127 m(b,x) != m(x,m(2,y)).  [ur(20,b,102,a(flip)),flip(a)].

given #107 (F,wt=9): 139 m(b,m(b,x)) != m(x,2).  [ur(20,b,105,a),rewrite([4(6)]),flip(a)].

given #108 (T,wt=9): 158 d(x,m(y,m(z,m(u,x)))).  [para(4(a,1),57(a,2,2))].

given #109 (T,wt=9): 159 d(m(b,b),m(x,m(a,a))).  [para(11(a,1),57(a,2,2))].

given #110 (T,wt=9): 211 d(m(x,y),m(y,m(x,z))).  [para(5(a,1),23(a,1))].

given #111 (T,wt=9): 212 d(m(x,y),m(y,m(z,x))).  [para(5(a,1),23(a,2)),rewrite([4(3)])].

given #112 (A,wt=13): 35 m(b,m(b,m(x,2))) = m(x,m(a,a)).  [para(11(a,1),4(a,2,2)),rewrite([17(6),5(5)])].

given #113 (F,wt=9): 141 m(x,m(b,b)) != m(x,2).  [ur(6,b,105,a)].

given #114 (F,wt=7): 513 m(a,a) != m(2,2).  [para(11(a,1),141(a,1))].

given #115 (F,wt=9): 142 m(2,x) != m(x,m(b,b)).  [ur(20,b,105,a(flip)),flip(a)].

given #116 (F,wt=9): 143 m(x,m(2,y)) != m(y,b).  [ur(20,b,111,a),rewrite([4(5)]),flip(a)].

given #117 (T,wt=9): 213 d(m(x,2),m(x,m(a,a))).  [para(11(a,1),23(a,2,2))].

given #118 (T,wt=9): 240 d(f(2,a),m(x,m(y,a))).  [para(4(a,1),200(a,2))].

given #119 (T,wt=9): 243 d(2,m(x,m(y,m(z,a)))).  [para(4(a,1),205(a,2,2))].

given #120 (T,wt=9): 270 d(2,m(x,m(y,m(a,z)))).  [para(4(a,1),208(a,2))].

given #121 (A,wt=12): 36 m(a,a) != m(2,x) | m(b,b) = x.  [para(11(a,1),6(a,1))].

given #122 (F,wt=7): 548 m(a,a) != m(2,a).  [ur(36,b,201,a)].

given #123 (F,wt=9): 145 m(x,m(y,2)) != m(x,b).  [ur(6,b,111,a)].

given #124 (F,wt=9): 146 m(b,x) != m(x,m(y,2)).  [ur(20,b,111,a(flip)),flip(a)].

given #125 (F,wt=9): 167 m(b,m(b,b)) != m(2,x).  [ur(8,b,107,a),flip(a)].

given #126 (T,wt=9): 278 d(f(2,a),m(x,m(a,y))).  [para(17(a,1),241(a,2))].

given #127 (T,wt=9): 305 d(m(a,a),m(a,m(b,b))).  [para(11(a,1),275(a,1)),rewrite([5(8)])].

given #128 (T,wt=9): 339 d(m(x,y),m(z,m(y,x))).  [para(5(a,1),59(a,2)),rewrite([4(3)])].

given #129 (T,wt=9): 342 d(m(x,f(2,a)),m(x,a)).  [para(81(a,1),59(a,2,2))].

given #130 (A,wt=14): 40 m(x,m(y,z)) != m(y,u) | m(x,z) = u.  [para(17(a,1),6(a,1))].

given #131 (F,wt=9): 173 m(b,b) != m(x,m(2,y)).  [para(17(a,1),108(a,2))].

given #132 (F,wt=7): 639 m(b,b) != m(x,a).  [para(81(a,1),173(a,2,2))].
-------- Proof 1 -------- 

============================== PROOF =================================

% Proof 1 at 0.02 (+ 0.06) seconds.
% Length of proof is 27.
% Level of proof is 10.
% Maximum clause weight is 12.000.
% Given clauses 132 (13.200 givens/level).

3 m(x,m(y,z)) = m(m(x,y),z).  [assumption].
4 m(m(x,y),z) = m(x,m(y,z)).  [copy(3),flip(a)].
5 m(x,y) = m(y,x).  [assumption].
6 m(x,y) != m(x,z) | y = z.  [assumption].
7 -d(x,y) | m(x,f(x,y)) = y.  [assumption].
8 m(x,y) != z | d(x,z).  [assumption].
9 -d(2,m(x,y)) | d(2,x) | d(2,y).  [assumption].
10 m(a,a) = m(2,m(b,b)).  [assumption].
11 m(2,m(b,b)) = m(a,a).  [copy(10),flip(a)].
12 -d(x,a) | -d(x,b) | x = 1.  [assumption].
13 -d(x,a) | -d(x,b) | 1 = x.  [copy(12),flip(c)].
14 2 != 1.  [assumption].
15 1 != 2.  [copy(14),flip(a)].
16 -d(2,m(x,x)) | d(2,x).  [factor(9,b,c)].
17 m(x,m(y,z)) = m(y,m(x,z)).  [para(5(a,1),4(a,1,1)),rewrite([4(2)])].
33 d(2,m(a,a)).  [resolve(11,a,8,a)].
36 m(a,a) != m(2,x) | m(b,b) = x.  [para(11(a,1),6(a,1))].
79 d(2,a).  [resolve(33,a,16,a)].
81 m(2,f(2,a)) = a.  [resolve(79,a,7,a)].
82 -d(2,b).  [ur(13,a,79,a,c,15,a)].
101 -d(2,m(b,b)).  [ur(16,b,82,a)].
108 m(b,b) != m(2,x).  [ur(8,b,101,a),flip(a)].
173 m(b,b) != m(x,m(2,y)).  [para(17(a,1),108(a,2))].
186 m(2,m(f(2,a),x)) = m(a,x).  [para(81(a,1),4(a,1,1)),flip(a)].
639 m(b,b) != m(x,a).  [para(81(a,1),173(a,2,2))].
642 m(a,a) != m(2,m(x,a)).  [ur(36,b,639,a)].
643 $F.  [resolve(642,a,186,a(flip))].

============================== end of proof ==========================

============================== STATISTICS ============================

Given=132. Generated=2152. Kept=638. proofs=1.
Usable=127. Sos=485. Demods=56. Limbo=1, Disabled=36. Hints=0.
Kept_by_rule=0, Deleted_by_rule=0.
Forward_subsumed=1514. Back_subsumed=15.
Sos_limit_deleted=0. Sos_displaced=0. Sos_removed=0.
New_demodulators=58 (2 lex), Back_demodulated=9. Back_unit_deleted=0.
Demod_attempts=18123. Demod_rewrites=1480.
Res_instance_prunes=0. Para_instance_prunes=0. Basic_paramod_prunes=0.
Nonunit_fsub_feature_tests=1506. Nonunit_bsub_feature_tests=528.
Megabytes=0.81.
User_CPU=0.02, System_CPU=0.06, Wall_clock=0.

============================== end of statistics =====================

============================== end of search =========================

THEOREM PROVED

THEOREM PROVED

Exiting with 1 proof.

------ process 259687 exit (max_proofs) ------

Process 259687 exit (max_proofs) Mon Apr 29 17:27:10 2024


Overwriting /tmp/sqrt.p
%%file /tmp/prime3.smt2
(declare-sort I 0)
(declare-fun m (I I) I)
(declare-fun f (I I) I)
(declare-fun d (I I) Bool)
(declare-const one I)
(declare-const two I)
(declare-const a I)
(declare-const b I)

(assert (forall ((x I)) (= (m one x) x)))
(assert (forall ((x I)) (= (m x one) x)))
(assert (forall ((x I) (y I) (z I)) (= (m x (m y z)) (m (m x y) z))))
(assert (forall ((x I) (y I)) (= (m x y) (m y x))))
(assert (forall ((x I) (y I) (z I))
  (=> (= (m x y) (m x z)) (= y z))))
(assert (forall ((x I) (y I)) (=> (d y x) (= (m y (f y x)) x))))
(assert (forall ((x I) (y I))
  (=> (d two (m x y)) (or (d two x) (d two y)))))
(assert (= (m a a) (m two (m b b)))) ; 
(assert (forall ((x I)) (=> (and (d x a) (d x b)) (= x one)))) ; lowest common terms
(assert (distinct one two))
(check-sat)
Overwriting /tmp/prime3.smt2
! time vampire /tmp/sqrt.p
% Running in auto input_syntax mode. Trying TPTP
% Refutation found. Thanks to Tanya!
% SZS status Unsatisfiable for sqrt
% SZS output start Proof for sqrt
3. m(X0,m(X1,X2)) = m(m(X0,X1),X2) [input]
4. m(X0,X1) = m(X1,X0) [input]
5. m(X0,X1) != m(X0,X2) | X1 = X2 [input]
6. ~d(X0,X1) | m(X0,f(X0,X1)) = X1 [input]
7. d(X0,X2) | m(X0,X1) != X2 [input]
8. ~d(two,m(X0,X1)) | d(two,X0) | d(two,X1) [input]
9. m(a,a) = m(two,m(b,b)) [input]
10. ~d(X0,b) | one = X0 | ~d(X0,a) [input]
11. one != two [input]
12. d(X0,m(X0,X1)) [equality resolution 7]
21. d(X0,m(X1,X0)) [superposition 12,4]
25. d(two,m(a,a)) [superposition 12,9]
46. m(a,a) != m(two,X0) | m(b,b) = X0 [superposition 5,9]
53. m(X0,X1) = m(X0,f(X0,m(X0,X1))) [resolution 6,12]
57. m(a,a) = m(two,f(two,m(a,a))) [resolution 6,25]
74. d(X2,m(X0,m(X1,X2))) [superposition 21,3]
88. d(two,a) | d(two,a) [resolution 8,25]
98. d(two,a) [duplicate literal removal 88]
99. a = m(two,f(two,a)) [resolution 98,6]
198. m(X0,X1) != m(X0,X2) | f(X0,m(X0,X1)) = X2 [superposition 5,53]
264. d(X1,m(X2,m(X1,X0))) [superposition 74,4]
486. m(a,X0) = m(two,m(f(two,a),X0)) [superposition 3,99]
684. d(two,m(X0,a)) [superposition 264,99]
1125. m(a,a) != m(a,a) | m(b,b) = f(two,m(a,a)) [superposition 46,57]
1160. m(b,b) = f(two,m(a,a)) [trivial inequality removal 1125]
7642. f(X0,m(X0,X1)) = X1 [equality resolution 198]
58952. m(f(two,a),X0) = f(two,m(a,X0)) [superposition 7642,486]
154801. m(b,b) = m(f(two,a),a) [superposition 58952,1160]
154861. d(two,m(b,b)) [superposition 684,154801]
163619. d(two,b) | d(two,b) [resolution 154861,8]
163621. d(two,b) [duplicate literal removal 163619]
163624. one = two | ~d(two,a) [resolution 163621,10]
163626. ~d(two,a) [subsumption resolution 163624,11]
163627. $false [subsumption resolution 163626,98]
% SZS output end Proof for sqrt
% ------------------------------
% Version: Vampire 4.8 (commit 23d7e9c3b on 2024-03-22 08:41:08 +0000)
% Linked with Z3 4.12.3.0 79bbbf76d0c123481c8ca05cd3a98939270074d3 z3-4.8.4-7980-g79bbbf76d
% Termination reason: Refutation

% Memory used [KB]: 34500
% Time elapsed: 10.284 s
perf_event_open failed (instruction limiting will be disabled): Permission denied
(If you are seeing 'Permission denied' ask your admin to run 'sudo sysctl -w kernel.perf_event_paranoid=-1' for you.)
% ------------------------------
% ------------------------------

real 0m10.291s
user 0m10.242s
sys 0m0.045s
%%file /tmp/sqrtcluase.p

cnf(sos,axiom,
    m(one,X0) = X0).

cnf(sos,axiom,
    m(X0,one) = X0).

cnf(sos,axiom,
    m(m(X1,X0),X2) = m(X1,m(X0,X2))).

cnf(sos,axiom,
    m(X0,X1) = m(X1,X0)).

cnf(sos,axiom,
    X0 = X2 | m(X1,X2) != m(X1,X0)).

cnf(sos,axiom,
    m(X0,f(X0,X1)) = X1 | ~d(X0,X1)).

cnf(sos,axiom,
    d(two,X1) | d(two,X0) | ~d(two,m(X1,X0))).

cnf(sos,axiom,
    m(two,m(b,b)) = m(a,a)).

cnf(sos,axiom,
    one = X0 | ~d(X0,a) | ~d(X0,b)).

cnf(sos,axiom,
    one != two).

%cnf(sos,axiom,~ d(A,B) | m(A,f(A,B)) = B).
cnf(sos,axiom, m(A,B) != C | d(A,C)).
%cnf(sos,axiom,~ d(two,m(A,B)) | d(two,A) | d(two,B)).
%cnf(sos,axiom,m(a,a) = m(two,m(b,b))).
%cnf(sos,axiom,~ d(A,a) | ~ d(A,b) | A = one).
%cnf(sos,axiom,~ two = one).
Overwriting /tmp/sqrtcluase.p
! time eprover-ho /tmp/sqrt.p
import z3

x = z3.Int("x")
e = x+2
e.sexpr()
x.decl().sexpr()
def collect_symbols(expr, symbols):
    """ Recursively collect constants and function symbols from an expression. """
    if expr.num_args() == 0:
        # It's a constant if it has no arguments (excluding bound variables in quantifiers)
        if expr.decl().kind() == z3.Z3_OP_UNINTERPRETED:
            symbols.add(expr)
    else:
        # If the node is a function application, add the function declaration
        if expr.decl().kind() == z3.Z3_OP_UNINTERPRETED or expr.decl().kind() == z3.Z3_OP_APPLY:
            symbols.add(expr.decl())
        # Go deeper into the expression for nested function applications
        for i in range(expr.num_args()):
            collect_symbols(expr.arg(i), symbols)
# naw this is ridiculous. I'll just collect up my constants manually.
#collect_symbols(e, set())
z3.ForAll([x],e > 2).sexpr()
Nat = z3.Datatype("Nat")
Nat.declare("zero")
Nat.declare("succ", ("prev", Nat))
Nat = Nat.create()
Nat.sexpr()
Nat.succ.sexpr()
'(declare-fun succ (Nat) Nat)'

PySMT

I thought maybe I should use pysmt. But I dunno. I’m so familiar with the z3 bindings.

from pysmt.shortcuts import Symbol, And, GE, LT, Plus, Equals, Int, get_model
from pysmt.typing import INT

hello = [Symbol(s, INT) for s in "hello"]
world = [Symbol(s, INT) for s in "world"]
letters = set(hello+world)
domains = And([And(GE(l, Int(1)),
                   LT(l, Int(10))) for l in letters])

sum_hello = Plus(hello) # n-ary operators can take lists
sum_world = Plus(world) # as arguments
problem = And(Equals(sum_hello, sum_world),
              Equals(sum_hello, Int(25)))
formula = And(domains, problem)

print("Serialization of the formula:")
print(formula)

model = get_model(formula)
if model:
  print(model)
else:
  print("No solution found")
Serialization of the formula:
((((1 <= h) & (h < 10)) & ((1 <= e) & (e < 10)) & ((1 <= l) & (l < 10)) & ((1 <= o) & (o < 10)) & ((1 <= w) & (w < 10)) & ((1 <= r) & (r < 10)) & ((1 <= d) & (d < 10))) & (((h + e + l + l + o) = (w + o + r + l + d)) & ((h + e + l + l + o) = 25)))
d := 9
o := 5
r := 1
w := 1
h := 1
l := 9
e := 1
from pysmt.logics import QF_UFLRA, QF_UFIDL, QF_LRA, QF_IDL, QF_LIA
from pysmt.shortcuts import get_env, GT, Solver, Symbol
logics = [QF_UFLRA, QF_UFIDL, QF_LIA],  # Some of the supported logics

env = get_env()

# Add the solver to the environment
env.factory.add_generic_solver("vampire","/usr/local/bin/vampire" , logics)
with Solver(name="vampire", logic=QF_LIA) as s:
    s.add_assertion(formula)
    if s.solve():
        print(s.get_model())
    else:
        print("No solution found")
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Cell In[2], line 1
----> 1 with Solver(name="vampire", logic=QF_LIA) as s:
      2     s.add_assertion(formula)
      3     if s.solve():


File ~/.local/lib/python3.10/site-packages/pysmt/shortcuts.py:908, in Solver(name, logic, **kwargs)
    901 def Solver(name=None, logic=None, **kwargs):
    902     """Returns a solver.
    903 
    904     :param name: Specify the name of the solver
    905     :param logic: Specify the logic that is going to be used.
    906     :rtype: Solver
    907     """
--> 908     return get_env().factory.Solver(name=name,
    909                                     logic=logic,
    910                                     **kwargs)


File ~/.local/lib/python3.10/site-packages/pysmt/factory.py:437, in Factory.Solver(self, name, logic, **options)
    436 def Solver(self, name=None, logic=None, **options):
--> 437     return self.get_solver(name=name,
    438                            logic=logic,
    439                            **options)


File ~/.local/lib/python3.10/site-packages/pysmt/factory.py:91, in Factory.get_solver(self, name, logic, **options)
     89 def get_solver(self, name=None, logic=None, **options):
     90     SolverClass, closer_logic = \
---> 91        self._get_solver_class(solver_list=self._all_solvers,
     92                               solver_type="Solver",
     93                               preference_list=self.solver_preference_list,
     94                               default_logic=self.default_logic,
     95                               name=name,
     96                               logic=logic)
     98     return SolverClass(environment=self.environment,
     99                        logic=closer_logic,
    100                        **options)


File ~/.local/lib/python3.10/site-packages/pysmt/factory.py:155, in Factory._get_solver_class(self, solver_list, solver_type, preference_list, default_logic, name, logic)
    150 if name not in solver_list:
    151     raise NoSolverAvailableError("%s '%s' is not available" % \
    152                                  (solver_type, name))
    154 if logic is not None and \
--> 155    name not in self._filter_solvers(solver_list, logic=logic):
    156     raise NoSolverAvailableError("%s '%s' does not support logic %s"%
    157                                  (solver_type, name, logic))
    159 SolverClass = solver_list[name]


File ~/.local/lib/python3.10/site-packages/pysmt/factory.py:361, in Factory._filter_solvers(self, solver_list, logic)
    359 for s, v in solver_list.items():
    360     for l in v.LOGICS:
--> 361         if logic <= l:
    362             res[s] = v
    363             break


File ~/.local/lib/python3.10/site-packages/pysmt/logics.py:273, in Logic.__le__(self, other)
    272 def __le__(self, other):
--> 273     return (self.theory <= other.theory and
    274             self.quantifier_free >= other.quantifier_free)


AttributeError: 'list' object has no attribute 'theory'

A Manual Proof Discipline

Here were some old experiments using raw smtlib idioms

Separation between module signature and proofs. Seperate files. Proof datatype to separate from bool definitions

;re
(declare-datatypes () (
  (Proof (Proof (getformula Bool)))
))
(define-fun proved ((p Proof)) Bool 
  (match p (
    ((Proof p) (not p))
  ))
)
(define-fun instan-lemma1 ((x Int)) Bool  (or (> x 0) (< x 0) (= x 0)))
(define-const lemma1 Proof (Proof (forall ((x Int)) (instan-lemma1 x))))

(push)
;------------
(assert (proved lemma1))
(check-sat)
(pop)

Other possibility for Proof type Forall (Expr) Exists (Expr) Eh. I don’t really want to reflect everything.

A shallower HOAS embedding

;re
(declare-datatype Formula (
  (Bool (bool1 Bool))
  (Forall (forall1 (Array Int Formula)))
  (Exists (exists1 (Array Int Formula)))
))


;re
(declare-datatype Exists 
  (
    (Pair (x Int) (pred (Array Int Bool)))
  ))

Universals can be proven by having (define-const x)

; other hypothesis do not mention x
;-----------------
(instan-lemma1 x)

or by ; other hypothesis can’t possibly mention x ;———– (assert (proved lemma1))

Universal can be used by using lemma1 or by instan-lemma1 on anything

Existentials can be proved

----------
(exists x (relx x))

or by

whatever
---------
(relx 3)

They can be used via

define-const x
(relx x)
; no other use of x
;----------------
conclusion may refer to x

ExistsE (ExistsE Bool) ForallE (ForallE (forallE Bool)) ; (apply?) Forall (ForallI (forallE Bool))

(Exists Int Bool) ; can I pack the witness?

define-const lemma Forall (forall x (forallE (instan-lemma x))) define-fun instan-lemma ((x Int)) ForallE

The method is related to lambda lifting. We need to give names to different free parameter things.

;re
(declare-fun even-fun (Int) Int)
(define-fun even ((x Int)) Bool (= (* 2 (even-fun x)) x))

(declare-fun odd-fun (Int) Int)
(define-fun odd ((x Int)) Bool (= (+ 1 (* 2 (odd-fun x))) x))

;(define-fun even ((x Int)) Bool (exists ((y Int)) (= (* 2 y) x)))
(define-const even-axiom Bool (forall ((x Int)) (=> (exists ((y Int)) (= (* 2 y) x)) (= (* 2 (even-fun x)) x))))
;(define-fun even ((x Int)) Bool (exists ((y Int)) (and (= y (even-fun x)) (= (* 2 y) x))))
(define-fun not-even ((x Int)) Bool (forall ((y Int)) (distinct x (* 2 y))))
;(define-fun instan-not-even ((x Int) (y Int)) (distinct x (2 * y)))

(declare-const m Int)
(declare-const n Int)
(assert (odd m))
(assert (odd n))
(assert (not (even m)))
(assert (not (even n)))
;(assert (instan-not-even m (even-fun )))
(assert-not (even (+ n m)))
;(assert (not-even (+ n m)))
(check-sat)
(get-model)

;re
(define-fun even ((x Int) (y Int)) Bool (= (* 2 y) x))
(define-fun odd ((x Int) (y Int)) Bool (= (+ 1 (* 2 y)) x))


(declare-const m Int)
(declare-const n Int)
(declare-const mo Int)
(declare-const no Int)
(assert (odd m mo))
(assert (odd n no))
(assert-not (exists ((x Int)) (even (+ m n) x)))
(check-sat)

;re
(define-fun even ((x Int)) Bool (exists ((y Int)) (= (* 2 y) x)))
(define-fun odd ((x Int)) Bool (exists ((y Int)) (= (+ 1 (* 2 y)) x)))

(declare-const m Int)
(declare-const n Int)
(assert (odd m))
(assert (odd n))
;--------------------------
(assert-not (even (+ m n)))
(check-sat)

```z3
;re
(define-fun even ((x Int)) Bool (exists ((y Int)) (= (* 2 y) x)))
(define-fun div4 ((x Int)) Bool (exists ((y Int)) (= (* 4 y) x)))

(declare-const m Int)
(assert (not (even m)))
;---------------------------
(assert-not (not (div4 m)))
(check-sat)

Positively asserting existence isn’t so bad. It is just asserting on a fresh variable.

Only allow mention variable once on existential elim in hyps

I’m a little worried about circulaity if I pack everything into one proof thing.

existential lemma can be used in hyps. Instantiated version in a conc (give explicitly). Or just let it crank.

(declare-const e) appears only once in hyp for existential elim

(assert-not
  (=> (and (existse e) (instan-lemma1 ))
      (and lemma1 )
  )

)

;re
(define-fun instan-theorem ((m Int) (n Int)) Bool (not (= (* m m) (* 2 (* n n)))))
(define-const theorem Bool (forall ((m Int) (n Int)) (not (= (* m m) (* 2 (* n n))))))


Python Hilbert Proof

An older database style attempt

Hilbert style proofs where the only inference rule is Z3_check. Register known theorems to central repository. Refer to them later. There are some basic theorems with quantifiers I couldn’t get Z3 to do even the most basic step of, so this idea feels sunk. I guess I could also admit vampire.

from z3 import *
class Kernel():
    def __init__(self):
        self._formula = {}
        self._schema = {}
        self._explanations = {}
    def axiom(self, name, formula):
        assert name not in self._formula
        self._formula[name] = (formula, None)
        return name
    def add_schema(self, name, schema):
        assert name not in self._schema
        self._schema[name] = schema
        return name
    def instan_schema(self, schema, name, *args):
        #name = self.fresh(name)
        self._formula[name] = (self._schema[schema](*args), ("schema", schema))
        return name
    def theorem(self, name, formula, *reasons, timeout=1):
        assert name not in self._formula
        s = Solver()
        #s.set("timeout", timeout)

        s.add([ self._formula[reason][0] for reason in reasons])
        # snity check here. s.check(). should return unknown. If returns unsat, we have an incosisitency
        # Even if we have inconsistency but z3 isn't finding it, that is a small comfort. 
        s.add(Not(formula))
        status = s.check()
        if status == unsat:
            self._formula[name] = (formula, reasons)
            print(f"Accepted {name}")
        elif status == sat:
            print(f"Counterexample : {s.model()}")
        elif status == unknown:
            print("Failed: reason unknown")
        else:
            print(f"unexpected status {status}")
    #def calc(self):
    #    pass
    #def definition: # is definition distinct from axiom?

t = Kernel()
p = Real("p")
t.axiom("p_def", p*p == 2)
m,n = [ToReal(x) for x in Ints("m n")]

#t.theorem("p irrational", p != m / n)
#(m / n == p)
def even(x):
    y = FreshInt()
    return Exists([y], 2 * y == x)
def odd(x):
    y = FreshInt()
    return Exists([y], 2 * y + 1 == x)
r,w = Ints("r w")
t.theorem("even odd", ForAll([r], even(r) != odd(r)))
def rational(p):
    m = FreshInt("m")
    n = FreshInt("n")
    return Exists([m,n], And(n != 0, p == ToReal(m) / ToReal(n) ))
rational(p)

#t.theorem("mndef", Implies(rational(p), p == m / n) )

#t.theorem( Implies( p == m / n, Exists([r,w], And(p == r / w,  Or(And(even(r), odd(w)), And(odd(r), even(w)) ) ))))
# how do you use this exists?

#t.theorem(   )
def evenodd(n,m):
    return even(n) != even(m)

t.theorem(0, Implies(  And(p == m / n, n != 0) , p * n == m  ))
t.theorem(1, Implies(  And(p == m / n, n != 0, evenodd(m,n)) , p * n == m  ))
#t.theorem(2, Implies(  And(p == m / n, n != 0, evenodd(m,n)) , 2 * n * n == m * m , "p_def"))
t.theorem(3, even(r) != odd(r))
#t.theorem(4, even(r * r) != odd(r * r))
t.theorem(5, Implies(r == 2 * w, even(r*r) ))
t.theorem(6, Implies(r == 2 * w, r*r == 4 * w * w ))

#t.theorem(7, Implies(r == 2 * w, even(r) ))
#t.theorem(8, Implies(even(r), even(r*r)), 5, 6, 7)
t.theorem("even or odd", Or(even(r), odd(r)))
#t.theorem("even or odd r*r", Or(even(r*r), odd(r*r)))
t.theorem(8, ForAll([r], even(r) != odd(r)))
t.theorem(9, even(r*r) != odd(r*r), 8)
# I dunno. This idea is sunk. I can't even get this to work?
# I could try not using the built in stuff I guess and wwork in FOL.


#t.theorem(5, odd(r) == odd(r * r), 3, 4)
#t.theorem(3, Implies(even(r * r), even(r) ))
#t.theorem(3, Implies(  And(2 * n * n == m * m), even(m)))




#t.theorem(0, Implies(  p == m / n , m * m == 2 * n * n   ), "p_def" )
even(r * r) != odd(r * r)
t = Hilbert()

Nat = Datatype('Nat')
Nat.declare('Zero')
Nat.declare('Succ', ('pred', Nat))
# Create the datatype
Nat = Nat.create()
Zero = Nat.Zero
Succ = Nat.Succ

n,m = Consts("n m", Nat)
plus = Function("+", Nat,Nat,Nat)
DatatypeRef.__add__ = lambda self, rhs : plus(self, rhs)
#type(n)
plus_zero = t.axiom("plus_zero", ForAll([m], Zero + m == m))
plus_succ = t.axiom("plus_succ", ForAll([n,m], Succ(n) + m == Succ(n + m)))
plus_def = [plus_zero, plus_succ]
def induction_schema(P):
    return  Implies(
                And(P(Zero), 
                    ForAll([n], Implies(P(n), P(Succ(n))))) ,
                ForAll([n], P(n))
    )
induction = t.add_schema("induction", induction_schema)

t.theorem( "plus_zero'",  ForAll([m], Zero + m == m , patterns=[Zero + m]), plus_zero) # guardedc version of the axiom
mylemma = t.instan(induction, "mylemma", lambda m : Zero + m == m + Zero )

t.theorem( "n_plus_zero", ForAll([m], Zero + m == m + Zero) ,  plus_zero, plus_succ,  mylemma )
t.theorem("test guarding", Zero + Zero == Succ(Zero), "plus_zero")

lte = Function("<=", Nat, Nat, BoolSort())
DatatypeRef.__le__ = lambda self, rhs : lte(self, rhs)

t.axiom("zero lte", ForAll([n], Zero <= n))
t.axiom("succ lte", ForAll([n,m], (Succ(n) <= Succ(m)) == (n <= m) ))
lte_def = ["zero lte", "succ lte"]


t.theorem("example",  (Zero + Zero) <= Zero,  *plus_def, *lte_def  )


reflect = Function("reflect", Nat  , IntSort())
x = Int("x")
t.axiom( "reflect succ", ForAll([x, n], (reflect(Succ(n)) == 1 + x) == (reflect(n) == x) ))
t.axiom( "reflect zero", reflect(Zero) == 0) 

t.formula