Skip to content

Commit

Permalink
Merge pull request #101 from lean-dojo/dev
Browse files Browse the repository at this point in the history
Support Lean's new directory structures
  • Loading branch information
Kaiyu Yang authored Nov 29, 2023
2 parents 7ab7db7 + c4cb4ab commit bec479c
Show file tree
Hide file tree
Showing 19 changed files with 298 additions and 255 deletions.
45 changes: 22 additions & 23 deletions docs/source/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -229,28 +229,27 @@ We use LeanDojo to trace the repo in Python by specifying its URL and a commit h
from lean_dojo import LeanGitRepo, trace
repo = LeanGitRepo("https://github.com/yangky11/lean4-example", "a61b40b90afba0ee5a3357665a86f7d0bb57461d")
repo = LeanGitRepo("https://github.com/yangky11/lean4-example", "fd14c4c8b29cc74a082e5ae6f64c2fb25b28e15e")
trace(repo, dst_dir="traced_lean4-example")
After a few minutes or one hour (depending on #CPUs), it generates a :file:`traced_lean4-example` directory with the subdirectories below.
The directory structure is different from that of Lean 3, as Lean 4 uses a different build system.
The directory structure is different from that of Lean 3, as Lean 4 uses a different build system.
Please check out :ref:`troubleshooting` if you encounter any issue.

::

traced_lean4-example
└─lean4-example
├─lake-packages
│ ├─lean4
│ └─...
├─build
│ ├─ir
│ │ ├─Lean4Example.dep_paths
│ │ ├─Lean4Example.ast.json
│ │ └─Lean4Example.trace.xml
│ ├─lib
│ │ └─Lean4Example.olean
│ └─bin
├─.lake
│ ├─packages
│ │ └─lean4
│ └─build
│ ├─ir
│ │ ├─Lean4Example.dep_paths
│ │ ├─Lean4Example.ast.json
│ │ └─Lean4Example.trace.xml
│ └─lib
│ └─Lean4Example.olean
├─Lean4Example.lean
└─...

Expand All @@ -264,7 +263,7 @@ and :file:`lean-example` is the traced example repo. We call them "traced" becau
* :file:`*.trace.xml`: Syntactic and semantic information extracted from Lean.

The most important one is :file:`*.trace.xml`. Its format is different from Lean 3.
For example, below is :file:`traced_lean4-example/lean4-example/build/ir/Lean4Example.trace.xml`:
For example, below is :file:`traced_lean4-example/lean4-example/.lake/build/ir/Lean4Example.trace.xml`:

.. code-block::
:caption: Lean4Example.trace.xml
Expand All @@ -281,8 +280,8 @@ For example, below is :file:`traced_lean4-example/lean4-example/build/ir/Lean4Ex
<IdentNode4 start="(1, 6)" end="(1, 9)" leading="" trailing=" " raw_val="Nat" val="Nat"/>
<AtomNode4 start="(1, 10)" end="(1, 11)" leading="" trailing="" val="("/>
<NullNode4 start="(1, 11)" end="(1, 29)">
<IdentNode4 start="(1, 11)" end="(1, 20)" leading="" trailing=" " raw_val="add_assoc" val="add_assoc"/>
<IdentNode4 start="(1, 21)" end="(1, 29)" leading="" trailing="" raw_val="add_comm" val="add_comm"/>
<IdentNode4 start="(1, 11)" end="(1, 20)" leading="" trailing=" " raw_val="add_assoc" val="add_assoc" full_name="Nat.add_assoc" mod_name="Init.Data.Nat.Basic" def_path=".lake/packages/lean4/src/lean/Init/Data/Nat/Basic.lean" def_start="(138, 19)" def_end="(138, 28)"/>
<IdentNode4 start="(1, 21)" end="(1, 29)" leading="" trailing="" raw_val="add_comm" val="add_comm" full_name="Nat.add_comm" mod_name="Init.Data.Nat.Basic" def_path=".lake/packages/lean4/src/lean/Init/Data/Nat/Basic.lean" def_start="(131, 19)" def_end="(131, 27)"/>
</NullNode4>
<AtomNode4 start="(1, 29)" end="(1, 30)" leading="" trailing="&#10;&#10;" val=")"/>
</CommandOpenonlyNode4>
Expand Down Expand Up @@ -313,7 +312,7 @@ For example, below is :file:`traced_lean4-example/lean4-example/build/ir/Lean4Ex
</NullNode4>
<NullNode4 start="(3, 28)" end="(3, 33)">
<AtomNode4 start="(3, 28)" end="(3, 29)" leading="" trailing=" " val=":"/>
<IdentNode4 start="(3, 30)" end="(3, 33)" leading="" trailing="" raw_val="Nat" val="Nat" full_name="Nat" mod_name="Init.Prelude" def_start="(1038, 11)" def_end="(1038, 14)"/>
<IdentNode4 start="(3, 30)" end="(3, 33)" leading="" trailing="" raw_val="Nat" val="Nat" full_name="Nat" mod_name="Init.Prelude" def_path=".lake/packages/lean4/src/lean/Init/Prelude.lean" def_start="(1044, 11)" def_end="(1044, 14)"/>
</NullNode4>
<NullNode4/>
<AtomNode4 start="(3, 33)" end="(3, 34)" leading="" trailing="&#10; " val=")"/>
Expand Down Expand Up @@ -359,13 +358,13 @@ For example, below is :file:`traced_lean4-example/lean4-example/build/ir/Lean4Ex
<NullNode4 start="(5, 7)" end="(5, 40)">
<OtherNode4 start="(5, 7)" end="(5, 16)" kind="Lean.Parser.Tactic.rwRule">
<NullNode4/>
<IdentNode4 start="(5, 7)" end="(5, 16)" leading="" trailing="" raw_val="add_assoc" val="add_assoc" full_name="Nat.add_assoc" mod_name="Init.Data.Nat.Basic" def_start="(138, 19)" def_end="(138, 28)"/>
<IdentNode4 start="(5, 7)" end="(5, 16)" leading="" trailing="" raw_val="add_assoc" val="add_assoc" full_name="Nat.add_assoc" mod_name="Init.Data.Nat.Basic" def_path=".lake/packages/lean4/src/lean/Init/Data/Nat/Basic.lean" def_start="(138, 19)" def_end="(138, 28)"/>
</OtherNode4>
<AtomNode4 start="(5, 16)" end="(5, 17)" leading="" trailing=" " val=","/>
<OtherNode4 start="(5, 18)" end="(5, 28)" kind="Lean.Parser.Tactic.rwRule">
<NullNode4/>
<OtherNode4 start="(5, 18)" end="(5, 28)" kind="Lean.Parser.Term.app">
<IdentNode4 start="(5, 18)" end="(5, 26)" leading="" trailing=" " raw_val="add_comm" val="add_comm" full_name="Nat.add_comm" mod_name="Init.Data.Nat.Basic" def_start="(131, 19)" def_end="(131, 27)"/>
<IdentNode4 start="(5, 18)" end="(5, 26)" leading="" trailing=" " raw_val="add_comm" val="add_comm" full_name="Nat.add_comm" mod_name="Init.Data.Nat.Basic" def_path=".lake/packages/lean4/src/lean/Init/Data/Nat/Basic.lean" def_start="(131, 19)" def_end="(131, 27)"/>
<NullNode4 start="(5, 27)" end="(5, 28)">
<IdentNode4 start="(5, 27)" end="(5, 28)" leading="" trailing="" raw_val="b" val="b"/>
</NullNode4>
Expand All @@ -380,7 +379,7 @@ For example, below is :file:`traced_lean4-example/lean4-example/build/ir/Lean4Ex
</OtherNode4>
</OtherNode4>
</NullNode4>
<IdentNode4 start="(5, 31)" end="(5, 40)" leading="" trailing="" raw_val="add_assoc" val="add_assoc" full_name="Nat.add_assoc" mod_name="Init.Data.Nat.Basic" def_start="(138, 19)" def_end="(138, 28)"/>
<IdentNode4 start="(5, 31)" end="(5, 40)" leading="" trailing="" raw_val="add_assoc" val="add_assoc" full_name="Nat.add_assoc" mod_name="Init.Data.Nat.Basic" def_path=".lake/packages/lean4/src/lean/Init/Data/Nat/Basic.lean" def_start="(138, 19)" def_end="(138, 28)"/>
</OtherNode4>
</NullNode4>
<AtomNode4 start="(5, 40)" end="(5, 41)" leading="" trailing="&#10;&#10;" val="]"/>
Expand Down Expand Up @@ -421,7 +420,7 @@ For example, below is :file:`traced_lean4-example/lean4-example/build/ir/Lean4Ex
</NullNode4>
<NullNode4 start="(7, 16)" end="(7, 21)">
<AtomNode4 start="(7, 16)" end="(7, 17)" leading="" trailing=" " val=":"/>
<IdentNode4 start="(7, 18)" end="(7, 21)" leading="" trailing="" raw_val="Nat" val="Nat" full_name="Nat" mod_name="Init.Prelude" def_start="(1038, 11)" def_end="(1038, 14)"/>
<IdentNode4 start="(7, 18)" end="(7, 21)" leading="" trailing="" raw_val="Nat" val="Nat" full_name="Nat" mod_name="Init.Prelude" def_path=".lake/packages/lean4/src/lean/Init/Prelude.lean" def_start="(1044, 11)" def_end="(1044, 14)"/>
</NullNode4>
<NullNode4/>
<AtomNode4 start="(7, 21)" end="(7, 22)" leading="" trailing=" " val=")"/>
Expand All @@ -439,7 +438,7 @@ For example, below is :file:`traced_lean4-example/lean4-example/build/ir/Lean4Ex
</OtherNode4>
<AtomNode4 start="(7, 31)" end="(7, 32)" leading="" trailing=" " val="="/>
<OtherNode4 start="(7, 33)" end="(7, 43)" kind="Lean.Parser.Term.app">
<IdentNode4 start="(7, 33)" end="(7, 41)" leading="" trailing=" " raw_val="Nat.succ" val="Nat.succ" full_name="Nat.succ" mod_name="Init.Prelude" def_start="(1044, 5)" def_end="(1044, 9)"/>
<IdentNode4 start="(7, 33)" end="(7, 41)" leading="" trailing=" " raw_val="Nat.succ" val="Nat.succ" full_name="Nat.succ" mod_name="Init.Prelude" def_path=".lake/packages/lean4/src/lean/Init/Prelude.lean" def_start="(1050, 5)" def_end="(1050, 9)"/>
<NullNode4 start="(7, 42)" end="(7, 43)">
<IdentNode4 start="(7, 42)" end="(7, 43)" leading="" trailing=" " raw_val="a" val="a"/>
</NullNode4>
Expand Down Expand Up @@ -486,7 +485,7 @@ can be downloaded from `our AWS S3 <https://lean-dojo.s3.amazonaws.com>`_ (see :
from lean_dojo import *
repo = LeanGitRepo("https://github.com/yangky11/lean4-example", "a61b40b90afba0ee5a3357665a86f7d0bb57461d")
repo = LeanGitRepo("https://github.com/yangky11/lean4-example", "fd14c4c8b29cc74a082e5ae6f64c2fb25b28e15e")
theorem = Theorem(repo, "Lean4Example.lean", "hello_world")
with Dojo(theorem) as (dojo, init_state):
Expand Down
1 change: 1 addition & 0 deletions docs/source/limitations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Limitations
LeanDojo has the following limitations. Addressing them won't be our priority in the near future, but we welcome contributions:

* LeanDojo cannot extract data from the `lean4 <https://github.com/leanprover/lean4>`_ repo itself nor interact with theorems in it.
* Currently, LeanDojo cannot process Lean repos that use FFI, e.g., [LeanCopilot](https://github.com/lean-dojo/LeanCopilot).
* LeanDojo does not support term-based proofs or proofs that mixes tactics and terms.
* Entering all tactic-style proofs in mathlib 3 to LeanDojo, we found ~1.4% of them are misjudged as incorrect. The errors fall into a few categories documented in `test_unexpected_errors.py <https://github.com/lean-dojo/LeanDojo/blob/main/tests/interaction/test_unexpected_errors.py>`_. We didn't perform this analysis on Lean 4.
* Theorems extracted by LeanDojo are "syntactic theorems", i.e., they are Lean constants defined using keywords :code:`theorem` or :code:`lemma`. First, they are not guaranteed to be real theorems (Lean constants of type :code:`Prop`). Second, theorems defined in other ways (e.g., using :code:`def`) are not extracted.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ LeanDojo's behavior can be configured through the following environment variable
* :code:`TACTIC_CPU_LIMIT`: Number of CPUs for executing tactics (see the `--cpus` flag of `docker run <https://docs.docker.com/engine/reference/commandline/run/#memory>`_) when interacting with Lean (only applicable when :code:`CONTAINER=docker`). Default to 1.
* :code:`TACTIC_MEMORY_LIMIT`: Maximum memory when interacting with Lean (see the `--memory` flag of `docker run <https://docs.docker.com/engine/reference/commandline/run/#memory>`_) (only applicable when :code:`CONTAINER=docker`). Default to 16 GB.
* :code:`GITHUB_ACCESS_TOKEN`: GitHub `personal access token <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic>`_ for using the GitHub API. They are optional. If provided, they can increase the `API rate limit <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>`_.
* :code:`LOAD_USED_DEPS_ONLY`: Setting it to any value will cause LeanDojo to load only the dependency files that are actually used by the target repo. Otherwise, for Lean 4, it will load all files in the dependency repos. Not set by default.
* :code:`LOAD_USED_PACKAGES_ONLY`: Setting it to any value will cause LeanDojo to load only the dependency files that are actually used by the target repo. Otherwise, for Lean 4, it will load all files in the dependency repos. Not set by default.
* :code:`VERBOSE` or :code:`DEBUG`: Setting either of them to any value will cause LeanDojo to print debug information. Not set by default.

LeanDojo supports `python-dotenv <https://pypi.org/project/python-dotenv/>`_. You can use it to manage environment variables in a :file:`.env` file.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ exclude = [

[project]
name = "lean-dojo"
version = "1.4.0"
version = "1.4.1"
authors = [
{ name="Kaiyu Yang", email="kaiyuy@caltech.edu" },
]
Expand Down
4 changes: 2 additions & 2 deletions scripts/generate-benchmark-lean3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"\n",
"import lean_dojo\n",
"from lean_dojo import *\n",
"from lean_dojo.constants import LEAN3_DEPS_DIR\n",
"from lean_dojo.constants import LEAN3_PACKAGES_DIR\n",
"\n",
"random.seed(3407) # https://arxiv.org/abs/2109.08203\n",
"\n",
Expand Down Expand Up @@ -220,7 +220,7 @@
" return str(thm.theorem.file_path)\n",
" else:\n",
" # The theorem belongs to one of the dependencies.\n",
" return f\"{LEAN3_DEPS_DIR}/{thm.repo.name}/{thm.theorem.file_path}\"\n",
" return f\"{LEAN3_PACKAGES_DIR}/{thm.repo.name}/{thm.theorem.file_path}\"\n",
"\n",
"\n",
"def export_proofs(\n",
Expand Down
4 changes: 2 additions & 2 deletions scripts/generate-benchmark-lean4.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"\n",
"import lean_dojo\n",
"from lean_dojo import *\n",
"from lean_dojo.constants import LEAN4_DEPS_DIR\n",
"from lean_dojo.constants import LEAN4_PACKAGES_DIR\n",
"\n",
"random.seed(3407) # https://arxiv.org/abs/2109.08203\n",
"\n",
Expand Down Expand Up @@ -224,7 +224,7 @@
" # The theorem belongs to one of the dependencies.\n",
" for name, dep in traced_repo.dependencies.items():\n",
" if dep == thm.repo:\n",
" return f\"{LEAN4_DEPS_DIR}/{name}/{thm.theorem.file_path}\"\n",
" return f\"{LEAN4_PACKAGES_DIR}/{name}/{thm.theorem.file_path}\"\n",
" raise ValueError(f\"Unable to find the dependency {thm.repo}\")\n",
"\n",
"\n",
Expand Down
1 change: 0 additions & 1 deletion src/lean_dojo/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import os
from .data_extraction.trace import (
trace,
trace_local,
get_traced_repo_path,
is_available_in_cache,
)
Expand Down
17 changes: 12 additions & 5 deletions src/lean_dojo/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

load_dotenv()

__version__ = "1.4.0"
__version__ = "1.4.1"

logger.remove()
if "VERBOSE" in os.environ or "DEBUG" in os.environ:
Expand Down Expand Up @@ -53,7 +53,7 @@
LEAN3_URL = "https://github.com/leanprover-community/lean"
"""The URL of the Lean 3 repo."""

LEAN3_DEPS_DIR = Path("_target/deps")
LEAN3_PACKAGES_DIR = Path("_target/deps")
"""The directory where Lean 3 dependencies are stored."""

LEAN4_URL = "https://github.com/leanprover/lean4"
Expand All @@ -80,12 +80,19 @@
LEAN4_NIGHTLY_REPO = GITHUB.get_repo("leanprover/lean4-nightly")
"""The GitHub Repo for Lean 4 nightly releases."""

LEAN4_DEPS_DIR = Path("lake-packages")
"""The directory where Lean 4 dependencies are stored."""
LEAN4_PACKAGES_DIR_OLD = Path("lake-packages")
"""The directory where Lean 4 dependencies are stored (before v4.3.0-rc2)."""

LOAD_USED_DEPS_ONLY = "LOAD_USED_DEPS_ONLY" in os.environ
LEAN4_PACKAGES_DIR = Path(".lake/packages")
"""The directory where Lean 4 dependencies are stored (since v4.3.0-rc2)."""

LOAD_USED_PACKAGES_ONLY = "LOAD_USED_PACKAGES_ONLY" in os.environ
"""Only load depdendency files that are actually used by the target repo."""

LEAN4_BUILD_DIR = Path(".lake/build")

LEAN_BUILD_DIR_OLD = Path("build")

TACTIC_TIMEOUT = int(os.getenv("TACTIC_TIMEOUT", 5000))
"""Maximum time (in milliseconds) before interrupting a tactic when interacting with Lean (only for Lean 3).
"""
Expand Down
Loading

0 comments on commit bec479c

Please sign in to comment.