Skip to content

Commit

Permalink
Merge pull request #175 from lean-dojo/kaiyu
Browse files Browse the repository at this point in the history
Reduce the Initialization Time When Interacting with Lean
  • Loading branch information
Peiyang-Song authored Jul 3, 2024
2 parents 6369427 + e8273e2 commit d5e6547
Show file tree
Hide file tree
Showing 23 changed files with 475 additions and 1,067 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ pip install .

* [LeanDojo Website](https://leandojo.org/): The official website of LeanDojo.
* [LeanDojo Benchmark](https://doi.org/10.5281/zenodo.8016385) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8016385.svg)](https://doi.org/10.5281/zenodo.8016385): The dataset used in our paper, consisting of theorems and proofs extracted from [mathlib](https://github.com/leanprover-community/mathlib/commits/19c869efa56bbb8b500f2724c0b77261edbfa28c) by [generate-benchmark-lean3.ipynb](./scripts/generate-benchmark-lean3.ipynb).
* [LeanDojo Benchmark 4](https://doi.org/10.5281/zenodo.8040109) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8040109.svg)](https://doi.org/10.5281/zenodo.8040109): The Lean 4 version of LeanDojo Benchmark, consisting of theorems and proofs extracted from [mathlib4](https://github.com/leanprover-community/mathlib4/commit/fe4454af900584467d21f4fd4fe951d29d9332a7) by [generate-benchmark-lean4.ipynb](./scripts/generate-benchmark-lean4.ipynb).
* [LeanDojo Benchmark 4](https://doi.org/10.5281/zenodo.8040109) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8040109.svg)](https://doi.org/10.5281/zenodo.8040109): The Lean 4 version of LeanDojo Benchmark, consisting of theorems and proofs extracted from [mathlib4](https://github.com/leanprover-community/mathlib4/commit/29dcec074de168ac2bf835a77ef68bbe069194c5) by [generate-benchmark-lean4.ipynb](./scripts/generate-benchmark-lean4.ipynb).
* [ReProver](https://github.com/lean-dojo/ReProver): The ReProver (Retrieval-Augmented Prover) model in our paper.
* [LeanDojo ChatGPT Plugin](https://github.com/lean-dojo/LeanDojoChatGPT)
* [Lean Copilot: Running language models as copilots for theorem proving in Lean](https://github.com/lean-dojo/LeanCopilot)
Expand Down
14 changes: 0 additions & 14 deletions docker/Dockerfile

This file was deleted.

4 changes: 0 additions & 4 deletions docker/build_docker_image.sh

This file was deleted.

2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
project = "LeanDojo"
copyright = "2023, LeanDojo Team"
author = "Kaiyu Yang"
release = "1.9.1"
release = "2.0.0"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ We use LeanDojo to trace the repo in Python by specifying its URL and a commit h
from lean_dojo import LeanGitRepo, trace
repo = LeanGitRepo("https://github.com/yangky11/lean4-example", "04e29174a45eefaccb49b835a372aa762321194e")
repo = LeanGitRepo("https://github.com/yangky11/lean4-example", "7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f")
trace(repo, dst_dir="traced_lean4-example")
After a few minutes, it generates a :file:`traced_lean4-example` directory with the subdirectories below.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Related Links

* `LeanDojo Website <https://leandojo.org/>`_: The official website of LeanDojo.
* `LeanDojo Benchmark <https://zenodo.org/doi/10.5281/zenodo.8016385>`_: The Lean 3 dataset used in `our paper <https://arxiv.org/abs/2306.15626>`_, consisting of theorems and proofs extracted from `mathlib <https://github.com/leanprover-community/mathlib/commits/19c869efa56bbb8b500f2724c0b77261edbfa28c>`_ by `generate-benchmark-lean3.ipynb <https://github.com/lean-dojo/LeanDojo/blob/main/scripts/generate-benchmark-lean3.ipynb>`_.
* `LeanDojo Benchmark 4 <https://zenodo.org/doi/10.5281/zenodo.8040109>`_: The Lean 4 version of LeanDojo Benchmark, consisting of theorems and proofs extracted from `mathlib4 <https://github.com/leanprover-community/mathlib4/commit/fe4454af900584467d21f4fd4fe951d29d9332a7>`_ by `generate-benchmark-lean4.ipynb <https://github.com/lean-dojo/LeanDojo/blob/main/scripts/generate-benchmark-lean4.ipynb>`_.
* `LeanDojo Benchmark 4 <https://zenodo.org/doi/10.5281/zenodo.8040109>`_: The Lean 4 version of LeanDojo Benchmark, consisting of theorems and proofs extracted from `mathlib4 <https://github.com/leanprover-community/mathlib4/commit/29dcec074de168ac2bf835a77ef68bbe069194c5>`_ by `generate-benchmark-lean4.ipynb <https://github.com/lean-dojo/LeanDojo/blob/main/scripts/generate-benchmark-lean4.ipynb>`_.
* `ReProver <https://github.com/lean-dojo/ReProver>`_: The ReProver (Retrieval-Augmented Prover) model in our paper.
* `LeanInfer <https://github.com/lean-dojo/LeanInfer>`_: Native neural network inference for running ReProver directly in Lean 4.

Expand Down
17 changes: 1 addition & 16 deletions docs/source/limitations.rst
Original file line number Diff line number Diff line change
@@ -1,26 +1,11 @@
.. _limitations:

Limitations and Caveats
=======================


Limitations
***********
===========

LeanDojo has the following limitations. Addressing them won't be our priority in the near future, but we welcome contributions:

* LeanDojo cannot extract data from the `lean4 <https://github.com/leanprover/lean4>`_ repo itself nor interact with theorems in it.
* Currently, LeanDojo cannot process Lean repos that use FFI, e.g., [LeanCopilot](https://github.com/lean-dojo/LeanCopilot).
* LeanDojo does not support term-based proofs or proofs that mixes tactics and terms.
* Entering all tactic-style proofs in mathlib 3 to LeanDojo, we found ~1.4% of them are misjudged as incorrect. The errors fall into a few categories documented in `test_unexpected_errors.py <https://github.com/lean-dojo/LeanDojo/blob/main/tests/interaction/test_unexpected_errors.py>`_. We didn't perform this analysis on Lean 4.
* Theorems extracted by LeanDojo are "syntactic theorems", i.e., they are Lean constants defined using keywords :code:`theorem` or :code:`lemma`. First, they are not guaranteed to be real theorems (Lean constants of type :code:`Prop`). Second, theorems defined in other ways (e.g., using :code:`def`) are not extracted.
* Tracing mathlib 3 produces buggy :code:`IdentNode`. Their names start with :code:`user__`. So you can search for :code:`name="user__` in the generated :file:`*.trace.xml` files. We haven't figured out the exact reason underlying this bug, but it's likely to be related to Lean's parser and `mathlib's alias command <https://leanprover-community.github.io/mathlib_docs/tactic/alias.html>`_.
* We require Lean 3 >= v3.42.1. Otherwise, the `modification patch <https://github.com/lean-dojo/LeanDojo/blob/main/src/lean_dojo/data_extraction/0001-Modify-Lean-for-proof-recording.patch>`_ cannot be applied.
* We have problems tracing a few premises related to :code:`quot` in Lean 3. See the warnings in `this Jupyter notebook <https://github.com/lean-dojo/LeanDojo/blob/main/scripts/generate-lean3-benchmark.ipynb>`_.
* Lean 3 ASTs generated by :code:`lean --ast --tsast --tspp` have a small number of errors.


Caveats
*******

* When LeanDojo is killed, it may leave temporary files in the system's temporary directory (or the directory specified by the :code:`TMP_DIR` environment variable). It may be necessary to manually clean up these files occasionally.
10 changes: 1 addition & 9 deletions docs/source/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,4 @@ Tracing Repos

The most likely reason is your machine doesn't have enough memory. The amount of
memory required depends on the repo you're tracing. For large repos, such as mathlib, you need at least 32 GB memory. If getting more memory is not an option,
you can try a smaller repo. If your machine has enough memory but the process still gets killed, please check
whether your Docker has access to all resources of host machine (see `here <https://docs.docker.com/desktop/settings/mac/#resources>`_).

Interacting with Lean
*********************

* :code:`docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist`

Make sure Docker has access to the :code:`/tmp` directory. If you're using Docker Desktop, go to Settings -> Resources -> File sharing.
you can try a smaller repo.
28 changes: 2 additions & 26 deletions docs/source/user-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,6 @@ In LeanDojo, tracing is done by running `build_lean3_repo.py <https://github.com
with our `modified Lean 3 <https://github.com/lean-dojo/LeanDojo/blob/main/src/lean_dojo/data_extraction/0001-Modify-Lean-for-proof-recording.patch>`_ or
`build_lean4_repo.py <https://github.com/lean-dojo/LeanDojo/blob/main/src/lean_dojo/data_extraction/build_lean4_repo.py>`_ with
`ExtractData.lean <https://github.com/lean-dojo/LeanDojo/blob/main/src/lean_dojo/data_extraction/ExtractData.lean>`_
By default, we perform tracing in a `Docker container <https://github.com/lean-dojo/LeanDojo/blob/main/docker/Dockerfile>`_ (configurable via the :code:`CONTAINER` environment variable).
Traced files are implemented by the :class:`lean_dojo.data_extraction.traced_data.TracedFile` class.


Expand Down Expand Up @@ -232,38 +231,15 @@ LeanDojo's behavior can be configured through the following environment variable
* :code:`TMP_DIR`: Temporary directory used by LeanDojo for storing intermediate files. Default to the systems' global temporary directory.
* :code:`NUM_PROCS`: Number of parallel processes for data extraction. Default to 32 or the number of CPUs (whichever is smaller).
* :code:`TACTIC_TIMEOUT`: Maximum time (in milliseconds) before interrupting a tactic when interacting with Lean (only applicable to Lean 3). Default to 5000.
* :code:`CONTAINER`: The container used for running LeanDojo. Default to :code:`native`, i.e., running without any container, but also supports :code:`docker`. See :ref:`running-within-docker`.
* :code:`TACTIC_CPU_LIMIT`: Number of CPUs for executing tactics (see the `--cpus` flag of `docker run <https://docs.docker.com/engine/reference/commandline/run/#memory>`_) when interacting with Lean (only applicable when :code:`CONTAINER=docker`). Default to 1.
* :code:`TACTIC_MEMORY_LIMIT`: Maximum memory when interacting with Lean (see the `--memory` flag of `docker run <https://docs.docker.com/engine/reference/commandline/run/#memory>`_) (only applicable when :code:`CONTAINER=docker`). Default to 16 GB.
* :code:`TACTIC_CPU_LIMIT`: Number of CPUs for executing tactics when interacting with Lean. Default to 1.
* :code:`TACTIC_MEMORY_LIMIT`: Maximum memory when interacting with Lean. Default to 16 GB.
* :code:`GITHUB_ACCESS_TOKEN`: GitHub `personal access token <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic>`_ for using the GitHub API. They are optional. If provided, they can increase the `API rate limit <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>`_.
* :code:`LOAD_USED_PACKAGES_ONLY`: Setting it to any value will cause LeanDojo to load only the dependency files that are actually used by the target repo. Otherwise, for Lean 4, it will load all files in the dependency repos. Not set by default.
* :code:`VERBOSE` or :code:`DEBUG`: Setting either of them to any value will cause LeanDojo to print debug information. Not set by default.

LeanDojo supports `python-dotenv <https://pypi.org/project/python-dotenv/>`_. You can use it to manage environment variables in a :file:`.env` file.


.. _running-within-docker:

Running within Docker (Important for Lean 3)
********************************************

By default, LeanDojo performs data extraction and interaction by running Lean directly (:code:`CONTAINER=native`).
However, this may be problematic for Lean 3, as LeanDojo builds a modified version of Lean 3 from its C++ code, which requires certain dependencies that can be hard to get right.
Therefore, if you use LeanDojo with Lean 3, we strongly recommend setting the environment variable :code:`CONTAINER` to :code:`docker`, which instructs LeanDojo to run relevant parts in a Docker container.
You will need Docker installed on your machine, but you do not need to build or launch Docker containers manually. LeanDojo will do it for you.

That said, there are scenarios where running within Docker is not an option, e.g.,
when you are on a remote server that does not have Docker installed, or
when your server requires wrapping the entire job as a Docker container and you want to
avoid the troubles caused by `running Docker within Docker <https://stackoverflow.com/questions/27879713/is-it-ok-to-run-docker-from-inside-docker>`_.

If you want to use LeanDojo with Lean 3 but cannot use Docker, follow these steps to run it without Docker:

* See if you can follow the CMake instructions to build `Lean 3 <https://github.com/leanprover-community/lean/blob/master/doc/make/index.md#generic-build-instructions>`_ from source. Resolve any dependency issues.
* Make sure you can build the Lean 3 repo you want by :code:`leanpkg build`.



Questions and Bugs
******************

Expand Down
5 changes: 2 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,16 @@ build-backend = "hatchling.build"
[tool.hatch.build.targets.sdist]
exclude = [
"/.github",
"/docker",
"/docs",
"/images",
"/scripts",
]

[project]
name = "lean-dojo"
version = "1.9.1"
version = "2.0.0"
authors = [
{ name="Kaiyu Yang", email="kaiyuy@caltech.edu" },
{ name="Kaiyu Yang", email="kaiyuy@meta.com" },
]
description = "LeanDojo: Machine Learning for Theorem Proving in Lean"
keywords = ["theorem proving", "machine learning", "Lean"]
Expand Down
Loading

0 comments on commit d5e6547

Please sign in to comment.