Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pydoclint] Add support for Sphinx-style docstrings in the pydoclint rules. #13286

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

augustelalande
Copy link
Contributor

@augustelalande augustelalande commented Sep 8, 2024

Summary

Add support for Sphinx-style docstrings in the pydoclint rules. This also defines (out of necessity) the set of pydocstyle rules enforced when the sphinx style is selected. For lack of a good reference I just copied the exclusions from the pep257 selection, minus Rule::SectionNotOverIndented.

Part of #12434.

Resolves #12520.

Test Plan

Test cases added.

Copy link
Contributor

github-actions bot commented Sep 8, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -36 violations, +0 -0 fixes in 2 projects; 52 projects unchanged)

bokeh/bokeh (+0 -2 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- src/bokeh/document/document.py:767:9: DOC201 `return` is not documented in docstring
- src/bokeh/document/models.py:160:9: DOC201 `return` is not documented in docstring

pandas-dev/pandas (+0 -34 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- doc/source/user_guide/style.ipynb:cell 113:31:89: E501 Line too long (90 > 88)
- doc/source/user_guide/style.ipynb:cell 113:33:89: E501 Line too long (98 > 88)
- doc/source/user_guide/style.ipynb:cell 123:5:56: E741 Ambiguous variable name: `l`
- doc/source/user_guide/style.ipynb:cell 125:2:13: C408 Unnecessary `dict` call (rewrite as a literal)
- doc/source/user_guide/style.ipynb:cell 125:4:13: C408 Unnecessary `dict` call (rewrite as a literal)
- doc/source/user_guide/style.ipynb:cell 125:6:13: C408 Unnecessary `dict` call (rewrite as a literal)
- doc/source/user_guide/style.ipynb:cell 125:8:13: C408 Unnecessary `dict` call (rewrite as a literal)
- doc/source/user_guide/style.ipynb:cell 126:1:1: NPY002 Replace legacy `np.random.seed` call with `np.random.Generator`
- doc/source/user_guide/style.ipynb:cell 126:2:1: PLW0127 Self-assignment of variable `cmap`
- doc/source/user_guide/style.ipynb:cell 126:2:8: PLW0128 Redeclared variable `cmap` in assignment
- doc/source/user_guide/style.ipynb:cell 126:3:22: NPY002 Replace legacy `np.random.randn` call with `np.random.Generator`
- doc/source/user_guide/style.ipynb:cell 128:1:22: NPY002 Replace legacy `np.random.randn` call with `np.random.Generator`
- doc/source/user_guide/style.ipynb:cell 134:1:89: E501 Line too long (93 > 88)
- doc/source/user_guide/style.ipynb:cell 141:1:89: E501 Line too long (95 > 88)
- doc/source/user_guide/style.ipynb:cell 157:1:38: F811 Redefinition of unused `f` from cell 123, line 5
- doc/source/user_guide/style.ipynb:cell 15:2:89: E501 Line too long (103 > 88)
- doc/source/user_guide/style.ipynb:cell 15:3:89: E501 Line too long (156 > 88)
- doc/source/user_guide/style.ipynb:cell 16:48:89: E501 Line too long (103 > 88)
... 12 additional changes omitted for rule E501
- doc/source/user_guide/style.ipynb:cell 37:1:1: NPY002 Replace legacy `np.random.seed` call with `np.random.Generator`
- doc/source/user_guide/style.ipynb:cell 37:2:20: NPY002 Replace legacy `np.random.randn` call with `np.random.Generator`
- doc/source/user_guide/style.ipynb:cell 60:1:20: NPY002 Replace legacy `np.random.randn` call with `np.random.Generator`
- doc/source/user_guide/style.ipynb:cell 6:1:27: NPY002 Replace legacy `np.random.rand` call with `np.random.Generator`
... 2 additional changes omitted for rule NPY002

Changes by rule (8 rules affected)

code total + violation - violation + fix - fix
E501 18 0 18 0 0
NPY002 8 0 8 0 0
C408 4 0 4 0 0
DOC201 2 0 2 0 0
E741 1 0 1 0 0
PLW0127 1 0 1 0 0
PLW0128 1 0 1 0 0
F811 1 0 1 0 0

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser
Copy link
Member

MichaReiser commented Sep 10, 2024

Wow thanks. This is great.

There are a lot of new ecosystem reports. Some of the superset violations look somewhat suspicious:

Like why are all the names empty?

@MichaReiser
Copy link
Member

It might just be that the docstyle is a bit too aggressive right now. Should this be considered a sphinx docstring?

Before I try to figure it out from the code. Did you gate the new code style behind preview mode? I think we have to because this change greatly increases the scope of the rule and it's probably good to get some feedback before rolling it out to everyone

@augustelalande
Copy link
Contributor Author

All the pydoclint rules are in preview

@MichaReiser
Copy link
Member

Looking at the airflow result I'm unsure if the parsing is correct. Could you take a look at them?

Why is param over indented?

Why is param considered a section?

The other thing I'm unsure about is whether the blank line rule between parameters should be disabled by default. All the examples I found don't require blank lines between parameters. What do you think?

@augustelalande
Copy link
Contributor Author

Why is param over indented?

It is over indented relative to the docstring as a whole, since it is indented within the .. seealso:: section

Why is param considered a section?

Sphinx doesn't have sections per se, so to make the sphinxs docstrings fit into the mold in use for google and numpy docstrings, I consider each :param title: Text to be a section.

Note: It's not possible to combine all the param entries into a single param sections, since as far as I know there is no obligation that params must follow each other, although in practice they usually would.

@MichaReiser
Copy link
Member

Sphinx doesn't have sections per se, so to make the sphinxs docstrings fit into the mold in use for google and numpy docstrings, I consider each :param title: Text to be a section.

That makes sense but it seems confusing to users because I wouldn't consider those sections and it seems to cause many false positives in other rules. I would have to think this through more carefully but would we emit different diagnostics if we designed this from scratch?

@tmke8
Copy link
Contributor

tmke8 commented Sep 11, 2024

Everything sphinx can parse successfully with default settings should be fine, no? And sphinx definitely doesn't require blank lines between :param ...: entries.

Here is a docstring that uses a lot of sphinx features and which is parsed correctly by sphinx.

class SAM(Optimizer):
    """Sharpness Aware Minimization

    Implements the 'Sharpness Aware Minimization' (SAM) algorithm introducued in
    `Sharpness Aware Minimization`_) and the adaptive variant of it introduced in `ASAM`_.

    We can use inline math: :math:`\\alpha`. You don't need the double backslash if
    the python docstring is a raw string (r-string).

    Block math:

    .. math::

       (a + b)^2 = a^2 + 2ab + b^2

    .. note::
        Implementation based on: https://github.com/cybertronai/pytorch-lamb
    .. seealso:: :class:`LAMB`

    .. _Sharpness Aware Minimization:
        https://arxiv.org/abs/2010.01412
    .. _ASAM:
        https://arxiv.org/abs/2102.11600

    :param base_optimizer: Base optimizer for SAM. (If the explanation goes
        over two lines, the second line needs to be indented.
    :param rho: Neighborhood size.

    :raises ValueError: if ``rho`` is negative.

    :example:
        Everything associated with the example section needs to be indented, so that Sphinx
        knows what belongs to the example.

        .. code-block:: python

           # Use AdamW as the base optimizer.
           base_optimizer = AdamW(model.parameters())
           # Wrap the base optimizer in SAM.
           optimizer = SAM(base_optimizer)

           # Closure required for recomputing the loss after computing epsilon(w).
           def _closure():
               return loss_function(logits=model(input), targets=targets)

           loss = _closure()
           loss.backward()

           optimizer.step(closure=_closure)
           optimizer.zero_grad()

        You can also write it in this syntax:

        >>> import template
        >>> a = template.MainClass1()
        >>> a.function1(1,1,1)
        2
    """

@augustelalande
Copy link
Contributor Author

Sphinx doesn't have sections per se, so to make the sphinxs docstrings fit into the mold in use for google and numpy docstrings, I consider each :param title: Text to be a section.

That makes sense but it seems confusing to users because I wouldn't consider those sections and it seems to cause many false positives in other rules. I would have to think this through more carefully but would we emit different diagnostics if we designed this from scratch?

We probably would not emit D410 among others for sphinx style docstrings. And in fact if the user specifies sphinx as their docstring convention then those rules get disabled (each convention runs a different subset of D). The problem is when the convention is unspecified we still run it.

One solution would be to disable innapropriate rules if sphinx is detected. There is some precendence for that, and I have already done it for some of them.

I don't know what a "proper" implementation of "D410" would look like for sphinx, since there are no "sections" per se

@augustelalande
Copy link
Contributor Author

One of the problems is that pydocstyle doesn't support sphinx so there is nothing to reference. We could choose to not run pydocstyle rules if sphinx is detected or specified.

@MichaReiser MichaReiser added rule Implementing or modifying a lint rule needs-decision Awaiting a decision from a maintainer labels Sep 12, 2024
@tmke8
Copy link
Contributor

tmke8 commented Sep 19, 2024

So, is the problem here that pydocstyle and pydoclint are incompatible? I see a --convention=pep257 flag in pydocstyle – this sounds like the convention that should be compatible with "sphinx-style"?

@MichaReiser MichaReiser self-assigned this Sep 20, 2024
@charliermarsh
Copy link
Member

One thing I'm not following from the conversation is why there are so many D rule changes in stable. I know the pydoclint rules are in preview, but I think this Sphinx detection should be too, right? In other words, we shouldn't see any non-preview changes in this PR for the D rules.

Another thing we could do is: never infer Sphinx. Require that users set it via a convention. That would help a lot with backwards compatibility while still giving us these benefits. On top of that, I'd err on the side of enabling a small set of rules for Sphinx, those we are confident in.

Finally... I think pydocstyle did add Sphinx support, at least for D417? Can we make sure that whatever we do here is consistent with the pydocstyle implementation? See: PyCQA/pydocstyle#595

@augustelalande
Copy link
Contributor Author

Per above, I have improved the gating so sphinx will only be active in preview mode.

I have also implemented the change to never infer sphinx, which should prevent a lot of the issues. But I am concerned that this prevents the ecosystem check to work properly, since they don't specify sphinx in their configs. Would love to hear some insights on this.

@augustelalande
Copy link
Contributor Author

augustelalande commented Sep 28, 2024

Ok so D417 is undocumented-param, which I think I can address in #13280 if this gets merged first (otherwise I'll add it here). Note, we already don't have full parity with D417 since the ruff version only supports google style.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Awaiting a decision from a maintainer rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC501 throwing errors when using Sphinx style docstrings
4 participants