Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping the repl argument to re.sub(), re.subn() #128138

Open
finite-state-machine opened this issue Dec 20, 2024 · 0 comments
Open

Escaping the repl argument to re.sub(), re.subn() #128138

finite-state-machine opened this issue Dec 20, 2024 · 0 comments
Labels
docs Documentation in the Doc dir topic-regex

Comments

@finite-state-machine
Copy link
Contributor

finite-state-machine commented Dec 20, 2024

Documentation

It's not immediately obvious how to escape the repl (replacement for matches) argument to re.sub() and re.subn() if repl is chosen by a potentially hostile actor. Obviously, re.escape() isn't the answer, as that escapes far too much.

The right answer seems to be escaped_repl = raw_repl.replace(bslash, bslash*2) where bslash = '\\'. It might be worth adding this to the documentation.

Here's the code I used to empirically validate the "right answer" given above (checked on Python 3.8 & 3.12):

from __future__ import annotations
import re, sys

def escape_re_sub_repl(repl: str) -> str:

    return repl.replace('\\', '\\\\')

def test_escape_re_sub_repl() -> None:

    backslash = '\\'
    assert len(backslash) == 1

    base_regex = 'TARGET'
    assert base_regex == re.escape(base_regex)
    base_prefix = 'BEFORE:'
    base_suffix = ':AFTER'
    base_input = f'{base_prefix}{base_regex}{base_suffix}'

    base_chars = tuple(chr(p) for p in range(sys.maxunicode + 1))
    escaped_chars = tuple(f'{backslash}{c}' for c in base_chars)
    test_cases = base_chars + escaped_chars
    assert {len(f) for f in test_cases} == {1, 2}

    for raw in test_cases:
        repl = escape_re_sub_repl(raw)
        got, change_count = re.subn(base_regex, repl, base_input)
        assert change_count == 1
        assert got == f'{base_prefix}{raw}{base_suffix}'
@finite-state-machine finite-state-machine added the docs Documentation in the Doc dir label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir topic-regex
Projects
Status: Todo
Development

No branches or pull requests

2 participants