Parser for Python's difflib
output.
Built on top of https://github.com/yebrahim/difflibparser/blob/master/difflibparser.py
Key changes from above library:
- Using generator pattern instead of using iterator pattern when iterating over diffs
- Using
@dataclass
over generic dictionaries to enforce strict typing - Using type annotations for strict typing
pip install difflib-parser
from difflib_parser import difflib_parser
parser = difflib_parser.DiffParser(["hello world"], ["hello world!"])
for diff in parser.iter_diffs():
print(diff)
class DiffCode(Enum):
SAME = 0
RIGHT_ONLY = 1
LEFT_ONLY = 2
CHANGED = 3
@dataclass
class Diff:
code: DiffCode
line: str
left_changes: List[int] | None = None
right_changes: List[int] | None = None
newline: str | None = None
A difflib
output might look something like this:
>>> import difflib
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["hola world"]))))
- hello world
? ^ ^^
+ hola world
? ^ ^
The specifics of diff interpretation can be found in the documentation.
There are concretely four types of changes we are interested in:
- No change
- A new line is added
- An existing line is removed
- An existing line is edited
Given that the last two cases operate on existing lines, they will always be preceded by -
. As such, we need to handle them delicately.
If an existing line is removed, it will not have any follow-up lines.
If an existing line is edited, it will have several follow-up lines that provide details on the values that have been changed.
From these follow-up lines, we can further case the changes made to a line:
- Only additions made (i.e.
"Hello world"
->"Hello world!"
) - Only removals made (i.e.
"Hello world"
->"Hllo world"
) - Both additions and removals made (i.e.
"Hello world"
->"Hola world!"
)
Each of them have their unique follow-up lines:
-
,+
,?
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["hello world!"]))))
- hello world
+ hello world!
? +
-
,?
,+
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["hllo world"]))))
- hello world
? -
+ hllo world
-
,?
,+
,?
>>> print("\n".join(list(difflib.ndiff(["hello world"], ["helo world!"]))))
- hello world
? -
+ helo world!
? +
As such, we have included them as separate patterns to process.