Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add changelog parsing code #11

Merged
merged 36 commits into from
Jan 17, 2025
Merged

Conversation

ericphanson
Copy link
Collaborator

@ericphanson ericphanson commented Dec 27, 2024

based on #10

closes #8

This adds parsing code to parse changelogs into a simple in-memory representation, which can be used to query for changes.

Some choices I made, which may or may not be optimal:

  • parse into fixed concrete structs, whose properties are public API
    • wanted to keep it simple and avoid a nest of getters
  • not aiming to have roundtrippable changelogs
    • i.e., we don't preserve formatting, things we don't parse out are just dropped
    • this way we have a very simple output, rather than needing to store that info
  • main struct named SimpleLog
    • in my code, it was named Changelog, but that clashes with the module name
    • I think this helps capture that it isn't a full representation, but a simplified one
  • if a version contains sections ("Added", "Breaking", etc), then any unsectioned-notes are placed under "General"
    • alternatively, we could have sectioned-changes and unsectioned-changes always (i.e. two separate fields sectioned_changes::OrderedDict{String,Vector{String}} and unsectioned_changes::Vector{String}).
    • or, alternatively, we could have only changes::OrderedDict{String,Vector{String}} and when there are no sections, use "General"
    • not really sure which is best
  • I use CommonMark for parsing, not Markdown stdlib, since I remember a lot of weird edge cases with the stdlib
    • I based the code on MarkdownAST, so the particular reader should be easily swappable
  • I build my own tree representation off of MarkdownAST's tree, and parse that tree
    • I found it hard to work off the raw MarkdownAST tree, because one needs to keep track of which section they are "within", but being "in" a section isn't represented in the MarkdownAST child/parent tree relationship
    • this adds another abstraction layer (string -> CommonMark AST -> MarkdownAST -> MarkdownHeadingTree), and it is somewhat leaky (we drop to MarkdownAST's nodes frequently)
  • I try to support a range of heading and date formats rather than being strict
    • I would like to use this at the ecosystem level, and I think being permissive about inputs is good here
    • We could probably support quite a few more formats by extending the header regex and the dateformats
  • I check in some big markdown changelogs from JuMP and Documenter. The tests are still fast to run, and I'd like to have some real in-the-wild tests.

@ericphanson
Copy link
Collaborator Author

if a version contains sections ("Added", "Breaking", etc), then any unsectioned-notes are placed under "General"
alternatively, we could have sectioned-changes and unsectioned-changes always (i.e. two separate fields sectioned_changes::OrderedDict{String,Vector{String}} and unsectioned_changes::Vector{String}).
or, alternatively, we could have only changes::OrderedDict{String,Vector{String}} and when there are no sections, use "General"
not really sure which is best

after some thought, I think it's better to have two fields, toplevel_changes and sectioned_changes. Introducing an artificial General category may be confusing to users and adds complexity to the implementation.

I've also renamed SimpleLog -> SimpleChangelog. I just don't like how SimpleLog looks/sounds, it doesn't really sound like a changelog, but some other logging thing, and I want "Changelog" in the name, as that's the name of the package. (Though too late to have Changelogs.jl and Changelog struct, unfortunately).

@ericphanson ericphanson merged commit 791e67a into JuliaDocs:master Jan 17, 2025
3 checks passed
@ericphanson ericphanson deleted the eph/parser branch January 17, 2025 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scope question: add changelog parsing code?
1 participant