Skip to content

Commit

Permalink
wip
Browse files Browse the repository at this point in the history
  • Loading branch information
biojppm committed Mar 28, 2024
1 parent 9e75661 commit f511768
Show file tree
Hide file tree
Showing 10 changed files with 86 additions and 70 deletions.
10 changes: 8 additions & 2 deletions .github/workflows/coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ env:
jobs:
#----------------------------------------------------------------------------
coverage:
name: coverage/c++${{matrix.std}}
name: coverage/c++${{matrix.std}}${{matrix.cmk}}
if: |
(!contains(github.event.head_commit.message, 'skip all')) ||
(!contains(github.event.head_commit.message, 'skip coverage')) ||
Expand All @@ -39,7 +39,13 @@ jobs:
include:
- {std: 11, cxx: g++-9, bt: Coverage, os: ubuntu-20.04}
- {std: 17, cxx: g++-9, bt: Coverage, os: ubuntu-20.04}
env: {STD: "${{matrix.std}}", CXX_: "${{matrix.cxx}}", BT: "${{matrix.bt}}", BITLINKS: "${{matrix.bitlinks}}", VG: "${{matrix.vg}}", SAN: "${{matrix.san}}", LINT: "${{matrix.lint}}", OS: "${{matrix.os}}",
# test also with the debug code enabled
- {std: 11, cxx: g++-9, bt: Coverage, os: ubuntu-20.04, cmk: "-DRYML_DBG=ON"}
- {std: 17, cxx: g++-9, bt: Coverage, os: ubuntu-20.04, cmk: "-DRYML_DBG=ON"}
env: {STD: "${{matrix.std}}", CXX_: "${{matrix.cxx}}", BT: "${{matrix.bt}}",
BITLINKS: "${{matrix.bitlinks}}", VG: "${{matrix.vg}}", SAN: "${{matrix.san}}",
LINT: "${{matrix.lint}}", OS: "${{matrix.os}}",
CMAKE_FLAGS: "${{matrix.cmk}}",
CODECOV_TOKEN: "${{secrets.CODECOV_TOKEN}}",
COVERALLS_REPO_TOKEN: "${{secrets.COVERALLS_REPO_TOKEN}}",
# coveralls disabled: https://github.com/lemurheavy/coveralls-public/issues/1665
Expand Down
9 changes: 7 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,6 @@ c4_add_library(ryml
INCORPORATE c4core
)

set_property(TARGET ryml PROPERTY CXX_STANDARD 17) # TO BE REMOVED!!!!!!!

if(RYML_WITH_TAB_TOKENS)
target_compile_definitions(ryml PUBLIC RYML_WITH_TAB_TOKENS)
endif()
Expand All @@ -85,6 +83,13 @@ if(RYML_DBG)
target_compile_definitions(ryml PRIVATE RYML_DBG)
endif()

if(CMAKE_COMPILER_IS_GNUCXX)
option(RYML_FANALYZER "Compile with -fanalyzer https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Static-Analyzer-Options.html" OFF)
if(RYML_FANALYZER)
target_compile_options(ryml PUBLIC -fanalyzer)
endif()
endif()


#-------------------------------------------------------

Expand Down
37 changes: 26 additions & 11 deletions changelog/current.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,42 @@
### Parser refactor

The parser was completely refactored. This was a large and hard job carried out over several months, but the result is:
The parser was completely refactored ([#PR414](https://github.com/biojppm/rapidyaml/pull/414)). This was a large and hard job carried out over several months, and the result is:

- A new event-based parser engine was added, enabling the improvements described below. This engine uses a templated event handler. The parsing code was fully rewritten, and is now much more simple, and much easier to work with and fix.
- YAML standard-conformance was improved significantly. Along with many other smaller fixes (too many to list here), the main changes are the following:
- A new event-based parser engine is now in place, enabling the improvements described below. This engine uses a templated event handler. The parsing code was fully rewritten, and is now much more simple (albeit longer), and much easier to work with and fix.
- YAML standard-conformance was improved significantly. Along with many smaller fixes and additions, (too many to list here), the main changes are the following:
- The parser engine can now successfully parse container keys, **but** as before, the ryml tree cannot accomodate these.
- Anchor keys can now be terminated with colon (eg, `&anchor: key: val`).
- The parser engine can be used to create native trees in other programming languages, or in cases where the user *must* have container keys.
- The parser engine can now be used to create native trees in other programming languages, or in cases where the user *must* have container keys.
- A strict JSON parser was added. Use the `parse_json_...()` family of functions to parse json in stricter mode than flow-style YAML.
- The YAML style information is now fully preserved through parsing/emitting round trips. This was made possible because the event model of the parsing engine now incorporates style varieties. So, for example:
- a scalar parsed from a plain/single-quoted/double-quoted/block-literal/block-folded scalar will be emitted always using its original style in the YAML source
- a container parsed in block-style will always be emitted in block-style
- a container parsed in flow-style will always be emitted in flow-style
Because of this, the style of YAML emitted by ryml will now change, if that YAML was parsed through ryml.
Because of this, the style of YAML emitted by ryml changes with from before, if that YAML was parsed through ryml.
- Scalar filtering was improved and is now done in place, except where the scalar expands and does not fit its initial range, in which case the scalar is filtered out of place to the tree's arena.
- The parser now offers methods to filter scalars in place or out of place. Filtering can be disabled while parsing, to ensure a fully-readonly parse, but this feature is still experimental and untested, given the scope of the rewrite work.
- The parser now offers methods to filter scalars in place or out of place. Filtering can be disabled while parsing, to ensure a fully-readonly parse (but this feature is still experimental and untested, given the scope of the rewrite work).
- Parsing performance improved (benchmark results incoming) from reduced parser branching.
- Emitting performance improved (benchmark results incoming), as the emitting code no longer has to read the full scalars to decide on an appropriate emit style.
- Emitting helper predicates were added:
```
/** choose an emitting style based on the scalar's contents */
type_bits scalar_style_choose(csubstr scalar) noexcept;
/** query whether a scalar can be encoded using single quotes. This is
* possible, notably when there is leading whitespace after a newline. */
bool scalar_style_query_squo(csubstr s) noexcept;
/** query whether a scalar can be encoded using plain style (no
* quotes, not a literal/folded block scalar). */
bool scalar_style_query_plain(csubstr s) noexcept;
```


As a result of the refactor, there are several changes with potential impact in client code:
- The existing parser methods were all removed. Use the corresponding `c4::yml::parse_...(Parser*, ...)` function from the header `c4/yml/parse.hpp`.
- `NodeType` declaration went to a separate header file `c4/yml/node_type.hpp` (previously it was in `c4/yml/tree.hpp`).
- Some of the node type flags were removed, and some other flags were added. Caveat emptor.
- Notably `KEYQUO` and `VALQUO` are now masks of the several quoted style flags, and saw their meaning adjusted accordingly.
As a result of the refactor, there are some limited changes with impact in client code. Even though this was a large refactor, effort was directed at keeping maximal backwards compatibility, and the changes are not wide. But they still exist:
- The existing `parse_...()` methods in the `Parser` class were all removed. Use the corresponding `parse_...(Parser*, ...)` function from the header [`c4/yml/parse.hpp`](https://github.com/biojppm/master/src/c4/yml/parse.hpp).
- When instantiated by the user, the parser now needs to receive a `EventHandlerTree` object, which is responsible for building the tree. Although fully functional and tested, the structure of this class is still somewhat experimental and is still likely to change. There is an alternative event handler implementation responsible for producing the events for the YAML test suite in `test/test_suite/test_suite_event_handler.hpp`.
- The declaration and definition of `NodeType` was moved to a separate header file `c4/yml/node_type.hpp` (previously it was in `c4/yml/tree.hpp`).
- Some of the node type flags were removed, and several flags (and combination flags) were added.
- Most of the existing flags are kept, as well as their meaning.
- `KEYQUO` and `VALQUO` are now masks of the several style flags for quoted scalars. In general, however, client code using these flags and `.is_val_quoted()` or `.is_key_quoted()` is not likely to require any changes.


### Fixes
Expand Down
2 changes: 1 addition & 1 deletion ext/c4core
1 change: 1 addition & 0 deletions src/c4/yml/node_type.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ RYML_EXPORT type_bits scalar_style_choose(csubstr scalar) noexcept;
/** query whether a scalar can be encoded using single quotes. This is
* possible, notably when there is leading whitespace after a newline. */
RYML_EXPORT bool scalar_style_query_squo(csubstr s) noexcept;

/** query whether a scalar can be encoded using plain style (no
* quotes, not a literal/folded block scalar). */
RYML_EXPORT bool scalar_style_query_plain(csubstr s) noexcept;
Expand Down
46 changes: 10 additions & 36 deletions src/c4/yml/parse_engine.def.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2874,42 +2874,24 @@ void ParseEngine<EventHandler>::_filter_block_folded_newlines(FilterProcessor &C
_RYML_CB_ASSERT(this->callbacks(), proc.curr() == '\n');
size_t num_newl = 0;
size_t wpos_at_first_newl = npos;
#if 1
static uint8_t trace[64];
static uint32_t tracewpos[C4_COUNTOF(trace)];
uint32_t tracepos = 0;
#define _trace(id) do { RYML_ASSERT(tracepos < sizeof(trace)); trace[tracepos] = id; tracewpos[tracepos] = (uint32_t)proc.wpos; ++tracepos; } while(0)
#define _traceshow() do { for(uint32_t i = 0; i < tracepos; ++i) printf("trace[%u/%u]=%u @%u\n", (uint32_t)i, (uint32_t)tracepos, (uint32_t)trace[i], (uint32_t)tracewpos[i]); } while(0)
#else
#define _trace(id)
#define _traceshow()
#endif
_trace(0);
while(proc.has_more_chars(len))
{
_trace(1);
const char curr = proc.curr();
//_c4dbgfbf("'{}' sofar=[{}]~~~{}~~~", _c4prc(curr), proc.wpos, proc.sofar());
_c4dbgfbf("'{}' sofar=[{}]~~~{}~~~", _c4prc(curr), proc.wpos, proc.sofar());
switch(curr)
{
case '\n':
_trace(2);
//_c4dbgfbf("newline. sofar={}", num_newl);
_c4dbgfbf("newline. sofar={}", num_newl);
switch(++num_newl)
{
case 1u:
_trace(21);
//_c4dbgfbf("... this is the first newline. turn into space. wpos={}", proc.wpos);
_c4dbgfbf("... this is the first newline. turn into space. wpos={}", proc.wpos);
wpos_at_first_newl = proc.wpos;
_trace(210);
proc.skip();
_trace(211);
proc.set(' ');
_trace(212);
break;
case 2u:
_trace(22);
//_c4dbgfbf("... this is the second newline. prev space (at wpos={}) must be newline", wpos_at_first_newl);
_c4dbgfbf("... this is the second newline. prev space (at wpos={}) must be newline", wpos_at_first_newl);
_RYML_CB_ASSERT(this->callbacks(), wpos_at_first_newl != npos);
_RYML_CB_ASSERT(this->callbacks(), proc.sofar()[wpos_at_first_newl] == ' ');
_RYML_CB_ASSERT(this->callbacks(), wpos_at_first_newl + 1u == proc.wpos);
Expand All @@ -2918,31 +2900,28 @@ _trace(22);
_RYML_CB_ASSERT(this->callbacks(), proc.sofar()[wpos_at_first_newl] == '\n');
break;
default:
_trace(23);
//_c4dbgfbf("... subsequent newline (num_newl={}). copy", num_newl);
_c4dbgfbf("... subsequent newline (num_newl={}). copy", num_newl);
proc.copy();
break;
}
_trace(24);
_filter_block_indentation(proc, indentation);
break;
case ' ':
case '\t':
{
_trace(3);
size_t first = proc.rem().first_not_of(" \t");
//_c4dbgfbf("space. first={}", first);
_c4dbgfbf("space. first={}", first);
if(first == npos)
first = proc.rem().len;
//_c4dbgfbf("... indentation increased to {}", first);
_c4dbgfbf("... indentation increased to {}", first);
if(num_newl)
{
//_c4dbgfbf("... prev space (at wpos={}) must be newline", wpos_at_first_newl);
_c4dbgfbf("... prev space (at wpos={}) must be newline", wpos_at_first_newl);
proc.set_at(wpos_at_first_newl, '\n');
}
if(num_newl > 1u)
{
//_c4dbgfbf("... add missing newline", wpos_at_first_newl);
_c4dbgfbf("... add missing newline", wpos_at_first_newl);
proc.set('\n');
}
_filter_block_folded_indented_block(proc, indentation, len, first);
Expand All @@ -2951,18 +2930,13 @@ _trace(3);
break;
}
case '\r':
_trace(4);
proc.skip();
break;
default:
_trace(5);
_traceshow();
//_c4dbgfbf("not space, not newline. stop.", 0);
_c4dbgfbf("not space, not newline. stop.", 0);
return;
}
_trace(6);
}
_traceshow();
}


Expand Down
27 changes: 18 additions & 9 deletions src/c4/yml/tree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -402,16 +402,22 @@ size_t Tree::_claim()
C4_SUPPRESS_WARNING_GCC_PUSH
C4_SUPPRESS_WARNING_CLANG_PUSH
C4_SUPPRESS_WARNING_CLANG("-Wnull-dereference")
#if defined(__GNUC__) && (__GNUC__ >= 6)
#if defined(__GNUC__)
#if (__GNUC__ >= 6)
C4_SUPPRESS_WARNING_GCC("-Wnull-dereference")
#endif
#if (__GNUC__ > 9)
C4_SUPPRESS_WARNING_GCC("-Wanalyzer-fd-leak")
#endif
#endif

void Tree::_set_hierarchy(size_t ichild, size_t iparent, size_t iprev_sibling)
{
_RYML_CB_ASSERT(m_callbacks, ichild >= 0 && ichild < m_cap);
_RYML_CB_ASSERT(m_callbacks, iparent == NONE || (iparent >= 0 && iparent < m_cap));
_RYML_CB_ASSERT(m_callbacks, iprev_sibling == NONE || (iprev_sibling >= 0 && iprev_sibling < m_cap));

NodeData *C4_RESTRICT child = get(ichild);
NodeData *C4_RESTRICT child = _p(ichild);

child->m_parent = iparent;
child->m_prev_sibling = NONE;
Expand Down Expand Up @@ -501,13 +507,6 @@ void Tree::_rem_hierarchy(size_t i)
}
}

//-----------------------------------------------------------------------------
void Tree::reorder()
{
size_t r = root_id();
_do_reorder(&r, 0);
}

//-----------------------------------------------------------------------------
size_t Tree::_do_reorder(size_t *node, size_t count)
{
Expand All @@ -529,6 +528,13 @@ size_t Tree::_do_reorder(size_t *node, size_t count)
return count;
}

void Tree::reorder()

Check warning on line 531 in src/c4/yml/tree.cpp

View check run for this annotation

Codecov / codecov/patch

src/c4/yml/tree.cpp#L531

Added line #L531 was not covered by tests
{
size_t r = root_id();
_do_reorder(&r, 0);

Check warning on line 534 in src/c4/yml/tree.cpp

View check run for this annotation

Codecov / codecov/patch

src/c4/yml/tree.cpp#L533-L534

Added lines #L533 - L534 were not covered by tests
}


//-----------------------------------------------------------------------------
void Tree::_swap(size_t n_, size_t m_)
{
Expand Down Expand Up @@ -1151,6 +1157,9 @@ size_t Tree::child_pos(size_t node, size_t ch) const
# if __GNUC__ >= 6
# pragma GCC diagnostic ignored "-Wnull-dereference"
# endif
# if __GNUC__ > 9
# pragma GCC diagnostic ignored "-Wanalyzer-null-dereference"
# endif
#endif

size_t Tree::find_child(size_t node, csubstr const& name) const
Expand Down
5 changes: 5 additions & 0 deletions test/test_lib/test_case_node.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -229,10 +229,15 @@ struct TestCaseNode

void _set_parent()
{
C4_SUPPRESS_WARNING_GCC_PUSH
#if defined(__GNUC__) && __GNUC__ > 9
C4_SUPPRESS_WARNING_GCC("-Wanalyzer-possible-null-dereference")
#endif
for(auto &ch : children)
{
ch.parent = this;
}
C4_SUPPRESS_WARNING_GCC_POP
}

NodeType_e _guess() const;
Expand Down
15 changes: 8 additions & 7 deletions test/test_suite/test_suite_event_handler.hpp
Original file line number Diff line number Diff line change
@@ -1,27 +1,28 @@
#ifndef _C4_YML_EVENT_HANDLER_YAMLSTD_HPP_
#define _C4_YML_EVENT_HANDLER_YAMLSTD_HPP_

#ifdef RYML_SINGLE_HEADER
#include <ryml_all.hpp>
#else
#ifndef _C4_YML_DETAIL_STACK_HPP_
#include "c4/yml/detail/stack.hpp"
#endif

#ifndef _C4_YML_DETAIL_PARSER_DBG_HPP_
#include "c4/yml/detail/parser_dbg.hpp"
#endif

#ifndef _C4_YML_PARSER_STATE_HPP_
#include "c4/yml/parser_state.hpp"
#endif

#ifndef _C4_YML_TREE_HPP_
#include "c4/yml/tree.hpp"
#endif


#include <vector>
#include <string>
#ifndef _C4_YML_STD_STRING_HPP_
#include "c4/yml/std/string.hpp"
#endif
#include "c4/yml/detail/print.hpp"
#endif
#include <vector>


namespace c4 {
namespace yml {
Expand Down
4 changes: 2 additions & 2 deletions test/test_suite/test_suite_events_emitter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ void EventsEmitter::emit_events(size_t node)
}
else if(m_tree->is_map(node))
{
pr((m_tree->type(node) & _WIP_CONTAINER_STYLE_FLOW) ? "+MAP {}" : "+MAP");
pr((m_tree->type(node) & _WIP_CONTAINER_STYLE_FLOW) ? csubstr("+MAP {}") : csubstr("+MAP"));
emit_val_anchor_tag(node);
pr('\n');
for(size_t child = m_tree->first_child(node); child != NONE; child = m_tree->next_sibling(child))
Expand All @@ -267,7 +267,7 @@ void EventsEmitter::emit_events(size_t node)
}
else if(m_tree->is_seq(node))
{
pr((m_tree->type(node) & _WIP_CONTAINER_STYLE_FLOW) ? "+SEQ []" : "+SEQ");
pr((m_tree->type(node) & _WIP_CONTAINER_STYLE_FLOW) ? csubstr("+SEQ []") : csubstr("+SEQ"));
emit_val_anchor_tag(node);
pr('\n');
for(size_t child = m_tree->first_child(node); child != NONE; child = m_tree->next_sibling(child))
Expand Down

0 comments on commit f511768

Please sign in to comment.