-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Expose non-io_buf API for fb_parser #4683
feat: Expose non-io_buf API for fb_parser #4683
Conversation
* also enables using the fb_parser_test prototype types in downstream projects
* Fix logic issues with detecting presence/absence of feature names and hashes in example data, leading to an access violation crash * Fix issues with properly parsing ExampleCollection buffers, around reporting doneness and advancing when parsing inner MultiExample objects. * Fix tests to include coverage for incoming flatbuffers with/without names/hashes * Fixes validation logic to avoid crashing when feature names/hashes are missing * Add read_span_flatbuffer tests
Parser logic needs to use VW::finish_example() rather than simple deletion to get rid of examples because the incoming example_factory may be grabbing examples from the example pool.
@@ -420,7 +420,7 @@ if(VW_FEAT_CSV) | |||
endif() | |||
|
|||
if(VW_FEAT_FLATBUFFERS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time to turn this on by default?
if (_active_multi_ex) { RETURN_IF_FAIL(parse_multi_example(all, examples[0], _multi_example_object, status)); } | ||
else if (_active_collection) { RETURN_IF_FAIL(process_collection_item(all, examples, status)); } | ||
#define RETURN_SUCCESS_FINISHED() \ | ||
return buffer_pointer ? VW::experimental::error_code::nothing_to_parse : VW::experimental::error_code::success; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time to remove experimental? It's been long enough.
bool features_have_names(const Namespace& ns) | ||
{ | ||
return flatbuffers::IsFieldPresent(&ns, Namespace::VT_FEATURE_NAMES) && (ns.feature_names()->size() != 0); | ||
// TODO: It is not clear what the right answer is when feature_values->size is 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the name collection is present at all and greater than 0 in length, it should match the value collection lenght for things to be rational
@@ -342,23 +445,13 @@ int parser::parse_namespaces(VW::workspace* all, example* ae, const Namespace* n | |||
// assuming the usecase that if feature names would exist, they would exist for all features in the namespace | |||
if (ns->feature_names()->size() != ns->feature_values()->size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implicit assumption is that there are either hashes or names and that that must match the values collection in size. An explicit comment here would be helpful for future us :).
|
||
auto feature_value_iter = (ns->feature_values())->begin(); | ||
const auto feature_value_iter_end = (ns->feature_values())->end(); | ||
|
||
bool has_hashes = features_have_hashes(*ns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much easier to read and reason about. Thanks!
int parser::parse_namespaces(VW::workspace* all, example* ae, const Namespace* ns, VW::experimental::api_status* status) | ||
{ | ||
#define RETURN_NS_PARSER_ERROR(status, error_code) \ | ||
if (_active_collection && _active_multi_ex) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what is happening in this state. A comment would be helpful.
// thus context.size() = sizeof(length) + length | ||
io_buf unused; | ||
|
||
// TODO: How do we report errors out of here? (This is a general API problem with the parsers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This is not great. Does this introduce exceptions to clientlib during parsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless given that we should get this feature out, probably ok to live with this and improve later if needed.
namespace parsers | ||
{ | ||
namespace flatbuffer | ||
{ | ||
int flatbuffer_to_examples(VW::workspace* all, io_buf& buf, VW::multi_ex& examples); | ||
bool read_span_flatbuffer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the new interface for clientlib?
Looks good to me. Added some suggestions for comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. (Left some suggestions for comments.)
This API will be used by consumers like RLClientLib to enable inputting FlatBuffer input.