diff --git a/designs/7.0/parser-api.md b/designs/7.0/parser-api.md index fc0c1740..c273ab8e 100644 --- a/designs/7.0/parser-api.md +++ b/designs/7.0/parser-api.md @@ -172,6 +172,8 @@ Parser objects also implement `IServiceProvider`, to allow them to expose arbitr > **Warning** The parser API is not exception-safe. If the `IParser.Run` method throws an exception, the `ParserState` held by the `ParserInputReader` will be left in an undefined state. The caller must ensure that the state is not used again afterwards. +> **Note** For compatibility with the streaming input parsing API described below, a parser should consume as much input as it can at each invocation, and invoking the parser with an input reader whose `IsFinalBlock` property is set to `true` should complete the operation by setting a result to the `ParserCompletionState` parameter. If a parser supports scenarios such as parsing one token at a time, it is recommended to be gated behind an option disabled by default. + ### Parsing streaming input The `IParser` interfaces themselves support parsing streaming input but the responsibility to manage the input buffers falls to the user. To make this easier, Farkle provides the `ParserStateContext` classes that greatly simplify parsing streaming input. diff --git a/designs/7.0/tokenizer-api.md b/designs/7.0/tokenizer-api.md index 6b3e07a7..99dba0c7 100644 --- a/designs/7.0/tokenizer-api.md +++ b/designs/7.0/tokenizer-api.md @@ -105,9 +105,11 @@ public static class TokenizerExtensions } ``` -The `SuspendTokenizer` extension methods implement the tokenizer suspension mechanism described above. A tokenizer that needs more input can call `inputReader.SuspendTokenizer(this)` before returning, and when parsing resumes the chain will continue from that tokenizer. The exact tokenizer instance passed to `SuspendTokenizer` does not matter; Farkle keeps track of the index of the running tokenizer in the chain. +There are two use cases for suspension. A tokenizer that needs more input can call `SuspendTokenizer` in `TryGetNextToken` and return `false`, and when parsing resumes the chain will continue from that tokenizer. -Besides a tokenizer, we can resume the tokenization process to an object of type `ITokenizerResumptionPoint`. This interface provides a very similar API to `Tokenizer`, but also accepts an argument of type `TArg`, giving some more flexibility to tokenizer authors. A tokenizer can implement this interface many times with different types for `TArg` to support different resumption points. Here's an example: +Alternatively a tokenizer that has found a token and wants to keep finding for potentially another token can call `SuspendTokenizer` in `TryGetNextToken` and return `true`. The tokenizer chain does not continue when a tokenizer finds a token either way, but when the tokenizer chain is invoked again it will not start over. + +The arguments to the `SuspendTokenizer` methods determine where the chain will continue from. Besides a `Tokenizer` we can resume the tokenization process to an object of type `ITokenizerResumptionPoint`. This interface provides a very similar API to `Tokenizer`, but also accepts an argument of type `TArg`, giving some more flexibility to tokenizer authors. A tokenizer can implement this interface many times with different types for `TArg` to support different resumption points. Here's an example: ```csharp public class MyTokenizer : Tokenizer, ITokenizerResumptionPoint, @@ -153,9 +155,11 @@ public class MyTokenizer : Tokenizer, ITokenizerResumptionPoint.WithTokenizer` is provided a `Tokenizer` that is not a chained one, it will automatically be wrapped in one. This will introduce one extra layer of indirection but will ensure that suspension always works. We could introduce an API to allow tokenizers to declare that they will never suspend and thus don't have to be wrapped (suspending them will have no effect). @@ -180,4 +184,4 @@ The advantage of this approach is that we can compose tokenizers in arbitrary wa ### More complex chaining -The "flat" chaining model described above is quite primitive. There were thoughts to support more complex chains, with components that act as "filters" where they can inspect and potentially change the result of a part of the chain. One use case would be to handle tokenizer failures, but the whole feature needs quite some thought and was postponed for a version after 7.0. +The "flat" chaining model described above is quite primitive. There were thoughts to support more complex chains, with components that act as "filters" where they can inspect and potentially change the result of a part of the chain. One use case would be to handle tokenizer failures, but the whole feature needs more thought and was postponed for a version after 7.0.