Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest placeholder escape sequence support #282

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

John-Leitch
Copy link

@John-Leitch John-Leitch commented Feb 5, 2025

This PR adds proper placeholder escape sequence support to the manifest processing logic, allowing for the escaping of curly braces via doubling e.g. {{ or }}. The lack of support in the previous implementation was quite noticeable in the case of .NET format string placeholders e.g. {0}, which triggered unhandled exceptions during generation.

While the Microsoft documentation makes no mention of this form of escaping, this sequence is supported by Microsoft in the Azure Developer CLI (azd).

Care was taken to ensure backwards compatibility with the unresolved placeholder behaviors of the previous implementation, while also bringing the implementation closer to parity with Microsoft's. Several unit tests were added to cover all known edge cases in reference tokenization and unescaping.

@John-Leitch John-Leitch marked this pull request as ready for review February 6, 2025 06:15
Copy link

qodo-merge-pro bot commented Feb 6, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Error Handling

The tokenizer and unescape logic may not handle all edge cases properly. For example, nested placeholders or malformed escape sequences could cause unexpected behavior.

public static List<JsonInterpolationToken> Tokenize(ReadOnlySpan<char> input)
{
    var state = TokenizerState.InText;
    var currentTokenIndex = 0;

    // Estimate the number of tokens.
    var tokens = new List<JsonInterpolationToken>(input.Length / 2);

    void TryAddToken(
        ref ReadOnlySpan<char> input,
        JsonInterpolationTokenType tokenType,
        int length,
        int nextTokenIndex)
    {
        if (length != 0 || tokenType != JsonInterpolationTokenType.Text)
        {
            tokens.Add(
                new JsonInterpolationToken(
                    tokenType,
                    input.Slice(currentTokenIndex, length).ToString()));
        }

        currentTokenIndex = nextTokenIndex;
    }

    for (var i = 0; i < input.Length; i++)
    {
        var currentChar = input[i];

        switch (state)
        {
            case TokenizerState.InText:

                if (currentChar == '{')
                {
                    // We are in a potential placeholder token start.
                    state = TokenizerState.InPlaceholderStart;
                }

                break;

            case TokenizerState.InPlaceholderStart:

                if (currentChar == '{')
                {
                    // This curly brace is escaped, we're still in text.
                    state = TokenizerState.InText;
                }
                else
                {
                    // We are in a placeholder token, slice the previous text.
                    TryAddToken(
                        ref input,
                        JsonInterpolationTokenType.Text,
                        i - currentTokenIndex - 1,
                        i);

                    // Advance to toke in placeholder token state.
                    state = TokenizerState.InPlaceholder;
                }

                break;

            case TokenizerState.InPlaceholder:

                // We are going for parity with the regular expression used in the
                // previous implementation: ([\w\.-]+)
                if (currentChar is
                    (>= 'a' and <= 'z') or
                    (>= 'A' and <= 'Z') or
                    (>= '0' and <= '9') or
                    '.' or
                    '-')
                {
                    // We are still in the placeholder token.
                    continue;
                }
                else if (currentChar == '}')
                {
                    // Our placeholder token is complete.
                    TryAddToken(
                        ref input,
                        JsonInterpolationTokenType.Placeholder,
                        i - currentTokenIndex,
                        i + 1);

                    // The next token is probably text.
                    state = TokenizerState.InText;
                }
                else
                {
                    // Our placeholder token has unsupported characters. As per
                    // the original implementation, we are going to treat the
                    // current lexme as text.
                    state = TokenizerState.InText;
                }

                break;

            default:
                throw new NotSupportedException();
        }
    }

    if (currentTokenIndex != input.Length - 1)
    {
        // There is a dangling token. To ensure parity with the original
        // implementation, we're going to treat it as text.

        if (state == TokenizerState.InPlaceholder)
        {
            // We were actually in a placeholder token, so we need to move
            // back a single character to ensure we catch the opening brace.
            currentTokenIndex--;
        }

        TryAddToken(
            ref input,
            JsonInterpolationTokenType.Text,
            input.Length - currentTokenIndex,
            0);
    }

    return tokens;
}
Performance

The string builder allocation and multiple string operations in the placeholder replacement logic could be optimized for better performance with large JSON documents.

var tokens = JsonInterpolation.Tokenize(input);
var transformedInput = new StringBuilder();

foreach (var token in tokens)
{
    if (token.IsText())
    {
        transformedInput.Append(token.Lexeme);
    }
    else
    {
        var jsonPath = token.Lexeme;
        var pathParts = jsonPath.Split('.');

        void AppendPlaceholderTokenAsText()
        {
            transformedInput.Append('{');
            transformedInput.Append(jsonPath);
            transformedInput.Append('}');
        }

        if (pathParts.Length == 1)
        {
            var resolvedNode = rootNode[pathParts[0]];

            if (resolvedNode != null)
            {
                transformedInput.Append(resolvedNode.ToString());
            }
            else
            {
                AppendPlaceholderTokenAsText();
            }

            continue;
        }
        else if (pathParts is [_, Literals.Bindings, ..])
        {
            transformedInput.Append(bindingProcessor.ParseBinding(pathParts, rootNode));
            continue;
        }

        var selectionPath = pathParts.AsJsonPath();
        var path = JsonPath.Parse(selectionPath);
        var result = path.Evaluate(rootNode);

        if (result.Matches.Count == 0)
        {
            AppendPlaceholderTokenAsText();
            continue;
        }

        var value = result.Matches.FirstOrDefault()?.Value;

        if (value is null)
        {
            AppendPlaceholderTokenAsText();
            continue;
        }

        transformedInput.Append(value.ToString());
    }

Copy link

qodo-merge-pro bot commented Feb 6, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null parameter validation

Add null checks for input parameters in the Tokenize and Unescape methods to
prevent potential NullReferenceExceptions.

src/Aspirate.Processors/Transformation/Json/JsonInterpolation.cs [54-56]

-public static List<JsonInterpolationToken> Tokenize(string input) => Tokenize(input.AsSpan());
+public static List<JsonInterpolationToken> Tokenize(string input) => 
+    input is null ? new List<JsonInterpolationToken>() : Tokenize(input.AsSpan());
 
-public static string Unescape(string value)
+public static string Unescape(string value) => 
+    value is null ? string.Empty : UnescapeImpl(value);
  • Apply this suggestion
Suggestion importance[1-10]: 7

__

Why: Adding null checks for input parameters is important for preventing NullReferenceExceptions and making the code more robust. This is a defensive programming practice that improves code reliability.

Medium
Add thread safety protection

Add thread safety protection when processing JsonArray and JsonObject elements
since ToArray() creates a copy but subsequent modifications could cause race
conditions.

src/Aspirate.Processors/Transformation/Json/JsonInterpolationUnescapeProcessor.cs [38-41]

-foreach (var kvp in jsonObject.ToArray())
+lock(jsonObject)
 {
-    UnescapeJsonExpression(kvp.Value);
+    foreach (var kvp in jsonObject.ToArray())
+    {
+        UnescapeJsonExpression(kvp.Value);
+    }
 }
  • Apply this suggestion
Suggestion importance[1-10]: 3

__

Why: While thread safety is generally good practice, the context doesn't suggest this code needs to be thread-safe. The lock would add unnecessary overhead since JsonNode operations are typically not designed for concurrent modifications.

Low

@John-Leitch
Copy link
Author

Hey @prom3theu5, sorry to bother, but this one and #277 are blocking a project I'm on. Is there anything I can do to help move these forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant