Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feedback] JSQLParser Version : 4.5, CCJSqlParserUtil.parseStatement(parser) method execution blocks Vert.x event loop #1981

Closed
singla-11 opened this issue Mar 27, 2024 · 20 comments

Comments

@singla-11
Copy link

singla-11 commented Mar 27, 2024

Recently we came across a bug where CCJSqlParserUtil.parse(query) returned null causing NullPointerException. Upon debugging, it was observed that query in question was a complex query and had nesting level > 10. We resolved by using below API instead.

  static Statement parseQuery(String query) throws JSQLParserException {
    CCJSqlParser parser =
        CCJSqlParserUtil.newParser(query).withAllowComplexParsing(true).withTimeOut(1000);
    return CCJSqlParserUtil.parseStatement(parser);
  }

I decided to contribute by throwing JSQLParserException with detail error message instead of returning null but ends up finding out that it's already taken care of in the latest version. During this process, it was revealed that parsing happens on separate thread created by ExecutorService.

    public static Statement parseStatement(CCJSqlParser parser) throws JSQLParserException {
        Statement statement = null;
        try {
            ExecutorService executorService = Executors.newSingleThreadExecutor();
            Future<Statement> future = executorService.submit(new Callable<Statement>() {
                @Override
                public Statement call() throws Exception {
                    return parser.Statement();
                }
            });
            executorService.shutdown();

            statement = future.get( parser.getConfiguration().getAsInteger(Feature.timeOut), TimeUnit.MILLISECONDS);
        } catch (TimeoutException ex) {
            parser.interrupted = true;
            throw new JSQLParserException("Time out occurred.", ex);
        } catch (Exception ex) {
            throw new JSQLParserException(ex);
        }
        return statement;
    }

Potential Bug

We use this library in our Vert.x project where CCJSqlParserUtil.parseStatement(parser) method is invoked on event loop. My question is why do we use ExecutorService to parse a query? Why can't the execution takes place on caller thread? Caller thread is anyhow blocked by Future.get() and there is an overhead involved of maintaining life cycle of ExecutorService and new spawned thread.

Even if there is a valid reason to use ExecutorService, why does API encapsulates this vital information instead of revealing it by returning Future<Statement> instead of Statement and client can choose to handle this in a way suited for application framework. For Vert.x projects it could cause huge performance downgrade as event loop will be blocked until Future execution completes.

Software Information:

  • JSqlParser version: 4.5
@singla-11 singla-11 changed the title [BUG] JSQLParser Version : 4.5, CCJSqlParserUtil.parseStatement(parser) method execution blocks Vert.x event loop [Feedback] JSQLParser Version : 4.5, CCJSqlParserUtil.parseStatement(parser) method execution blocks Vert.x event loop Mar 27, 2024
@singla-11 singla-11 changed the title [Feedback] JSQLParser Version : 4.5, CCJSqlParserUtil.parseStatement(parser) method execution blocks Vert.x event loop [Bug] JSQLParser Version : 4.5, CCJSqlParserUtil.parseStatement(parser) method execution blocks Vert.x event loop Mar 27, 2024
@singla-11 singla-11 changed the title [Bug] JSQLParser Version : 4.5, CCJSqlParserUtil.parseStatement(parser) method execution blocks Vert.x event loop [Feedback] JSQLParser Version : 4.5, CCJSqlParserUtil.parseStatement(parser) method execution blocks Vert.x event loop Mar 27, 2024
@manticore-projects
Copy link
Contributor

Greetings!

There are calls to parse and parseStatements which take an existing executor service as a parameter. Please use those.

You can also access the Parser directly and call parse or parseStatements without involving any executor service. But then you are without the hanging detection and some very complex queries can stall forever.

Please see: https://manticore-projects.com/JSQLParser/javadoc_snapshot.html#ccjsqlparserutil

@singla-11
Copy link
Author

singla-11 commented Mar 27, 2024

Thanks for the prompt and actionable response!

Hence, ExecutorService is used for hanging detection.

Caller thread is blocked even if I pass ExecutorService as param. is there any other benefit of using ExecutorService, if not then I will directly use CCJSqlParser.Statement() and implement hanging detection without blocking Vert.x event loop.

If possible, could you share some examples of very complex queries which stall forever so that I can test my implementation against those. Thanks.

@manticore-projects
Copy link
Contributor

Good Morning.

We use the Executor solely for detecting hanging queries, killing the Parser thread and retrying after in Complex mode. This is rather brute but also effective as there is no other way to kill the parser once its stuck.

Of course you can implement this differently. I would be most interested having a look on your alternative approach too:

public static Statement parseStatement(CCJSqlParser parser, ExecutorService executorService)
            throws JSQLParserException {
        Statement statement = null;
        Future<Statement> future = executorService.submit(new Callable<Statement>() {
            @Override
            public Statement call() throws ParseException {
                return parser.Statement();
            }
        });
        try {
            statement = future.get(parser.getConfiguration().getAsLong(Feature.timeOut),
                    TimeUnit.MILLISECONDS);
        } catch (TimeoutException ex) {
            parser.interrupted = true;
            future.cancel(true);
            throw new JSQLParserException("Time out occurred.", ex);
        } catch (Exception ex) {
            throw new JSQLParserException(ex);
        }
        return statement;
    }

@manticore-projects
Copy link
Contributor

manticore-projects commented Mar 27, 2024

If possible, could you share some examples of very complex queries which stall forever so that I can test my implementation against those. Thanks.

Please check out net.sf.jsqlparser.statement.select.NestedBracketsPerformanceTest which hosts a series of tests and samples related to this problem.

    // maxDepth = 10 collides with the Parser Timeout = 6 seconds

    @Test
    public void testRecursiveBracketExpressionIssue1019_2() throws JSQLParserException {
        doIncreaseOfParseTimeTesting("IF(1=1, $1, 2)", "1", 10);
    }
   
    // time 2993 for SELECT IF(1=1, IF(1=1, IF(1=1, IF(1=1, IF(1=1, IF(1=1, IF(1=1, IF(1=1, IF(1=1, 1, 2), 2), 2), 2), 2), 2), 2), 2), 2) FROM mytbl

@manticore-projects
Copy link
Contributor

I also feel the obligation to explain my thought process here:

a) we do know that 'hanging queries' are a problem with the Grammar but the nature of SQL itself and our approach of being RDBMS agnostic made it impossible to identify and fix this issue (especially when for 99% of the real life problems, it just works good enough)

b) we are working especially for the "edit the query in the editor" use-case where 1) the editor needs to wait for the parser to finish and 2) must not freeze.

The current implementation is robust in working around this particular use-case.
For a web application with shared, limited resources there may be better approaches and I will love to see and discuss your ideas.

@singla-11
Copy link
Author

Thanks! Really appreciate it.

For a web application with shared, limited resources there may be better approaches and I will love to see and discuss your ideas.

Sure, Will get back once I am done with implementation and would love to discuss and get your feedback on it.

@singla-11
Copy link
Author

singla-11 commented Mar 30, 2024

Greetings!

We use Vert.x with RxJava2 in our project. Below is one alternate way to execute blocking code in Vert.x so that caller thread (event loop) does not get blocked. Timeout on the operation is powered by RxJava2 API.

    Maybe<Statement> result =
        Vertx.currentContext()
            .owner()
            .rxExecuteBlocking(
                (Promise<Statement> promise) -> {
                  try {
                    CCJSqlParser parser = CCJSqlParserUtil.newParser(query).withAllowComplexParsing(true);
                    Statement statement = parser.Statement();
                    promise.complete(statement);
                  } catch (Exception ex) {
                    promise.fail(ex);
                  }
                })
            .timeout(1000, TimeUnit.MILLISECONDS);

Below are the advantages of below implementation in Vert.x:

  • Event loop is not blocked as per the Golden Rule
  • No need to manage lifecycle of an ExecutorService

Observations

In both implementations (ExecutorService and Vert.x+RxJava2), it is observed that TimeoutException is thrown once the time limit is breached but the thread executing the task is still Running and cannot be interrupted. Hence, if a query parsing takes 20 seconds, then the thread would not be able to return to pool before 20 seconds even if the response is no longer required due to occurrence of TimeoutException.

Could you throw some light on the some very complex queries can stall forever bit you mentioned. Does it mean that for some very complex queries, thread might never return back to the pool and is stuck forever? if yes, then maybe we can look into handing Thread Interrupts so that the thread can return back to the pool.

@manticore-projects
Copy link
Contributor

manticore-projects commented Mar 30, 2024

Good to hear from you. I am bit under stress right now, so please have a look at this example:

void testComplexParsingOnly() throws Exception {

It illustrates what happens when a query is parsed forever.
We kill the parser effectively by setting a static variable interrupted which is checked during any problematic loop:

| LOOKAHEAD( Function(), { !interrupted}) retval=Function() [ LOOKAHEAD(2) retval = AnalyticExpression( (Function) retval ) ]

@manticore-projects
Copy link
Contributor

Please have a look at fresh #1983 also.

@singla-11
Copy link
Author

singla-11 commented Mar 31, 2024

Greeting!

parser.interrupted = true worked and the parsing thread returned to the pool after throwing ParseException

  static Single<Statement> parseQuery(String query) {
    CCJSqlParser parser = CCJSqlParserUtil.newParser(query).withAllowComplexParsing(true);
    return Vertx.currentContext()
        .owner()
        .rxExecuteBlocking(
            (Promise<Statement> promise) -> {
              try {
                Statement statement = parser.Statement();
                promise.complete(statement);
              } catch (Exception ex) {
                promise.fail(ex);
              }
            })
        .toSingle()
        .timeout(1000, TimeUnit.MILLISECONDS)
        .onErrorResumeNext(
            ex -> {
              if (ex instanceof TimeoutException) {
                // interrupt parsing thread, results in ParseException
                parser.interrupted = true;
                return Single.error(new JSQLParserException("Query Parsing Timed Out", ex));
              } else {
                return Single.error(ex);
              }
            });
  }

Thanks for helping out promptly and clearing all the doubts regarding the implementation. The library is beneficial for a diversified set of use cases.

@manticore-projects
Copy link
Contributor

manticore-projects commented Mar 31, 2024

Glad to have been of assistance.

A few more tips:
a) some queries parse only with ComplexParsing=false, which is also the much faster mode because it assumes no deep nesting -- but some queries need ComplexParsing=true

b) so what we do is: 1st try to parse with ComplexParsing=false and a short time out and only when this fails retry with ComplexParsing=true and a long time out

c) timeout should be dynamic in relation to the actual length of the query. But I have not run good tests on that thought yet.

d) the main problem are the Semantic Lookaheads in the Grammar especially around Function and Cast Expression

I am aware that all of this sounds pretty rough but it is the most robust solution with a pinch of brute force I have been able to find so far. In practice it solves maybe 99.7% of my SQL needs and when it fails (like #1983) then the Query should be rewritten.

In fact we should advertise it as a Query Q/A tool :-D

@manticore-projects
Copy link
Contributor

  static Single<Statement> parseQuery(String query) {
    CCJSqlParser parser = CCJSqlParserUtil.newParser(query).withAllowComplexParsing(true);
    return Vertx.currentContext()
        .owner()
        .rxExecuteBlocking(
            (Promise<Statement> promise) -> {
              try {
                Statement statement = parser.Statement();
                promise.complete(statement);
              } catch (Exception ex) {
                promise.fail(ex);
              }
            })
        .toSingle()
        .timeout(1000, TimeUnit.MILLISECONDS)
        .onErrorResumeNext(
            ex -> {
              if (ex instanceof TimeoutException) {
                // interrupt parsing thread, results in ParseException
                parser.interrupted = true;
                return Single.error(new JSQLParserException("Query Parsing Timed Out", ex));
              } else {
                return Single.error(ex);
              }
            });
  }

@singla-11, sorry to ask: would you like to craft a PR implementing this with JDK board tools only, excluding any VERTx dependencies? I would love to offer such a set of functions for the Application Server Use Case.

@singla-11
Copy link
Author

singla-11 commented Mar 31, 2024

Where can I find some examples of queries that will only work with ComplexParsing=false? I did observe that ComplexParsing=true added a bit of latency (almost 2x) but I thought it's better than parsing twice, if there are queries which will only work with ComplexParsing=false then I would need to look into it.

In fact we should advertise it as a Query Q/A tool :-D

Totally agree. In fact we primarily use it for below use cases

  • Input query validation (Athena, Redshift, Spark SQL)
  • Extracting column names

@singla-11
Copy link
Author

singla-11 commented Mar 31, 2024

@singla-11, sorry to ask: would you like to craft a PR implementing this with JDK board tools only, excluding any VERTx dependencies? I would love to offer such a set of functions for the Application Server Use Case.

Sure, it would be my pleasure. Will craft a PR in couple of weeks.

@manticore-projects
Copy link
Contributor

Where can I find some examples of queries that will only work with ComplexParsing=false?

I am really sorry, I have not saved/documented such cases and do not recall.
I believe it was about the NestedBracketsPerformance tests, where you have many deep nested brackets around 1 single expression.

@singla-11
Copy link
Author

Thanks! Found below example

public void testIssue766() throws JSQLParserException {

Have modified code to incorporate this

  static Single<Statement> parseQuery(String query, boolean allowComplexParsing, long timeout) {
    CCJSqlParser parser =
        CCJSqlParserUtil.newParser(query).withAllowComplexParsing(allowComplexParsing);
    return Vertx.currentContext()
        .owner()
        .rxExecuteBlocking(
            (Promise<Statement> promise) -> {
              try {
                Statement statement = parser.Statement();
                promise.complete(statement);
              } catch (Exception ex) {
                promise.fail(ex);
              }
            })
        .toSingle()
        .timeout(timeout, TimeUnit.MILLISECONDS)
        .onErrorResumeNext(
            ex -> {
              if (ex instanceof TimeoutException) {
                // interrupt parsing thread, results in ParseException
                parser.interrupted = true;
                return Single.error(new JSQLParserException("Query Parsing Timed Out", ex));
              } else {
                return Single.error(ex);
              }
            });
  }

  static Single<Statement> parseQuery(String query) {
    return parseQuery(query, false, 500)
        .onErrorResumeNext(
            ex -> {
              log.error("Error parsing query. Retrying with allowComplexParsing: true", ex);
              return parseQuery(query, true, 1000);
            });
  }

@manticore-projects
Copy link
Contributor

This looks very reasonable, although I would still aim to a) make timeout a function of the length of the query and b) increase the multiplier between Simple and Complex.

If Simple fails, it fails very fast while Complex still can succeed after more than 1 minute (unfortunately).

@singla-11
Copy link
Author

singla-11 commented Mar 31, 2024

Agreed.

However, these timeouts are specific to our use case. Timeout has never been a concern for us (at least till now) as the input queries are not that complex. Also, total time cannot go beyond a certain threshold as this is used to serve an HTTP API and client will break connection if total time goes beyond a certain threshold (usually 2-3 seconds).

Our main concern was to unblock caller thread (Vert.x event loop) and parsing on Vert.x provided thread pool instead of maintaining lifecycle of an ExecutorService on every invocation and also making sure that parsing thread is not stuck indefinitely and returns back to the pool.

@manticore-projects
Copy link
Contributor

All good, sorry for over engineering!
Cheers and I look forward seeing more finding from you in the future.

@singla-11
Copy link
Author

singla-11 commented Mar 31, 2024

Cheers! Will get back with PR. We can mark this issue as resolved. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants