Skip to content

Commit

Permalink
parser: Complete rewrite to LALR (#202)
Browse files Browse the repository at this point in the history
This replaces the existing Earley parser (which is O(n^3)) with a LALR parser
using Yacc which is an ideal O(n). Even with very short SQL statements, the
existing parser was _really_ slow, so I had to build a query cache as bandaid,
but that has also been removed now.

This refactoring was made possible by adapting yacc from a Go implementation
here: https://github.com/elliotchance/vyac. However, in keeping with the promise
of this repo being completely written in V, the source has been copied to this
repo.

Other notable and breaking changes:

1. Not sure how this worked before, but the query may not specify a catalog in
identity chains (for example, `catalog.schema.table`). The catalog must be set
using `SET CATALOG`.
2. Syntax error messages will be slightly different, but should be a little more
helpful.
3. There are some ambiguities with the SQL grammar, such as trying to decode
what `x IS NOT TRUE` means or differentiating between `COUNT(expr)` vs
`COUNT(*)` due to lookahead limitations. Some special tokens for combinations of
operators and keywords have had to be added for known edge cases, but there are
many remaining conflicts. I suspect these conflicts don't matter as ambiguous
paths should still yield valid results, so these warnings have to be ignored for
now.
4. Fixes a very minor bug where string literals in VALUES might be treated as
`VARCHAR` instead of `CHARACTER` in some cases.
5. Renamed "std_" files with their position number in the standard. This helps
for grouping similar sections and makes lookups easier.
  • Loading branch information
elliotchance authored Dec 27, 2024
1 parent 4727b8b commit c1def77
Show file tree
Hide file tree
Showing 177 changed files with 8,519 additions and 5,467 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,6 @@ vsql-server
grammar.bnf
vsql/grammar.v
scripts/generate-v-client-library-docs
y.output
vsql/y.v
vsql/y.y
23 changes: 11 additions & 12 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,16 @@ ready: grammar fmt snippets

# Binaries

bin/vsql: vsql/grammar.v
bin/vsql: vsql/y.v
mkdir -p bin
v $(BUILD_OPTIONS) $(PROD) cmd/vsql -o bin/vsql

bin/vsql.exe: vsql/grammar.v
bin/vsql.exe: vsql/y.v
mkdir -p bin
v -os windows $(BUILD_OPTIONS) $(PROD) cmd/vsql
mv cmd/vsql/vsql.exe bin/vsql.exe

oldv: vsql/grammar.v
oldv: vsql/y.v
ifdef OLDV
@mkdir -p /tmp/oldv/
@# VJOBS and VFLAGS needs to be provided for macOS. I'm not sure if they also
Expand All @@ -54,19 +54,18 @@ docs: snippets
clean-docs:
cd docs && make clean

# Grammar (BNF)
# Grammar

grammar.bnf:
grep "//~" -r vsql | cut -d~ -f2 > grammar.bnf
vsql/y.y:
python3 scripts/generate_grammar.py

vsql/grammar.v: grammar.bnf
python3 generate-grammar.py
v fmt -w vsql/grammar.v
vsql/y.v: vsql/y.y
v run scripts/vyacc.v -o vsql/y.v vsql/y.y

clean-grammar:
rm -f grammar.bnf vsql/grammar.v
rm -f vsql/y.v vsql/y.y

grammar: clean-grammar vsql/grammar.v
grammar: clean-grammar vsql/y.v

# Formatting

Expand Down Expand Up @@ -104,7 +103,7 @@ examples:
echo $$f; v run $$f || exit 1; \
done

examples/%: vsql/grammar.v
examples/%: vsql/y.v
v run examples/$*.v

# Benchmarking
Expand Down
4 changes: 2 additions & 2 deletions cmd/tests/catalogs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ echo 'INSERT INTO foo (bar) VALUES (123);' | $VSQL cli $VSQL1_FILE
echo 'CREATE TABLE foo (baz INT);' | $VSQL cli $VSQL2_FILE
echo 'INSERT INTO foo (baz) VALUES (456);' | $VSQL cli $VSQL2_FILE

echo 'SELECT * FROM "file1".public.foo;' | $VSQL cli $VSQL1_FILE $VSQL2_FILE > $TXT_FILE
echo 'SELECT * FROM "file2".public.foo;' | $VSQL cli $VSQL1_FILE $VSQL2_FILE >> $TXT_FILE
echo "SET CATALOG 'file1'; SELECT * FROM public.foo;" | $VSQL cli $VSQL1_FILE $VSQL2_FILE > $TXT_FILE
echo "SET CATALOG 'file2'; SELECT * FROM public.foo;" | $VSQL cli $VSQL1_FILE $VSQL2_FILE >> $TXT_FILE

grep -R "BAR: 123" $TXT_FILE
grep -R "BAZ: 456" $TXT_FILE
55 changes: 33 additions & 22 deletions cmd/vsql/cli.v
Original file line number Diff line number Diff line change
Expand Up @@ -32,39 +32,50 @@ fn cli_command(cmd cli.Command) ! {
print('vsql> ')
os.flush()

query := os.get_line()
raw_query := os.get_line()

// When running on Docker, ctrl+C doesn't always get passed through. Also,
// this provides another text based way to break out of the shell.
if query.trim_space() == 'exit' {
if raw_query.trim_space() == 'exit' {
break
}

if query != '' {
start := time.ticks()
db.clear_warnings()
result := db.query(query) or {
print_error('Error', err)
continue
}
if raw_query != '' {
// TODO: This is a very poor way to handle multiple queries.
for i, query in raw_query.split(';') {
if query.trim_space() == '' {
continue
}

for warning in db.warnings {
print_error('Warning', warning)
}
start := time.ticks()
db.clear_warnings()
result := db.query(query) or {
print_error('Error', err)
continue
}

mut total_rows := 0
for row in result {
for column in result.columns {
print('${column.name.sub_entity_name}: ${row.get_string(column.name.sub_entity_name)!} ')
for warning in db.warnings {
print_error('Warning', warning)
}
total_rows++
}

if total_rows > 0 {
println('')
}
mut total_rows := 0
for row in result {
for column in result.columns {
print('${column.name.sub_entity_name}: ${row.get_string(column.name.sub_entity_name)!} ')
}
total_rows++
}

if total_rows > 0 {
println('')
}

println('${total_rows} ${vsql.pluralize(total_rows, 'row')} (${time.ticks() - start} ms)')
println('${total_rows} ${vsql.pluralize(total_rows, 'row')} (${time.ticks() - start} ms)')

if i > 0 {
println('')
}
}
} else {
// This means there is no more input and should only occur when the
// commands are being few through a pipe like:
Expand Down
24 changes: 3 additions & 21 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,34 +52,16 @@ rebuild the entire docs you can use:
Parser & SQL Grammar
--------------------

To make changes to the SQL grammar you will need to modify the ``grammar.bnf``
file. These rules are partial or complete BNF rules from the
To make changes to the SQL grammar you will need to modify the ``*.y`` files.
These rules are partial or complete BNF rules from the
`2016 SQL standard <https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html>`_.

Within ``grammar.bnf`` you will see that some of the rules have a parser
function which is a name after ``->``. The actual parser function will have
``parse_`` prefix added. You can find all the existing parse functions in the
``parse.v`` file.

If a rule does not have a parse function (no ``->``) then the value will be
passed up the chain which is the desired behavior in most cases. However, be
careful if there are multiple terms, you will need to provide a parse function
to return the correct term.

Each of the rules can have an optional type described in ``/* */`` before
``::=``. Rules that do not have a type will be ignored as parameters for parse
functions. Otherwise, these types are used in the generated code to make sure
the correct types are passed into the parse functions.

After making changes to ``grammar.bnf`` you will need to run:
After making changes to grammar file(s) you will need to run:

.. code-block:: sh
make grammar
Now, when running `v test .` you may receive errors for missing ``parse_``
functions, you should implement those now.

Testing
-------

Expand Down
38 changes: 2 additions & 36 deletions docs/v-client-library-docs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,16 +141,6 @@ new_numeric_value expects a value to be valid and the size and scale are determi



fn new_query_cache
------------------


.. code-block:: v
pub fn new_query_cache() &QueryCache
Create a new query cache.

fn new_real_value
-----------------

Expand Down Expand Up @@ -393,17 +383,15 @@ struct Connection
catalogs map[string]&CatalogConnection
// funcs only needs to be initialized once on open()
funcs []Func
// query_cache is maintained over file reopens.
query_cache &QueryCache
// cast_rules are use for CAST() (see cast.v)
cast_rules map[string]CastFunc
// unary_operators and binary_operators are for operators (see operators.v)
unary_operators map[string]UnaryOperatorFunc
binary_operators map[string]BinaryOperatorFunc
// current_schema is where to search for unquailified table names. It will
// current_schema is where to search for unqualified table names. It will
// have an initial value of 'PUBLIC'.
current_schema string
// current_catalog (also known as the database). It will have an inital value
// current_catalog (also known as the database). It will have an initial value
// derived from the first database file loaded.
current_catalog string
pub mut:
Expand Down Expand Up @@ -431,14 +419,6 @@ struct ConnectionOptions
pub struct ConnectionOptions {
pub mut:
// query_cache contains the precompiled prepared statements that can be
// reused. This makes execution much faster as parsing the SQL is extremely
// expensive.
//
// By default each connection will be given its own query cache. However,
// you can safely share a single cache over multiple connections and you are
// encouraged to do so.
query_cache &QueryCache = unsafe { nil }
// Warning: This only works for :memory: databases. Configuring it for
// file-based databases will either be ignored or causes crashes.
page_size int
Expand Down Expand Up @@ -548,20 +528,6 @@ struct PreparedStmt
A prepared statement is compiled and validated, but not executed. It can then be executed with a set of host parameters to be substituted into the statement. Each invocation requires all host parameters to be passed in.

struct QueryCache
-----------------


.. code-block:: v
@[heap]
pub struct QueryCache {
mut:
stmts map[string]Stmt
}
A QueryCache improves the performance of parsing by caching previously cached statements. By default, a new QueryCache is created for each Connection. However, you can share a single QueryCache safely amung multiple connections for even better performance. See ConnectionOptions.

struct Result
-------------

Expand Down
Loading

0 comments on commit c1def77

Please sign in to comment.