Replies: 2 comments 2 replies
-
Thanks! Despite the 1700 pages that describe just part of the base SQL language, it's amazing how much is left to the "up to the database implementation". Still, it is a strong foundation I'll be able to follow for a while yet.
What does a request mean in this case? It could be a single query or a single roundtrip that might contain multiple (read or write) queries? Would this be affected by
Isn't this the motivation for
This is a more interesting point of discussion in my opinion. Certainly one extreme is to fsync everything and always. However, even the file system and hardware itself might not give these guarantees as there as several layers between the software and physical storage that may cache/write later that could be influenced by a poorly timed power failure. Without getting into the weeds too much, I think there is a reasonable balance where we should trust the fsync (if it says it's been written) and treat all queries as atomic and requiring an fsync at the end always. That is, a modification (such as a Since there is no shared memory, hard flushing on basically all operations is required. Even an incomplete transaction would probably not be able to keep dirty pages beyond a single query for risk of page conflicts that might happen when they get written later. That being said, this allows the file system to be the database tuning. Modern file systems can treat an fsync like a memory update and as long as the machine can guarantee that page gets written eventually and all other processes see the same virtual page in memory, it can avoid writing to the medium until later. I suspect this is why the file system benchmarks are shown as being as fast (or even faster!) than the memory-only runs: https://github.com/elliotchance/vsql/blob/main/docs/benchmark.rst#results |
Beta Was this translation helpful? Give feedback.
-
A request is a collection of commands determined by the user as "one request". In its simpliest form (and accounting for the absolute vast majority) it's one transaction. But the user can determine that e.g. two consecutive commands can be treated as best effort and they should be understood as a special type of transaction wrapping the inner commands.
Yeah, in a sense it's a bit similar but still it's quite different 😉. Generally what I mean by best effort vs guaranteed is just about delay and trading it for fairness. Suppose I have 4 incoming transactions (each representing a separate request) on the server in an incoming buffer, all of them Now suppose the server will start executing first (best effort) transaction and during this execution finds out that the next transaction (guaranteed) would get delayed if it proceeded with the current (best effort) transaction. In that case it'd drop the best effort transaction (rolling it back) and continued only with guaranteed transactions while dropping & cancelling (i.e. notifying the client about aborted transaction) all best effort transactions on the way until it can again keep up with guaranteed transactions already accepted into the incoming buffer. Of course, if there are only best effort transactions in the buffer, then we'll proceed with processing the oldest ones and just drop one or two most recent ones from the buffer to make space for a newly incoming one (this artificially made space ensures guaranteed ones are at least partially accepted even under high load). Note this algorithm couldn't be implemented exactly this way to work efficiently. It's just to showcase the idea 😉. In practice you'd probably want to use two separate buffers (one for best effort and one for guaranteed requests) and then in a work-stealing fashion choose which one to process, etc.
Yep, these are all interesting problems. There are some tricks to force a sooner writing to disk. But this doesn't change anything on the fact, that it's system-specific how to determine whether Btw. the benchmarks you did are actually something more or less to expect - I'm pretty certain it's because you don't wait for |
Beta Was this translation helpful? Give feedback.
-
First I'd like to say I like the decision to strictly conform to a (modern) SQL standard. I think this might be one of the selling points - "it was tested against vsql" would have a huge meaning in the bright future I envision 😉.
Here I'd like to discuss probably the biggest PITA any DB has to clearly decide and define. It's durability guarantees, i.e. a "fault model". This also determines the maximum achievable performance of the DB due to physical limitations.
So I always envisioned the following:
to have a DB API requiring one to specify on per-request basis whether it shall be treated as best-effort (i.e. no guarantee but presumably faster and with lower delays) or as guaranteed operation. See e.g. my rant ACID & strict serializability versus best effort on a per request basis redis/redis#6200 (comment) .
Note that best effort might to lead incorrect data compared to the surrounding global context but it must not lead to internally incorrect data within a transaction - e.g. if in one transaction I fetch two different rows from two different tables it still must guarantee that the data (i.e. both rows) returned are exactly the data from the point in time when the transaction began (i.e. mutually correct).
Guarantee durability in the very maximum technically possible way POSIX allows us to and the knowledge of HW-originated errors allows us to. Namely the same fault model as TigerBeetle has. SQLite3 allows almost for what TigerBeetle does (e.g. commit both the parent directory structure as well as the DB file itself and then fsync) but there are still some rough corners.
The first step would be to focus on calling
fsync
and waiting for it to finish (see below) after each transaction which has the potential to modify data.Note also that Linux
fsync
is synchronous (and will stay so in the upcoming years) but on many other systems it's asynchronous with system-specific calls to wait for its return.What do you think about implementing both points (1) and (2)?
Beta Was this translation helpful? Give feedback.
All reactions