- Transaction
- A transaction is a way for an application to group several reads and writes together into a logical unit.
- all the reads and writes in a transaction are executed as one operation: either the entire transaction succeeds (commit) or it fails (abort, rollback).
-
Atomicity
- describes what happens if a client wants to make several writes, but a fault occurs after some of the writes have been processed (gives all-or-nothing guarantee)
- if the writes are grouped together into an atomic transaction, and the transaction cannot be completed (committed) due to a fault, then the transaction is aborted and the database must discard or undo any writes it has made so far in that transaction.
- Without atomicity, if an error occurs, it’s difficult to know which changes have taken effect and which haven’t.
- describes what happens if a client wants to make several writes, but a fault occurs after some of the writes have been processed (gives all-or-nothing guarantee)
-
Consistency
- the idea is that you have certain statements about your data (invariants) that must always be true
- However, this idea of consistency depends on the application’s notion of invariants (This is not something that the database can guarantee)
-
Isolation
- Most databases are accessed by several clients at the same time, isolation is used to handle concurrency problems (race conditions)
- it means that concurrently executing transactions are isolated from each other: they cannot step on each other’s toes
- In theory textbooks formalize isolation as serializability, which mean transactions happen one by one
- In practice, serializable isolation is rarely used, because it carries a performance penalty
-
Durability
- ensures that once a transaction has committed successfully, any data it has written will not be forgotten (regardless of failures and crashes).
- In a single-node database, durability typically means that the data has been written to nonvolatile storage such as a hard drive or SSD.
- It usually has a write-ahead log (or similar) for recovery
- In a replicated database, durability may mean that the data has been successfully copied to some number of nodes.
- a database must wait until these writes or replications are complete before reporting a transaction as successfully committed
- Replication durability cases:
- if write to disk and machine dies, data is inaccessible until you fix it, but replicated system remain available
- If a node crashes on a particular input, all replicas can be down, in memory data will be lost
- In an asynchronously replicated system, recent writes may be lost when the leader becomes unavailable
- Hardware disks may not be reliable (SSD due to power cut, firmware issues, etc)
- File maybe corrupted after a crash due to software (file system, storage engine, etc) bugs
-
Read Committed:
- Two guarantees
- No dirty reads: When reading from the database, you will only see data that has been committed.
- dirty reads: read uncommitted data from other transaction
- No dirty writes: When writing to the database, you will only overwrite data that has been committed.
- dirty writes: later write overwrites an (earlier) uncommitted value
- No dirty reads: When reading from the database, you will only see data that has been committed.
- Implementation
- Row level locks (to prevent dirty write):
- acquire a lock on the project (when updating), hold lock until transaction is committed or aborted.
- Only one transaction can hold a lock for any given object, others must wait
- To prevent dirty read:
- only see the old value (not the new value that is being committing to database)
- Only when the new value is committed do transactions switch over to reading the new value.
- Typically read committed uses a separate snapshot for each query
- Row level locks (to prevent dirty write):
- Two guarantees
-
read skew (nonrepeatable read)
- when two transactions are processing into different database, one can read the database in an inconsistent state. But once the transaction is complete, the values will be consistent
- Read skew is considered acceptable under read committed isolation
- this may be a problem for database backups or Analytic queries and integrity checks
-
Snapshot isolation is the most common solution to read skew
- Transaction sees all the data that was committed in the database at the start of transaction
- each transaction sees only the old data from that particular point in time
- Implementation: multiversion concurrency control (MVCC)
- From a performance point of view, a key principle of snapshot isolation is readers never block writers, and writers never block readers
- database must potentially keep several different committed versions of an object
- Typically snapshot isolation uses the same snapshot for an entire transaction
- readers never block writers, and writers never block readers
-
MVCC implementation (for postgres)
- When a transaction is started, it is given a unique, always-increasing transaction ID (txid).
- Whenever a transaction writes anything to the database, the data it writes is tagged with the transaction ID of the writer.
- Each row in a table has a created_by field, containing the ID of the transaction that inserted this row into the table
- each row has a deleted_by field (initially empty)
- If a transaction deletes a row, row is soft deleted by marking the deleted_by field to the ID of the transaction that requested the deletion (row is not deleted from database)
- when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes the soft deleted rows
- An update is internally translated into a delete and a create.
- transaction IDs are used to decide which objects it can see and which are invisible for reads. Visibility rules for both creation and deletion:
- At the time when the reader’s transaction started, the transaction that created the object had already committed
- The object is not marked for deletion, or if it is, the transaction that requested deletion had not yet committed at the time when the reader’s transaction started.
- By never updating values in place but instead creating a new version every time a value is changed, the database can provide a consistent snapshot while incurring only a small overhead.
- Database index point to all versions of an object and filter out those invisible ones
-
To prevent Lost update (in a read-modify-write cycle): when two transactions do this concurrently, one of the modifications can be lost
- Atomic update operations: taking an exclusive lock on the object when it is read so that no other transaction can read it until the update has been applied (cursor stability).
- Explicit locking: explicitly lock objects that are going to be updated
- Automatically detecting lost updates: allow transactions to execute in parallel, abort transaction and force it to retry if lost updated is detected
- Compare-and-set: allow an update to happen only if the value has not changed since you last read it. If it changes, force retry.
- For replicated databases: allow concurrent writes to create several conflicting versions of a value (also known as siblings), and to use application code or special data structures to resolve and merge these versions after the fact.
-
Write Skew: if two transactions read the same objects, and then update some of those objects (different transactions may update different objects), then dirty write or lost update may happen
- Solution: explicitly lock the rows that the transaction depends on
- Lock the row for update
- Unique constraint for create
- Solution: explicitly lock the rows that the transaction depends on
-
phantom: a write in one transaction changes the result of a search query in another transaction (i.e. objects that do not yet exist in the database, but which might be added in the future)
- A serializable isolation level is much preferable in most cases
- materializing conflicts (last resort if no alternative is possible): it takes a phantom and turns it into a lock conflict on a concrete set of rows that exist in the database (i.e. pre-create all the rows for different combinations in database, and new inquires becomes checking the existing rows rather than create them)
-
Literally executing transactions in a serial order
- single-threaded execution
- stored procedure: the application submit the entire transaction code to the database ahead of time, and executing all transactions on a single thread (in-memory)
-
Two-phase locking (2PL)
-
pessimistic concurrency control mechanism: if anything may go wrong, it's better to wait until its safe to do anything (similar to mutual exclusion)
-
Concurrent transactions are allowed to read the same object when nobody is writing to it.
-
When a write happens (update/delete):
- the second transaction (can be read or write) must wait until first transaction (can be read or write) commits or aborts
- Reading an old version is forbidden
- writers don’t just block other writers; they also block readers and vice versa
-
Implementation: lock (in shared mode or exclusive mode) on each object in the database
- Read: acquire the lock in shared mode (allow several transactions to hold it), but transaction must wait for an exclusive lock
- Write: acquire the lock in exclusive mode (only 1 transaction can hold it, others must wait)
- First read then write: upgrade its shared lock to an exclusive lock
-
First phase: acquire lock; second phase: release lock.
-
Performance is poor, deadlock may happen
-
Predicate lock: belongs to all objects that match some search condition
- Read for some condition: acquire a shared-mode predicate lock on the conditions of the query, if another transaction has an exclusive lock on the objects matching the conditions, then the current transaction must wait
- Write: first check whether either the old or the new value matches any existing predicate lock, if there is matching the current transaction must wait
-
Index-range lock (next-key locking):
- Similar to predicate locks, but is based on indices of the rows for the search condition
- Better performance but may lock a bigger range of objects
- If there is no suitable index where a range lock can be attached, the database can fall back to a shared lock on the entire table
-
-
Optimistic concurrency control techniques such as Serializable Snapshot Isolation (SSI)
- optimistic concurrency control mechanism: instead of blocking if something may go wrong, transactions continue anyway
- all reads within a transaction are made from a consistent snapshot of the database
- when a transaction wants to commit, database checks whether isolation was violated (a query result might have changed), and if so the transaction is aborted and has to be retried
- By avoiding unnecessary aborts, SSI preserves snapshot isolation’s support for long-running reads from a consistent snapshot.
- How database checks whether isolation was violated
- Detecting stale MVCC reads (uncommitted write occurred before the read due to visibility rules)
- When the transaction wants to commit, the database checks whether any of the ignored writes have now been committed. If so, the transaction must be aborted.
- Detecting writes that affect prior reads (the write occurs after the read)
- Similar to the index-range locks, index is recorded for the transaction when reading; when writing, the index will be checked and notify the transaction to abort
- Detecting stale MVCC reads (uncommitted write occurred before the read due to visibility rules)
- Performance:
- one trade-off is the granularity at which transactions’ reads and writes are tracked
- Compared to two-phase locking, the big advantage is that one transaction doesn’t need to block waiting for locks held by another transaction.