Table of Contents
This book provides a thorough introduction and discussion on transactions as used with Berkeley DB (DB). Both the base API as well as the Direct Persistence Layer API is used in this manual. It begins by offering a general overview to transactions, the guarantees they provide, and the general application infrastructure required to obtain full transactional protection for your data.
This book also provides detailed examples on how to write a transactional application. Both single threaded and multi-threaded (as well as multi-process applications) are discussed. A detailed description of various backup and recovery strategies is included in this manual, as is a discussion on performance considerations for your transactional application.
You should understand the concepts from the Getting Started with Berkeley DB guide before reading this book.
Transactions offer your application's data protection from application or system failures. That is, DB transactions offer your application full ACID support:
Atomicity
Multiple database operations are treated as a single unit of work. Once committed, all write operations performed under the protection of the transaction are saved to your databases. Further, in the event that you abort a transaction, all write operations performed during the transaction are discarded. In this event, your database is left in the state it was in before the transaction began, regardless of the number or type of write operations you may have performed during the course of the transaction.
Note that DB transactions can span one or more database handles.
Consistency
Your databases will never see a partially completed transaction. This is true even if your application fails while there are in-progress transactions. If the application or system fails, then either all of the database changes appear when the application next runs, or none of them appear.
In other words, whatever consistency requirements your application has will never be violated by DB. If, for example, your application requires every record to include an employee ID, and your code faithfully adds that ID to its database records, then DB will never violate that consistency requirement. The ID will remain in the database records until such a time as your application chooses to delete it.
Isolation
While a transaction is in progress, your databases will appear to the transaction as if there are no other operations occurring outside of the transaction. That is, operations wrapped inside a transaction will always have a clean and consistent view of your databases. They never have to see updates currently in progress under the protection of another transaction. Note, however, that isolation guarantees can be relaxed from the default setting. See Isolation for more information.
Durability
Once committed to your databases, your modifications will persist even in the event of an application or system failure. Note that like isolation, your durability guarantee can be relaxed. See Non-Durable Transactions for more information.
From time to time this manual mentions that transactions protect your data against 'system or application failure.' This is true up to a certain extent. However, not all failures are created equal and no data protection mechanism can protect you against every conceivable way a computing system can find to die.
Generally, when this book talks about protection against failures, it means that transactions offer protection against the likeliest culprits for system and application crashes. So long as your data modifications have been committed to disk, those modifications should persist even if your application or OS subsequently fails. And, even if the application or OS fails in the middle of a transaction commit (or abort), the data on disk should be either in a consistent state, or there should be enough data available to bring your databases into a consistent state (via a recovery procedure, for example). You may, however, lose whatever data you were committing at the time of the failure, but your databases will be otherwise unaffected.
Be aware that many disks have a disk write cache and on some systems it is enabled by default. This means that a transaction can have committed, and to your application the data may appear to reside on disk, but the data may in fact reside only in the write cache at that time. This means that if the disk write cache is enabled and there is no battery backup for it, data can be lost after an OS crash even when maximum durability mode is in use. For maximum durability, disable the disk write cache or use a disk write cache with a battery backup.
Of course, if your disk fails, then the transactional benefits described in this book are only as good as the backups you have taken. By spreading your data and log files across separate disks, you can minimize the risk of data loss due to a disk failure, but even in this case it is possible to conjure a scenario where even this protection is insufficient (a fire in the machine room, for example) and you must go to your backups for protection.
Finally, by following the programming examples shown in this book, you can write your code so as to protect your data in the event that your code crashes. However, no programming API can protect you against logic failures in your own code; transactions cannot protect you from simply writing the wrong thing to your databases.
In order to use transactions, your application has certain requirements beyond what is required of non-transactional protected applications. They are:
Environments.
Environments are optional for non-transactional applications, but they are required for transactional applications.
Environments are optional for non-transactional applications that use the base API, but they are required for transactional applications. (Of course, applications that use the DPL always require the DPL.)
Environment usage is described in detail in Transaction Basics.
Transaction subsystem.
In order to use transactions, you must explicitly enable the transactional subsystem for your application, and this must be done at the time that your environment is first created.
Logging subsystem.
The logging subsystem is required for recovery purposes, but its usage also means your application may require a little more administrative effort than it does when logging is not in use. See Managing DB Files for more information.
Transaction handles.
In order to obtain the atomicity guarantee offered by the transactional subsystem (that is, combine multiple operations in a single unit of work), your application must use transaction handles. These handles are obtained from your Environment objects. They should normally be short-lived, and their usage is reasonably simple. To complete a transaction and save the work it performed, you call its commit() method. To complete a transaction and discard its work, you call its abort() method.
In addition, it is possible to use auto commit if you want to transactional protect a single write operation. Auto commit allows a transaction to be used without obtaining an explicit transaction handle. See Auto Commit for information on how to use auto commit.
Entity Store
If you are using the DPL, then you must configure your entity stores for transactional support before opening them (that is, before obtaining a primary key from them for the first time).
Database open requirements.
In addition to using environments and initializing the correct subsystems, your application must transaction protect the database opens, and any secondary index associations, if subsequent operations on the databases are to be transaction protected. The database open and secondary index association are commonly transaction protected using auto commit.
Note that if you are using the DPL, you do not have to explicitly do anything to the underlying databases unless you want to modify their default behavior — such as the isolation level that they use, for example.
Deadlock detection.
Typically transactional applications use multiple threads of control when accessing the database. Any time multiple threads are used on a single resource, the potential for lock contention arises. In turn, lock contention can lead to deadlocks. See Locks, Blocks, and Deadlocks for more information.
Therefore, transactional applications must frequently include code for detecting and responding to deadlocks. Note that this requirement is not specific to transactions – you can certainly write concurrent non-transactional DB applications. Further, not every transactional application uses concurrency and so not every transactional application must manage deadlocks. Still, deadlock management is so frequently a characteristic of transactional applications that we discuss it in this book. See Concurrency for more information.
DB is designed to support multi-threaded and multi-process applications, but their usage means you must pay careful attention to issues of concurrency. Transactions help your application's concurrency by providing various levels of isolation for your threads of control. In addition, DB provides mechanisms that allow you to detect and respond to deadlocks (but strictly speaking, this is not limited to just transactional applications).
Isolation means that database modifications made by one transaction will not normally be seen by readers from another transaction until the first commits its changes. Different threads use different transaction handles, so this mechanism is normally used to provide isolation between database operations performed by different threads.
Note that DB supports different isolation levels. For example, you can configure your application to see uncommitted reads, which means that one transaction can see data that has been modified but not yet committed by another transaction. Doing this might mean your transaction reads data "dirtied" by another transaction, but which subsequently might change before that other transaction commits its changes. On the other hand, lowering your isolation requirements means that your application can experience improved throughput due to reduced lock contention.
For more information on concurrency, on managing isolation levels, and on deadlock detection, see Concurrency.