In an Amazon Aurora database (DB) cluster, failover is a mechanism by which Aurora automatically repairs the DB cluster status when a primary DB instance becomes unavailable. It achieves this goal by electing an Aurora Reader to become the new primary DB instance, so that the DB cluster can provide maximum availability to a primary read-write DB instance. The AWS Advanced Python Driver uses the Failover Plugin to coordinate with this behavior in order to provide minimal downtime in the event of a DB instance failure.
The figure above provides a simplified overview of how the AWS Advanced Python Driver handles an Aurora failover encounter. Starting at the top of the diagram, an application uses the AWS Advanced Python Driver to get a logical connection to an Aurora database.
In this example, the application requests a connection using the Aurora DB cluster endpoint and is returned with a logical connection that is physically connected to the primary DB instance in the DB cluster, DB instance C. By design, details about which specific DB instance the physical connection is connected to have been abstracted away.
Over the course of the application's lifetime, it executes various statements against the logical connection. If DB instance C is stable and active, these statements succeed and the application continues as normal. If DB instance C experiences a failure, Aurora will initiate failover to promote a new primary DB instance. At the same time, the AWS Advanced Python Driver will intercept the related communication exception and kick off its own internal failover process.
If the primary DB instance has failed, the AWS Advanced Python Driver attempts to use its internal topology cache to temporarily connect to an active Aurora Reader. This Aurora Reader will be periodically queried for the DB cluster topology until the new primary DB instance is identified (DB instance A or B in this case). If the driver is unable to connect to an active Aurora Reader or the cluster is still being reconfigured, the driver will retry the connection until it is successful.
At this point, the Python Driver will connect to the new primary DB instance and return control to the application by raising a FailoverSuccessError so you can reconfigure the session state as needed. Although the DNS endpoint for the DB cluster might not yet resolve to the new primary DB instance, the AWS Advanced Python Driver has already discovered this new DB instance during its failover process, and will be directly connected to it when the application continues executing statements. In this way the AWS Advanced Python Driver provides a faster way to reconnect to a newly promoted DB instance, thus increasing the availability of the DB cluster.
The failover plugin will be loaded by default if the plugins
parameter is not specified. The failover plugin can also be explicitly loaded by adding the plugin code failover
to the plugins
parameter. After you load the plugin, the failover feature will be enabled by default and the enable_failover
parameter will be set to True.
Please refer to the failover configuration guide for tips to keep in mind when using the failover plugin.
In addition to the parameters that you can configure for the underlying driver, you can pass the following connection parameters to the AWS Advanced Python Driver to specify additional failover behavior.
Parameter | Value | Required | Description | Default Value |
---|---|---|---|---|
failover_mode |
String | No | Defines a mode for failover process. Failover process may prioritize hosts with different roles and connect to them. Possible values: - strict_writer - Failover process follows writer host and connects to a new writer when it changes.- reader_or_writer - During failover, the driver tries to connect to any available/accessible reader host. If no reader is available, the driver will connect to a writer host. This logic mimics the logic of the Aurora read-only cluster endpoint.- strict_reader - During failover, the driver tries to connect to any available reader host. If no reader is available, the driver raises an error. Reader failover to a writer host will only be allowed for single-host clusters. This logic mimics the logic of the Aurora read-only cluster endpoint.If this parameter is omitted, default value depends on connection URL. For Aurora read-only cluster endpoint, it's set to reader_or_writer . Otherwise, it's strict_writer . |
Default value depends on connection URL. For Aurora read-only cluster endpoint, it's set to reader_or_writer . Otherwise, it's strict_writer . |
cluster_instance_host_pattern |
String | If connecting using an IP address or custom domain URL: Yes Otherwise: No |
This parameter is not required unless connecting to an AWS RDS cluster via an IP address or custom domain URL. In those cases, this parameter specifies the cluster instance DNS pattern that will be used to build a complete instance endpoint. A "?" character in this pattern should be used as a placeholder for the DB instance identifiers of the instances in the cluster. See here for more information. Example: ?.my-domain.com , any-subdomain.?.my-domain.com:9999 Use case Example: If your cluster instance endpoints follow this pattern: instanceIdentifier1.customHost , instanceIdentifier2.customHost , etc. and you want your initial connection to be to customHost:1234 , then your connection parameters should look like this: host=customHost:1234 cluster_instance_host_pattern=?.customHost |
If the provided connection string is not an IP address or custom domain, the AWS Advanced Python Driver will automatically acquire the cluster instance host pattern from the customer-provided connection string. |
enable_failover |
Boolean | No | Set to True to enable the fast failover behavior offered by the AWS Advanced Python Driver. Set to False for simple database connections that do not require fast failover functionality. |
True |
failover_cluster_topology_refresh_rate_sec |
Integer | No | Cluster topology refresh rate in seconds during a writer failover process. During the writer failover process, cluster topology may be refreshed at a faster pace than normal to speed up discovery of the newly promoted writer. | 2 |
failover_reader_connect_timeout_sec |
Integer | No | Maximum allowed time in seconds to attempt to connect to a reader instance during a reader failover process. | 30 |
failover_timeout_sec |
Integer | No | Maximum allowed time in seconds to attempt reconnecting to a new writer or reader instance after a cluster failover is initiated. | 300 |
failover_writer_reconnect_interval_sec |
Integer | No | Interval of time in seconds to wait between attempts to reconnect to a failed writer during a writer failover process. | 2 |
When connecting to Aurora clusters, the cluster_instance_host_pattern
parameter is required if the connection string does not provide enough information about the database cluster domain name. If the Aurora cluster endpoint is used directly, the AWS Advanced Python Driver will recognize the standard Aurora domain name and can re-build a proper Aurora instance name when needed. In cases where the connection string uses an IP address, a custom domain name, or localhost, the driver won't know how to build a proper domain name for a database instance endpoint. For example, if a custom domain was being used and the cluster instance endpoints followed a pattern of instanceIdentifier1.customHost
, instanceIdentifier2.customHost
, etc., the driver would need to know how to construct the instance endpoints using the specified custom domain. Since there isn't enough information from the custom domain alone to create the instance endpoints, you should set the cluster_instance_host_pattern
to ?.customHost
, so that the connection parameters include host=customHost cluster_instance_host_pattern=?.customHost
. Refer to this diagram about AWS Advanced Python Driver behavior during failover for different connection URLs and more details and examples.
Errors | Is the connection valid? | Can the connection be reused? | Has the session state changed? | Does the session need to be reconfigured? | Does the last query need to be re-executed? | Does the transaction need to be restarted? |
---|---|---|---|---|---|---|
FailoverFailedError | No | No | N/A | N/A | Yes | Yes |
FailoverSuccessError | Yes | Yes | Yes | Yes | Yes | N/A |
TransactionResolutionUnknownError | Yes | Yes | Yes | Yes | Yes | Yes |
When the AWS Advanced Python Driver throws a FailoverFailedError
, the original connection has failed, and the AWS Advanced Python Driver tried to failover to a new instance, but was unable to. There are various reasons this may happen: no hosts were available, a network failure occurred, and so on. In this scenario, please wait until the server is up or other problems are solved. (Exception will be thrown.)
When the AWS Advanced Python Driver raises a FailoverSuccessError
, the original connection has failed while outside a transaction, and the AWS Advanced Python Driver successfully failed over to another available instance in the cluster. However, any session state configuration of the initial connection is now lost. In this scenario, you should:
- Reuse and reconfigure the original connection (e.g., reconfigure session state to be the same as the original connection).
- Recreate the
Cursor
object. - Repeat the query that was executed when the connection failed, and continue work as desired.
When the AWS Advanced Python Driver throws a TransactionResolutionUnknownError
, the original connection has failed within a transaction. In this scenario, the AWS Advanced Python Driver first attempts to rollback the transaction and then fails over to another available instance in the cluster. Note that the rollback might be unsuccessful as the initial connection may be broken at the time that the AWS Advanced Python Driver recognizes the problem. Note also that any session state configuration of the initial connection is now lost. In this scenario, you should:
- Reuse and reconfigure the original connection (e.g: reconfigure session state to be the same as the original connection).
- Recreate the
Cursor
object. - Restart the transaction and repeat all queries which were executed during the transaction before the connection failed.
- Repeat the query that was executed when the connection failed and continue work as desired.
PostgreSQL Failover Sample Code
MySQL Failover Sample Code
Warning
Warnings About Proper Usage of the AWS Advanced Python Driver
- A common practice when using Python drivers is to wrap invocations against a Connection object in a try-except block, and dispose of the Connection object if an Exception is hit. If this practice is left unaltered, the application will lose the fast-failover functionality offered by the AWS Advanced Python Driver. When failover occurs, the AWS Advanced Python Driver internally establishes a ready-to-use connection inside the original Connection object before throwing an exception to the user. If this Connection object is disposed of, the newly established connection will be thrown away. The correct practice is to check the exception type for failover errors and reuse the Connection object if the error type indicates successful failover. The PostgreSQL Failover Sample Code demonstrates this practice. See the section about Failover Errors for more details.
- We highly recommended that you use the cluster and read-only cluster endpoints instead of the direct instance endpoints of your Aurora cluster, unless you are confident in your application's use of instance endpoints. Although the AWS Advanced Python Driver will correctly failover to the new writer instance when using instance endpoints, use of these endpoints is discouraged because individual instances can spontaneously change reader/writer status when failover occurs. the AWS Advanced Python Driver will always connect directly to the instance specified if an instance endpoint is provided, so a write-safe connection cannot be assumed if the application uses instance endpoints.