[automatic failover] Implement max number of failover attempts#4293
Merged
atakavci merged 14 commits intoredis:feature/automatic-failover-2from Sep 26, 2025
Merged
Conversation
- add tests
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR implements a maximum number of failover attempts feature for the MultiCluster Redis client. When no healthy clusters are available, the system now tracks failover attempts and throws different exceptions based on whether the maximum attempt limit has been reached.
- Adds configurable maximum failover attempts and delay settings to MultiClusterClientConfig
- Introduces new exception types for temporary vs permanent unavailability scenarios
- Updates the failover logic to track attempt counts and implement delay-based retry windows
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| MultiClusterClientConfig.java | Adds configuration fields for max failover attempts and delay settings with defaults |
| MultiClusterPooledConnectionProvider.java | Implements failover attempt tracking with exception handling and delay logic |
| JedisFailoverException.java | Creates new exception hierarchy for temporary vs permanent failover failures |
| CircuitBreakerFailoverBase.java | Updates circuit breaker failover to use new provider method signature |
| MultiClusterFailoverAttemptsConfigTest.java | Comprehensive test coverage for the new failover attempt configuration |
| Various test files | Updates existing tests to use new method signature for iterateActiveCluster |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/main/java/redis/clients/jedis/mcf/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/test/java/redis/clients/jedis/mcf/MultiClusterFailoverAttemptsConfigTest.java
Show resolved
Hide resolved
ggivo
reviewed
Sep 25, 2025
src/main/java/redis/clients/jedis/mcf/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/MultiClusterPooledConnectionProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/redis/clients/jedis/mcf/CircuitBreakerFailoverBase.java
Outdated
Show resolved
Hide resolved
src/test/java/redis/clients/jedis/mcf/MultiClusterPooledConnectionProviderTest.java
Show resolved
Hide resolved
ggivo
reviewed
Sep 26, 2025
Collaborator
ggivo
left a comment
There was a problem hiding this comment.
LGTM.
need to fix the test
src/main/java/redis/clients/jedis/mcf/MultiClusterPooledConnectionProvider.java
Outdated
Show resolved
Hide resolved
src/test/java/redis/clients/jedis/mcf/MultiClusterPooledConnectionProviderTest.java
Outdated
Show resolved
Hide resolved
ggivo
approved these changes
Sep 26, 2025
8f2f195
into
redis:feature/automatic-failover-2
10 checks passed
ggivo
added a commit
that referenced
this pull request
Sep 30, 2025
…#4297) * [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270) - remove the check for number of waitiers in TrackingConnectionPool * [automatic failover] Configure max total connections for EchoStrategy (#4268) - set maxtotal connections for echoStrategy * [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275) * - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover - improve thread safety with provider initialization * - formatting * [automatic failover] Minor optimizations on fast failover (#4277) * - minor optimizations on fail fast * - volatile failfast * [automatic failover] Implement health check retries (#4273) * - replace minConsecutiveSuccessCount with numberOfRetries - add retries into healtCheckImpl - apply changes to strategy implementations config classes - fix unit tests * - fix typo * - fix failing tests * - add tests for retry logic * - formatting * - format * - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies - new types probecontext, ProbePolicy, HealthProbeContext - add delayer executor pool to healthcheckımpl - adjustments on worker pool of healthCheckImpl for shared use of workers * - format * - expand comment with example case * - drop pooled executor for delays * - polish * - fix tests * - formatting * - checking failing tests * - fix test * - fix flaky tests * - fix flaky test * - add tests for builtin probing policies * - fix flaky test * [automatic failover] Move failover provider to mcf (#4294) * - move failover provider to mcf * - make iterateActiveCluster package private * [automatic failover] Add SSL configuration support to LagAwareStrategy (#4291) * User-provided ssl config for lag-aware health check * ssl scenario test for lag-aware healthcheck * format * format * address review comments - use getters instead of fields * [automatic failover] Implement max number of failover attempts (#4293) * - implement max failover attempt - add tests * - fix user receive the intended exception * -clean+format * - java doc for exceptions * format * - more tests on excaption types in max failover attempts mechanism * format * fix failing timing in test * disable health checks * rename to switchToHealthyCluster * format --------- Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com>
atakavci
added a commit
that referenced
this pull request
Oct 3, 2025
…ure rate) capabililty to circuit breaker (#4295) * [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270) - remove the check for number of waitiers in TrackingConnectionPool * [automatic failover] Configure max total connections for EchoStrategy (#4268) - set maxtotal connections for echoStrategy * [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275) * - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover - improve thread safety with provider initialization * - formatting * [automatic failover] Minor optimizations on fast failover (#4277) * - minor optimizations on fail fast * - volatile failfast * [automatic failover] Implement health check retries (#4273) * - replace minConsecutiveSuccessCount with numberOfRetries - add retries into healtCheckImpl - apply changes to strategy implementations config classes - fix unit tests * - fix typo * - fix failing tests * - add tests for retry logic * - formatting * - format * - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies - new types probecontext, ProbePolicy, HealthProbeContext - add delayer executor pool to healthcheckımpl - adjustments on worker pool of healthCheckImpl for shared use of workers * - format * - expand comment with example case * - drop pooled executor for delays * - polish * - fix tests * - formatting * - checking failing tests * - fix test * - fix flaky tests * - fix flaky test * - add tests for builtin probing policies * - fix flaky test * [automatic failover] Move failover provider to mcf (#4294) * - move failover provider to mcf * - make iterateActiveCluster package private * [automatic failover] Add SSL configuration support to LagAwareStrategy (#4291) * User-provided ssl config for lag-aware health check * ssl scenario test for lag-aware healthcheck * format * format * address review comments - use getters instead of fields * [automatic failover] Implement max number of failover attempts (#4293) * - implement max failover attempt - add tests * - fix user receive the intended exception * -clean+format * - java doc for exceptions * format * - more tests on excaption types in max failover attempts mechanism * format * fix failing timing in test * disable health checks * rename to switchToHealthyCluster * format * - Add dual-threshold (min failures + failure rate) failover to circuit breaker executor - Map config to resilience4j via CircuitBreakerThresholdsAdapter - clean up/simplfy config: drop slow-call and window type - Add thresholdMinNumOfFailures; update some of the defaults - Update provider to use thresholds adapter - Update docs; align examples with new defaults - Add tests for 0% rate, edge thresholds * polish * Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * - fix typo * - fix min total calls calculation * format * - merge issues fixed * fix javadoc ref * - move threshold evaluations to failoverbase - simplfy executer and cbfailoverconnprovider - adjust config getters - fix failing tests due to COUNT_BASED -> TIME_BASED - new tests for thresholds calculations and impact on circuit state transitions * - avoid facilitating actual CBConfig type in tests * Update src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Trigger workflows * - evaluate only in failure recorded and failover immediately - add more test on threshold calculations - enable command line arg for overwriting surefire.excludedGroups * format * check pom * - fix error prone test * [automatic failover] Set and test default values for failover config&components (#4298) * - set & test default values * - format * - fix tests failing due to changing defaults * - fix flaky test * - remove unnecessary checks for failover attempt * - clean and trim adapter class - add docs and more explanantion * fix javadoc issue * - switch to all_succes to fix flaky timing * - fix issue in CircuitBreakerFailoverConnectionProvider * introduce ReflectionTestUtil --------- Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
atakavci
added a commit
that referenced
this pull request
Oct 6, 2025
…4306) * [automatic failover] Set and test default values for failover config&components (#4298) * - set & test default values * - format * - fix tests failing due to changing defaults * [automatic failover] Add dual thresholds (min num of failures + failure rate) capabililty to circuit breaker (#4295) * [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270) - remove the check for number of waitiers in TrackingConnectionPool * [automatic failover] Configure max total connections for EchoStrategy (#4268) - set maxtotal connections for echoStrategy * [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275) * - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover - improve thread safety with provider initialization * - formatting * [automatic failover] Minor optimizations on fast failover (#4277) * - minor optimizations on fail fast * - volatile failfast * [automatic failover] Implement health check retries (#4273) * - replace minConsecutiveSuccessCount with numberOfRetries - add retries into healtCheckImpl - apply changes to strategy implementations config classes - fix unit tests * - fix typo * - fix failing tests * - add tests for retry logic * - formatting * - format * - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies - new types probecontext, ProbePolicy, HealthProbeContext - add delayer executor pool to healthcheckımpl - adjustments on worker pool of healthCheckImpl for shared use of workers * - format * - expand comment with example case * - drop pooled executor for delays * - polish * - fix tests * - formatting * - checking failing tests * - fix test * - fix flaky tests * - fix flaky test * - add tests for builtin probing policies * - fix flaky test * [automatic failover] Move failover provider to mcf (#4294) * - move failover provider to mcf * - make iterateActiveCluster package private * [automatic failover] Add SSL configuration support to LagAwareStrategy (#4291) * User-provided ssl config for lag-aware health check * ssl scenario test for lag-aware healthcheck * format * format * address review comments - use getters instead of fields * [automatic failover] Implement max number of failover attempts (#4293) * - implement max failover attempt - add tests * - fix user receive the intended exception * -clean+format * - java doc for exceptions * format * - more tests on excaption types in max failover attempts mechanism * format * fix failing timing in test * disable health checks * rename to switchToHealthyCluster * format * - Add dual-threshold (min failures + failure rate) failover to circuit breaker executor - Map config to resilience4j via CircuitBreakerThresholdsAdapter - clean up/simplfy config: drop slow-call and window type - Add thresholdMinNumOfFailures; update some of the defaults - Update provider to use thresholds adapter - Update docs; align examples with new defaults - Add tests for 0% rate, edge thresholds * polish * Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * - fix typo * - fix min total calls calculation * format * - merge issues fixed * fix javadoc ref * - move threshold evaluations to failoverbase - simplfy executer and cbfailoverconnprovider - adjust config getters - fix failing tests due to COUNT_BASED -> TIME_BASED - new tests for thresholds calculations and impact on circuit state transitions * - avoid facilitating actual CBConfig type in tests * Update src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Trigger workflows * - evaluate only in failure recorded and failover immediately - add more test on threshold calculations - enable command line arg for overwriting surefire.excludedGroups * format * check pom * - fix error prone test * [automatic failover] Set and test default values for failover config&components (#4298) * - set & test default values * - format * - fix tests failing due to changing defaults * - fix flaky test * - remove unnecessary checks for failover attempt * - clean and trim adapter class - add docs and more explanantion * fix javadoc issue * - switch to all_succes to fix flaky timing * - fix issue in CircuitBreakerFailoverConnectionProvider * introduce ReflectionTestUtil --------- Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [automatic failover] feat: Add MultiDbClient with multi-endpoint failover and circuit breaker support (#4300) * feat: introduce ResilientRedisClient with multi-endpoint failover support Add ResilientRedisClient extending UnifiedJedis with automatic failover capabilities across multiple weighted Redis endpoints. Includes circuit breaker pattern, health monitoring, and configurable retry logic for high-availability Redis deployments. * format * mark ResilientRedisClientTest as integration one * fix test - make sure endpoint is healthy before activating it * Rename ResilientClient to align with design - ResilientClient -> MultiDbClient (builder, tests, etc) * Rename setActiveEndpoint to setActiveDatabaseEndpoint * Rename clusterSwitchListener to databaseSwitchListener * Rename multiClusterConfig to multiDbConfig * fix api doc's error * fix compilation error after rebase * format * fix example in javadoc * Update ActiveActiveFailoverTest scenariou test to use builder's # Conflicts: # src/test/java/redis/clients/jedis/scenario/ActiveActiveFailoverTest.java * rename setActiveDatabaseEndpoint -. setActiveDatabase * is healthy throw exception if cluster does not exists * format * [automatic failover]Use Endpoint interface instead HostAndPort in multi db (#4302) [clean up] Use Endpoint interface where possible * - fix variable name type * fix typo in variable name * - fix flaky test --------- Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
ggivo
added a commit
that referenced
this pull request
Oct 6, 2025
…ure rate) capabililty to circuit breaker (#4295) * [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270) - remove the check for number of waitiers in TrackingConnectionPool * [automatic failover] Configure max total connections for EchoStrategy (#4268) - set maxtotal connections for echoStrategy * [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275) * - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover - improve thread safety with provider initialization * - formatting * [automatic failover] Minor optimizations on fast failover (#4277) * - minor optimizations on fail fast * - volatile failfast * [automatic failover] Implement health check retries (#4273) * - replace minConsecutiveSuccessCount with numberOfRetries - add retries into healtCheckImpl - apply changes to strategy implementations config classes - fix unit tests * - fix typo * - fix failing tests * - add tests for retry logic * - formatting * - format * - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies - new types probecontext, ProbePolicy, HealthProbeContext - add delayer executor pool to healthcheckımpl - adjustments on worker pool of healthCheckImpl for shared use of workers * - format * - expand comment with example case * - drop pooled executor for delays * - polish * - fix tests * - formatting * - checking failing tests * - fix test * - fix flaky tests * - fix flaky test * - add tests for builtin probing policies * - fix flaky test * [automatic failover] Move failover provider to mcf (#4294) * - move failover provider to mcf * - make iterateActiveCluster package private * [automatic failover] Add SSL configuration support to LagAwareStrategy (#4291) * User-provided ssl config for lag-aware health check * ssl scenario test for lag-aware healthcheck * format * format * address review comments - use getters instead of fields * [automatic failover] Implement max number of failover attempts (#4293) * - implement max failover attempt - add tests * - fix user receive the intended exception * -clean+format * - java doc for exceptions * format * - more tests on excaption types in max failover attempts mechanism * format * fix failing timing in test * disable health checks * rename to switchToHealthyCluster * format * - Add dual-threshold (min failures + failure rate) failover to circuit breaker executor - Map config to resilience4j via CircuitBreakerThresholdsAdapter - clean up/simplfy config: drop slow-call and window type - Add thresholdMinNumOfFailures; update some of the defaults - Update provider to use thresholds adapter - Update docs; align examples with new defaults - Add tests for 0% rate, edge thresholds * polish * Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * - fix typo * - fix min total calls calculation * format * - merge issues fixed * fix javadoc ref * - move threshold evaluations to failoverbase - simplfy executer and cbfailoverconnprovider - adjust config getters - fix failing tests due to COUNT_BASED -> TIME_BASED - new tests for thresholds calculations and impact on circuit state transitions * - avoid facilitating actual CBConfig type in tests * Update src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Trigger workflows * - evaluate only in failure recorded and failover immediately - add more test on threshold calculations - enable command line arg for overwriting surefire.excludedGroups * format * check pom * - fix error prone test * [automatic failover] Set and test default values for failover config&components (#4298) * - set & test default values * - format * - fix tests failing due to changing defaults * - fix flaky test * - remove unnecessary checks for failover attempt * - clean and trim adapter class - add docs and more explanantion * fix javadoc issue * - switch to all_succes to fix flaky timing * - fix issue in CircuitBreakerFailoverConnectionProvider * introduce ReflectionTestUtil --------- Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With this PR, provider will try failing over and if there is no healthy clusters, then it throws a
JedisTemporarilyNotAvailableExceptionwith provided delay (delayInBetweenFailoverAttempts) until it reaches to max number of attempts (maxNumFailoverAttempts).When it reaches to max value, exception type changes and it starts to throw
JedisPermanentlyNotAvailableExceptionon any failover attempt from that point.This is useful when user can choose to fallback to another resource or proceed with a brand new client/provider instance.