Proper Retry in JavaScript

There are times when retriable errors might occur when your code is running. In this post I'm going to show how to handle these situations in JavaScript.

Jarmo Pertman — 2020-11-19 (7 minute read)

It happens pretty often that you want to retry some API calls due to possible network issues for example. To do that in a programming language where asynchronous code is not a norm (at least from a code writing point of view) it's pretty easy to write a retry logic. Here's one really simplistic example written in the Ruby programming language:


def get(url)
  http_client.get(url)
rescue
  sleep 1
  retry
end

It's pretty simple - it just tries to perform a GET request and if any errors are thrown then that request will be retried after a small back-off time. Of course in the production code there should be some extra code to log errors and have maximum count of retries with some backoff retry strategy to avoid performing a DoS attack against the API endpoint.

A JavaScript Way

Doing something similar in the JavaScript as is done above in Ruby gets more complicated due to the asynchronous nature of that language. We stumbled on an attempt in one of our client's project, which looked something like this:


const legacyRetry = async (promise, retryCount, timeout, increaseTimeoutBy) => {
    try {
        return await new Promise(async (resolve, reject) => {
            setTimeout(async () => {
                reject('Timeout is reached!')
            }, timeout)
            try {
                resolve(await promise)
            } catch (e) {
                reject(e)
            }
        })
    } catch (err) {
        if (retryCount < 1) {
            throw err
        }
        const newTimeout = (timeout += increaseTimeoutBy)
        console.log('Retrying with timeout: ', newTimeout)
        return await legacyRetry(promise, retryCount - 1, newTimeout, increaseTimeoutBy)
    }
}

This is how legacyRetry function was used like:


const result = await legacyRetry(httpClient.get(url), 3, 360000, 80000);
console.info(result);

At first looking at this code it seems that there has gone a lot of thought into it and it seems complex enough so that it might actually work as you'd expect. However, looking closely we can see that it doesn't work at all - mainly it does not retry a single time! Can you spot the problem?

Problem is already in the legacyRetry function signature - it does expect a promise as one of its argument. Since promises can be resolved/rejected only once in their lifetime then it means that after first rejection no retries are possible - it will stay always rejected after the first failure and no retries will happen at all.

Let's create a simple code to verify that assumption. First we create some functions, which help us in testing.


const request = (failuresCount) => {
    return () => {
        return new Promise((resolve, reject) => {
            setTimeout(() => {
                if (failuresCount-- > 0) {
                    reject("error response")
                } else {
                    resolve("success response")
                }
            })
        })
    }
}

const fail = () => {
    throw new Error("Expected to fail")
}

First there is a request function which takes a failuresCount as its argument and returns a new function which will return successfully only after it has been called failuresCount + 1 times. Using setTimeout is only necessary to introduce asynchronous element to the code to be similar to real-life example. Second function called fail is just there to make sure that we always threw an Exception when we expected it to be thrown. Let's see that our request function works as expected:


const testFailingRequest = async () => {
    const failingRequest = request(2)

    console.log("First try")
    try {
        await failingRequest()
        fail()
    } catch (e) {
        console.log("First failure:", e)
    }

    console.log("Second try")
    try {
        await failingRequest()
        fail()
    } catch (e) {
        console.log("Second failure:", e)
    }

    console.log("Third try")
    let result = await failingRequest()
    console.log("Third try result:", result)
}

testFailingRequest()

As expected, we will see two first executions of failingRequest to fail and third one to succeed:


First try
First failure: error response
Second try
Second failure: error response
Third try
Third try result: success response

Now, let's see how the supposedly broken legacyRetry function behaves:


const testLegacyRetry = async () => {
    try {
        await legacyRetry(request(3)(), 2, 1, 1)
        fail()
    } catch (e) {
        console.log("Legacy retry result:", e)
    }

    try {
        await legacyRetry(request(2)(), 3, 1, 1)
        fail()
    } catch (e) {
        console.log("Legacy retry result:", e)
    }

    const response = await legacyRetry(request(0)(), 2, 1, 1)
    console.log("Legacy retry result:", response)
}

testLegacyRetry()

Here's the output from the code above:


Retrying with timeout:  2
Retrying with timeout:  3
Legacy retry result: error response
Retrying with timeout:  2
Retrying with timeout:  3
Retrying with timeout:  4
Legacy retry result: error response
Legacy retry result: success response

Looking at the output above it is certain that our assumption was true - legacyRetry function managed to get a successful result only when there were no failures at all meaning that the retry functionality does not work!

Conclusion of the `legacyRetry` Function

As shown above the legacyRetry function does not work and has never worked in the production code. There's mainly two reasons for why that has happened - developer has not understood how asynchronous code using Promises work and he/she has not ever actually tested it manually nor with automated unit tests (there were none). Its timeout logic is also questionable even if the logic of using Promise object would have been working - only timeout for failures (for example a connect timeout) was incresed, but not the wait-time between performing any actual requests.

There's an App for it?

Retry logic is quite abstract and generally usable in any software project so there's probably a npm package out there or a good example available in the web. Right? After some searching I did find multiple ([1], [2], [3], [4] and [5]) attempts of implementing retry logic. All of these seem to be either too complex or not abstract enough - retriable code should not know anything about the fact that it might be retried. Code, which is retried should be also possible to execute directly without any retry code in it - it's the simple separation of concerns principle. Maybe there's a perfect solution out there, but seeing how many have gotten it wrong we decided to create our own solution.

A Proper JavaScript Way

We tried to fix the code above, but ended up creating our own version since it was much easier in the end. Here's what we came up with:


const retry = async (fn, maxAttempts) => {
  const execute = async (attempt) => {
    try {
        return await fn()
    } catch (err) {
        if (attempt <= maxAttempts) {
            const nextAttempt = attempt + 1
            const delayInSeconds = Math.max(Math.min(Math.pow(2, nextAttempt)
              + randInt(-nextAttempt, nextAttempt), 600), 1)
            console.error(`Retrying after ${delayInSeconds} seconds due to:`, err)
            return delay(() => execute(nextAttempt), delayInSeconds * 1000)
        } else {
            throw err
        }
    }
  }
  return execute(1)
}

const delay = (fn, ms) => new Promise((resolve) => setTimeout(() => resolve(fn()), ms))

const randInt = (min, max) => Math.floor(Math.random() * (max - min + 1) + min)

It does not look similar at all to the legacyRetry function. One of the biggest change is that instead of expecting a promise as its argument it does expect a function that returns promise. With this small change it is possible to actually implement retry logic since after promise gets rejected retry function can create a new unfulfilled promise instance and try again. There's also some gradual back-off logic built into this function which will wait more time after each failure before trying again to leave some breathing room to the other side (API endpoint for example).

Let's see if and how it actually works:


const testRetry = async () => {
    try {
        await retry(request(3), 2)
        fail()
    } catch (e) {
        console.log("Failing retry result:", e)
    }

    const response = await retry(request(5), 10);
    console.log("Successful retry result:", response)
}

testRetry()

Here's the output of the code above:


Retrying after 3 seconds due to: error response
Retrying after 5 seconds due to: error response
Failing retry result: error response
Retrying after 6 seconds due to: error response
Retrying after 7 seconds due to: error response
Retrying after 17 seconds due to: error response
Retrying after 36 seconds due to: error response
Retrying after 64 seconds due to: error response
Successful retry result: success response

As seen from the output above everything works exactly as expected! When you run the same code yourself then the output is probably not exactly the same because there is a random element involved when it comes to back-off delay.

Conclusion

In this post we showed how a non-working retry function might be created in JavaScript when not understanding how the asynchronous part of the language actually works and not having any automated unit tests. Having created some unit tests (or even tried it out manually twice - once for a success case and once for a failure case) would have shown immediately that the current implementation is faulty and works only when there is no need to perform any retries. Creating a properly working function is possible, however much more complex than in programming languages where code is written synchronously. Feel free to use our tested and working retry function and let us know in case there's any improvement ideas.

Read other articles

Solutional is an agile software development company which has a team of professional engineers who are able to solve all software problems from beginning to the end without any middlemen.