How Laravel fails and retries jobs

Published on

This post has some examples of how failing and retrying jobs work in Laravel.

All examples in this post use queue:work to run the queue worker. For reference, here are the default arguments for queue:work. If you define tries or backoff in a job class, they take precedence over the arguments of your worker.

The arguments you pass to the queue worker don't matter that much. The only important thing is that you don't set your tries to 0, that'll get your failing jobs stuck in infinite loops and you won't have a good time (this was actually the default up until Laravel 6).

Key takeaways from this post:

  • If you define a Queue::failing() callback, every failed queued job calls that exactly once
  • If a queued job fails and has a failed method, it will call that method exactly once
  • Jobs that fail always end up in the failed_jobs table
  • The backoff delay is only used when a job fails due to an uncaught exception
  • You have to manually stop your code after calling release or fail in a job
  • Always make sure your retry_after is higher than your highest timeout

Events for failed jobs

For all examples below, I've registered this event listener in my App​Service​Provider:

use Illuminate\Queue\Events\JobFailed;
use Illuminate\Support\Facades\Queue;

class AppServiceProvider extends ServiceProvider
{
    public function boot()
    {
        Queue::failing(function (JobFailed $event) {
            info('Queue::failing()');
        });
    }
}

Releasing jobs

You can use the release method to tell Laravel a job should be retried. After calling this method, the job is marked to be "released" back to the queue, this just means that you want to retry the job.

class TestJob extends BaseJob implements ShouldQueue
{
    public $backoff = 999; // this value isn't used because we use `release()`

    public $tries = 3;

    public function handle()
    {
        info('handle(): '.$this->attempts().' / '.$this->tries);

        $this->release(delay: now()->addSeconds(3));

        info('After release');
    }

    public function failed(Throwable $exception): void
    {
        info('failed(): '.get_class($exception).': '.$exception->getMessage());
    }
}

This job logs the following:

[2023-04-16 13:19:48] local.INFO: handle(): 1 / 3
[2023-04-16 13:19:48] local.INFO: After release
[2023-04-16 13:19:51] local.INFO: handle(): 2 / 3
[2023-04-16 13:19:51] local.INFO: After release
[2023-04-16 13:19:54] local.INFO: handle(): 3 / 3
[2023-04-16 13:19:54] local.INFO: After release
[2023-04-16 13:19:57] local.INFO: failed(): Illuminate\Queue\MaxAttemptsExceededException: App\Jobs\TestJob has been attempted too many times or run too long. The job may have previously timed out.
[2023-04-16 13:19:57] local.INFO: Queue::failing()
[2023-04-16 13:19:57] local.ERROR: App\Jobs\TestJob has been attempted too many times or run too long. The job may have previously timed out. {"exception":"[object] (Illuminate\\Queue\\MaxAttemptsExceededException(code: 0): App\\Jobs\\TestJob has been attempted too many times or run too long. The job may have previously timed out. at /Users/sjorso/code/watchtower/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:746)
[stacktrace]
    (long stacktrace)

The first thing to notice is that the release() method does not stop the job, you have to stop the code manually. Even if it reaches the end of the handle method, the job will still be retried if you called release() somewhere.

Another thing to note is that backoff does not apply. The backoff delay is only used when your job fails due to an uncaught exception. When you retry a job by releasing it, it always uses the delay passed into the release method. If you call release() without an argument, the job will be retried instantly.

Uncaught exceptions

Let's take a look at uncaught exceptions:

class TestJob extends BaseJob implements ShouldQueue
{
    public $backoff = 5;

    public $tries = 3;

    public function handle()
    {
        info('handle(): '.$this->attempts().' / '.$this->tries);

        throw new RuntimeException($this->attempts().' / '.$this->tries);
    }

    public function failed(Throwable $exception): void
    {
        info('failed(): '.get_class($exception).': '.$exception->getMessage());
    }
}

This job logs the following:

[2023-04-16 13:09:33] local.INFO: handle(): 1 / 3
[2023-04-16 13:09:33] local.ERROR: 1 / 3 {"exception":"[object] (RuntimeException(code: 0): 1 / 3 at /Users/sjorso/code/watchtower/app/Jobs/TestJob.php:24)
[stacktrace]
    (long stacktrace)
[2023-04-16 13:09:38] local.INFO: handle(): 2 / 3
[2023-04-16 13:09:38] local.ERROR: 2 / 3 {"exception":"[object] (RuntimeException(code: 0): 2 / 3 at /Users/sjorso/code/watchtower/app/Jobs/TestJob.php:24)
[stacktrace]
    (long stacktrace)
[2023-04-16 13:09:43] local.INFO: handle(): 3 / 3
[2023-04-16 13:09:43] local.INFO: failed(): RuntimeException: 3 / 3
[2023-04-16 13:09:43] local.INFO: Queue::failing()
[2023-04-16 13:09:43] local.ERROR: 3 / 3 {"exception":"[object] (RuntimeException(code: 0): 3 / 3 at /Users/sjorso/code/watchtower/app/Jobs/TestJob.php:24)
[stacktrace]
    (long stacktrace)

Each attempt logs the exception, and then waits backoff seconds to retry. The failed method is only called after the last failure.

Manually failing your job

You can also manually fail a job by calling the fail method:

class TestJob extends BaseJob implements ShouldQueue
{
    public $backoff = 5;

    public $tries = 3;

    public function handle()
    {
        info('handle(): '.$this->attempts().' / '.$this->tries);

        $this->fail('The message');

        info('After fail');
    }

    public function failed(Throwable $exception): void
    {
        info('failed(): '.get_class($exception).': '.$exception->getMessage());
    }
}

This job logs the following:

[2023-04-16 13:10:45] local.INFO: handle(): 1 / 3
[2023-04-16 13:10:45] local.INFO: failed(): Illuminate\Queue\ManuallyFailedException: The message
[2023-04-16 13:10:45] local.INFO: Queue::failing()
[2023-04-16 13:10:45] local.INFO: After fail

Calling fail tells Laravel that this job has failed and that it shouldn't be retried. Laravel doesn't write anything to the log. Just like with the release method, the fail method doesn't throw an exception, so you have to manually return and stop your code.

Job timeouts

Last up, timeouts:

class TestJob extends BaseJob implements ShouldQueue
{
    public $timeout = 3;

    public $backoff = 999;

    public $tries = 3;

    public function handle()
    {
        info('handle(): '.$this->attempts().' / '.$this->tries);

        sleep($this->timeout + 1);

        info('After sleep');
    }

    public function failed(Throwable $exception): void
    {
        info('failed(): '.get_class($exception).': '.$exception->getMessage());
    }
}

Timeouts are interesting, this is what gets logged:

[2023-04-16 13:12:23] local.INFO: handle(): 1 / 3
    (the queue:work process gets killed here)
    (supervisor restarts the queue:work process)
    (nothing happens for 90 seconds, this the "retry_after" value in "config/queue.php")
[2023-04-16 13:13:53] local.INFO: handle(): 2 / 3
    (same as above, process killed and restarted, waiting to retry)
[2023-04-16 13:15:23] local.INFO: handle(): 3 / 3
    (same as above, process killed and restarted, no waiting this time)
[2023-04-16 13:15:26] local.INFO: failed(): Illuminate\Queue\MaxAttemptsExceededException: App\Jobs\TestJob has been attempted too many times or run too long. The job may have previously timed out.
[2023-04-16 13:15:26] local.INFO: Queue::failing()

Laravel doesn't write anything to the log when a job fails due to a timeout. The only way you'd notice that this job has failed is by either monitoring your failed_jobs table, or by logging something manually.

It is important to understand how the retry_after delay works. The operating system kills the queue worker process if it exceeds the timeout limit, Laravel doesn't know that this has happend, so your job ends up stuck in limbo. Laravel solves this by always retrying all jobs after retry_after seconds. It is very important that your retry_after value is higher than your highest timeout, else long-running jobs will get retried while they are still running.

Don't retry a job on timeout

If you don't want to retry a job on timeout, you can define a fail​On​Timeout property on your job. If you take the same job as above, but add a public fail​On​Timeout = true; property, it logs the following:

[2023-04-16 13:21:57] local.INFO: handle(): 1 / 3
[2023-04-16 13:22:00] local.INFO: failed(): Illuminate\Queue\MaxAttemptsExceededException: App\Jobs\TestJob has been attempted too many times or run too long. The job may have previously timed out.
[2023-04-16 13:22:00] local.INFO: Queue::failing()