Mar 7, 2024 4 min read

How to manage long-running LLM tasks using background jobs with RQ (Redis Queue)

Like every new piece of tech that you add to your application, adding LLMs also pose some interesting integration challenges. In one of our older blogs, we talked about how to handle LLM generation latencies via streaming. While streaming takes care of delivering LLM responses in a user-friendly manner, there’s another side to this problem. How do you manage the long-running LLM generation processes that take minutes to complete on the backend?

What are background jobs?

API requests normally take anything between 100s of milliseconds to a few seconds to respond. But how do you process a task that takes minutes or hours to complete? This is where background jobs (or async tasks or just jobs) come into the picture. The idea is simple - you offload work to a different application that processes the long-running task in async, freeing up your main application to respond to requests in real time.

Why not just handle long-running tasks in the webserver?

Handling long-running tasks in the main web application can severely impact its performance and responsiveness

Long-running tasks tie up the server's resources causing requests from other users to be blocked or delayed. This can lead to poor user experience as users may experience slow response times or even timeouts.
Web servers typically have limited resources, such as CPU, memory, and network bandwidth. Running long tasks in the main application can consume these resources, potentially causing other requests to be queued or rejected.
As the number of users increases so does the demand on the server. Long-running tasks can exacerbate scalability issues, making it difficult to handle a large number of concurrent users.
Users expect web applications to respond quickly and provide immediate feedback. Long-running tasks can disrupt this expectation, leading to frustration, and dissatisfaction among users.

This blog is by Sourabh, our CTO who spends most of his time building an AI agent and gives the best restaurant recommendations. If you like this post, try KushoAI today, and start shipping bug-free code faster!

Background jobs using RQ

There are many different ways to implement background jobs - threading, multiprocessing, and async task queues. In this blog post, we’re going to talk about how to implement background jobs using RQ which is an async task queue.

RQ (short for Redis Queue) is an async task queue for queueing jobs and processing them in the background with workers. You can think of it as a simpler alternative to Celery. RQ, as the name suggests, uses Redis as a message broker and has a very simple interface and setup process. This also happens to be the main reason we picked RQ - it’s very easy to get started with and deploy in production.

Deploying RQ is very straightforward. You need a Redis instance where the job queue will be maintained and some way to deploy RQ workers that do the actual processing of tasks.

NOTE: We had some problems with running the latest version of RQ (1.15.1 at the time of writing) with Redis 7. So we used Redis 6 for our deployment.

The way RQ works is:

From the main application, enqueue a task (which is just a function in your codebase that you want to execute).
This task data is written to Redis.
An RQ worker who is free picks up the task and starts the execution.
After the execution is completed, the output is written back to Redis for a configurable amount of time.
If your job fails, RQ provides ways to retry the tas.

Now let’s look at some code.

Enqueuing tasks from application

Install RQ:

pip install rq

Let’s say this is the long-running task that you want to run using RQ workers:

# filename - rq_task.py
 
import time

def sleep_for_a_while(sleep_time_seconds: int = 30):
    time.sleep(sleep_time_seconds)

Create an instance for the queue you’ll use for enqueuing tasks:

from redis import Redis
from rq import Queue

redis_instance = Redis(
  host="url for your redis host",  # default is localhost
  ssl=False,  # set this to True if you use TLS
  decode_responses=True,  # this will return response as string instead of bytes. This is False by default
)
q = Queue(connection=redis_instance)

Please note that this will create an instance for the “default” queue. If you want to enqueue tasks to a different queue, you can do it like this:

q = Queue(connection=redis_instance, name="queue_name")

There are 2 ways to enqueue a task. You can either pass reference to the function that needs to be executed by the worker or you can pass the entire path as a string (in case you can’t import the module due to circular imports)

Enqueue by passing a function reference:

from rq_task import sleep_for_a_while
result = q.enqueue(sleep_for_a_while, 60)

Enqueue by passing the full function path:

result = q.enqueue("rq_task.sleep_for_a_while", 60)

RQ workers for processing tasks

This is how you start an RQ worker which will start consuming enqueued tasks

rq worker --url redis://<hostname>:<port>

In case you’re using TLS, the redis connection string will start with “rediss”

rq worker --url rediss://<hostname>:<port>

Note that this will start consuming tasks from the default queue. If you want to consume from a different queue, you can do it like this

rq worker <queue_name> --url redis://<hostname>:<port>

And that's it for the basic setup! For deploying it in production, we would suggest running workers in a container on k8s or ECS and using RQ dashboard for monitoring.

In our next blog post, we’ll talk about more advanced topics like handling worker output, retry/timeouts, managing worker crashes, scheduling cron jobs, monitoring RQ workers, etc. Stay tuned!

At KushoAI, we're building an AI agent that tests your APIs for you. Bring in API information in any format and watch KushoAI turn it into fully functional and exhaustive test suites in minutes.