At Storytoon, we’ve designed a robust and scalable architecture to power our creative tools. Our goal is to handle a variety of asynchronous tasks—from generating images and videos to processing audio—without compromising the user experience.
System Architecture Diagram
The following diagram illustrates the flow of information and tasks within our system:
graph TD subgraph "Clients" direction LR A[Website] B[Mobile Apps] C[Telegram Bot] end subgraph "API Gateway" D[Astro Backend] end subgraph "Data & Messaging" E[MongoDB] F[ZeroMQ] end subgraph "Async Workers" G[Worker Pool] end subgraph "Services & Tools" direction LR H[fal.ai] I[Google Gemini] J[Self-hosted Models] K[FFmpeg] end A --> D B --> D C --> D D -- "Creates Task" --> E[(Task Queue)] D -- "Publishes Job" --> F F -- "Subscribes to Job" --> G G -- "Processes Task & Updates Status" --> E G -- "Uses" --> H G -- "Uses" --> I G -- "Uses" --> J G -- "Uses" --> K
Core Components
Our architecture is built around a few key components that work together to provide a seamless experience.
API Gateway (Astro)
The heart of our application is a monolithic Astro project that serves both our website frontend and the backend API. This unified approach simplifies development and deployment. The API is the single entry point for all clients, including our web interface, iOS/Android mobile applications, and a Telegram bot.
When a request for a long-running job (like generating a video) arrives, the backend’s primary role is to be fast and responsive. It creates a task in our database and immediately returns a task ID to the client, preventing timeouts and improving user experience.
Task Queue (MongoDB)
We use MongoDB as our database and task queue. Each time a new job is requested, a document is created in a tasks
collection with an initial status (e.g., pending
). This document stores all the necessary information for a worker to execute the task, such as the prompt, source image, or video clips. As the task progresses, its status is updated directly in MongoDB.
Messaging (ZeroMQ)
To decouple our API from the workers, we use ZeroMQ as a lightweight messaging bus. When the API creates a new task, it publishes a message containing the task ID to a ZeroMQ topic. This allows us to send notifications to our worker pool instantly without the overhead of a heavier message broker.
Asynchronous Workers
Our Worker Pool consists of one or more processes that subscribe to jobs published via ZeroMQ. When a worker receives a notification, it fetches the full task details from MongoDB using the provided task ID.
The worker then executes the required job, which can involve one or more of the following:
- Calling third-party APIs like fal.ai or Google Gemini for generative AI tasks.
- Interacting with our self-hosted models (e.g., a
w2l
model on AWS) for specialized processing. - Performing media manipulation, such as concatenating video clips using FFmpeg.
Once the task is complete (or if it fails), the worker updates the task’s status and stores the result (e.g., a URL to the generated file) in MongoDB. The client can then poll the API endpoint with the task ID to check the status and retrieve the final result.
This asynchronous, decoupled architecture allows us to build a scalable and resilient system capable of handling a diverse range of creative tasks.