This is a story about software architecture, about a personal itch, and about scalability. And like any good tech story, it begins with a shaky architecture.
At Panorays, we help large enterprises to measure the security posture of their suppliers. But I’m not going to get into the whole 3rd party security management extravaganza with you. we came to talk about our architecture and process.
In the beginning, there was bash. and scripts to manage VMs. a lot of scripts.
Problems:
a Dynamic Workflow Engine, built to create workflows and execute them as Kubernetes Jobs.
A container based architecture makes The Transporter both flexible enough to configure jobs separately and efficient enough to scale.
It favors parallelism when possible, according to the workflow dependencies and provides a REST API for a fully automated pipeline.
Standing on the Shoulders of Giants
The Transporter’s API is automatically triggered whenever a new company is entered to the platform. The Transporter then deploys the jobs to kubernetes in parallel or sequentially, according to a predefined workflow.
Overview
As with the original transporter, The Transporter follows a few simple rules:
The Rules
The transporter deal is you define the workflow, The transporter will make it happen.But in order to define a workflow we first need to define a job.
In our case a job is the equivalent of running a docker container.A group of these jobs are a phase, a phase can be sequential or parallel.A workflow is a sequence of phases.
Now, we can enjoy parallelism while still follow some rules.
Workflow Example
The Transporter leverage a distributed task queue architecture.In this architecture tasks get transported to queues, workers consume the tasks from the queues and perform these tasks.This architecture makes it possible to retry a failed task,set a timeout, set a priority, and schedule tasks for later.
We send notifications to alert on workflow start, success and failures.
Distributed Task Queue Architecture
Now, we are ready to tie it all together — The transporter provides endpoints to manipulate a workflowbehind the scenes the workflow gets translated to Celery tasks. we use celery chains and celery groups to set dependencies.these tasks get transported to queues based on the dependencies.On the other side celery workers consume tasks from the queuesand deploy the corresponding KubernetesJob.The result — a workflow getting accomplished according to job dependencies.
We also added endpoints to control workers for convenience.The number of running workers sets the limit for how many jobs can run concurrently.
Our new deployment process includes security researchers building and pushing docker images to Google Cloud Registry.The Transporter transports the corresponding jobs according to the workflow and a ConfigMap which defines the version of each job.Kubernetes is the engine actually executing the underlying docker containers.
Updating Jobs
At first we set the KubernetesJob name the same as the original job name with a unique identifier appended at the end.
How Not to Name a Job
and this is how we discovered some Kubernetes naming limitations:
The first one is the regex for validating name a job name which is basically alphanumeric characters separated by dashes or underscoresWe discovered the second limitation - for maximum charactersthanks to our security researchers who appreciate long overly-detailed names
#1 Tip - unique identifiers for names. but we still wanted to know the original job name and the company name which leads me to — #2 Tip - labels. labels everywhere.
Before implementing our own workflow engine we checked some existing solutions.
Airflow- Airflow is great. It works by rendering python files into DAGs which represents a workflow.If you have a static workflow which is determined pre-runtime you want to execute like an ETL flow I recommend to try working a solution with Airflow. Airflow’s problem lies in dynamic workflows — check out this proper way to create dynamic workflows in airflow on stack overflow.The reason we decided not to use it was our need for generating dynamic workflows, which changes based on our REST API requests.
UI — We want to add a UI to make it easy to monitor and troubleshoot active and finished workflows.
Generify — If we make The Transporter a bit more generic maybe we could release it as an open source.
If you have a use case which involves running batch jobs according to certain dependencies (e.g Data Acquisition, Web Crawling System) and you are interested in scaling with The Transporter please comment or reach out to let me know.
If you enjoyed this post, feel free to hold down the clap button 👏🏽 and if you’re interested in posts to come, make sure to follow me on
Medium: https://medium.com/@talperetz24Twitter: https://twitter.com/talperetz24LinkedIn: https://www.linkedin.com/in/tal-per/