Workflow Development
A workflow defines how multiple python functions are connected and executed together. Each workflow has exactly one main process and any number of dependent steps. Steps may consume outputs from the main process or from other steps.
Workflows are acyclic and are executed in topological order based on declared dependencies.
From an OGC API – Processes (Part 3 draft) perspective:
- A workflow behaves like a single process by exposing itself as a process.
- Internally, it is a directed acyclic graph (DAG) of sub-processes
- Each workflow step is itself a valid process
- Outputs from one step may be consumed as inputs by another step
Workflows allow complex execution graphs to be expressed declaratively while remaining compliant with the OGC process model.
Conceptual Model (OGC Alignment)
| Concept | Meaning |
|---|---|
| Workflow | A directed, acyclic graph of processes with a single main entry point and a single output, where all data dependencies are explicitly declared. |
| Main step | The primary entrypoint declared using registry.main() |
| Step | A sub-process within the workflow, defined using your_process.step() where your_process is the python function decorated with registry.main() |
| Dependency | A data relationship between process inputs and outputs |
| Execution order | A topologically sorted process graph |
A workflow must:
- Define exactly one main entrypoint
- Define exactly one output step
- Contain no cycles
- Explicitly declare all inter-process dependencies
WorkflowRegistry — Conceptual Overview
Motivation
OGC API – Processes Part 3 (draft) introduces the concept of workflows, where a process execution request may reference other processes as inputs (“nested processes”). This enables clients to define ad-hoc workflows dynamically at execution time.
This framework (procodile) takes a complementary approach:
Instead of defining workflows dynamically in JSON at execution time, workflows
are authored declaratively in Python, registered once, and then exposed as
standard OGC processes.
Internally, workflows are represented as structured Python objects (Workflow) that
capture:
- step definitions
- input bindings (
FromMain,FromStep) - execution order
- execution logic (
workflow.run)
| Aspect | OGC Nested Processes | Procodile Workflow Framework |
|---|---|---|
| Definition order | Leaf-first | Root-first |
| Representation | Nested JSON | Explicit DAG |
| Primary use | Ad-hoc execution | Deployment & reuse |
| Identity | Execution-scoped | Stable process ID |
| Execution model | Tree evaluation | DAG execution |
Both models describe the same execution semantics
The framework is the inverse authoring model of nested processes
Future support for nested execution requests can be added without changing the internal model
Why Workflows are exposed as Processes
OGC API – Processes is fundamentally process-centric:
- Clients discover
/processes - Clients execute
/processes/{id}/execution
There is no separate “workflow” resource in Part 1 or Part 3.
Therefore:
A workflow must be represented as a process in order to be OGC-compliant.
The ProcessRegistry provides this abstraction by projecting workflows into
processes.
ProcessRegistry is a mapping-like registry that:
- stores internal
Workflowobjects - exposes them externally as
Processobjects
Future Compatibility with Nested Execution Requests
Although workflows are currently authored as Python objects, the internal representation already contains everything needed to support OGC Part 3 nested execution requests in the future.
A future extension may:
- accept nested execution JSON
- translate it into an internal workflow DAG
- execute it using the same runtime
- or deploy it as a persistent workflow
In that scenario:
- nested execution becomes an alternative front door
- the internal execution model remains unchanged
This ensures forward compatibility with the OGC API – Processes Part 3 draft.
Creating a Workflow
Workflows are created through a WorkflowRegistry.
from procodile import ProcessRegistry
registry = ProcessRegistry()
Each workflow is uniquely identified by its id.
Defining the Main Step
The main step represents the workflow’s external interface. It is the only step that receives user-supplied inputs.
Every workflow must define exactly one main step.
For e.g.,
@registry.main(
id="main_step",
inputs={
"id": Field(title="Main input"),
},
outputs={
"a": Field(title="Main output"),
},
)
def main_step(id: str) -> str:
return id.upper()
Defining Workflow Steps (Sub-Processes)
Workflow steps are defined using the @main_step.step decorator.
Each step is a process that may depend on:
- Outputs from the main step
- Outputs from other workflow steps
@main_step.step(id="second_step")
def second_step(id: str) -> str:
return id[::-1]
Declaring Dependencies
Dependencies are either declared
- using
typing.Annotatedin type annotations of your function argument declarations or - using
pydantic.Fieldin the decorator'sinputsandoutputsvalues.
From the Main Step
Use FromMain to reference outputs of the main step.
from typing import Annotated
from procodile import FromMain
@main_step.step(id="use_main")
def use_main(
id: Annotated[str, FromMain(output="a")]
) -> str:
return id
or
from typing import Annotated
from procodile import FromMain
from pydantic import Field
@main_step.step(
id="use_main",
inputs={
"id": Field(title="main input")
},
)
def use_main(
id: str
) -> str:
return id
outputrefers to a named output of the main step."return_value"may be used when the main step has no explicit outputs defined.
From Another Step
Use FromStep to reference outputs from another workflow step.
from procodile import FromStep
@main_step.step(id="use_step")
def use_step(
value: Annotated[str, FromStep(step_id="second_step", output="return_value")]
) -> str:
return value
or
from procodile import FromStep
@main_step.step(id="use_step",
inputs={
FromStep(step_id="second_step", output="return_value")
}
)
def use_step(
value: str
) -> str:
return value
step_idmust refer to an existing workflow step.outputmust match one of the step’s outputs or"return_value".
Mixing Dependencies and Inputs
A step may mix dependencies and normal inputs.
from procodile import FromStep, FromMain
@main_step.step(id="mixed_step")
def mixed_step(
a: Annotated[str, FromMain(output="a")],
b: Annotated[str, FromStep(step_id="second_step", output="return_value")],
) -> str:
return f"{a}:{b}"
or
from procodile import FromStep, FromMain
@main_step.step(
id="mixed_step",
inputs={
"a": FromMain(output="a"),
"b": FromStep(step_id="second_step", output="return_value")
}
)
def mixed_step(
a: str,
b: str
) -> str:
return f"{a}:{b}"
Declaring Outputs for Steps
Steps may declare explicit outputs.
@main_step.step(
id="final_step",
outputs={
"result": Field(title="Final result"),
},
)
def final_step(
value: Annotated[str, FromStep(step_id="mixed_step", output="return_value")]
) -> str:
return value
If no outputs are declared, the return value is exposed as "return_value".
Output Resolution Rules
Workflow execution follows strict output normalization rules. These rules apply uniformly to main and step processes.
1. No output specification
{"return_value": result}
2. Tuple return value
- Values are mapped positionally to output names
- Output names must be defined in the output specification
3. Dictionary return value
- Keys are mapped directly to output names
- All keys must exist in the output specification
4. Single scalar value
- Treated as a single output
- Exposed as "return_value" unless outputs are explicitly defined
Examples
# No outputs declared
return 42
# → {"return_value": 42}
# No outputs declared
return (1, 2)
# → {"return_value": (1, 2)}
# Declared outputs: ("a", "b")
return (1, 2)
# → {"a": 1, "b": 2}
# Declared outputs: ("a", "b")
return {"a": 1, "b": 2}
# → {"a": 1, "b": 2}
# Declared outputs: ("c")
return {"a": 1, "b": 2}
# → {"c":{"a": 1, "b": 2}}
Dependency Validation
Dependencies are validated at workflow construction time:
- Exactly one main step
- Exactly one output step (leaf step) -> this returns the final output
- Referenced steps must exist
- Referenced outputs must exist
- Cyclic dependencies are rejected
Errors are raised immediately if the workflow is invalid.
Execution Order
Execution order is automatically derived using topological sorting.
To ensure seamless integration with orchestrating engines
(such as Apache Airflow or local runners), the system automatically
appends a final step (procodile.workflow.FINAL_STEP_ID) to every user-created
workflow.
The final step serves as a standardized exit point for data. By maintaining a consistent final node, orchestrators can reliably extract the workflow's output without needing to parse unique or variable step names defined by the user.
Important: The final step node is a passthrough entity. It does not perform any data transformations, computations, or logic. Its sole responsibility is to receive the output from the preceding step and expose it via a standardized key.
Visualizing the Workflow
Workflows can be visualized as a directed graph.
dot = workflow.visualize_workflow()
This returns a Graphviz DOT representation similar to this:
digraph pipeline {
rankdir=LR;
"main_step";
"second_step";
"main_step" -> "second_step";
}