Skip to content

Workflow Development

A workflow defines how multiple python functions are connected and executed together. Each workflow has exactly one main process and any number of dependent steps. Steps may consume outputs from the main process or from other steps.

Workflows are acyclic and are executed in topological order based on declared dependencies.

From an OGC API – Processes (Part 3 draft) perspective:

  • A workflow behaves like a single process by exposing itself as a process.
  • Internally, it is a directed acyclic graph (DAG) of sub-processes
  • Each workflow step is itself a valid process
  • Outputs from one step may be consumed as inputs by another step

Workflows allow complex execution graphs to be expressed declaratively while remaining compliant with the OGC process model.

Conceptual Model (OGC Alignment)

Concept Meaning
Workflow A directed, acyclic graph of processes with a single main entry point and a single output, where all data dependencies are explicitly declared.
Main step The primary entrypoint declared using registry.main()
Step A sub-process within the workflow, defined using your_process.step() where your_process is the python function decorated with registry.main()
Dependency A data relationship between process inputs and outputs
Execution order A topologically sorted process graph

A workflow must:

  • Define exactly one main entrypoint
  • Define exactly one output step
  • Contain no cycles
  • Explicitly declare all inter-process dependencies

WorkflowRegistry — Conceptual Overview

Motivation

OGC API – Processes Part 3 (draft) introduces the concept of workflows, where a process execution request may reference other processes as inputs (“nested processes”). This enables clients to define ad-hoc workflows dynamically at execution time.

This framework (procodile) takes a complementary approach:

Instead of defining workflows dynamically in JSON at execution time, workflows 
are authored declaratively in Python, registered once, and then exposed as 
standard OGC processes.

Internally, workflows are represented as structured Python objects (Workflow) that capture:

  • step definitions
  • input bindings (FromMain, FromStep)
  • execution order
  • execution logic (workflow.run)
Aspect OGC Nested Processes Procodile Workflow Framework
Definition order Leaf-first Root-first
Representation Nested JSON Explicit DAG
Primary use Ad-hoc execution Deployment & reuse
Identity Execution-scoped Stable process ID
Execution model Tree evaluation DAG execution

Both models describe the same execution semantics

The framework is the inverse authoring model of nested processes

Future support for nested execution requests can be added without changing the internal model

Why Workflows are exposed as Processes

OGC API – Processes is fundamentally process-centric:

  • Clients discover /processes
  • Clients execute /processes/{id}/execution

There is no separate “workflow” resource in Part 1 or Part 3.

Therefore:

A workflow must be represented as a process in order to be OGC-compliant.

The ProcessRegistry provides this abstraction by projecting workflows into processes.

ProcessRegistry is a mapping-like registry that:

  • stores internal Workflow objects
  • exposes them externally as Process objects

Future Compatibility with Nested Execution Requests

Although workflows are currently authored as Python objects, the internal representation already contains everything needed to support OGC Part 3 nested execution requests in the future.

A future extension may:

  • accept nested execution JSON
  • translate it into an internal workflow DAG
  • execute it using the same runtime
  • or deploy it as a persistent workflow

In that scenario:

  • nested execution becomes an alternative front door
  • the internal execution model remains unchanged

This ensures forward compatibility with the OGC API – Processes Part 3 draft.

Creating a Workflow

Workflows are created through a WorkflowRegistry.

from procodile import ProcessRegistry

registry = ProcessRegistry()

Each workflow is uniquely identified by its id.

Defining the Main Step

The main step represents the workflow’s external interface. It is the only step that receives user-supplied inputs.

Every workflow must define exactly one main step.

For e.g.,

@registry.main(
    id="main_step",
    inputs={
        "id": Field(title="Main input"),
    },
    outputs={
        "a": Field(title="Main output"),
    },
)
def main_step(id: str) -> str:
    return id.upper()

Defining Workflow Steps (Sub-Processes)

Workflow steps are defined using the @main_step.step decorator.

Each step is a process that may depend on:

  • Outputs from the main step
  • Outputs from other workflow steps
@main_step.step(id="second_step")
def second_step(id: str) -> str:
    return id[::-1]

Declaring Dependencies

Dependencies are either declared

  • using typing.Annotated in type annotations of your function argument declarations or
  • using pydantic.Field in the decorator's inputs and outputs values.

From the Main Step

Use FromMain to reference outputs of the main step.

from typing import Annotated
from procodile import FromMain

@main_step.step(id="use_main")
def use_main(
    id: Annotated[str, FromMain(output="a")]
) -> str:
    return id

or

from typing import Annotated
from procodile import FromMain
from pydantic import Field

@main_step.step(
    id="use_main",
    inputs={
        "id": Field(title="main input")
    },
)
def use_main(
    id: str
) -> str:
    return id
  • output refers to a named output of the main step.
  • "return_value" may be used when the main step has no explicit outputs defined.

From Another Step

Use FromStep to reference outputs from another workflow step.

from procodile import FromStep

@main_step.step(id="use_step")
def use_step(
    value: Annotated[str, FromStep(step_id="second_step", output="return_value")]
) -> str:
    return value

or

from procodile import FromStep

@main_step.step(id="use_step", 
               inputs={
                   FromStep(step_id="second_step", output="return_value")
               }
)
def use_step(
    value: str
) -> str:
    return value
  • step_id must refer to an existing workflow step.
  • output must match one of the step’s outputs or "return_value".

Mixing Dependencies and Inputs

A step may mix dependencies and normal inputs.

from procodile import FromStep, FromMain

@main_step.step(id="mixed_step")
def mixed_step(
    a: Annotated[str, FromMain(output="a")],
    b: Annotated[str, FromStep(step_id="second_step", output="return_value")],
) -> str:
    return f"{a}:{b}"

or

from procodile import FromStep, FromMain

@main_step.step(
    id="mixed_step",
    inputs={
        "a": FromMain(output="a"),
        "b": FromStep(step_id="second_step", output="return_value")
    }
)
def mixed_step(
    a: str,
    b: str
) -> str:
    return f"{a}:{b}"

Declaring Outputs for Steps

Steps may declare explicit outputs.

@main_step.step(
    id="final_step",
    outputs={
        "result": Field(title="Final result"),
    },
)
def final_step(
    value: Annotated[str, FromStep(step_id="mixed_step", output="return_value")]
) -> str:
    return value

If no outputs are declared, the return value is exposed as "return_value".

Output Resolution Rules

Workflow execution follows strict output normalization rules. These rules apply uniformly to main and step processes.

1. No output specification

{"return_value": result}

2. Tuple return value

  • Values are mapped positionally to output names
  • Output names must be defined in the output specification

3. Dictionary return value

  • Keys are mapped directly to output names
  • All keys must exist in the output specification

4. Single scalar value

  • Treated as a single output
  • Exposed as "return_value" unless outputs are explicitly defined

Examples

# No outputs declared
return 42
# → {"return_value": 42}

# No outputs declared
return (1, 2)
# → {"return_value": (1, 2)}

# Declared outputs: ("a", "b")
return (1, 2)
# → {"a": 1, "b": 2}

# Declared outputs: ("a", "b")
return {"a": 1, "b": 2}
# → {"a": 1, "b": 2}

# Declared outputs: ("c")
return {"a": 1, "b": 2}
# → {"c":{"a": 1, "b": 2}}

Dependency Validation

Dependencies are validated at workflow construction time:

  • Exactly one main step
  • Exactly one output step (leaf step) -> this returns the final output
  • Referenced steps must exist
  • Referenced outputs must exist
  • Cyclic dependencies are rejected

Errors are raised immediately if the workflow is invalid.

Execution Order

Execution order is automatically derived using topological sorting.

To ensure seamless integration with orchestrating engines (such as Apache Airflow or local runners), the system automatically appends a final step (procodile.workflow.FINAL_STEP_ID) to every user-created workflow.

The final step serves as a standardized exit point for data. By maintaining a consistent final node, orchestrators can reliably extract the workflow's output without needing to parse unique or variable step names defined by the user.

Important: The final step node is a passthrough entity. It does not perform any data transformations, computations, or logic. Its sole responsibility is to receive the output from the preceding step and expose it via a standardized key.

Visualizing the Workflow

Workflows can be visualized as a directed graph.

dot = workflow.visualize_workflow()

This returns a Graphviz DOT representation similar to this:

digraph pipeline {
    rankdir=LR;
    "main_step";
    "second_step";
    "main_step" -> "second_step";
}