Execution¶

Aqueducts supports both local and remote pipeline execution. Use the CLI for local execution or submit jobs to remote executors running in your infrastructure.

Local Execution¶

Execute pipelines directly on your local machine using the Aqueducts CLI.

Basic Usage¶

# Run a pipeline
aqueducts run --file pipeline.yml

# Run with parameters
aqueducts run --file pipeline.yml --param key1=value1 --param key2=value2

# Multiple parameters
aqueducts run --file pipeline.yml --params key1=value1 --params key2=value2

Supported Formats¶

The CLI supports multiple pipeline definition formats:

YAMLJSONTOML

aqueducts run --file pipeline.yml

aqueducts run --file pipeline.json

aqueducts run --file pipeline.toml

Format Support

JSON and TOML support require appropriate feature flags during CLI installation.

Local Execution Benefits¶

Direct access to local files and databases
No network overhead for file-based operations
Immediate feedback and debugging
Full control over execution environment

Remote Execution¶

Execute pipelines on remote executors deployed within your infrastructure, closer to your data sources.

Architecture Overview¶

Remote execution follows this flow:

CLI Client submits pipeline to Remote Executor
Remote Executor processes data from Data Sources
Remote Executor writes results to Destinations
Remote Executor sends status updates back to CLI Client

Setting Up Remote Execution¶

Deploy an Executor:

# Start executor with API key
aqueducts-executor --api-key your_secret_key --max-memory 4

Submit Pipeline from CLI:

# Execute remotely
aqueducts run --file pipeline.yml \
  --executor executor-host:3031 \
  --api-key your_secret_key \
  --param environment=prod

Remote Execution Benefits¶

Minimize network transfer by processing data close to sources
Scale processing power independently of client machines
Secure execution within your infrastructure boundaries
Centralized resource management with memory limits

Managing Remote Executions¶

Monitor and control remote pipeline executions:

# Check executor status
curl http://executor-host:3031/api/health

# Cancel a running pipeline
aqueducts cancel --executor executor-host:3031 \
  --api-key your_secret_key \
  --execution-id abc-123

Aqueducts Executor¶

The Aqueducts Executor is a deployable application for running pipelines within your infrastructure.

Key Features¶

Remote Execution: Run data pipelines securely within your infrastructure
Memory Management: Configure maximum memory usage with DataFusion's memory pool
Real-time Feedback: WebSocket communication for live progress updates
Cloud Storage Support: Native S3, GCS, and Azure Blob Storage integration
Database Connectivity: ODBC support for various database systems
Exclusive Execution: Single-pipeline execution for optimal resource utilization

Docker Deployment (Recommended)¶

The Docker image includes ODBC support with PostgreSQL drivers pre-installed:

# Pull from GitHub Container Registry
docker pull ghcr.io/vigimite/aqueducts/aqueducts-executor:latest

# Run with command line arguments
docker run -d \
  --name aqueducts-executor \
  -p 3031:3031 \
  ghcr.io/vigimite/aqueducts/aqueducts-executor:latest \
  --api-key your_secret_key --max-memory 4

Environment Variables¶

Configure the executor using environment variables:

docker run -d \
  --name aqueducts-executor \
  -p 3031:3031 \
  -e AQUEDUCTS_API_KEY=your_secret_key \
  -e AQUEDUCTS_HOST=0.0.0.0 \
  -e AQUEDUCTS_PORT=3031 \
  -e AQUEDUCTS_MAX_MEMORY=4 \
  -e AQUEDUCTS_LOG_LEVEL=info \
  ghcr.io/vigimite/aqueducts/aqueducts-executor:latest

Configuration Options¶

Option	Description	Default	Environment Variable
`--api-key`	API key for authentication	-	`AQUEDUCTS_API_KEY`
`--host`	Host address to bind to	0.0.0.0	`AQUEDUCTS_HOST`
`--port`	Port to listen on	8080	`AQUEDUCTS_PORT`
`--max-memory`	Maximum memory usage in GB (0 for unlimited)	0	`AQUEDUCTS_MAX_MEMORY`
`--server-url`	URL of Aqueducts server for registration (optional)	-	`AQUEDUCTS_SERVER_URL`
`--executor-id`	Unique identifier for this executor	auto-generated	`AQUEDUCTS_EXECUTOR_ID`
`--log-level`	Logging level (info, debug, trace)	info	`AQUEDUCTS_LOG_LEVEL`

Docker Compose Setup¶

For local development and testing:

# Start database only (default)
docker-compose up

# Start database + executor
docker-compose --profile executor up

# Build and start from source
docker-compose --profile executor up --build

The executor will be available at:

API: http://localhost:3031
Health check: http://localhost:3031/api/health
WebSocket: ws://localhost:3031/ws/connect

Manual Installation¶

Install using Cargo for custom deployments:

# Standard installation with cloud storage features
cargo install aqueducts-executor

# Installation with ODBC support
cargo install aqueducts-executor --features odbc

API Endpoints¶

Endpoint	Method	Auth	Description
`/api/health`	GET	No	Basic health check
`/ws/connect`	GET	Yes	WebSocket endpoint for bidirectional communication

ODBC Configuration¶

For database connectivity, ODBC support requires the odbc feature flag during installation and proper system configuration.

Installation Requirements¶

First, install Aqueducts with ODBC support:

# CLI with ODBC support
cargo install aqueducts-cli --features odbc

# Executor with ODBC support  
cargo install aqueducts-executor --features odbc

System Dependencies¶

Ubuntu/DebianFedora/RHEL/CentOSmacOS

# Install UnixODBC development libraries
sudo apt-get update
sudo apt-get install unixodbc-dev

# PostgreSQL driver
sudo apt-get install odbc-postgresql

# MySQL driver
sudo apt-get install libmyodbc

# SQL Server driver (optional)
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
sudo apt-get update
sudo apt-get install msodbcsql17

# Install UnixODBC development libraries
sudo dnf install unixODBC-devel

# PostgreSQL driver
sudo dnf install postgresql-odbc

# MySQL driver
sudo dnf install mysql-connector-odbc

# SQL Server driver (optional)
sudo curl -o /etc/yum.repos.d/msprod.repo https://packages.microsoft.com/config/rhel/8/prod.repo
sudo dnf install msodbcsql17

# Install UnixODBC via Homebrew
brew install unixodbc

# PostgreSQL driver
brew install psqlodbc

# MySQL driver
brew install mysql-connector-c++

# SQL Server driver (optional)
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew install msodbcsql17

Driver Configuration¶

PostgreSQLMySQL

Edit /etc/odbcinst.ini (system) or ~/.odbcinst.ini (user):

Linux:

[PostgreSQL]
Description=PostgreSQL ODBC driver
Driver=/usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so
Setup=/usr/lib/x86_64-linux-gnu/odbc/libodbcpsqlS.so
FileUsage=1

macOS:

[PostgreSQL]
Description=PostgreSQL ODBC driver
Driver=/opt/homebrew/lib/psqlodbcw.so
Setup=/opt/homebrew/lib/libodbcpsqlS.so
FileUsage=1

Edit /etc/odbcinst.ini (system) or ~/.odbcinst.ini (user):

Linux:

[MySQL]
Description=MySQL ODBC driver
Driver=/usr/lib/x86_64-linux-gnu/odbc/libmyodbc8w.so
FileUsage=1

macOS:

[MySQL]
Description=MySQL ODBC driver
Driver=/opt/homebrew/lib/libmyodbc8w.so
FileUsage=1

Data Source Configuration¶

Configure database connections in /etc/odbc.ini (system) or ~/.odbc.ini (user):

PostgreSQL DSNMySQL DSN

[PostgreSQL-Local]
Description=Local PostgreSQL Database
Driver=PostgreSQL
Server=localhost
Port=5432
Database=mydb
UserName=myuser
Password=mypass
ReadOnly=no
ServerType=postgres
ConnSettings=

[MySQL-Local]
Description=Local MySQL Database  
Driver=MySQL
Server=localhost
Port=3306
Database=mydb
User=myuser
Password=mypass

Connection String Examples¶

PostgreSQLMySQL

# Using DSN
sources:
  - type: odbc
    name: postgres_data
    connection_string: "DSN=PostgreSQL-Local"
    load_query: "SELECT * FROM users WHERE created_at > '2024-01-01'"

# Direct connection string
sources:
  - type: odbc
    name: postgres_data
    connection_string: "Driver={PostgreSQL};Server=localhost;Database=mydb;UID=user;PWD=pass;"
    load_query: "SELECT * FROM users LIMIT 1000"

# Using DSN
sources:
  - type: odbc
    name: mysql_data
    connection_string: "DSN=MySQL-Local"
    load_query: "SELECT * FROM products WHERE price > 100"

# Direct connection string
sources:
  - type: odbc
    name: mysql_data
    connection_string: "Driver={MySQL};Server=localhost;Database=mydb;User=user;Password=pass;"
    load_query: "SELECT * FROM orders WHERE date >= '2024-01-01'"

Testing Your Setup¶

1. Test ODBC Installation¶

# Check installed drivers
odbcinst -q -d

# Check configured data sources
odbcinst -q -s

2. Test Database Connection¶

# Test with isql (interactive SQL)
isql -v PostgreSQL-Local username password

# Test MySQL connection
isql -v MySQL-Local username password

3. Test with Aqueducts¶

Create a minimal test pipeline:

# yaml-language-server: $schema=https://raw.githubusercontent.com/vigimite/aqueducts/main/json_schema/aqueducts.schema.json

version: "v2"
sources:
  - type: odbc
    name: test_connection
    connection_string: "DSN=PostgreSQL-Local"
    load_query: "SELECT 1 as test_column"

stages:
  - - name: verify
      query: "SELECT * FROM test_connection"
      show: 1

Run the test:

aqueducts run --file test-odbc.yml

Common Driver Paths¶

LinuxmacOS

# PostgreSQL
/usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so

# MySQL  
/usr/lib/x86_64-linux-gnu/odbc/libmyodbc8w.so

# Find drivers
find /usr -name "*odbc*.so" 2>/dev/null

# PostgreSQL
/opt/homebrew/lib/psqlodbcw.so

# MySQL
/opt/homebrew/lib/libmyodbc8w.so

# Find drivers  
find /opt/homebrew -name "*odbc*.so" 2>/dev/null

Performance Considerations¶

Optimization Tips

Limit query results: Use LIMIT clauses to avoid memory issues
Filter early: Apply WHERE conditions in your load_query
Use indexes: Ensure your database queries use appropriate indexes
Memory management: Set executor --max-memory limits appropriately

ODBC Troubleshooting¶

Driver Loading Issues¶

# Check if drivers are registered
odbcinst -q -d

# Test driver loading
ldd /path/to/driver.so  # Linux
otool -L /path/to/driver.so  # macOS

Connection Issues¶

# Enable ODBC tracing for debugging
export ODBCSYSINI=/tmp
export ODBCINSTINI=/etc/odbcinst.ini
export ODBCINI=/etc/odbc.ini

# Test with verbose output
isql -v DSN_NAME username password

Common Error Solutions¶

Driver not found: Verify driver paths in odbcinst.ini
DSN not found: Check odbc.ini configuration
Permission denied: Ensure ODBC files are readable
Library loading: Install missing system dependencies

Troubleshooting¶

Common Issues¶

Local Execution:

Pipeline validation errors: Check YAML syntax and schema compliance
Missing features: Ensure CLI was compiled with required feature flags
File not found: Verify file paths and permissions

Remote Execution:

Connection timeouts: Check network connectivity and firewall rules
Authentication failures: Verify API key configuration
Executor busy: Only one pipeline runs at a time per executor
Memory errors: Increase --max-memory or optimize pipeline queries

ODBC Issues:

Driver not found: Install database-specific ODBC drivers
Connection failures: Verify DSN configuration in odbc.ini
Permission errors: Check database credentials and network access

Performance Optimization¶

Memory Management

Set appropriate --max-memory limits for executors
Break large queries into smaller stages
Add filtering early in the pipeline
Use partitioning for large datasets

Network Optimization

Deploy executors close to data sources
Use cloud storage in the same region as executors
Minimize data movement between stages