Skip to content

Aqueduct

This is a generated JSONSchema reference for the Aqueducts configuration.

Title: Aqueduct

Type object
Additional properties Any type: allowed

Description: Definition for an Aqueduct data pipeline

Property Pattern Type Title/Description
+ sources No array Definition of the data sources for this pipeline
+ stages No array of array A sequential list of transformations to execute within the context of this pipeline Nested stages are executed in parallel
- destination No Combination Destination for the final step of the `Aqueduct` takes the last stage as input for the write operation

Required Field: sources

Type array

Description: Definition of the data sources for this pipeline

Each item of this array must be Description
Source A data source that can be either a delta table ('delta'), a 'file', a 'directory' or an 'odbc' connection

Field: Source

Type combining
Additional properties Any type: allowed
Defined in #/definitions/Source

Description: A data source that can be either a delta table (delta), a file, a directory or an odbc connection

One of(Option)
item 0
item 1
item 2
item 3
item 4

Field: item 0

Type object
Additional properties Any type: allowed

Description: An in-memory source

Property Pattern Type Title/Description
+ type No enum (of string) -
+ name No string Name of the in-memory table, existence will be checked at runtime

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "InMemory"


Field: name
Type string

Description: Name of the in-memory table, existence will be checked at runtime

Field: item 1

Type object
Additional properties Any type: allowed

Description: A delta table source

Property Pattern Type Title/Description
+ type No enum (of string) -
+ name No string Name of the delta source, will be the registered table name in the SQL context
+ location No string A URL or Path to the location of the delta table Supports relative local paths
- version_ts No string or null A RFC3339 compliant timestamp to load the delta table state at a specific point in time Used for deltas time traveling feature
- storage_options No object Storage options for the delta table Please reference the delta-rs github repo for more information on available keys (e.g. https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/storage.rs) additionally also reference the `object_store` docs (e.g. https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Delta"


Field: name
Type string

Description: Name of the delta source, will be the registered table name in the SQL context


Field: location
Type string
Format uri

Description: A URL or Path to the location of the delta table Supports relative local paths


Field: version_ts
Type string or null
Format date-time

Description: A RFC3339 compliant timestamp to load the delta table state at a specific point in time Used for deltas time traveling feature


Field: storage_options
Type object
Additional properties Should-conform
Default {}

Description: Storage options for the delta table Please reference the delta-rs github repo for more information on available keys (e.g. https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/storage.rs) additionally also reference the object_store docs (e.g. https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)

Property Pattern Type Title/Description
- No string -

Field: additionalProperties
Type string

Description: No description...

Field: item 2

Type object
Additional properties Any type: allowed

Description: A file source

Property Pattern Type Title/Description
+ type No enum (of string) -
+ name No string Name of the file source, will be the registered table name in the SQL context
+ file_type No object File type of the file to be ingested Supports `Parquet` for parquet files, `Csv` for CSV files and `Json` for JSON files
+ location No string A URL or Path to the location of the delta table Supports relative local paths
- storage_options No object Storage options for the delta table Please reference the delta-rs github repo for more information on available keys (e.g. https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/storage.rs) additionally also reference the `object_store` docs (e.g. https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "File"


Field: name
Type string

Description: Name of the file source, will be the registered table name in the SQL context


Field: file_type
Type combining
Additional properties Any type: allowed
Defined in

Description: File type of the file to be ingested Supports Parquet for parquet files, Csv for CSV files and Json for JSON files

One of(Option)
item 0
item 1
item 2
Field: item 0
Type object
Additional properties Any type: allowed

Description: Parquet source options

Property Pattern Type Title/Description
+ type No enum (of string) -
+ options No object -

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Parquet"


Field: options
Type object
Additional properties Any type: allowed
Defined in #/definitions/ParquetSourceOptions

Description: No description...

Field: item 1
Type object
Additional properties Any type: allowed

Description: Csv source options

Property Pattern Type Title/Description
+ type No enum (of string) -
+ options No object -

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Csv"


Field: options
Type object
Additional properties Any type: allowed
Defined in #/definitions/CsvSourceOptions

Description: No description...

Property Pattern Type Title/Description
- has_header No boolean or null set to `true` to treat first row of CSV as the header column names will be inferred from the header, if there is no header the column names are `column_1, column_2, ... column_x`
- delimiter No string or null set a delimiter character to read this CSV with

Field: has_header
Type boolean or null

Description: set to true to treat first row of CSV as the header column names will be inferred from the header, if there is no header the column names are column_1, column_2, ... column_x


Field: delimiter
Type string or null

Description: set a delimiter character to read this CSV with

Restrictions
Min length 1
Max length 1
Field: item 2
Type object
Additional properties Any type: allowed

Description: Json source options

Property Pattern Type Title/Description
+ type No enum (of string) -
+ options No object -

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Json"


Field: options
Type object
Additional properties Any type: allowed
Defined in #/definitions/JsonSourceOptions

Description: No description...


Field: location
Type string
Format uri

Description: A URL or Path to the location of the delta table Supports relative local paths


Field: storage_options
Type object
Additional properties Should-conform
Default {}

Description: Storage options for the delta table Please reference the delta-rs github repo for more information on available keys (e.g. https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/storage.rs) additionally also reference the object_store docs (e.g. https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)

Property Pattern Type Title/Description
- No string -

Field: additionalProperties
Type string

Description: No description...

Field: item 3

Type object
Additional properties Any type: allowed

Description: A directory source

Property Pattern Type Title/Description
+ type No enum (of string) -
+ name No string Name of the directory source, will be the registered table name in the SQL context
+ file_type No object File type of the file to be ingested Supports `Parquet` for parquet files, `Csv` for CSV files and `Json` for JSON files
+ location No string A URL or Path to the location of the delta table Supports relative local paths
- storage_options No object Storage options for the delta table Please reference the delta-rs github repo for more information on available keys (e.g. https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/storage.rs) additionally also reference the `object_store` docs (e.g. https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Directory"


Field: name
Type string

Description: Name of the directory source, will be the registered table name in the SQL context


Field: file_type
Type combining
Additional properties Any type: allowed
Defined in

Description: File type of the file to be ingested Supports Parquet for parquet files, Csv for CSV files and Json for JSON files

One of(Option)
item 0
item 1
item 2
Field: item 0
Type object
Additional properties Any type: allowed

Description: Parquet source options

Property Pattern Type Title/Description
+ type No enum (of string) -
+ options No object -

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Parquet"


Field: options
Type object
Additional properties Any type: allowed
Defined in #/definitions/ParquetSourceOptions

Description: No description...

Field: item 1
Type object
Additional properties Any type: allowed

Description: Csv source options

Property Pattern Type Title/Description
+ type No enum (of string) -
+ options No object -

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Csv"


Field: options
Type object
Additional properties Any type: allowed
Defined in #/definitions/CsvSourceOptions

Description: No description...

Property Pattern Type Title/Description
- has_header No boolean or null set to `true` to treat first row of CSV as the header column names will be inferred from the header, if there is no header the column names are `column_1, column_2, ... column_x`
- delimiter No string or null set a delimiter character to read this CSV with

Field: has_header
Type boolean or null

Description: set to true to treat first row of CSV as the header column names will be inferred from the header, if there is no header the column names are column_1, column_2, ... column_x


Field: delimiter
Type string or null

Description: set a delimiter character to read this CSV with

Restrictions
Min length 1
Max length 1
Field: item 2
Type object
Additional properties Any type: allowed

Description: Json source options

Property Pattern Type Title/Description
+ type No enum (of string) -
+ options No object -

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Json"


Field: options
Type object
Additional properties Any type: allowed
Defined in #/definitions/JsonSourceOptions

Description: No description...


Field: location
Type string
Format uri

Description: A URL or Path to the location of the delta table Supports relative local paths


Field: storage_options
Type object
Additional properties Should-conform
Default {}

Description: Storage options for the delta table Please reference the delta-rs github repo for more information on available keys (e.g. https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/storage.rs) additionally also reference the object_store docs (e.g. https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)

Property Pattern Type Title/Description
- No string -

Field: additionalProperties
Type string

Description: No description...

Field: item 4

Type object
Additional properties Any type: allowed

Description: An ODBC source

Property Pattern Type Title/Description
+ type No enum (of string) -
+ name No string Name of the ODBC source, will be the registered table name in the SQL context
+ query No string Query to execute when fetching data from the ODBC connection This query will execute eagerly before the data is processed by the pipeline Size of data returned from the query cannot exceed work memory
+ connection_string No string ODBC connection string Please reference the respective database connection string syntax (e.g. https://www.connectionstrings.com/postgresql-odbc-driver-psqlodbc/)

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Odbc"


Field: name
Type string

Description: Name of the ODBC source, will be the registered table name in the SQL context


Field: query
Type string

Description: Query to execute when fetching data from the ODBC connection This query will execute eagerly before the data is processed by the pipeline Size of data returned from the query cannot exceed work memory


Field: connection_string
Type string

Description: ODBC connection string Please reference the respective database connection string syntax (e.g. https://www.connectionstrings.com/postgresql-odbc-driver-psqlodbc/)


Required Field: stages

Type array of array

Description: A sequential list of transformations to execute within the context of this pipeline Nested stages are executed in parallel

Each item of this array must be Description
stages items -

Field: stages items

Type array

Description: No description...

Each item of this array must be Description
Stage Definition for a processing stage in an Aqueduct Pipeline

Field: Stage

Type object
Additional properties Any type: allowed
Defined in #/definitions/Stage

Description: Definition for a processing stage in an Aqueduct Pipeline

Property Pattern Type Title/Description
+ name No string Name of the stage, used as the table name for the result of this stage
+ query No string SQL query that is executed against a datafusion context. Check the datafusion SQL reference for more information https://datafusion.apache.org/user-guide/sql/index.html
- show No integer or null When set to a value of up to `usize`, will print the result of this stage to the stdout limited by the number Set value to 0 to not limit the outputs
- explain No boolean When set to 'true' the stage will output the query execution plan
- explain_analyze No boolean When set to 'true' the stage will output the query execution plan with added execution metrics
- print_schema No boolean When set to 'true' the stage will pretty print the output schema of the executed query

Field: name
Type string

Description: Name of the stage, used as the table name for the result of this stage


Field: query
Type string

Description: SQL query that is executed against a datafusion context. Check the datafusion SQL reference for more information https://datafusion.apache.org/user-guide/sql/index.html


Field: show
Type integer or null
Format uint

Description: When set to a value of up to usize, will print the result of this stage to the stdout limited by the number Set value to 0 to not limit the outputs

Restrictions
Minimum N/A

Field: explain
Type boolean
Default false

Description: When set to 'true' the stage will output the query execution plan


Field: explain_analyze
Type boolean
Default false

Description: When set to 'true' the stage will output the query execution plan with added execution metrics


Field: print_schema
Type boolean
Default false

Description: When set to 'true' the stage will pretty print the output schema of the executed query


Optional Field: destination

Type combining
Additional properties Any type: allowed

Description: Destination for the final step of the Aqueduct takes the last stage as input for the write operation

Any of(Option)
Destination
item 1

Field: Destination

Type combining
Additional properties Any type: allowed
Defined in #/definitions/Destination

Description: Target output for the Aqueduct table

One of(Option)
item 0
item 1
item 2
item 3

Field: item 0

Type object
Additional properties Any type: allowed

Description: An in-memory destination

Property Pattern Type Title/Description
+ type No enum (of string) -
+ name No string Name to register the table with in the provided `SessionContext`

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "InMemory"


Field: name
Type string

Description: Name to register the table with in the provided SessionContext

Field: item 1

Type object
Additional properties Any type: allowed

Description: A delta table destination

Property Pattern Type Title/Description
+ type No enum (of string) -
+ name No string Name of the table
+ location No string Location of the table as a URL e.g. file:///tmp/delta_table/, s3://bucket_name/delta_table
- storage_options No object DeltaTable storage options
+ table_properties No object DeltaTable table properties: https://docs.delta.io/latest/table-properties.html
+ write_mode No object Columns that will be used to determine uniqueness during merge operation Supported types: All primitive types and lists of primitive types
+ partition_cols No array of string Columns to partition table by

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Delta"


Field: name
Type string

Description: Name of the table


Field: location
Type string
Format uri

Description: Location of the table as a URL e.g. file:///tmp/delta_table/, s3://bucket_name/delta_table


Field: storage_options
Type object
Additional properties Should-conform
Default {}

Description: DeltaTable storage options

Property Pattern Type Title/Description
- No string -

Field: additionalProperties
Type string

Description: No description...


Field: table_properties
Type object
Additional properties Should-conform

Description: DeltaTable table properties: https://docs.delta.io/latest/table-properties.html

Property Pattern Type Title/Description
- No string or null -

Field: additionalProperties
Type string or null

Description: No description...


Field: write_mode
Type combining
Additional properties Any type: allowed
Defined in

Description: Columns that will be used to determine uniqueness during merge operation Supported types: All primitive types and lists of primitive types

One of(Option)
item 0
item 1
item 2
Field: item 0
Type object
Additional properties Any type: allowed

Description: Append: appends data to the Destination

Property Pattern Type Title/Description
+ operation No enum (of string) -

Field: operation
Type enum (of string)

Description: No description...

Must be one of: * "Append"

Field: item 1
Type object
Additional properties Any type: allowed

Description: Upsert: upserts data to the Destination using the specified merge columns

Property Pattern Type Title/Description
+ operation No enum (of string) -
+ params No array of string -

Field: operation
Type enum (of string)

Description: No description...

Must be one of: * "Upsert"


Field: params
Type array of string

Description: No description...

Each item of this array must be Description
params items -
Field: params items
Type string

Description: No description...

Field: item 2
Type object
Additional properties Any type: allowed

Description: Replace: replaces data to the Destination using the specified ReplaceConditions

Property Pattern Type Title/Description
+ operation No enum (of string) -
+ params No array -

Field: operation
Type enum (of string)

Description: No description...

Must be one of: * "Replace"


Field: params
Type array

Description: No description...

Each item of this array must be Description
ReplaceCondition Condition used to build a predicate by which data should be replaced in a 'Destination' Expression built is checking equality for the given 'value' of a field with 'field_name'
Field: ReplaceCondition
Type object
Additional properties Any type: allowed
Defined in #/definitions/ReplaceCondition

Description: Condition used to build a predicate by which data should be replaced in a Destination Expression built is checking equality for the given value of a field with field_name

Property Pattern Type Title/Description
+ column No string -
+ value No string -

Field: column
Type string

Description: No description...


Field: value
Type string

Description: No description...


Field: partition_cols
Type array of string

Description: Columns to partition table by

Each item of this array must be Description
partition_cols items -
Field: partition_cols items
Type string

Description: No description...

Field: item 2

Type object
Additional properties Any type: allowed

Description: A file output destination

Property Pattern Type Title/Description
+ type No enum (of string) -
+ name No string Name of the file to write
+ location No string Location of the file as a URL e.g. file:///tmp/output.csv, s3://bucket_name/prefix/output.parquet, s3:://bucket_name/prefix
+ file_type No object File type, supported types are Parquet and CSV
- single_file No boolean Describes whether to write a single file (can be used to overwrite destination file)
- partition_cols No array of string Columns to partition table by
- storage_options No object Object store storage options

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "File"


Field: name
Type string

Description: Name of the file to write


Field: location
Type string
Format uri

Description: Location of the file as a URL e.g. file:///tmp/output.csv, s3://bucket_name/prefix/output.parquet, s3:://bucket_name/prefix


Field: file_type
Type combining
Additional properties Any type: allowed
Defined in

Description: File type, supported types are Parquet and CSV

One of(Option)
item 0
item 1
item 2
Field: item 0
Type object
Additional properties Any type: allowed

Description: Parquet options map, please refer to https://docs.rs/datafusion-common/latest/datafusion_common/config/struct.TableParquetOptions.html for possible options

Property Pattern Type Title/Description
+ type No enum (of string) -
+ options No object -

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Parquet"


Field: options
Type object
Additional properties Should-conform

Description: No description...

Property Pattern Type Title/Description
- No string -

Field: additionalProperties
Type string

Description: No description...

Field: item 1
Type object
Additional properties Any type: allowed

Description: CSV options

Property Pattern Type Title/Description
+ type No enum (of string) -
+ options No object Csv options

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Csv"


Field: options
Type object
Additional properties Any type: allowed
Defined in #/definitions/CsvDestinationOptions

Description: Csv options

Property Pattern Type Title/Description
- has_header No boolean or null Defaults to true, sets a header for the CSV file
- delimiter No string or null Defaults to `,`, sets the delimiter char for the CSV file

Field: has_header
Type boolean or null

Description: Defaults to true, sets a header for the CSV file


Field: delimiter
Type string or null

Description: Defaults to ,, sets the delimiter char for the CSV file

Restrictions
Min length 1
Max length 1
Field: item 2
Type object
Additional properties Any type: allowed

Description: Json destination, no supported options

Property Pattern Type Title/Description
+ type No enum (of string) -

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Json"


Field: single_file
Type boolean
Default false

Description: Describes whether to write a single file (can be used to overwrite destination file)


Field: partition_cols
Type array of string
Default []

Description: Columns to partition table by

Each item of this array must be Description
partition_cols items -
Field: partition_cols items
Type string

Description: No description...


Field: storage_options
Type object
Additional properties Should-conform
Default {}

Description: Object store storage options

Property Pattern Type Title/Description
- No string -

Field: additionalProperties
Type string

Description: No description...

Field: item 3

Type object
Additional properties Any type: allowed

Description: An ODBC insert query to write to a DB table

Property Pattern Type Title/Description
+ type No enum (of string) -
+ name No string Name of the ODBC destination
+ connection_string No string ODBC connection string Please reference the respective database connection string syntax (e.g. https://www.connectionstrings.com/postgresql-odbc-driver-psqlodbc/)
+ write_mode No object Strategy for performing ODBC write operation
+ batch_size No integer batch size (rows) to use when inserting data

Field: type
Type enum (of string)

Description: No description...

Must be one of: * "Odbc"


Field: name
Type string

Description: Name of the ODBC destination


Field: connection_string
Type string

Description: ODBC connection string Please reference the respective database connection string syntax (e.g. https://www.connectionstrings.com/postgresql-odbc-driver-psqlodbc/)


Field: write_mode
Type combining
Additional properties Any type: allowed
Defined in

Description: Strategy for performing ODBC write operation

One of(Option)
item 0
item 1
Field: item 0
Type object
Additional properties Any type: allowed

Description: Append: appends data to the Destination

Property Pattern Type Title/Description
+ operation No enum (of string) -

Field: operation
Type enum (of string)

Description: No description...

Must be one of: * "Append"

Field: item 1
Type object
Additional properties Any type: allowed

Description: Custom: Inserts data with a prepared stament. Option to perform any number of (non-insert) preliminary statements

Property Pattern Type Title/Description
+ operation No enum (of string) -
+ transaction No object SQL statements for `Custom` write mode.

Field: operation
Type enum (of string)

Description: No description...

Must be one of: * "Custom"


Field: transaction
Type object
Additional properties Any type: allowed
Defined in #/definitions/CustomStatements

Description: SQL statements for Custom write mode.

Property Pattern Type Title/Description
- pre_insert No string or null Optional (non-insert) preliminary statement
+ insert No string Insert prepared statement

Field: pre_insert
Type string or null

Description: Optional (non-insert) preliminary statement


Field: insert
Type string

Description: Insert prepared statement


Field: batch_size
Type integer
Format uint

Description: batch size (rows) to use when inserting data

Restrictions
Minimum N/A

Field: item 1

Type null

Description: No description...


Generated using json-schema-for-humans