API v1 - Metadata
    • Dark
      Light

    API v1 - Metadata

    • Dark
      Light

    Article Summary

    Note
    • This endpoint can only be accessed on Matillion ETL version 1.54.7 and onwards.
    • This endpoint is an Enterprise Mode feature.
    • This endpoint is currently only available for Matillion ETL for Snowflake and Matillion ETL for Redshift.
    • Credentials on COPY SQL statements will be redacted before being logged.

    Overview

    The Matillion ETL Metadata API lets you determine where data in Matillion ETL jobs has originated from. SQL queries that update data (INSERT, UPDATE, DELETE) or that alter database structures (for example, when creating a table) are logged in a single database table, along with metadata pertinent to the orchestration or transformation job, as well as task information for the component.

    This endpoint allows you to retrieve the SQL queries executed by a specific project against a specific environment. SQL queries that execute DDL or DML operations as part of a job run will be logged, as well as SELECT queries run on source systems as part of data ingestion.

    This information can be used to determine the lineage of data staged by Matillion ETL and how this is transformed into new target tables in your cloud data warehouse.

    Note

    Entries older than seven days are removed daily from the table.


    Record field explanations

    Below is an example JSON block returned from the server when the endpoint is called. Beneath the JSON block is an explanation of each field.

    {
    "taskType": "Run",
    "loggedTimestamp": 1619017102402,
    "startTimestamp": 1619017086243,
    "endTimestamp": 1619017102383,
    "urlPath": "/#test/test/default/lin_orch",
    "componentName": "Rewrite Table",
    "componentCanvasName": "Rewrite Table 0",
    "source": [
    {
    "url": "matillion.com-1",
    "fqn" : "db_test.schema.text.fct_table_test"
    }
    ],
    "target": [
    {
    "url": "matillion.com-1",
    "fqn" : "db_test.schema.text.fct_table_test"
    }
    ],
    "sourceConnectionUrl": "Not Applicable",
    "sourceUsername": "Not Applicable",
    "environment": {
    "name": "test",
    "url": "matillion",
    "schema": "PUBLIC",
    "database": "PRODUCT",
    "user": "username",
    "connectionOptions": {}
    },
    "job": {
    "name": "test",
    "type": "TRANSFORMATION"
    },
    "sql": [
    {
    "query": "CREATE TABLE \"FROSTY_BORG\".\"PUBLIC\".\"TEST_NEW_2\" AS SELECT \n  * \nFROM (SELECT \n  \"a\" \nFROM \"FROSTY_BORG\".\"PUBLIC\".\"test\") \nORDER BY \"a\" DESC",
    "dialect": "Snowflake"
    }
    ]
    },
    
    Field nameData typeDescription
    taskTypeStringDescribes how the component was run, using the display name in the DatabaseTaskBatchType enumeration. For example, "Run", "Schedule", "API".
    loggedTimestampNumberThe time in milliseconds (ms) denoting when the line was logged.
    startTimestampNumberValue in milliseconds (ms) denoting when the component started a run.
    endTimestampNumberValue in milliseconds (ms) denoting when the component completed a run.
    urlPathStringThe path appended to the host. Used for accessing the orchestration or transformation job containing this component.
    componentNameStringThe name of the underlying Matillion ETL component. For example, "Truncate Table".
    componentCanvasNameStringThe name of the component as defined by the user in the component's Name property.
    SourceArrayIncludes the source URL (url) and fully qualified name (fqn) of the data source, to provide lineage information about Matillion ETL orchestration and transformation jobs.
    TargetArrayIncludes the target URL (url) and fully qualified name (fqn) of the data target, to provide lineage information about Matillion ETL orchestration and transformation jobs.
    sourceConnectionUrlStringThe connection URL configured on certain components. This parameter is restricted to a select few components, and will have a value of "Not Applicable" elsewhere.
    sourceUsernameStringThe username for the data connection configured on certain components. This parameter is restricted to a select few components, and will have a value of "Not Applicable" elsewhere.
    environmentObjectDetails of the Matillion ETL environment used to execute the job, such as which Cloud Data Warehouse (CDW) database and schema are the defaults.
    jobObjectA block of JSON data containing information about the Matillion ETL job, including name and type ("ORCHESTRATION" or "TRANSFORMATION").
    SQLArrayAll of the SQL related to the running of the component, presented in tuples and including the SQL dialect.

    Lineage log API endpoints

    API base URI

    http(s)://<host>/rest/v1/group/name/<projectGroupName>/project/name/<projectName>/environment/name/<environmentName>
    

    API endpoint and function

    The Lineage API endpoint is available on standard RESTful APIs that use HTTP or HTTPS requests to GET data. The Lineage API service is accessed via the Uniform Resource Identifier (URI). All reference below will assume the API Base URI has been specified.

    The available API endpoint is listed below.

    MethodPathURIFunction
    GET/lineage/lineage?jobType=<type>&startTimestamp=<value>&endTimestamp=<value>Get a lineage log of each component in a Matillion ETL job.

    Full endpoint example:

    https://host.com/rest/v1/group/name/ProjectGroupName/project/name/ProjectName/environment/name/EnvironmentName/lineage?jobType=orchestration&startTimestamp=1619017086243&endTimestamp=1619017102383
    

    API parameters and description

    Below you will find the endpoint parameters and a description of each.

    ParameterDescriptionCondition
    jobTypeFilters the type of job to be returned in the response. jobType="orchestration" returns only orchestration jobs; jobType="transformation" returns only transformation jobs.Optional
    startTimestampThe earliest possible logged time for records to be returned. Takes the format of UTC time in milliseconds.Required
    endTimestampThe latest possible logged time for records to be returned. Takes the format of UTC time in milliseconds.Optional

    When run, the endpoint returns a list, in ascending order, of records logged after (and including) the startTimestamp parameter, and before (and including) the endTimestamp parameter. If no endTimestamp parameter is included, all records from the startTimestamp parameter to the current time are included. The list will be filtered by job type (orchestration or transformation) if the optional jobType parameter is included.

    By default, no more than 30,000 records can be returned per call. However, this limit can be modified if required. Please contact support if you need to modify this limit.