Skip to content

Releases: MarquezProject/marquez

Marquez 0.31.0

16 Feb 22:19
Compare
Choose a tag to compare

Added

  • UI: add facet view enhancements #2336 @tito12
    Creates a dynamic component offering the ability to navigate and search the JSON, expand sections and click on links.
  • UI: highlight selected path on graph and display status of jobs and datasets based on last 14 runs or latest quality facets #2384 @tito12
    Adds highlighting of the visual graph based on upstream and downstream dependencies of selected nodes; makes displayed status reflect last 14 runs the case of jobs and latest quality facets in the case of datasets.
  • UI: enable auto-accessibility feature on graph nodes #2388 @merobi-hub
    Adds attributes to the FontAwesomeIcons to enable a built-in accessibility feature.

Fixed

  • API: add index to jobs_fqn table using namespace_name and job_fqn columns #2357 @collado-mike
    Optimizes read queries by adding an index to this table.
  • API: add missing indices to column_lineage, dataset_facets, job_facets tables #2419 @pawel-big-lebowski
    Creates missing indices on reference columns in a number of database tables.
  • Spec: make data version and dataset types the same #2400 @phixme
    Makes the fields property the same for datasets and dataset versions, allowing type-generating systems to treat them the same way.
  • UI: show location button only when link to code exists #2409 @tito12
    Makes the button visible only if the link is not empty.

Marquez 0.30.0

31 Jan 22:47
Compare
Choose a tag to compare

Added

  • Proposals: add proposal for OL facet tables #2076 @wslulciuc
    Adds the proposal Optimize query performance for OpenLineage facets.
  • UI: display column lineage of a dataset #2293 @pawel-big-lebowski @tito12
    Adds a JSON preview of column-level lineage of a selected dataset to the UI.
  • UI: Add soft delete option to UI #2343 @tito12
    Adds option to soft delete a data record with a dialog component and double confirmation.
  • API: split lineage_events table to dataset_facets, run_facets, and job_facets tables. 2350, 2355, 2359
    @wslulciuc, @pawel-big-lebowski
    Performance improvement storing and querying facets.
    Migration procedure requires manual steps if database has more than 100K lineage events.
    We highly encourage users to review our migration plan.
  • Docker: add new script for stopping Docker #2380 @rossturk
    Provides a clean way to stop a deployment via docker-compose down.
  • Docker: seed data for column lineage #2381 @rossturk
    Adds some ColumnLineageDatasetFacet JSON snippets to docker/metadata.json to seed data for column-level lineage facets.

Fixed

  • API: validate RunLink and JobLink #2342 @pawel-big-lebowski
    Fixes validation of the ParentRunFacet to avoid NullPointerExceptions in the case of empty run sections.
  • Docker: use docker-compose.web.yml as base compose file #2360 @wslulciuc
    Fixes the Marquez HTTP server set in docker/up.sh so the script uses docker-compose.web.yml with overrides for dev set via docker-compose.web-dev.yml.
  • Docs: update copyright headers #2353 @merobi-hub
    Updates the headers with the current year.
  • Chart: fix Helm chart #2374 @perttus
    Fixes minor issues with the Helm chart.
  • Spec: update dataset version API spec #2389 @phixme
    Adds limit and offset to the openapi.yml spec file as query parameters.

Marquez 0.29.0

19 Dec 19:22
Compare
Choose a tag to compare

Added

Fixed

Marquez 0.28.0

21 Nov 21:21
Compare
Choose a tag to compare

Added

Fixed

Marquez 0.27.0

24 Oct 20:24
Compare
Choose a tag to compare

Added

Changed

Fixed

Marquez 0.26.0

15 Sep 19:07
Compare
Choose a tag to compare

Added

  • Update FlywayFactory to support an argument to customize the schema programatically #2055 @collado-mike
    Note: this change does not aim to support custom schemas from configuration.
  • Add steps on proposing changes to Marquez #2065 @wslulciuc
    Adds steps on how to submit a proposal for review along with a design doc template.
  • Add --metadata option to seed backend with OpenLineage events #2082 @wslulciuc
    Updates the seed command to load metadata from a file containing an array of OpenLineage events via the --metadata option. (Metadata used in the command was not being defined using the OpenLineage standard.)
  • Improve documentation on nodeId in the spec #2084 @howardyoo
    Adds complete examples of nodeId to the spec.
  • Add metadata cmd #2091 @wslulciuc
    Adds cmd metadata to generate OpenLineage events; generated events will be saved to a file called metadata.json that can be used to seed Marquez via the seed cmd. (We lacked a way to performance test the data model of Marquez with significantly large OL events.)
  • Add possibility to soft-delete datasets and jobs #2032 #2099 #2101 @mobuchowski
    Adds the ability to "hide" inactive datasets and jobs through the UI. (This PR does not include the UI part.) The feature works by adding an is_hidden flag to both datasets and jobs tables. Then, it changes jobs_view and adds datasets_view, which hides rows where the is_hidden flag is set to True. This makes writing proper queries easier since there is no need to do this filtering manually. The soft-delete is reversed if the job or dataset is updated again because the new version reverts the flag.
  • Add raw OpenLineage events API #2070 @mobuchowski
    Adds an API that returns raw OpenLineage events sorted by time and optionally filtered by namespace. Filtering by namespace takes into account both job and dataset namespaces.
  • Create column lineage endpoint proposal #2077 @julienledem @pawel-big-lebowski
    Adds a proposal to implement a column-level lineage endpoint in Marquez to leverage the column-level lineage facet in OpenLineage.

Changed

  • Update lineage query to only look at jobs with inputs or outputs #2068 @collado-mike
    Changes the lineage query to query the job_versions_io_mapping table and INNER join with the jobs_view so that only jobs that have inputs or outputs are present in the jobs_io CTE. Hence, the table becomes very small and the recursive join in the lineage CTE very fast. (In many environments, a large number of jobs reporting events have no inputs or outputs - e.g., PythonOperators in an Airflow deployment. If a Marquez installation has many of these, the lineage query spends much of its time searching for overlaps with jobs that have no inputs or outputs.)
  • Persist OpenLineage event before updating Marquez model #2069 @fm100
    Switches the order of the code in order to persist the OpenLineage event first and then update the Marquez model. (When the RunTransitionListener was invoked, the OpenLineage event was not persisted to the database. Because the OpenLineage event is the source of truth for all Marquez run transitions, it should be available from RunTransitionListener.)
  • Drop requirement to provide marquez.yml for seed cmd #2094 @wslulciuc
    Uses io.dropwizard.cli.Command instead of io.dropwizard.cli.ConfiguredCommand to no longer require passing marquez.yml as an argument to the seed cmd. (The marquez.yml argument is not used in the seed cmd.)

Fixed

  • Fix/rewrite jobs fqn locks #2067 @collado-mike
    Updates the function to only update the table if the job is a new record or if the symlink_target_uuid is distinct from the previous value. (The rewrite_jobs_fqn_table function was inadvertently updating jobs even when no metadata about the job had changed. Under load, this caused significant locking issues, as the jobs_fqn table must be locked for every job update.)
  • Fix enum string types in the OpenAPI spec #2086 @studiosciences
    Changes the type to string. (type: enum was not valid in OpenAPI spec.)
  • Fix incorrect PostgresSQL version #2089 @jabbera
    Corrects the tag for PostgresSQL.
  • Update OpenLineageDao to handle Airflow run UUID conflicts #2097 @collado-mike
    Alleviates the problem for Airflow installations that will continue to publish events with the older OpenLineage library. This checks the namespace of the parent run and verifies that it matches the namespace in the ParentRunFacet. If not, it generates a new parent run ID that will be written with the correct namespace. (The Airflow integration was generating conflicting UUIDs based on the DAG name and the DagRun ID without accounting for different namespaces. In Marquez installations that have multiple Airflow deployments with duplicated DAG names, we generated jobs whose parents have the wrong namespace.)

Marquez 0.25.0

08 Aug 20:29
Compare
Choose a tag to compare

Fixed

Marquez 0.24.0

02 Aug 18:45
Compare
Choose a tag to compare

Added

  • Add copyright lines to all source files #1996 @merobi-hub
  • Add copyright and license guidelines in CONTRIBUTING.md @wslulciuc
  • Add @FlywayTarget annotation to migration tests to control flyway upgrades #2035 @collado-mike

Changed

Fixed

Marquez 0.23.0

16 Jun 20:31
Compare
Choose a tag to compare

Added

Changed

  • Set default limit for listing datasets and jobs in UI from 2000 to 25 #2018 @wslulciuc

Fixed

Marquez 0.22.0

16 May 23:22
Compare
Choose a tag to compare

Added

  • Add support for LifecycleStateChangeFacet with an ability to softly delete datasets #1847 @pawel-big-lebowski
  • Enable pod specific annotations in Marquez Helm Chart via marquez.podAnnotations #1945 @wslulciuc
  • Add support for job renaming/redirection via symlink #1947 @collado-mike
  • Add Created by view for dataset versions along with SQL syntax highlighting in web UI #1929 @phixMe
  • Add operationId to openapi spec #1978 @phixMe

Changed

Fixed