Skip to content

Commit

Permalink
Add unified model docs
Browse files Browse the repository at this point in the history
  • Loading branch information
agnessnowplow committed Nov 16, 2023
1 parent 4d19bcc commit b33d647
Show file tree
Hide file tree
Showing 42 changed files with 372 additions and 319 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,18 @@ import {versions} from '@site/src/componentVersions';
| snowplow-web version | dbt versions | BigQuery | Databricks | Redshift | Snowflake | Postgres |
| -------------------------- | ------------------- | :------: | :--------: | :------: | :-------: | :------: |
| ${versions.dbtSnowplowWeb} | >=1.5.0 to <2.0.0 ||||||
| 0.15.2 | >=1.4.0 to <2.0.0 ||||||
| 0.13.3* | >=1.3.0 to <2.0.0 ||||||
| 0.15.2 | >=1.4.0 to <2.0.0 |||||* |
| 0.13.3** | >=1.3.0 to <2.0.0 ||||||
| 0.11.0 | >=1.0.0 to <1.3.0 ||||||
| 0.5.1 | >=0.20.0 to <1.0.0 ||||||
| 0.4.1 | >=0.18.0 to <0.20.0 ||||||
`} remarkPlugins={[remarkGfm]} />

<span style={{'font-size':'80%'}}>

^ Since version 0.15.0 of `snowplow_web` at least version 15.0 of Postgres is required, otherwise you will need to [overwrite](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-operation/macros-and-keys/index.md#overriding-macros) the `default_channel_group` macro to not use the `regexp_like` function.
\* Since version 0.15.0 of `snowplow_web` at least version 15.0 of Postgres is required, otherwise you will need to [overwrite](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-operation/macros-and-keys/index.md#overriding-macros) the `default_channel_group` macro to not use the `regexp_like` function.

\* From version v0.13.0 onwards we use the `load_tstamp` field so you must be using [RDB Loader](/docs/pipeline-components-and-applications/loaders-storage-targets/snowplow-rdb-loader/index.md) v4.0.0 and above, or [BigQuery Loader](/docs/pipeline-components-and-applications/loaders-storage-targets/snowplow-rdb-loader/index.md) v1.0.0 and above. If you do not have this field because you are not using these versions, or you are using the Postgres loader, you will need to set `snowplow__enable_load_tstamp` to `false` in your `dbt_project.yml` and will not be able to use the consent models.
** From version v0.13.0 onwards we use the `load_tstamp` field so you must be using [RDB Loader](/docs/pipeline-components-and-applications/loaders-storage-targets/snowplow-rdb-loader/index.md) v4.0.0 and above, or [BigQuery Loader](/docs/pipeline-components-and-applications/loaders-storage-targets/snowplow-rdb-loader/index.md) v1.0.0 and above. If you do not have this field because you are not using these versions, or you are using the Postgres loader, you will need to set `snowplow__enable_load_tstamp` to `false` in your `dbt_project.yml` and will not be able to use the consent models.
</span>

</TabItem>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ In all states the `upper_limit` is limited by the `snowplow__backfill_limit_days

If there are no enabled models already in the manifest then we process from the start date up to the backfill limit or now, whichever is older:

**`lower_limit`**: `snowplow__start_date`
**`lower_limit`**: `snowplow__start_date`
**`upper_limit`**: `least(current_tstamp, snowplow__start_date + snowplow__backfill_limit_days)`

```mermaid
Expand All @@ -118,7 +118,7 @@ gantt

If there are enabled models that aren't in the manifest table then a new model tagged with `snowplow_<package>_incremental` has been added since the last run; this can happen with a new custom model, or you have enabled some previously disabled custom modules. In this case the package will replay all previously processed events in order to back-fill the new model.

**`lower_limit`**: `snowplow__start_date`
**`lower_limit`**: `snowplow__start_date`
**`upper_limit`**: `least(max_last_success, snowplow__start_date + snowplow__backfill_limit_days)`

```mermaid
Expand All @@ -145,7 +145,7 @@ gantt

If the `min_last_success` is less than the `max_last_success` it means the tagged models are out of sync, for example due to a particular model failing to execute successfully during the previous run or as part of catching up on a new model. The package will attempt to sync all models as far as your backfill limit will allow.

**`lower_limit`**: `min_last_success - snowplow__lookback_window_hours`
**`lower_limit`**: `min_last_success - snowplow__lookback_window_hours`
**`upper_limit`**: `least(max_last_success, min_last_success + snowplow__backfill_limit_days)`

```mermaid
Expand Down Expand Up @@ -173,7 +173,7 @@ gantt

If none of the above criteria are met, then we consider it a 'standard run' where all models are in sync and we carry on from the last processed event.

**`lower_limit`**: `max_last_success - snowplow__lookback_window_hours`
**`lower_limit`**: `max_last_success - snowplow__lookback_window_hours`
**`upper_limit`**: `least(current_tstamp, max_last_success + snowplow__backfill_limit_days)`


Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ sidebar_position: 30
```mdx-code-block
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import ThemedImage from '@theme/ThemedImage';
```

:::tip
Expand All @@ -17,7 +18,7 @@ On this page, `<package>` can be one of: `web`, `mobile`, `unified`

Stitching users together is not an easy task: depending on the typical user journey the complexity can range from having individually identified (logged in) users, thus not having to do any extra modelling to never identified users mainly using the same common public device (e.g. school or library) where it is technically impossible to do any user stitching. As stitching is a reiterative process as it constantly needs to be updated after each incremental run for a desirably large range of data, compute power and extra expenses as well as time constraints may limit and dictate the best course of action.

**Session stitching**
#### Session stitching

For the out-of-the-box user stitching we opted for the sweet spot method: applying a logic that the majority of our users will benefit from while keeping in mind not to introduce compute-heavy calculations still reaping ideal benefits.

Expand All @@ -27,6 +28,49 @@ The `domain_userid`/`device_user_id` is cookie/device based and therefore expire

This mapping is applied to the sessions table by a post-hook which updates the `stitched_user_id` column with the latest mapping. If no mapping is present, the default value for `stitched_user_id` is the `domain_userid`/`device_user_id`. This process is known as session stitching, and effectively allows you to attribute logged-in and non-logged-in sessions back to a single user.



<Tabs groupId="dbt-packages" queryString>

<TabItem value="unified" label="Snowplow Unified" default>
<p align="center">
<ThemedImage
alt='Session stitching in the unified package'
sources={{
light: require('./images/session_stitching_light_unified.drawio.png').default,
dark: require('./images/session_stitching_dark_unified.drawio.png').default
}}
/>
</p>
</TabItem>

<TabItem value="web" label="Snowplow Web">
<p align="center">
<ThemedImage
alt='Session stitching in the web package'
sources={{
light: require('./images/session_stitching_light_web.drawio.png').default,
dark: require('./images/session_stitching_dark_web.drawio.png').default
}}
/>
</p>
</TabItem>

<TabItem value="mobile" label="Snowplow Mobile">
<p align="center">
<ThemedImage
alt='Session stitching in the mobile package'
sources={{
light: require('./images/session_stitching_light_mobile.drawio.png').default,
dark: require('./images/session_stitching_dark_mobile.drawio.png').default
}}
/>
</p>
</TabItem>

</Tabs>


If required, this update operation can be disabled by setting in your `dbt_project.yml` file (selecting one of web/mobile, or both, as appropriate):

```yml title="dbt_project.yml"
Expand All @@ -37,10 +81,30 @@ vars:
In the unified package and also in the web package, since version 0.16.0, it is also possible to stitch onto the page views table by setting the value of `snowplow__page_view_stitching` to `true`. It may be enough to apply this with less frequency than on sessions to keep costs down, by only enabling this at runtime (on the command line) on only some of the runs.

**Cross platform stitching**
#### Cross platform stitching

Since the arrival of the `snowplow_unified` package all the user data is modelled in one place. This makes it easy to effectively perform cross-platform stitching, which means that as soon as a user identifies themselves by logging in as the same user on separate platforms, all the user data will be found within one package making it really convenient for perform further analysis.

**Custom solutions**
#### Custom solutions

User mapping is typically not a 'one size fits all' exercise. Depending on your tracking implementation, business needs and desired level of sophistication you may want to write bespoke logic. Please refer to this [blog post](https://snowplow.io/blog/developing-a-single-customer-view-with-snowplow/) for ideas. In addition, the web and unified packages offer the possibility to change what field is used as your stitched user id, so instead of `user_id` you can use any field you wish (note that it will still be called `user_id` in your mapping table), and by taking advantage of the [custom sessionization and users](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/dbt-web-data-model/custom-sessionization-and-users/index.md) you can also change the field used as the `domain_user_id` (for the web model) or user_identifier (unified model). We plan to add support for these features to the mobile package in the future.

#### Overview

<p align="center">
<ThemedImage
alt='Overview of stitching scenarios'
sources={{
light: require('./images/stitching_scenarios.drawio.png').default,
dark: require('./images/stitching_scenarios.drawio.png').default
}}
/>
</p>

(1) it is most convenient to use the unified package so that all of these events will be modelled into the same derived tables regardless of platform

(2) if it is the same mobile/web device and the user identifies by logging in at a later stage while still retaining the same domain_userid/device_user_id, the model will update the stitched_user_id during session_stitching

(3) if it is the same mobile/web device and the user identifies by logging in while still retaining the same domain_userid/device_user_id, the model will update the stitched_user_id during session_stitching

(4) if it is the same mobile device cross-navigation tracking and stitching can be applied (coming soon!)
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem';

## Package Configuration Variables

This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file.
This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file. We have provided a [tool](#config-generator) below to help you with that.

:::caution

Expand Down Expand Up @@ -204,7 +204,11 @@ export const printYamlVariables = (data) => {
export const Template = ObjectFieldTemplateGroupsGenerator(GROUPS);
```

## Config Generator
You can use the below inputs to generate the code that you need to place into your `dbt_project.yml` file to configure the package as you require. Any values not specified will use their default values from the package.
## Config Generator
```mdx-code-block
import ConfigGenerator from "@site/docs/reusable/data-modeling/config-generator/_index.md"
<ConfigGenerator/>
```

<JsonApp schema={dbtSnowplowEcommerceConfigSchema} output={printYamlVariables} template={Template}/>
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import TabItem from '@theme/TabItem';

## Package Configuration Variables

This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file.
This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file. We have provided a [tool](#config-generator) below to help you with that.

:::caution

Expand Down Expand Up @@ -124,6 +124,12 @@ export const Template = ObjectFieldTemplateGroupsGenerator(GROUPS);
```

## Config Generator
You can use the below inputs to generate the code that you need to place into your `dbt_project.yml` file to configure the package as you require. Any values not specified will use their default values from the package.

```mdx-code-block
import ConfigGenerator from "@site/docs/reusable/data-modeling/config-generator/_index.md"
<ConfigGenerator/>
```


<JsonApp schema={dbtSnowplowFractributionConfigSchema} output={printYamlVariables} template={Template}/>
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem';

## Package Configuration Variables

This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file.
This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file. We have provided a [tool](#config-generator) below to help you with that.

:::caution

Expand Down Expand Up @@ -170,6 +170,11 @@ export const Template = ObjectFieldTemplateGroupsGenerator(GROUPS);
```

## Config Generator
You can use the below inputs to generate the code that you need to place into your `dbt_project.yml` file to configure the package as you require. Any values not specified will use their default values from the package.
```mdx-code-block
import ConfigGenerator from "@site/docs/reusable/data-modeling/config-generator/_index.md"
<ConfigGenerator/>
```


<JsonApp schema={dbtSnowplowMediaPlayerConfigSchema} output={printYamlVariables} template={Template}/>
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem';

## Package Configuration Variables

This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file.
This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file. We have provided a [tool](#config-generator) below to help you with that.

:::caution

Expand Down Expand Up @@ -194,6 +194,10 @@ export const Template = ObjectFieldTemplateGroupsGenerator(GROUPS);
```

## Config Generator
You can use the below inputs to generate the code that you need to place into your `dbt_project.yml` file to configure the package as you require. Any values not specified will use their default values from the package.
```mdx-code-block
import ConfigGenerator from "@site/docs/reusable/data-modeling/config-generator/_index.md"
<ConfigGenerator/>
```

<JsonApp schema={dbtSnowplowMobileConfigSchema} output={printYamlVariables} template={Template}/>
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem';

## Package Configuration Variables

This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file.
This package utilizes a set of variables that are configured to recommended values for optimal performance of the models. Depending on your use case, you might want to override these values by adding to your `dbt_project.yml` file. We have provided a [tool](#config-generator) below to help you with that.

:::caution

Expand Down Expand Up @@ -134,6 +134,11 @@ export const Template = ObjectFieldTemplateGroupsGenerator(GROUPS);
```

## Config Generator
You can use the below inputs to generate the code that you need to place into your `dbt_project.yml` file to configure the package as you require. Any values not specified will use their default values from the package.
``mdx-code-block
import ConfigGenerator from "@site/docs/reusable/data-modeling/config-generator/_index.md"

<ConfigGenerator/>
```
<JsonApp schema={dbtSnowplowWebConfigSchema} output={printYamlVariables} template={Template}/>
Loading

0 comments on commit b33d647

Please sign in to comment.