Skip to content

Commit

Permalink
Forward-merge branch-25.02 into branch-25.06 (#2132)
Browse files Browse the repository at this point in the history
Forward-merge triggered by push to branch-25.02 that creates a PR to
keep branch-25.06 up-to-date. If this PR is unable to be immediately
merged due to conflicts, it will remain open for the team to manually
merge. See [forward-merger
docs](https://docs.rapids.ai/maintainers/forward-merger/) for more info.
  • Loading branch information
dagardner-nv authored Jan 28, 2025
2 parents def2b63 + 8cc9ab7 commit ef393ae
Show file tree
Hide file tree
Showing 12 changed files with 66 additions and 50 deletions.
3 changes: 1 addition & 2 deletions docker/build_container.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ popd &> /dev/null
DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME:?"Must set \$DOCKER_IMAGE_NAME to build. Use the dev/release scripts to set these automatically"}
DOCKER_IMAGE_TAG=${DOCKER_IMAGE_TAG:?"Must set \DOCKER_IMAGE_TAG to build. Use the dev/release scripts to set these automatically"}
DOCKER_TARGET=${DOCKER_TARGET:-"runtime"}
DOCKER_TARGET_ARCH=${DOCKER_TARGET_ARCH:-$(dpkg --print-architecture)}

if [ "${DOCKER_TARGET_ARCH}" == "amd64" ]; then
REAL_ARCH="x86_64"
Expand Down Expand Up @@ -58,7 +57,7 @@ PYTHON_VER=${PYTHON_VER:-3.10}
MORPHEUS_ROOT_HOST=${MORPHEUS_ROOT_HOST:-"$(realpath --relative-to=${PWD} ${MORPHEUS_ROOT})"}

# Build the docker arguments
DOCKER_ARGS="-t ${DOCKER_IMAGE_NAME}:${DOCKER_IMAGE_TAG}-${DOCKER_TARGET_ARCH}"
DOCKER_ARGS="-t ${DOCKER_IMAGE_NAME}:${DOCKER_IMAGE_TAG}"
DOCKER_ARGS="${DOCKER_ARGS} --target ${DOCKER_TARGET}"
DOCKER_ARGS="${DOCKER_ARGS} --build-arg CUDA_MAJOR_VER=${CUDA_MAJOR_VER}"
DOCKER_ARGS="${DOCKER_ARGS} --build-arg CUDA_MINOR_VER=${CUDA_MINOR_VER}"
Expand Down
3 changes: 2 additions & 1 deletion docker/build_container_dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"

export DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME:-"morpheus"}
export DOCKER_IMAGE_TAG=${DOCKER_IMAGE_TAG:-"dev-$(date +'%y%m%d')"}
export DOCKER_TARGET_ARCH=${DOCKER_TARGET_ARCH:-$(dpkg --print-architecture)}
export DOCKER_IMAGE_TAG=${DOCKER_IMAGE_TAG:-"dev-$(date +'%y%m%d')-${DOCKER_TARGET_ARCH}"}
export DOCKER_TARGET=${DOCKER_TARGET:-"development"}

# Call the general build script
Expand Down
3 changes: 2 additions & 1 deletion docker/build_container_release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
pushd ${SCRIPT_DIR} &> /dev/null

export DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME:-"nvcr.io/nvidia/morpheus/morpheus"}
export DOCKER_IMAGE_TAG=${DOCKER_IMAGE_TAG:-"$(git describe --tags --abbrev=0)-runtime"}
export DOCKER_TARGET_ARCH=${DOCKER_TARGET_ARCH:-$(dpkg --print-architecture)}
export DOCKER_IMAGE_TAG=${DOCKER_IMAGE_TAG:-"$(git describe --tags --abbrev=0)-runtime-${DOCKER_TARGET_ARCH}"}
export DOCKER_TARGET=${DOCKER_TARGET:-"runtime"}

popd &> /dev/null
Expand Down
5 changes: 5 additions & 0 deletions docs/source/extra_info/known_issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,10 @@ limitations under the License.

- `vdb_upload` example pipeline triggers an internal error in Triton ([#1649](https://github.com/nv-morpheus/Morpheus/issues/1649))
- `ransomware_detection` example pipeline occasionally logs a `distributed.comm.core.CommClosedError` error during shutdown ([#2026](https://github.com/nv-morpheus/Morpheus/issues/2026)).
- `abp_pcap_detection` pipeline running slowly on AArch64 ([#2120](https://github.com/nv-morpheus/Morpheus/issues/2120))
- LLM `vdb_upload` and `rag` pipelines not supported on AArch64 ([#2122](https://github.com/nv-morpheus/Morpheus/issues/2122))
- `gnn_fraud_detection_pipeline` not working on AArch64 ([#2123](https://github.com/nv-morpheus/Morpheus/issues/2123))
- `ransomware_detection` pipeline running slowly on AArch64 ([#2124](https://github.com/nv-morpheus/Morpheus/issues/2124))
- DFP visualization fails to install on AArch64 ([#2125](https://github.com/nv-morpheus/Morpheus/issues/2125))

Refer to [open issues in the Morpheus project](https://github.com/nv-morpheus/Morpheus/issues)
23 changes: 13 additions & 10 deletions examples/abp_pcap_detection/abp_pcap_preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,19 +81,22 @@ def supports_cpp_node(self):
def pre_process_batch(msg: ControlMessage, fea_len: int, fea_cols: typing.List[str],
req_cols: typing.List[str]) -> ControlMessage:
meta = msg.payload()
assert meta is not None, "Payload is None"
orig_df = meta.get_data()

# Converts the int flags field into a binary string
flags_bin_series = meta.get_data("flags").to_pandas().apply(lambda x: format(int(x), "05b"))
flags = orig_df["flags"].astype("int8")
flags_bin_series = (flags // 16 % 2).astype('O') + (flags // 8 % 2).astype('O') + (
flags // 4 % 2).astype('O') + (flags // 2 % 2).astype('O') + (flags % 2).astype('O')

# Expand binary string into an array
df = cudf.DataFrame(np.vstack(flags_bin_series.str.findall("[0-1]")).astype("int8"),
index=meta.get_data().index)
flag_array = flags_bin_series.str.findall("[0-1]").list.astype("int8")

# adding [ack, psh, rst, syn, fin] details from the binary flag
# Expand binary string into an array adding [ack, psh, rst, syn, fin] details from the binary flag
rename_cols_dct = {0: "ack", 1: "psh", 2: "rst", 3: "syn", 4: "fin"}
df = df.rename(columns=rename_cols_dct)
df = cudf.DataFrame({rename_cols_dct[i]: flag_array.list.get(i) for i in range(5)}, index=orig_df.index)

df["flags_bin"] = flags_bin_series
df["timestamp"] = meta.get_data("timestamp").astype("int64")
df["timestamp"] = orig_df["timestamp"].astype("int64")

def round_time_kernel(timestamp, rollup_time, secs):
for i, time in enumerate(timestamp):
Expand All @@ -112,8 +115,8 @@ def round_time_kernel(timestamp, rollup_time, secs):
df["rollup_time"] = cudf.to_datetime(df["rollup_time"], unit="us").dt.strftime("%Y-%m-%d %H:%M")

# creating flow_id "src_ip:src_port=dst_ip:dst_port"
df["flow_id"] = (meta.get_data("src_ip") + ":" + meta.get_data("src_port").astype("str") + "=" +
meta.get_data("dest_ip") + ":" + meta.get_data("dest_port").astype("str"))
df["flow_id"] = (orig_df["src_ip"] + ":" + orig_df["src_port"].astype("str") + "=" + orig_df["dest_ip"] + ":" +
orig_df["dest_port"].astype("str"))
agg_dict = {
"ack": "sum",
"psh": "sum",
Expand All @@ -124,7 +127,7 @@ def round_time_kernel(timestamp, rollup_time, secs):
"flow_id": "count",
}

df["data_len"] = meta.get_data("data_len").astype("int16")
df["data_len"] = orig_df["data_len"].astype("int16")

# group by operation
grouped_df = df.groupby(["rollup_time", "flow_id"]).agg(agg_dict)
Expand Down
16 changes: 1 addition & 15 deletions examples/digital_fingerprinting/production/grafana/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,20 +63,6 @@ configure_logging(loki_handler, log_level=log_level)

More information about Loki Python logging can be found [here](https://pypi.org/project/python-logging-loki/).

## Build the Morpheus container:
From the root of the Morpheus repo:
```bash
./docker/build_container_release.sh
```

Build `docker compose` services:

```
cd examples/digital_fingerprinting/production
export MORPHEUS_CONTAINER_VERSION="$(git describe --tags --abbrev=0)-runtime"
docker compose build
```

## Start Grafana and Loki services:

To start Grafana and Loki, run the following command on host in `examples/digital_fingerprinting/production`:
Expand All @@ -86,7 +72,7 @@ docker compose up grafana

## Run Azure DFP Training

Create `bash` shell in `morpheus_pipeline` container:
Start `bash` shell in `morpheus_pipeline` container, run the following command on host in `examples/digital_fingerprinting/production`:

```bash
docker compose run --rm morpheus_pipeline bash
Expand Down
14 changes: 7 additions & 7 deletions examples/digital_fingerprinting/visualization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@

We show here how to set up and run the Production DFP pipeline on Azure and Duo log data to generate input files for the DFP visualization UI. You can find more information about the Production DFP pipeline in this [README](../production/README.md) and the [DFP Developer Guide](../../../docs/source/developer_guide/guides/5_digital_fingerprinting.md).

## Supported Architectures
| Architecture | Supported | Issue |
|--------------|-----------|-------|
| x86_64 || |
| aarch64 || [#2125](https://github.com/nv-morpheus/Morpheus/issues/2125) |


## Prerequisites

To run the demo you will need the following:
Expand All @@ -30,17 +37,10 @@ To run the demo you will need the following:
git submodule update --init --recursive
```

## Build the Morpheus container
This is necessary to get the latest changes needed for DFP. From the root of the Morpheus repo:
```bash
./docker/build_container_release.sh
```

## Building Services via `docker compose`

```bash
cd examples/digital_fingerprinting/production
export MORPHEUS_CONTAINER_VERSION="$(git describe --tags --abbrev=0)-runtime"
docker compose build
```

Expand Down
6 changes: 6 additions & 0 deletions examples/gnn_fraud_detection_pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ All environments require additional Conda packages which can be installed with e
| Morpheus Release Container || |
| Dev Container || |

### Supported Architectures
| Architecture | Supported | Issue |
|--------------|-----------|-------|
| x86_64 || |
| aarch64 || [#2123](https://github.com/nv-morpheus/Morpheus/issues/2123) |

## Requirements

Prior to running the GNN fraud detection pipeline, additional requirements must be installed in to your Conda environment.
Expand Down
6 changes: 6 additions & 0 deletions examples/llm/rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ All environments require additional Conda packages which can be installed with e
| Morpheus Release Container || Requires launching Milvus on the host |
| Dev Container || |

### Supported Architectures
| Architecture | Supported | Issue |
|--------------|-----------|-------|
| x86_64 || |
| aarch64 || [#2122](https://github.com/nv-morpheus/Morpheus/issues/2122) |

## Table of Contents

## Background Information
Expand Down
5 changes: 5 additions & 0 deletions examples/llm/vdb_upload/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ All environments require additional Conda packages which can be installed with e
| Morpheus Release Container || Requires launching Triton and Milvus on the host |
| Dev Container || |

### Supported Architectures
| Architecture | Supported | Issue |
|--------------|-----------|-------|
| x86_64 || |
| aarch64 || [#2122](https://github.com/nv-morpheus/Morpheus/issues/2122) |

## Background Information

Expand Down
6 changes: 4 additions & 2 deletions examples/llm/vdb_upload/vdb_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,10 @@
import logging
import typing

import pymilvus
import yaml

from morpheus.config import Config
from morpheus.config import PipelineModes
from morpheus_llm.service.vdb.milvus_client import DATA_TYPE_MAP

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -147,6 +145,10 @@


def build_milvus_config(resource_schema_config: dict):
import pymilvus

from morpheus_llm.service.vdb.milvus_client import DATA_TYPE_MAP

schema_fields = []
for field_data in resource_schema_config["schema_conf"]["schema_fields"]:
field_data["dtype"] = DATA_TYPE_MAP.get(field_data["dtype"])
Expand Down
26 changes: 14 additions & 12 deletions examples/ransomware_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,22 +88,24 @@ Usage: run.py [OPTIONS]
Options:
--debug BOOLEAN
--num_threads INTEGER RANGE Number of internal pipeline threads to use
--num_threads INTEGER RANGE Number of internal pipeline threads to use.
[x>=1]
--n_dask_workers INTEGER RANGE Number of dask workers [x>=2]
--n_dask_workers INTEGER RANGE Number of dask workers. [x>=1]
--threads_per_dask_worker INTEGER RANGE
Number of threads per each dask worker
[x>=2]
Number of threads per each dask worker.
[x>=1]
--model_max_batch_size INTEGER RANGE
Max batch size to use for the model [x>=1]
--model_fea_length INTEGER RANGE
Features length to use for the model [x>=1]
--features_file TEXT File path for ransomware detection features
Max batch size to use for the model. [x>=1]
--pipeline_batch_size INTEGER RANGE
Internal batch size for the pipeline. Can be
much larger than the model batch size.
[x>=1]
--conf_file TEXT Ransomware detection configuration filepath.
--model_name TEXT The name of the model that is deployed on
Tritonserver
--server_url TEXT Tritonserver url [required]
Tritonserver.
--server_url TEXT Tritonserver url. [required]
--sliding_window INTEGER RANGE Sliding window to be used for model input
request [x>=1]
request. [x>=3]
--input_glob TEXT Input glob pattern to match files to read.
For example,
'./input_dir/*/snapshot-*/*.json' would read
Expand All @@ -120,6 +122,6 @@ Options:
--output_file TEXT The path to the file where the inference
output will be saved.
--help Show this message and exit.
```
```

> **Note**: There is a known race condition in `dask.distributed` which occasionally causes `tornado.iostream.StreamClosedError` to be raised during shutdown, but does not affect the output of the pipeline. If you see this exception during shutdown, it is typically safe to ignore unless it corresponds to other undesirable behavior. For more information see ([#2026](https://github.com/nv-morpheus/Morpheus/issues/2026)).

0 comments on commit ef393ae

Please sign in to comment.