Clone and Build
Building the HPCC Platform from source code, deploying to a test environment and submitting pull requests.
diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000000..e69de29bb2d diff --git a/404.html b/404.html new file mode 100644 index 00000000000..f61f5ac0075 --- /dev/null +++ b/404.html @@ -0,0 +1,21 @@ + + +
+ + +HPCC SYSTEMS software Copyright (C) 2019 HPCC Systems.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+https://hpccsystems.com
sudo apt-get install cmake bison flex build-essential binutils-dev libldap2-dev libcppunit-dev libicu-dev libxslt1-dev \\
+zlib1g-dev libboost-regex-dev libarchive-dev python-dev libv8-dev default-jdk libapr1-dev libaprutil1-dev libiberty-dev \\
+libhiredis-dev libtbb-dev libxalan-c-dev libnuma-dev nodejs libevent-dev libatlas-base-dev libblas-dev python3-dev \\
+default-libmysqlclient-dev libsqlite3-dev r-base-dev r-cran-rcpp r-cran-rinside r-cran-inline libmemcached-dev \\
+libcurl4-openssl-dev pkg-config libtool autotools-dev automake libssl-dev
+
sudo apt-get install cmake bison flex build-essential binutils-dev libldap2-dev libcppunit-dev libicu-dev libxslt1-dev \\
+zlib1g-dev libboost-regex-dev libssl-dev libarchive-dev python-dev libv8-dev default-jdk libapr1-dev libaprutil1-dev \\
+libiberty-dev libhiredis-dev libtbb-dev libxalan-c-dev libnuma-dev libevent-dev libatlas-base-dev libblas-dev \\
+libatlas-dev python3-dev libcurl4-openssl-dev libtool autotools-dev automake
+
Get openssl-1.0.2o.tar.gz from https://www.openssl.org/source/
unpack, build, and install:
./config -fPIC shared
+make
+make install
+
This installs to /usr/local/ssl by default.
Build the platform with the following additional CMake options:
-DOPENSSL_LIBRARIES=/usr/local/ssl/lib/libssl.so -DOPENSSL_SSL_LIBRARY=/usr/local/ssl/lib/libssl.so
+
Regardless of which version of CentOS you will be building on, it is suggested that you enable the EPEL repository
sudo yum install -y epel-release
sudo yum install gcc-c++ gcc make bison flex binutils-devel openldap-devel libicu-devel libxslt-devel libarchive-devel \\
+boost-devel openssl-devel apr-devel apr-util-devel hiredis-devel numactl-devel mariadb-devel libevent-devel tbb-devel \\
+atlas-devel python34 libmemcached-devel sqlite-devel v8-devel python-devel python34-devel java-1.8.0-openjdk-devel \\
+R-core-devel R-Rcpp-devel R-inline R-RInside-devel nodejs cmake3 rpm-build libcurl-devel
+
sudo yum install gcc-c++ gcc make bison flex binutils-devel openldap-devel libicu-devel libxslt-devel libarchive-devel \\
+boost-devel openssl-devel apr-devel apr-util-devel hiredis-devel numactl-devel libmysqlclient-dev libevent-devel \\
+tbb-devel atlas-devel python34 R-core-devel R-Rcpp-devel R-inline R-RInside-devel nodejs libcurl-devel
+
sudo yum install gcc-c++ gcc make fedora-packager cmake bison flex binutils-devel openldap-devel libicu-devel \\
+xerces-c-devel xalan-c-devel libarchive-devel boost-devel openssl-devel apr-devel apr-util-devel
+
sudo dnf install gcc-c++ gcc make fedora-packager cmake bison flex binutils-devel openldap-devel libicu-devel \\
+xerces-c-devel xalan-c-devel libarchive-devel boost-devel openssl-devel apr-devel apr-util-devel numactl-devel \\
+tbb-devel libxslt-devel nodejs
+
sudo port install bison flex binutils openldap icu xalanc zlib boost openssl libarchive
+
You can disable some functionality in order to reduce the list of required components, if necessary. Optional components, such as the plugins for interfacing to external languages, will be disabled automatically if the required libraries and headers are not found at build time.
Optional dependencies:
To build with support for all the plugins for third party embedded code, additional dependencies may be required. If not found, the system will default to skipping those components.
From 7.4 the build system has been changed to make it easier to build on windows. It is dependent on two other projects chocolatey and vcpkg for installing the dependencies:
Install chocolatey:
https://chocolatey.org/install
Use that to install bison/flex:
choco install winflexbison3
Install vcpkg:
git clone https://github.com/Microsoft/vcpkg
+cd vcpkg
+bootstrap-vcpkg
Use vcpkg to install various packages:
vcpkg install zlib
+vcpkg install boost
+vcpkg install icu
+vcpkg install libxslt
+vcpkg install tbb
+vcpkg install cppunit
+vcpkg install libarchive
+vcpkg install apr
+vcpkg install apr-util
You may need to force vcpkg to build the correct version, e.g. for 64bit:
vcpkg install zlib zlib:x64-windows
NodeJS (version 8.x.x LTS recommended) is used to package ECL Watch and related web pages.
To install nodeJs on Linux based systems, try:
curl -sL https://deb.nodesource.com/setup_8.x | sudo bash -
+ sudo apt-get install -y nodejs
If these instructions do not work on your system, refer to the detailed instructions available here
First insure that the R language is installed on your system. For Ubuntu use sudo apt-get install r-base-dev
. For centos distributions use sudo yum install -y R-core-devel
.
To install the prerequisites for building R support, use the following for all distros:
wget https://cran.r-project.org/src/contrib/00Archive/Rcpp/Rcpp_0.12.1.tar.gz
+wget https://cran.r-project.org/src/contrib/00Archive/RInside/RInside_0.2.12.tar.gz
+wget http://cran.r-project.org/src/contrib/inline_0.3.14.tar.gz
+sudo R CMD INSTALL Rcpp_0.12.1.tar.gz RInside_0.2.12.tar.gz inline_0.3.14.tar.gz
Visit Git-step-by-step for full instructions.
To get started quickly, simply:
git clone [-b <branch name>] --recurse-submodules https://github.com/hpcc-systems/HPCC-Platform.git
Where [ ] denotes an optional argument.
The minimum version of CMake required to build the HPCC Platform is 3.3.2 on Linux. You may need to download a recent version here at cmake.org.
Now you need to run CMake to populate your build directory with Makefiles and special configuration to build HPCC, packages, tests, etc.
A separate directory is required for the build files. In the examples below, the source directory is contained in ~/hpcc and the build directory is ~/hpcc/build. mkdir ~/hpcc/build
All cmake commands would normally need to be executed within the build directory: cd ~/hpcc/build
For release builds, do: cmake ../src
To enable a specific plugin in the build:
cmake –D<Plugin Name>=ON ../src
+ make –j6 package
These are the current supported plugins:
If testing during development and you may want to include plugins (except R) in the package: cmake -DTEST_PLUGINS=ON ../src
To produce a debug build: cmake -DCMAKE_BUILD_TYPE:STRING=Debug ../src
To build the client tools only: cmake -DCLIENTTOOLS_ONLY=1 ../src
To enable signing of the ecl standard library, ensure you have a gpg private key loaded into your gpg keychain and do: # Add -DSIGN_MODULES_KEYID and -DSIGN_MODULES_PASSPHRASE if applicable cmake -DSIGN_MODULES=ON ../src
In some cases, users have found that when packaging for Ubuntu, the dpkg-shlibdeps portion of the packaging adds an exceptional amount of time to the build process. To turn this off (and to create a package without dynamic dependency generation) do: cmake -DUSE_SHLIBDEPS=OFF ../src
CMake will check for necessary dependencies like binutils, boost regex, cppunit, pthreads, etc. If everything is correct, it'll create the necessary Makefiles and you're ready to build HPCC.
We default to using libxslt in place of Xalan for xslt support. Should you prefer to use libxalan, you can specify -DUSE_LIBXALAN on the cmake command line.
You may build by either using make:
# Using -j option here to specify 6 compilation threads (suitable for quad core cpu)
+make -j6
+
Or, alternatively you can call a build system agnostic variant (works with make, ninja, XCode, Visual Studio etc.):
cmake --build .
+
This will make all binaries, libraries and scripts necessary to create the package.
The recommended method to install HPCC Systems on your machine (even for testing) is to use distro packages. CMake has already detected your system, so it know whether to generate TGZ files, DEB or RPM packages.
Just type:
make package
+
Alternatively you can use the build system agnostic variant:
cmake --build . --target package
+
and it will create the appropriate package for you to install. The package file will be created inside the build directory.
Install the package:
sudo dpkg -i hpccsystems-platform-community_6.0.0-trunk0trusty_amd64.deb
+
(note that the name of the package you have just built will depend on the branch you checked out, the distro, and other options).
Hint: missing dependencies may be fixed with:
sudo apt-get -f install
+
(see here for Ubuntu based installation).
Note: These instructions may not be up to date
git submodule update --init --recursive
brew install icu4c
+ brew install boost
+ brew install libarchive
+ brew install bison
+ brew install openldap
** Also make sure that bison is ahead of the system bison on your path. bison --version
(The result should be > 2.4.1 )
** OS X has LDAP installed, but when compiling against it (/System/Library/Frameworks/LDAP.framework/Headers/ldap.h) you will get a #include nested too deeply
, which is why you should install openldap.
export CC=/usr/bin/clang
+ export CXX=/usr/bin/clang++
+ cmake ../ -DICU_LIBRARIES=/usr/local/opt/icu4c/lib/libicuuc.dylib -DICU_INCLUDE_DIR=/usr/local/opt/icu4c/include \\
+ -DLIBARCHIVE_INCLUDE_DIR=/usr/local/opt/libarchive/include \\
+ -DLIBARCHIVE_LIBRARIES=/usr/local/opt/libarchive/lib/libarchive.dylib \\
+ -DBOOST_REGEX_LIBRARIES=/usr/local/opt/boost/lib -DBOOST_REGEX_INCLUDE_DIR=/usr/local/opt/boost/include \\
+ -DCLIENTTOOLS_ONLY=true \\
+ -DUSE_OPENLDAP=true -DOPENLDAP_INCLUDE_DIR=/usr/local/opt/openldap/include \\
+ -DOPENLDAP_LIBRARIES=/usr/local/opt/openldap/lib/libldap_r.dylib
HPCC SYSTEMS software Copyright (C) 2019 HPCC Systems.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+https://hpccsystems.com
sudo apt-get install cmake bison flex build-essential binutils-dev libldap2-dev libcppunit-dev libicu-dev libxslt1-dev \\
+zlib1g-dev libboost-regex-dev libarchive-dev python-dev libv8-dev default-jdk libapr1-dev libaprutil1-dev libiberty-dev \\
+libhiredis-dev libtbb-dev libxalan-c-dev libnuma-dev nodejs libevent-dev libatlas-base-dev libblas-dev python3-dev \\
+default-libmysqlclient-dev libsqlite3-dev r-base-dev r-cran-rcpp r-cran-rinside r-cran-inline libmemcached-dev \\
+libcurl4-openssl-dev pkg-config libtool autotools-dev automake libssl-dev
+
sudo apt-get install cmake bison flex build-essential binutils-dev libldap2-dev libcppunit-dev libicu-dev libxslt1-dev \\
+zlib1g-dev libboost-regex-dev libssl-dev libarchive-dev python-dev libv8-dev default-jdk libapr1-dev libaprutil1-dev \\
+libiberty-dev libhiredis-dev libtbb-dev libxalan-c-dev libnuma-dev libevent-dev libatlas-base-dev libblas-dev \\
+libatlas-dev python3-dev libcurl4-openssl-dev libtool autotools-dev automake
+
Get openssl-1.0.2o.tar.gz from https://www.openssl.org/source/
unpack, build, and install:
./config -fPIC shared
+make
+make install
+
This installs to /usr/local/ssl by default.
Build the platform with the following additional CMake options:
-DOPENSSL_LIBRARIES=/usr/local/ssl/lib/libssl.so -DOPENSSL_SSL_LIBRARY=/usr/local/ssl/lib/libssl.so
+
Regardless of which version of CentOS you will be building on, it is suggested that you enable the EPEL repository
sudo yum install -y epel-release
sudo yum install gcc-c++ gcc make bison flex binutils-devel openldap-devel libicu-devel libxslt-devel libarchive-devel \\
+boost-devel openssl-devel apr-devel apr-util-devel hiredis-devel numactl-devel mariadb-devel libevent-devel tbb-devel \\
+atlas-devel python34 libmemcached-devel sqlite-devel v8-devel python-devel python34-devel java-1.8.0-openjdk-devel \\
+R-core-devel R-Rcpp-devel R-inline R-RInside-devel nodejs cmake3 rpm-build libcurl-devel
+
sudo yum install gcc-c++ gcc make bison flex binutils-devel openldap-devel libicu-devel libxslt-devel libarchive-devel \\
+boost-devel openssl-devel apr-devel apr-util-devel hiredis-devel numactl-devel libmysqlclient-dev libevent-devel \\
+tbb-devel atlas-devel python34 R-core-devel R-Rcpp-devel R-inline R-RInside-devel nodejs libcurl-devel
+
sudo yum install gcc-c++ gcc make fedora-packager cmake bison flex binutils-devel openldap-devel libicu-devel \\
+xerces-c-devel xalan-c-devel libarchive-devel boost-devel openssl-devel apr-devel apr-util-devel
+
sudo dnf install gcc-c++ gcc make fedora-packager cmake bison flex binutils-devel openldap-devel libicu-devel \\
+xerces-c-devel xalan-c-devel libarchive-devel boost-devel openssl-devel apr-devel apr-util-devel numactl-devel \\
+tbb-devel libxslt-devel nodejs
+
sudo port install bison flex binutils openldap icu xalanc zlib boost openssl libarchive
+
You can disable some functionality in order to reduce the list of required components, if necessary. Optional components, such as the plugins for interfacing to external languages, will be disabled automatically if the required libraries and headers are not found at build time.
Optional dependencies:
To build with support for all the plugins for third party embedded code, additional dependencies may be required. If not found, the system will default to skipping those components.
From 7.4 the build system has been changed to make it easier to build on windows. It is dependent on two other projects chocolatey and vcpkg for installing the dependencies:
Install chocolatey:
https://chocolatey.org/install
Use that to install bison/flex:
choco install winflexbison3
Install vcpkg:
git clone https://github.com/Microsoft/vcpkg
+cd vcpkg
+bootstrap-vcpkg
Use vcpkg to install various packages:
vcpkg install zlib
+vcpkg install boost
+vcpkg install icu
+vcpkg install libxslt
+vcpkg install tbb
+vcpkg install cppunit
+vcpkg install libarchive
+vcpkg install apr
+vcpkg install apr-util
You may need to force vcpkg to build the correct version, e.g. for 64bit:
vcpkg install zlib zlib:x64-windows
NodeJS (version 8.x.x LTS recommended) is used to package ECL Watch and related web pages.
To install nodeJs on Linux based systems, try:
curl -sL https://deb.nodesource.com/setup_8.x | sudo bash -
+ sudo apt-get install -y nodejs
If these instructions do not work on your system, refer to the detailed instructions available here
First insure that the R language is installed on your system. For Ubuntu use sudo apt-get install r-base-dev
. For centos distributions use sudo yum install -y R-core-devel
.
To install the prerequisites for building R support, use the following for all distros:
wget https://cran.r-project.org/src/contrib/00Archive/Rcpp/Rcpp_0.12.1.tar.gz
+wget https://cran.r-project.org/src/contrib/00Archive/RInside/RInside_0.2.12.tar.gz
+wget http://cran.r-project.org/src/contrib/inline_0.3.14.tar.gz
+sudo R CMD INSTALL Rcpp_0.12.1.tar.gz RInside_0.2.12.tar.gz inline_0.3.14.tar.gz
Visit Git-step-by-step for full instructions.
To get started quickly, simply:
git clone [-b <branch name>] --recurse-submodules https://github.com/hpcc-systems/HPCC-Platform.git
Where [ ] denotes an optional argument.
The minimum version of CMake required to build the HPCC Platform is 3.3.2 on Linux. You may need to download a recent version here at cmake.org.
Now you need to run CMake to populate your build directory with Makefiles and special configuration to build HPCC, packages, tests, etc.
A separate directory is required for the build files. In the examples below, the source directory is contained in ~/hpcc and the build directory is ~/hpcc/build. mkdir ~/hpcc/build
All cmake commands would normally need to be executed within the build directory: cd ~/hpcc/build
For release builds, do: cmake ../src
To enable a specific plugin in the build:
cmake –D<Plugin Name>=ON ../src
+ make –j6 package
These are the current supported plugins:
If testing during development and you may want to include plugins (except R) in the package: cmake -DTEST_PLUGINS=ON ../src
To produce a debug build: cmake -DCMAKE_BUILD_TYPE:STRING=Debug ../src
To build the client tools only: cmake -DCLIENTTOOLS_ONLY=1 ../src
To enable signing of the ecl standard library, ensure you have a gpg private key loaded into your gpg keychain and do: # Add -DSIGN_MODULES_KEYID and -DSIGN_MODULES_PASSPHRASE if applicable cmake -DSIGN_MODULES=ON ../src
In some cases, users have found that when packaging for Ubuntu, the dpkg-shlibdeps portion of the packaging adds an exceptional amount of time to the build process. To turn this off (and to create a package without dynamic dependency generation) do: cmake -DUSE_SHLIBDEPS=OFF ../src
CMake will check for necessary dependencies like binutils, boost regex, cppunit, pthreads, etc. If everything is correct, it'll create the necessary Makefiles and you're ready to build HPCC.
We default to using libxslt in place of Xalan for xslt support. Should you prefer to use libxalan, you can specify -DUSE_LIBXALAN on the cmake command line.
You may build by either using make:
# Using -j option here to specify 6 compilation threads (suitable for quad core cpu)
+make -j6
+
Or, alternatively you can call a build system agnostic variant (works with make, ninja, XCode, Visual Studio etc.):
cmake --build .
+
This will make all binaries, libraries and scripts necessary to create the package.
The recommended method to install HPCC Systems on your machine (even for testing) is to use distro packages. CMake has already detected your system, so it know whether to generate TGZ files, DEB or RPM packages.
Just type:
make package
+
Alternatively you can use the build system agnostic variant:
cmake --build . --target package
+
and it will create the appropriate package for you to install. The package file will be created inside the build directory.
Install the package:
sudo dpkg -i hpccsystems-platform-community_6.0.0-trunk0trusty_amd64.deb
+
(note that the name of the package you have just built will depend on the branch you checked out, the distro, and other options).
Hint: missing dependencies may be fixed with:
sudo apt-get -f install
+
(see here for Ubuntu based installation).
Note: These instructions may not be up to date
git submodule update --init --recursive
brew install icu4c
+ brew install boost
+ brew install libarchive
+ brew install bison
+ brew install openldap
** Also make sure that bison is ahead of the system bison on your path. bison --version
(The result should be > 2.4.1 )
** OS X has LDAP installed, but when compiling against it (/System/Library/Frameworks/LDAP.framework/Headers/ldap.h) you will get a #include nested too deeply
, which is why you should install openldap.
export CC=/usr/bin/clang
+ export CXX=/usr/bin/clang++
+ cmake ../ -DICU_LIBRARIES=/usr/local/opt/icu4c/lib/libicuuc.dylib -DICU_INCLUDE_DIR=/usr/local/opt/icu4c/include \\
+ -DLIBARCHIVE_INCLUDE_DIR=/usr/local/opt/libarchive/include \\
+ -DLIBARCHIVE_LIBRARIES=/usr/local/opt/libarchive/lib/libarchive.dylib \\
+ -DBOOST_REGEX_LIBRARIES=/usr/local/opt/boost/lib -DBOOST_REGEX_INCLUDE_DIR=/usr/local/opt/boost/include \\
+ -DCLIENTTOOLS_ONLY=true \\
+ -DUSE_OPENLDAP=true -DOPENLDAP_INCLUDE_DIR=/usr/local/opt/openldap/include \\
+ -DOPENLDAP_LIBRARIES=/usr/local/opt/openldap/lib/libldap_r.dylib
HPCC Systems offers an enterprise ready, open source supercomputing platform to solve big data problems. As compared to Hadoop, the platform offers analysis of big data using less code and less nodes for greater efficiencies and offers a single programming language, a single platform and a single architecture for efficient processing. HPCC Systems is a technology division of LexisNexis Risk Solutions.
In general, a new version of the HPCC Platform is released every 3 months. These releases can be either Major (with breaking changes) or Minor (with new features). Maintenance and security releases (point releases) are typically made weekly, and may occasionally include technical previews.
Maintenance releases are supported for the current and previous release, while security releases are supported for the current and previous two releases:
---
+displayMode: compact
+---
+gantt
+ title Release Schedule
+ axisFormat %Y-Q%q
+ tickInterval 3month
+ dateFormat YYYY-MM-DD
+ section v8.12.x
+ Active: active, 2023-02-07, 5M
+ Critical: 3M
+ Security: 6M
+ section v9.0.x
+ Active: active, 2023-04-03, 6M
+ Critical: 3M
+ Security: 6M
+ section v9.2.x
+ Active: active, 2023-07-04, 9M
+ Critical: 3M
+ Security: 3M
+ section v9.4.x
+ Active: active, 2023-10-04, 9M
+ Critical: 3M
+ Security: 3M
+ section v9.6.x
+ Active: active, 2024-04-04, 6M
+ Critical: 3M
+ Security: 3M
+ section v9.8.x
+ Active: active, 2024-07-02, 6M
+ Critical: 3M
+ Security: 3M
+ section v9.10.x
+ Active: active, 2024-10-01, 6M
+ Critical: 3M
+ Security: 3M
The HPCC Systems architecture incorporates the Thor and Roxie clusters as well as common middleware components, an external communications layer, client interfaces which provide both end-user services and system management tools, and auxiliary components to support monitoring and to facilitate loading and storing of filesystem data from external sources. An HPCC environment can include only Thor clusters, or both Thor and Roxie clusters. Each of these cluster types is described in more detail in the following sections below the architecture diagram.
Thor (the Data Refinery Cluster) is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. A cluster can scale from a single node to thousands of nodes.
Roxie (the Query Cluster) provides separate high-performance online query processing and data warehouse capabilities. Roxie (Rapid Online XML Inquiry Engine) is the data delivery engine used in HPCC to serve data quickly and can support many thousands of requests per node per second.
ECL (Enterprise Control Language) is the powerful programming language that is ideally suited for the manipulation of Big Data.
ECL IDE is a modern IDE used to code, debug and monitor ECL programs.
ESP (Enterprise Services Platform) provides an easy to use interface to access ECL queries using XML, HTTP, SOAP and REST.
The following links describe the structure of the system and detail some of the key components:
cd /opt/HPCCSystems/testing/regress
+./ecl-test query --target thor nlppp.ecl
HPCC Systems offers an enterprise ready, open source supercomputing platform to solve big data problems. As compared to Hadoop, the platform offers analysis of big data using less code and less nodes for greater efficiencies and offers a single programming language, a single platform and a single architecture for efficient processing. HPCC Systems is a technology division of LexisNexis Risk Solutions.
In general, a new version of the HPCC Platform is released every 3 months. These releases can be either Major (with breaking changes) or Minor (with new features). Maintenance and security releases (point releases) are typically made weekly, and may occasionally include technical previews.
Maintenance releases are supported for the current and previous release, while security releases are supported for the current and previous two releases:
---
+displayMode: compact
+---
+gantt
+ title Release Schedule
+ axisFormat %Y-Q%q
+ tickInterval 3month
+ dateFormat YYYY-MM-DD
+ section v8.12.x
+ Active: active, 2023-02-07, 5M
+ Critical: 3M
+ Security: 6M
+ section v9.0.x
+ Active: active, 2023-04-03, 6M
+ Critical: 3M
+ Security: 6M
+ section v9.2.x
+ Active: active, 2023-07-04, 9M
+ Critical: 3M
+ Security: 3M
+ section v9.4.x
+ Active: active, 2023-10-04, 9M
+ Critical: 3M
+ Security: 3M
+ section v9.6.x
+ Active: active, 2024-04-04, 6M
+ Critical: 3M
+ Security: 3M
+ section v9.8.x
+ Active: active, 2024-07-02, 6M
+ Critical: 3M
+ Security: 3M
+ section v9.10.x
+ Active: active, 2024-10-01, 6M
+ Critical: 3M
+ Security: 3M
The HPCC Systems architecture incorporates the Thor and Roxie clusters as well as common middleware components, an external communications layer, client interfaces which provide both end-user services and system management tools, and auxiliary components to support monitoring and to facilitate loading and storing of filesystem data from external sources. An HPCC environment can include only Thor clusters, or both Thor and Roxie clusters. Each of these cluster types is described in more detail in the following sections below the architecture diagram.
Thor (the Data Refinery Cluster) is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. A cluster can scale from a single node to thousands of nodes.
Roxie (the Query Cluster) provides separate high-performance online query processing and data warehouse capabilities. Roxie (Rapid Online XML Inquiry Engine) is the data delivery engine used in HPCC to serve data quickly and can support many thousands of requests per node per second.
ECL (Enterprise Control Language) is the powerful programming language that is ideally suited for the manipulation of Big Data.
ECL IDE is a modern IDE used to code, debug and monitor ECL programs.
ESP (Enterprise Services Platform) provides an easy to use interface to access ECL queries using XML, HTTP, SOAP and REST.
The following links describe the structure of the system and detail some of the key components:
cd /opt/HPCCSystems/testing/regress
+./ecl-test query --target thor nlppp.ecl
";else if(Array.isArray(r)){for(m=0,b=r.length;m"),a.appendChild(o)):(a=null,s+=o);a=null}else s+=r;s+=n[h]}if(a=e(s),++w>0){for(u=new Array(w),c=document.createTreeWalker(a,NodeFilter.SHOW_COMMENT,null,!1);c.nextNode();)o=c.currentNode,/^o:/.test(o.nodeValue)&&(u[+o.nodeValue.slice(2)]=o);for(h=0;h - / : - CMakeLists.txt - Root CMake file - version.cmake - common cmake file where version variables are set - build-config.h.cmake - cmake generation template for build-config.h Use FOREACH: USE add_custom_command only when 100% needed. All directories in a cmake project should have a CMakeLists.txt file and be called from the upper level project with an add_subdirectory or HPCC_ADD_SUBDIRECTORY When you have a property that will be shared between cmake projects use define_property to set it in the top level cache. All of our Find scripts use the following format: Will define when done: (more can be defined, but must be at min the previous unless looking for only a binary) For an example, see FindAPR.cmake - / : - CMakeLists.txt - Root CMake file - version.cmake - common cmake file where version variables are set - build-config.h.cmake - cmake generation template for build-config.h Use FOREACH: USE add_custom_command only when 100% needed. All directories in a cmake project should have a CMakeLists.txt file and be called from the upper level project with an add_subdirectory or HPCC_ADD_SUBDIRECTORY When you have a property that will be shared between cmake projects use define_property to set it in the top level cache. All of our Find scripts use the following format: Will define when done: (more can be defined, but must be at min the previous unless looking for only a binary) For an example, see FindAPR.cmake The primary purpose of the code generator is to take an ECL query and convert it into a work unit that is suitable for running by one of the engines. The code generator has to do its job accurately. If the code generator does not correctly map the ECL to the workunit it can lead to corrupt data and invalid results. Problems like that can often be very hard and frustrating for the ECL users to track down. There is also a strong emphasis on generating output that is as good as possible. Eclcc contains many different optimization stages, and is extensible to allow others to be easily added. Eclcc needs to be able to cope with reasonably large jobs. Queries that contain several megabytes of ECL, and generate tens of thousands of activities, and 10s of Mb of C++ are routine. These queries need to be processed relatively quickly. Nearly all the processing of ECL is done using an expression graph. The representation of the expression graph has some particular characteristics: The ECL language is a declarative language, and in general is assumed to be pure - i.e. there are no side-effects, expressions can be evaluated lazily and re-evaluating an expression causes no problems. This allows eclcc to transform the graph in lots of interesting ways. (Life is never that simple so there are mechanisms for handling the exceptions.) One of the main challenges with eclcc is converting the declarative ECL code into imperative C++ code. One key problem is it needs to try to ensure that code is only evaluated when it is required, but that it is also only evaluated once. It isn't always possible to satisfy both constraints - for example a global dataset expression used within a child query. Should it be evaluated once before the activity containing the child query is called, or each time the child query is called? If it is called on demand then it may not be evaluated as efficiently... This issue complicates many of the optimizations and transformations that are done to the queries. Long term the plan is to allow the engines to support more delayed lazy-evaluation, so that whether something is evaluated is more dynamic rather than static. The idealised view of the processing within eclcc follows the following stages: In practice the progression is not so clear cut. There tends to be some overlap between the different stages, and some of them may occur in slightly different orders. However the order broadly holds. Before any change is accepted for the code generator it is always run against several regression suites to ensure that it doesn't introduce any problems, and that the change has the desired effect. There are several different regression suites: The ecl/regress directory contains a script 'regress.sh' that is used for running the regression tests. It should be executed in the directory containing the ecl files. The script generates the c++ code (and workunits) for each of the source files to a target directory, and then executes a comparison program to compare the new results with a previous "golden" reference set. Before making any changes to the compiler, a reference set should be created by running the regression script and copying the generated files to the reference directory. Here is a sample command line (A version of this command resides in a shell script in each of my regression suite directories, with the -t and -c options adapted for each suite.) For a full list of options execute the script with no parameters, or take a look at the script itself. A couple of useful options are: We strongly recommend using a comparison program which allows rules to be defined to ignore certain differences (e.g., beyond compare). It is much quicker to run eclcc directly from the build directory, rather than deploying a system and running eclcc from there. To do this you need to configure some options that eclcc requires, e.g. where the include files are found. The options can be set by either setting environment variables or by specifiying options in an eclcc.ini file. The following are the names of the different options: Environment flag Ini file option CL_PATH compilerPath ECLCC_LIBRARY_PATH libraryPath ECLCC_INCLUDE_PATH includePath ECLCC_PLUGIN_PATH plugins HPCC_FILEHOOKS_PATH filehooks ECLCC_TPL_PATH templatePath ECLCC_ECLLIBRARY_PATH eclLibrariesPath The eclcc.ini can either be a file in the local directory, or specified on the eclcc command line with -specs. Including the settings in a local eclcc.ini file also it easy to debug eclcc directly from the build directory within the eclipse environment. Logging There is an option for eclcc to output a logging file, and another to specify the level of detail in that logging file. If the detail level is above 500 then the expresssion tree for the query is output to the logging file after each of the code transformations. The tracing is very useful for tracking down at which stage inconsistencies are introduced in the expression graph, and also for learning how each transformation affects the query. The output format defaults to ECL - which is regenerated from the expression tree. (This ECL cannot generally be compiled without editing - partly because it contains extra annoations.) Use either of the following: -ftraceIR There is a debug option (-ftraceIR) that generates an intermediate representation of the expression graph rather than regenerating ECL. The output tends to be less compact and harder to read quickly, but has the advantage of being better structured, and contains more details of the internal representation. ecl/hql/hqlir.cpp contains more details of the format. Adding extra logging into the source code If you want to add tracing of expressions at any point in the code generation then adding either of the following calls will include the expression details in the log file: Logging while debugging If you are debugging inside gdb it is often useful to be able to dump out details of an expression. Calling EclIR:dump_ir(expr); will generate the IR to stdout. The function can also be used with multiple parameters. Each expression will be dumped out, but common child nodes will only be generated once. This can be very useful when trying to determine the difference between two expressions. The quickest way is to call Expression sequence ids. Sometimes it can be hard to determine where a particular IHqlExpression node was created. If that is the case, then defining The key data structure within eclcc is the graph representation. The design has some key elements. Once a node is created it is never modified. Some derived information (e.g., sort order, number of records, unique hash, ...) might be calculated and stored in the class after it has been created, but that doesn't change what the node represents in any way. Some nodes are created in stages - e.g., records, modules. These nodes are marked as fully completed when closeExpr() is called, after which they cannot be modified. Nodes are always commoned up. If the same operator has the same arguments and type then there will be a unique IHqlExpression to represent it. This helps ensure that graphs stay as graphs and don't get converted to trees. It also helps with optimizations, and allows code duplicated in two different contexts to be brought together. The nodes are link counted. Link counts are used to control the lifetime of the expression objects. Whenever a reference to an expression node is held, its link count is increased, and decreased when no longer required. The node is freed when there are no more references. (This generally works well, but does give us problems with forward references.) The access to the graph is through interfaces. The main interfaces are IHqlExpression, IHqlDataset and IHqlScope. They are all defined in hqlexpr.hpp. The aim of the interfaces is to hide the implementation of the expression nodes so they can be restructured and changed without affecting any other code. The expression classes use interfaces and a type field rather than polymorphism. This could be argued to be bad object design...but. There are more than 500 different possible operators. If a class was created for each of them the system would quickly become unwieldy. Instead there are several different classes which model the different types of expression (dataset/expression/scope). The interfaces contain everything needed to create and interrogate an expression tree, but they do not contain functionality for directly processing the graph. To avoid some of the shortcomings of type fields there are various mechanisms for accessing derived attributes which avoid interrogating the type field. Memory consumption is critical. It is not unusual to have 10M or even 100M nodes in memory as a query is being processed. At that scale the memory consumption of each node matters - so great care should be taken when considering increasing the size of the objects. The node classes contain a class hierarchy which is there purely to reduce the memory consumption - not to reflect the functionality. With no memory constraints they wouldn't be there, but removing a single pointer per node can save 1Gb of memory usage for very complex queries. This is the interface that is used to walk and interrogate the expression graph once it has been created. Some of the main functions are: getOperator() What does this node represent? It returns a member of the node_operator enumerated type. numChildren() How many arguments does node have? queryChild(unsigned n) What is the nth child? If the argument is out of range it returns NULL. queryType() The type of this node. queryBody() Used to skip annotations (see below) queryProperty() Does this node have a child which is an attribute that matches a given name. (see below for more about attributes). queryValue() For a no_constant return the value of the constant. It returns NULL otherwise. The nodes in the expression graph are created through factory functions. Some of the expression types have specialised functions - e.g., createDataset, createRow, createDictionary, but scalar expressions and actions are normally created with createValue(). Note: Generally ownership of the arguments to the createX() functions are assumed to be taken over by the newly created node. The values of the enumeration constants in node_operator are used to calculate "crcs" which are used to check if the ECL for a query matches, and if disk and index record formats match. It contains quite a few legacy entries no_unusedXXX which can be used for new operators (otherwise new operators must be added to the end). This interface is implemented by records, and is used to map names to the fields within the records. If a record contains IFBLOCKs then each of the fields in the ifblock is defined in the IHqlSimpleScope for the containing record. Normally obtained by calling IHqlExpression::queryScope(). It is primarily used in the parser to resolve fields from within modules. The ECL is parsed on demand so as the symbol is looked up it may cause a cascade of ECL to be compiled. The lookup context (HqlLookupContext ) is passed to IHqlScope::lookupSymbol() for several reasons: The interface IHqlScope currently has some members that are used for creation; this should be refactored and placed in a different interface. This is normally obtained by calling IHqlExpression::queryDataset(). It has shrunk in size over time, and could quite possibly be folded into IHqlExpression with little pain. There is a distinction in the code generator between "tables" and "datasets". A table (IHqlDataset::queryTable()) is a dataset operation that defines a new output record. Any operation that has a transform or record that defines an output record (e.g., PROJECT,TABLE) is a table, whilst those that don't (e.g., a filter, dedup) are not. There are a few apparent exceptions -e.g., IF (This is controlled by definesColumnList() which returns true the operator is a table.) There are two related by slightly different concepts. An attribute refers to the explicit flags that are added to operators (e.g., , LOCAL, KEEP(n) etc. specified in the ECL or some internal attributes added by the code generator). There are a couple of different functions for creating attributes. createExtraAttribute() should be used by default. createAttribute() is reserved for an attribute that never has any arguments, or in unusual situations where it is important that the arguments are never transformed. They are tested using queryAttribute()/hasAttribute() and represented by nodes of kind no_attr/no_expr_attr. The term "property" refers to computed information (e.g., record counts) that can be derived from the operator, its arguments and attributes. They are covered in more detail below. Fields can be selected from active rows of a dataset in three main ways: Some operators define LEFT/RIGHT to represent an input or processed dataset. Fields from these active rows are referenced with LEFT.<field-name>. Here LEFT or RIGHT is the "selector". Other operators use the input dataset as the selector. E.g., myFile(myFile.id != 0). Here the input dataset is the "selector". Often when the input dataset is used as the selector it can be omitted. E.g., myFile(id != 0). This is implicitly expanded by the PARSER to the second form. A reference to a field is always represented in the expression graph as a node of kind no_select (with createSelectExpr). The first child is the selector, and the second is the field. Needless to say there are some complications... LEFT/RIGHT. The problem is that the different uses of LEFT/RIGHT need to be disambiguated since there may be several different uses of LEFT in a query. This is especially true when operations are executed in child queries. LEFT is represented by a node no_left(record, selSeq). Often the record is sufficient to disambiguate the uses, but there are situations where it isn't enough. So in addition no_left has a child which is a selSeq (selector sequence) which is added as a child attribute of the PROJECT or other operator. At parse time it is a function of the input dataset that is later normalized to a unique id to reduce the transformation work. Active datasets. It is slightly more complicated - because the dataset used as the selector can be any upstream dataset up to the nearest table. So the following ECL code is legal: Here the reference to x.id in the definition of z is referring to a field in the input dataset. Because of these semantics the selector in a normalized tree is actually inputDataset->queryNormalizedSelector() rather than inputDatset. This function currently returns the table expression node (but it may change in the future see below). In some situations ECL allows child datasets to be treated as a dataset without an explicit NORMALIZE. E.g., EXISTS(myDataset.ChildDataset); This is primarily to enable efficient aggregates on disk files to be generated, but it adds some complications with an expression of the form dataset.childdataset.grandchild. E.g.,: Or: In the first example dataset.childdataset within the dataset.childdataset.grandchild is a reference to a dataset that doesn't have an active cursor and needs to be iterated), whilst in the second it refers to an active cursor. To differentiate between the two, all references to fields within datasets/rows that don't have active selectors have an additional attribute("new") as a child of the select. So a no_select with a "new" attribute requires the dataset to be created, one without is a member of an active dataset cursor. If you have a nested row, the new attribute is added to the selection from the dataset, rather than the selection from the nested row. The functions queryDatasetCursor() and querySelectorDataset()) are used to help interpret the meaning. (An alternative would be to use a different node from no_select - possibly this should be considered - it would be more space efficient.) The expression graph generated by the ECL parser doesn't contain any new attributes. These are added as one of the first stages of normalizing the expression graph. Any code that works on normalized expressions needs to take care to interpret no_selects correctly. When an expression graph is transformed and none of the records are changed, the representation of LEFT/RIGHT remains the same. This means any no_select nodes in the expression tree will also stay the same. However, if the transform modifies a table (highly likely) it means that the selector for the second form of field selector will also change. Unfortunately this means that transforms often cannot be short-circuited. It could significantly reduce the extent of the graph that needs traversing, and the number of nodes replaced in a transformed graph if this could be avoided. One possibility is to use a different value for dataset->queryNormalizedSelector() using a unique id associated with the table. I think it would be a good long term change, but it would require unique ids (similar to the selSeq) to be added to all table expressions, and correctly preserved by any optimization. Sometimes it is useful to add information into the expression graph (e.g., symbol names, position information) that doesn't change the meaning, but should be preserved. Annotations allow information to be added in this way. An annotation's implementation of IHqlExpression generally delegates the majority of the methods through to the annotated expression. This means that most code that interrogates the expression graph can ignore their presence, which simplifies the caller significantly. However transforms need to be careful (see below). Information about the annotation can be obtained by calling IHqlExpression:: getAnnotationKind() and IHqlExpression:: queryAnnotation(). In legacy ECL you will see code like the following:: The assumption is that whenever a(x) is evaluated the value of Y will be output. However that doesn't particularly fit in with a declarative expression graph. The code generator creates a special node (no_compound) with child(0) as the output action, and child(1) as the value to be evaluated (g(Y)). If the expression ends up being included in the final query then the action will also be included (via the no_compound). At a later stage the action is migrated to a position in the graph where actions are normally evaluated. There are many pieces of information that it is useful to know about a node in the expression graph - many of which would be expensive to recomputed each time there were required. Eclcc has several mechanisms for caching derived information so it is available efficiently. Boolean flags - getInfoFlags()/getInfoFlags2(). There are many Boolean attributes of an expression that are useful to know - e.g., is it constant, does it have side-effects, does it reference any fields from a dataset etc. etc. The bulk of these are calculated and stored in a couple of members of the expression class. They are normally retrieved via accessor functions e.g., containsAssertKeyed(IHqlExpression*). Active datasets - gatherTablesUsed(). It is very common to want to know which datasets an expression references. This information is calculated and cached on demand and accessed via the IHqlExpression::gatherTablesUsed() functions. There are a couple of other functions IHqlExpression::isIndependentOfScope() and IHqlExpression::usesSelector() which provide efficient functions for common uses. Information stored in the type. Currently datasets contain information about sort order, distribution and grouping as part of the expression type. This information should be accessed through the accessor functions applied to the expression (e.g., isGrouped(expr)). At some point in the future it is planned to move this information as a general derived property (see next). Other derived property. There is a mechanism (in hqlattr) for calculating and caching an arbitrary derived property of an expression. It is currently used for number of rows, location-independent representation, maximum record size etc. . There are typically accessor functions to access the cached information (rather than calling the underlying IHqlExpression::queryAttribute() function). Helper functions. Some information doesn't need to be cached because it isn't expensive to calculate, but rather than duplicating the code, a helper function is provided. E.g., queryOriginalRecord() and hasUnknownTransform(). They are not part of the interface because the number would make the interface unwieldy and they can be completely calculated from the public functions. However, it can be very hard to find the function you are looking for, and they would greatly benefit from being grouped e.g., into namespaces. One of the key processes in eclcc is walking and transforming the expression graphs. Both of these are covered by the term transformations. One of the key things to bear in mind is that you need to walk the expression graph as a graph, not as a tree. If you have already examined a node once you shouldn't repeat the work - otherwise the execution time may be exponential with node depth. Other things to bear in mind It is essential that an expression that is used in different contexts with different annotations (e.g., two different named symbols) is consistently transformed. Otherwise it is possible for a graph to be converted into a tree. E.g.,: must not be converted to: For this reason most transformers will check if expr->queryBody() matches expr, and if not will transform the body (the unannotated expression), and then clone any annotations. Some examples of the work done by transformations are: Some more details on the individual transforms are given below.. The first job of eclcc is to parse the ECL into an expression graph. The source for the ECL can come from various different sources (archive, source files, remote repository). The details are hidden behind the IEclSource/IEclSourceCollection interfaces. The createRepository() function is then used to resolve and parse the various source files on demand. Several things occur while the ECL is being parsed: Function definitions are expanded inline. A slightly unusual behaviour. It means that the expression tree is a fully expanded expression -which is better suited to processing and optimizing. Some limited constant folding occurs. When a function is expanded, often it means that some of the test conditions are always true/false. To reduce the transformations the condition may be folded early on. When a symbol is referenced from another module this will recursively cause the ECL for that module (or definition within that module) to be parsed. Currently the semantic checking is done as the ECL is parsed. If we are going to fully support template functions and delayed expansion of functions this will probably have to change so that a syntax tree is built first, and then the semantic checking is done later. There are various problems with the expression graph that comes out of the parser: Records can have values as children (e.g., { myField := infield.value} ), but it causes chaos if record definitions can change while other transformations are going on. So the normalization removes values from fields. Some activities use records to define the values that output records should contain (e.g., TABLE). These are now converted to another form (e.g., no_newusertable). Sometimes expressions have multiple definition names. Symbols and annotations are rationalized and commoned up to aid commoning up other expressions. Some PATTERN definitions are recursive by name. They are resolved to a form that works if all symbols are removed. The CASE/MAP representation for a dataset/action is awkward for the transforms to process. They are converted to nested Ifs. (At some point a different representation might be a good idea.) EVALUATE is a weird syntax. Instances are replaced with equivalent code which is much easier to subsequently process. The datasets used in index definitions are primarily there to provide details of the fields. The dataset itself may be very complex and may not actually be used. The dataset input to an index is replaced with a dummy "null" dataset to avoid unnecessary graph transforming, and avoid introducing any additional incorrect dependencies. Generally if you use LEFT/RIGHT then the input rows are going to be available wherever they are used. However if they are passed into a function, and that function uses them inside a definition marked as global then that is invalid (since by definition global expressions don't have any context). Similarly if you use syntax <dataset>.<field>, its validity and meaning depends on whether <dataset> is active. The scope transformer ensures that all references to fields are legal, and adds a "new" attribute to any no_selects where it is necessary. This transform simplifies the expression tree. Its aim is to simplify scalar expressions, and dataset expressions that are valid whether or not the nodes are shared. Some examples are: Most of the optimizations are fairly standard, but a few have been added to cover more esoteric examples which have occurred in queries over the years. This transform also supports the option to percolate constants through the graph. E.g., if a project assigns the value 3 to a field, it can substitute the value 3 wherever that field is used in subsequent activities. This can often lead to further opportunities for constant folding (and removing fields in the implicit project). This transformer is used to simplify, combine and reorder dataset expressions. The transformer takes care to count the number of times each expression is used to ensure that none of the transformations cause duplication. E.g., swapping a filter with a sort is a good idea, but if there are two filters of the same sort and they are both swapped you will now be duplicating the sort. Some examples of the optimizations include: ECL tends to be written as general purpose definitions which can then be combined. This can lead to potential inefficiencies - e.g., one definition may summarise some data in 20 different ways, this is then used by another definition which only uses a subset of those results. The implicit project transformer tracks the data flow at each point through the expression graph, and removes any fields that are not required. This often works in combination with the other optimizations. For instance the constant percolation can remove the need for fields, and removing fields can sometimes allow a left outer join to be converted to a project. The code generator ultimately creates workunits. A workunit completely describes a generated query. It consists of two parts. There is an xml component - this contains the workflow information, the various execution graphs, and information about options. It also describes which inputs can be supplied to the query and what results are generated. The other part is the generated shared object compiled from the generated C++. This contains functions and classes that are used by the engines to execute the queries. Often the xml is compressed and stored as a resource within the shared object -so the shared object contains a complete workunit. The actions in a workunit are divided up into individual workflow items. Details of when each workflow item is executed, what its dependencies are stored in the <Workflow> section of the xml. The generated code also contains a class definition, with a method perform() which is used to execute the actions associated with a particular workflow item. (The class instances are created by calling the exported createProcess() factory function). The generated code for an individual workflow item will typically call back into the engine at some point to execute a graph. The activity graphs are stored in the xml. The graph contains details of which activities are required, how those activities link together, what dependencies there are between the activities. For each activity it might contain the following information: Each activity in a graph also has a corresponding helper class instance in the generated code. (The name of the class is cAc followed by the activity number, and the exported factory method is fAc followed by the activity number.) These classes implement the interfaces defined in eclhelper.hpp. The engine uses the information from the xml to produce a graph of activities that need to be executed. It has a general purpose implementation of each activity kind, and it uses the class instance to tailor that general activity to the specific use e.g., what is the filter condition, what fields are set up, what is the sort order? The workunit xml contains details of what inputs can be supplied when that workunit is run. These correspond to STORED definitions in the ECL. The result xml also contains the schema for the results that the workunit will generate. Once an instance of the workunit has been run, the values of the results may be written back into dali's copy of the workunit so they can be retrieved and displayed. Aims for the generated C++ code: Minimal include dependencies. Compile time is an issue - especially for small on-demand queries. To help reduce compile times (and dependencies with the rest of the system) the number of header files included by the generated code is kept to a minimum. In particular references to jlib, boost and icu are kept within the implementation of the runtime functions, and are not included in the public dependencies. Thread-safe. It should be possible to use the members of an activity helper from multiple threads without issue. The helpers may contain some context dependent state, so different instances of the helpers are needed for concurrent use from different contexts (e.g., expansions of a graph.) Concise. The code should be broadly readable, but the variable names etc. are chosen to generate compact code. Functional. Generally the generated code assigns to a variable once, and doesn't modify it afterwards. Some assignments may be conditional, but once the variable is evaluated it isn't updated. (There are of course a few exceptions - e.g., dataset iterators) First a few pointers to help understand the code within eclcc: It makes extensive use of link counting. You need understand that concept to get very far. If something is done more than once then that is generally split into a helper function. The helper functions aren't generally added to the corresponding interface (e.g., IHqlExpression) because the interface would become bloated. Instead they are added as global functions. The big disadvantage of this approach is they can be hard to find. Even better would be for them to be rationalised and organised into namespaces. The code is generally thread-safe unless there would be a significant performance implication. In generally all the code used by the parser for creating expressions is thread safe. Expression graph transforms are thread-safe, and can execute in parallel if a constant (NUM_PARALLEL_TRANSFORMS) is increased. The data structures used to represent the generated code are NOT thread-safe. Much of the code generation is structured fairly procedurally, with classes used to process the stages within it. There is a giant "God" class HqlCppTranslator - which could really do with refactoring. The eclcc parser uses the standard tools bison and flex to process the ECL and convert it to a : expression graph. There are a couple of idiosyncrasies with the way it is implemented. Macros with fully qualified scope. Slightly unusually macros are defined in the same way that other definitions are - in particular to can have references to macros in other modules. This means that there are references to macros within the grammar file (instead of being purely handled by a pre-processor). It also means the lexer keeps an active stack of macros being processed. Attributes on operators. Many of the operators have optional attributes (e.g., KEEP, INNER, LOCAL, ...). If these were all reserved words it would remove a significant number of keywords from use as symbols, and could also mean that when a new attribute was added it broke existing code. To avoid this the lexer looks ahead in the parser tables (by following the potential reductions) to see if the token really could come next. If it can't then it isn't reserved as a symbol. As the workunit is created the code generator builds up the generated code and the xml for the workunit. Most of the xml generation is encapsulated within the IWorkUnit interface. The xml for the graphs is created in an IPropertyTree, and added to the workunit as a block. The C++ generation is ultimately controlled by some template files (thortpl.cpp). The templates are plain text and contain references to allow named sections of code to be expanded at particular points. The code generator builds up some structures in memory for each of those named sections. Once the generation is complete some peephole optimization is applied to the code. This structure is walked to expand each named section of code as required. The BuildCtx class provides a cursor into that generated C++. It will either be created for a given named section, or more typically from another BuildCtx. It has methods for adding the different types of statements. Some are simple (e.g., addExpr()), whilst some create a compound statement (e.g., addFilter). The compound statements change the active selector so any new statements are added within that compound statement. As well as building up a tree of expressions, this data structure also maintains a tree of associations. For instance when a value is evaluated and assigned to a temporary variable, the logical value is associated with that temporary. If the same expression is required later, the association is matched, and the temporary value is used instead of recalculating it. The associations are also used to track the active datasets, classes generated for row-meta information, activity classes etc. etc. Each activity in an expression graph will have an associated class generated in the C++. Each different activity kind expects a helper that implements a particular IHThorArg interface. E.g., a sort activity of kind TAKsort requires a helper that implements IHThorSortArg. The associated factory function is used to create instances of the helper class. The generated class might take one of two forms: This is a class that is used by the engines to encapsulate all the information about a single row -e.g., the format that each activity generates. It is an implementation of the IOutputMeta interface. It includes functions to The same expression nodes are used for representing expressions in the generated C++ as the original ECL expression graph. It is important to keep track of whether an expression represents untranslated ECL, or the "translated" C++. For instance ECL has 1 based indexes, while C++ is zero based. If you processed the expression x[1] it might get translated to x[0] in C++. Translating it again would incorrectly refer to x[-1]. There are two key classes used while building the C++ for an ECL expression: CHqlBoundExpr. This represents a value that has been converted to C++. Depending on the type, one or more of the fields will be filled in. CHqlBoundTarget. This represents the target of an assignment -C++ variable(s) that are going to be assigned the result of evaluating an expression. It is almost always passed as a const parameter to a function because the target is well-defined and the function needs to update that target. A C++ expression is sometimes converted back to an ECL pseudo-expression by calling getTranslatedExpr(). This creates an expression node of kind no_translated to indicate the child expression has already been converted. The generation code for expressions has a hierarchy of calls. Each function is there to allow optimal code to be generated - e.g., not creating a temporary variable if none are required. A typical flow might be: buildExpr(ctx, expr, bound). Evaluate the ecl expression "expr" and save the C++ representation in the class bound. This might then call through to... buildTempExpr(ctx, expr, bound); Create a temporary variable, and evaluate expr and assign it to that temporary variable.... Which then calls. buildExprAssign(ctx, target, expr); evaluate the expression, and ensure it is assigned to the C++ target "target". The default implementation might be to call buildExpr.... An operator must either be implemented in buildExpr() (calling a function called doBuildExprXXX) or in buildExprAssign() (calling a function called doBuildAssignXXX). Some operators are implemented in both places if there are different implementations that would be more efficient in each context. Similarly there are several different assignment functions: The different varieties are there depending on whether the source value or targets have already been translated. (The names could be rationalised!) Most dataset operations are only implemented as activities (e.g., PARSE, DEDUP). If these are used within a transform/filter then eclcc will generate a call to a child query. An activity helper for the appropriate operation will then be generated. However a subset of the dataset operations can also be evaluated inline without calling a child query. Some examples are filters, projects, and simple aggregation. It removes the overhead of the child query call in the simple cases, and often generates more concise code. When datasets are evaluated inline there is a similar hierarchy of function calls: buildDatasetAssign(ctx, target, expr); Evaluate the dataset expression, and assign it to the target (a builder interface). This may then call.... buildIterate(ctx, expr) Iterate through each of the rows in the dataset expression in turn. Which may then call... buildDataset(ctx, expr, target, format) Build the entire dataset, and return it as a single value. Some of the operations (e.g., aggregating a filtered dataset) can be done more efficiently by summing and filtering an iterator, than forcing the filtered dataset to be evaluated first. The interface IHqlCppDatasetCursor allows the code generator to iterate through a dataset, or select a particular element from a dataset. It is used to hide the different representation of datasets, e.g., Generally rows that are serialized (e.g., on disk) are in blocked format, and they are stored as link counted rows in memory. The IReferenceSelector interface and the classes in hqltcppc[2] provide an interface for getting and setting values within a row of a dataset. They hide the details of the layout - e.g., csv/xml/raw data, and the details of exactly how each type is represented in the row. The current implementation of keys in HPCC uses a format which uses a separate 8 byte integer field which was historically used to store the file position in the original file. Other complications are that the integer fields are stored big-endian, and signed integer values are biased. This introduces some complication in the way indexes are handled. You will often find that the logical index definition is replaced with a physical index definition, followed by a project to convert it to the logical view. A similar process occurs for disk files to support VIRTUAL(FILEPOSITION) etc. The following are the main directories used by the ecl compiler. Directory Contents rtl/eclrtpl Template text files used to generate the C++ code rtl/include Headers that declare interfaces implemented by the generated code common/deftype Interfaces and classes for scalar types and values. common/workunit Code for managing the representation of a work unit. ecl/hql Classes and interfaces for parsing and representing an ecl expression graph ecl/hqlcpp Classes for converting an expression graph to a work unit (and C++) As mentioned at the start of this document, one of the main challenges with eclcc is converting the declarative ECL code into imperative C++ code. The direction we are heading in is to allow the engines to support more lazy-evaluation so possibly in this instance to evaluate it the first time it is used (although that may potentially be much less efficient). This will allow the code generator to relax some of its current assumptions. There are several example queries which are already producing pathological behaviour from eclcc, causing it to generate C++ functions which are many thousands of lines long. Currently the grammar for the parser is too specialised. In particular the separate productions for expression, datasets, actions cause problems - e.g., it is impossible to properly allow sets of datasets to be treated in the same way as other sets. The semantic checking (and probably semantic interpretation) is done too early. Really the parser should build up a syntax tree, and then disambiguate it and perform the semantic checks on the syntax tree. The function calls should probably be expanded later than they are. I have tried in the past and hit problems, but I can't remember all the details. Some are related to the semantic checking. The primary purpose of the code generator is to take an ECL query and convert it into a work unit that is suitable for running by one of the engines. The code generator has to do its job accurately. If the code generator does not correctly map the ECL to the workunit it can lead to corrupt data and invalid results. Problems like that can often be very hard and frustrating for the ECL users to track down. There is also a strong emphasis on generating output that is as good as possible. Eclcc contains many different optimization stages, and is extensible to allow others to be easily added. Eclcc needs to be able to cope with reasonably large jobs. Queries that contain several megabytes of ECL, and generate tens of thousands of activities, and 10s of Mb of C++ are routine. These queries need to be processed relatively quickly. Nearly all the processing of ECL is done using an expression graph. The representation of the expression graph has some particular characteristics: The ECL language is a declarative language, and in general is assumed to be pure - i.e. there are no side-effects, expressions can be evaluated lazily and re-evaluating an expression causes no problems. This allows eclcc to transform the graph in lots of interesting ways. (Life is never that simple so there are mechanisms for handling the exceptions.) One of the main challenges with eclcc is converting the declarative ECL code into imperative C++ code. One key problem is it needs to try to ensure that code is only evaluated when it is required, but that it is also only evaluated once. It isn't always possible to satisfy both constraints - for example a global dataset expression used within a child query. Should it be evaluated once before the activity containing the child query is called, or each time the child query is called? If it is called on demand then it may not be evaluated as efficiently... This issue complicates many of the optimizations and transformations that are done to the queries. Long term the plan is to allow the engines to support more delayed lazy-evaluation, so that whether something is evaluated is more dynamic rather than static. The idealised view of the processing within eclcc follows the following stages: In practice the progression is not so clear cut. There tends to be some overlap between the different stages, and some of them may occur in slightly different orders. However the order broadly holds. Before any change is accepted for the code generator it is always run against several regression suites to ensure that it doesn't introduce any problems, and that the change has the desired effect. There are several different regression suites: The ecl/regress directory contains a script 'regress.sh' that is used for running the regression tests. It should be executed in the directory containing the ecl files. The script generates the c++ code (and workunits) for each of the source files to a target directory, and then executes a comparison program to compare the new results with a previous "golden" reference set. Before making any changes to the compiler, a reference set should be created by running the regression script and copying the generated files to the reference directory. Here is a sample command line (A version of this command resides in a shell script in each of my regression suite directories, with the -t and -c options adapted for each suite.) For a full list of options execute the script with no parameters, or take a look at the script itself. A couple of useful options are: We strongly recommend using a comparison program which allows rules to be defined to ignore certain differences (e.g., beyond compare). It is much quicker to run eclcc directly from the build directory, rather than deploying a system and running eclcc from there. To do this you need to configure some options that eclcc requires, e.g. where the include files are found. The options can be set by either setting environment variables or by specifiying options in an eclcc.ini file. The following are the names of the different options: Environment flag Ini file option CL_PATH compilerPath ECLCC_LIBRARY_PATH libraryPath ECLCC_INCLUDE_PATH includePath ECLCC_PLUGIN_PATH plugins HPCC_FILEHOOKS_PATH filehooks ECLCC_TPL_PATH templatePath ECLCC_ECLLIBRARY_PATH eclLibrariesPath The eclcc.ini can either be a file in the local directory, or specified on the eclcc command line with -specs. Including the settings in a local eclcc.ini file also it easy to debug eclcc directly from the build directory within the eclipse environment. Logging There is an option for eclcc to output a logging file, and another to specify the level of detail in that logging file. If the detail level is above 500 then the expresssion tree for the query is output to the logging file after each of the code transformations. The tracing is very useful for tracking down at which stage inconsistencies are introduced in the expression graph, and also for learning how each transformation affects the query. The output format defaults to ECL - which is regenerated from the expression tree. (This ECL cannot generally be compiled without editing - partly because it contains extra annoations.) Use either of the following: -ftraceIR There is a debug option (-ftraceIR) that generates an intermediate representation of the expression graph rather than regenerating ECL. The output tends to be less compact and harder to read quickly, but has the advantage of being better structured, and contains more details of the internal representation. ecl/hql/hqlir.cpp contains more details of the format. Adding extra logging into the source code If you want to add tracing of expressions at any point in the code generation then adding either of the following calls will include the expression details in the log file: Logging while debugging If you are debugging inside gdb it is often useful to be able to dump out details of an expression. Calling EclIR:dump_ir(expr); will generate the IR to stdout. The function can also be used with multiple parameters. Each expression will be dumped out, but common child nodes will only be generated once. This can be very useful when trying to determine the difference between two expressions. The quickest way is to call Expression sequence ids. Sometimes it can be hard to determine where a particular IHqlExpression node was created. If that is the case, then defining The key data structure within eclcc is the graph representation. The design has some key elements. Once a node is created it is never modified. Some derived information (e.g., sort order, number of records, unique hash, ...) might be calculated and stored in the class after it has been created, but that doesn't change what the node represents in any way. Some nodes are created in stages - e.g., records, modules. These nodes are marked as fully completed when closeExpr() is called, after which they cannot be modified. Nodes are always commoned up. If the same operator has the same arguments and type then there will be a unique IHqlExpression to represent it. This helps ensure that graphs stay as graphs and don't get converted to trees. It also helps with optimizations, and allows code duplicated in two different contexts to be brought together. The nodes are link counted. Link counts are used to control the lifetime of the expression objects. Whenever a reference to an expression node is held, its link count is increased, and decreased when no longer required. The node is freed when there are no more references. (This generally works well, but does give us problems with forward references.) The access to the graph is through interfaces. The main interfaces are IHqlExpression, IHqlDataset and IHqlScope. They are all defined in hqlexpr.hpp. The aim of the interfaces is to hide the implementation of the expression nodes so they can be restructured and changed without affecting any other code. The expression classes use interfaces and a type field rather than polymorphism. This could be argued to be bad object design...but. There are more than 500 different possible operators. If a class was created for each of them the system would quickly become unwieldy. Instead there are several different classes which model the different types of expression (dataset/expression/scope). The interfaces contain everything needed to create and interrogate an expression tree, but they do not contain functionality for directly processing the graph. To avoid some of the shortcomings of type fields there are various mechanisms for accessing derived attributes which avoid interrogating the type field. Memory consumption is critical. It is not unusual to have 10M or even 100M nodes in memory as a query is being processed. At that scale the memory consumption of each node matters - so great care should be taken when considering increasing the size of the objects. The node classes contain a class hierarchy which is there purely to reduce the memory consumption - not to reflect the functionality. With no memory constraints they wouldn't be there, but removing a single pointer per node can save 1Gb of memory usage for very complex queries. This is the interface that is used to walk and interrogate the expression graph once it has been created. Some of the main functions are: getOperator() What does this node represent? It returns a member of the node_operator enumerated type. numChildren() How many arguments does node have? queryChild(unsigned n) What is the nth child? If the argument is out of range it returns NULL. queryType() The type of this node. queryBody() Used to skip annotations (see below) queryProperty() Does this node have a child which is an attribute that matches a given name. (see below for more about attributes). queryValue() For a no_constant return the value of the constant. It returns NULL otherwise. The nodes in the expression graph are created through factory functions. Some of the expression types have specialised functions - e.g., createDataset, createRow, createDictionary, but scalar expressions and actions are normally created with createValue(). Note: Generally ownership of the arguments to the createX() functions are assumed to be taken over by the newly created node. The values of the enumeration constants in node_operator are used to calculate "crcs" which are used to check if the ECL for a query matches, and if disk and index record formats match. It contains quite a few legacy entries no_unusedXXX which can be used for new operators (otherwise new operators must be added to the end). This interface is implemented by records, and is used to map names to the fields within the records. If a record contains IFBLOCKs then each of the fields in the ifblock is defined in the IHqlSimpleScope for the containing record. Normally obtained by calling IHqlExpression::queryScope(). It is primarily used in the parser to resolve fields from within modules. The ECL is parsed on demand so as the symbol is looked up it may cause a cascade of ECL to be compiled. The lookup context (HqlLookupContext ) is passed to IHqlScope::lookupSymbol() for several reasons: The interface IHqlScope currently has some members that are used for creation; this should be refactored and placed in a different interface. This is normally obtained by calling IHqlExpression::queryDataset(). It has shrunk in size over time, and could quite possibly be folded into IHqlExpression with little pain. There is a distinction in the code generator between "tables" and "datasets". A table (IHqlDataset::queryTable()) is a dataset operation that defines a new output record. Any operation that has a transform or record that defines an output record (e.g., PROJECT,TABLE) is a table, whilst those that don't (e.g., a filter, dedup) are not. There are a few apparent exceptions -e.g., IF (This is controlled by definesColumnList() which returns true the operator is a table.) There are two related by slightly different concepts. An attribute refers to the explicit flags that are added to operators (e.g., , LOCAL, KEEP(n) etc. specified in the ECL or some internal attributes added by the code generator). There are a couple of different functions for creating attributes. createExtraAttribute() should be used by default. createAttribute() is reserved for an attribute that never has any arguments, or in unusual situations where it is important that the arguments are never transformed. They are tested using queryAttribute()/hasAttribute() and represented by nodes of kind no_attr/no_expr_attr. The term "property" refers to computed information (e.g., record counts) that can be derived from the operator, its arguments and attributes. They are covered in more detail below. Fields can be selected from active rows of a dataset in three main ways: Some operators define LEFT/RIGHT to represent an input or processed dataset. Fields from these active rows are referenced with LEFT.<field-name>. Here LEFT or RIGHT is the "selector". Other operators use the input dataset as the selector. E.g., myFile(myFile.id != 0). Here the input dataset is the "selector". Often when the input dataset is used as the selector it can be omitted. E.g., myFile(id != 0). This is implicitly expanded by the PARSER to the second form. A reference to a field is always represented in the expression graph as a node of kind no_select (with createSelectExpr). The first child is the selector, and the second is the field. Needless to say there are some complications... LEFT/RIGHT. The problem is that the different uses of LEFT/RIGHT need to be disambiguated since there may be several different uses of LEFT in a query. This is especially true when operations are executed in child queries. LEFT is represented by a node no_left(record, selSeq). Often the record is sufficient to disambiguate the uses, but there are situations where it isn't enough. So in addition no_left has a child which is a selSeq (selector sequence) which is added as a child attribute of the PROJECT or other operator. At parse time it is a function of the input dataset that is later normalized to a unique id to reduce the transformation work. Active datasets. It is slightly more complicated - because the dataset used as the selector can be any upstream dataset up to the nearest table. So the following ECL code is legal: Here the reference to x.id in the definition of z is referring to a field in the input dataset. Because of these semantics the selector in a normalized tree is actually inputDataset->queryNormalizedSelector() rather than inputDatset. This function currently returns the table expression node (but it may change in the future see below). In some situations ECL allows child datasets to be treated as a dataset without an explicit NORMALIZE. E.g., EXISTS(myDataset.ChildDataset); This is primarily to enable efficient aggregates on disk files to be generated, but it adds some complications with an expression of the form dataset.childdataset.grandchild. E.g.,: Or: In the first example dataset.childdataset within the dataset.childdataset.grandchild is a reference to a dataset that doesn't have an active cursor and needs to be iterated), whilst in the second it refers to an active cursor. To differentiate between the two, all references to fields within datasets/rows that don't have active selectors have an additional attribute("new") as a child of the select. So a no_select with a "new" attribute requires the dataset to be created, one without is a member of an active dataset cursor. If you have a nested row, the new attribute is added to the selection from the dataset, rather than the selection from the nested row. The functions queryDatasetCursor() and querySelectorDataset()) are used to help interpret the meaning. (An alternative would be to use a different node from no_select - possibly this should be considered - it would be more space efficient.) The expression graph generated by the ECL parser doesn't contain any new attributes. These are added as one of the first stages of normalizing the expression graph. Any code that works on normalized expressions needs to take care to interpret no_selects correctly. When an expression graph is transformed and none of the records are changed, the representation of LEFT/RIGHT remains the same. This means any no_select nodes in the expression tree will also stay the same. However, if the transform modifies a table (highly likely) it means that the selector for the second form of field selector will also change. Unfortunately this means that transforms often cannot be short-circuited. It could significantly reduce the extent of the graph that needs traversing, and the number of nodes replaced in a transformed graph if this could be avoided. One possibility is to use a different value for dataset->queryNormalizedSelector() using a unique id associated with the table. I think it would be a good long term change, but it would require unique ids (similar to the selSeq) to be added to all table expressions, and correctly preserved by any optimization. Sometimes it is useful to add information into the expression graph (e.g., symbol names, position information) that doesn't change the meaning, but should be preserved. Annotations allow information to be added in this way. An annotation's implementation of IHqlExpression generally delegates the majority of the methods through to the annotated expression. This means that most code that interrogates the expression graph can ignore their presence, which simplifies the caller significantly. However transforms need to be careful (see below). Information about the annotation can be obtained by calling IHqlExpression:: getAnnotationKind() and IHqlExpression:: queryAnnotation(). In legacy ECL you will see code like the following:: The assumption is that whenever a(x) is evaluated the value of Y will be output. However that doesn't particularly fit in with a declarative expression graph. The code generator creates a special node (no_compound) with child(0) as the output action, and child(1) as the value to be evaluated (g(Y)). If the expression ends up being included in the final query then the action will also be included (via the no_compound). At a later stage the action is migrated to a position in the graph where actions are normally evaluated. There are many pieces of information that it is useful to know about a node in the expression graph - many of which would be expensive to recomputed each time there were required. Eclcc has several mechanisms for caching derived information so it is available efficiently. Boolean flags - getInfoFlags()/getInfoFlags2(). There are many Boolean attributes of an expression that are useful to know - e.g., is it constant, does it have side-effects, does it reference any fields from a dataset etc. etc. The bulk of these are calculated and stored in a couple of members of the expression class. They are normally retrieved via accessor functions e.g., containsAssertKeyed(IHqlExpression*). Active datasets - gatherTablesUsed(). It is very common to want to know which datasets an expression references. This information is calculated and cached on demand and accessed via the IHqlExpression::gatherTablesUsed() functions. There are a couple of other functions IHqlExpression::isIndependentOfScope() and IHqlExpression::usesSelector() which provide efficient functions for common uses. Information stored in the type. Currently datasets contain information about sort order, distribution and grouping as part of the expression type. This information should be accessed through the accessor functions applied to the expression (e.g., isGrouped(expr)). At some point in the future it is planned to move this information as a general derived property (see next). Other derived property. There is a mechanism (in hqlattr) for calculating and caching an arbitrary derived property of an expression. It is currently used for number of rows, location-independent representation, maximum record size etc. . There are typically accessor functions to access the cached information (rather than calling the underlying IHqlExpression::queryAttribute() function). Helper functions. Some information doesn't need to be cached because it isn't expensive to calculate, but rather than duplicating the code, a helper function is provided. E.g., queryOriginalRecord() and hasUnknownTransform(). They are not part of the interface because the number would make the interface unwieldy and they can be completely calculated from the public functions. However, it can be very hard to find the function you are looking for, and they would greatly benefit from being grouped e.g., into namespaces. One of the key processes in eclcc is walking and transforming the expression graphs. Both of these are covered by the term transformations. One of the key things to bear in mind is that you need to walk the expression graph as a graph, not as a tree. If you have already examined a node once you shouldn't repeat the work - otherwise the execution time may be exponential with node depth. Other things to bear in mind It is essential that an expression that is used in different contexts with different annotations (e.g., two different named symbols) is consistently transformed. Otherwise it is possible for a graph to be converted into a tree. E.g.,: must not be converted to: For this reason most transformers will check if expr->queryBody() matches expr, and if not will transform the body (the unannotated expression), and then clone any annotations. Some examples of the work done by transformations are: Some more details on the individual transforms are given below.. The first job of eclcc is to parse the ECL into an expression graph. The source for the ECL can come from various different sources (archive, source files, remote repository). The details are hidden behind the IEclSource/IEclSourceCollection interfaces. The createRepository() function is then used to resolve and parse the various source files on demand. Several things occur while the ECL is being parsed: Function definitions are expanded inline. A slightly unusual behaviour. It means that the expression tree is a fully expanded expression -which is better suited to processing and optimizing. Some limited constant folding occurs. When a function is expanded, often it means that some of the test conditions are always true/false. To reduce the transformations the condition may be folded early on. When a symbol is referenced from another module this will recursively cause the ECL for that module (or definition within that module) to be parsed. Currently the semantic checking is done as the ECL is parsed. If we are going to fully support template functions and delayed expansion of functions this will probably have to change so that a syntax tree is built first, and then the semantic checking is done later. There are various problems with the expression graph that comes out of the parser: Records can have values as children (e.g., { myField := infield.value} ), but it causes chaos if record definitions can change while other transformations are going on. So the normalization removes values from fields. Some activities use records to define the values that output records should contain (e.g., TABLE). These are now converted to another form (e.g., no_newusertable). Sometimes expressions have multiple definition names. Symbols and annotations are rationalized and commoned up to aid commoning up other expressions. Some PATTERN definitions are recursive by name. They are resolved to a form that works if all symbols are removed. The CASE/MAP representation for a dataset/action is awkward for the transforms to process. They are converted to nested Ifs. (At some point a different representation might be a good idea.) EVALUATE is a weird syntax. Instances are replaced with equivalent code which is much easier to subsequently process. The datasets used in index definitions are primarily there to provide details of the fields. The dataset itself may be very complex and may not actually be used. The dataset input to an index is replaced with a dummy "null" dataset to avoid unnecessary graph transforming, and avoid introducing any additional incorrect dependencies. Generally if you use LEFT/RIGHT then the input rows are going to be available wherever they are used. However if they are passed into a function, and that function uses them inside a definition marked as global then that is invalid (since by definition global expressions don't have any context). Similarly if you use syntax <dataset>.<field>, its validity and meaning depends on whether <dataset> is active. The scope transformer ensures that all references to fields are legal, and adds a "new" attribute to any no_selects where it is necessary. This transform simplifies the expression tree. Its aim is to simplify scalar expressions, and dataset expressions that are valid whether or not the nodes are shared. Some examples are: Most of the optimizations are fairly standard, but a few have been added to cover more esoteric examples which have occurred in queries over the years. This transform also supports the option to percolate constants through the graph. E.g., if a project assigns the value 3 to a field, it can substitute the value 3 wherever that field is used in subsequent activities. This can often lead to further opportunities for constant folding (and removing fields in the implicit project). This transformer is used to simplify, combine and reorder dataset expressions. The transformer takes care to count the number of times each expression is used to ensure that none of the transformations cause duplication. E.g., swapping a filter with a sort is a good idea, but if there are two filters of the same sort and they are both swapped you will now be duplicating the sort. Some examples of the optimizations include: ECL tends to be written as general purpose definitions which can then be combined. This can lead to potential inefficiencies - e.g., one definition may summarise some data in 20 different ways, this is then used by another definition which only uses a subset of those results. The implicit project transformer tracks the data flow at each point through the expression graph, and removes any fields that are not required. This often works in combination with the other optimizations. For instance the constant percolation can remove the need for fields, and removing fields can sometimes allow a left outer join to be converted to a project. The code generator ultimately creates workunits. A workunit completely describes a generated query. It consists of two parts. There is an xml component - this contains the workflow information, the various execution graphs, and information about options. It also describes which inputs can be supplied to the query and what results are generated. The other part is the generated shared object compiled from the generated C++. This contains functions and classes that are used by the engines to execute the queries. Often the xml is compressed and stored as a resource within the shared object -so the shared object contains a complete workunit. The actions in a workunit are divided up into individual workflow items. Details of when each workflow item is executed, what its dependencies are stored in the <Workflow> section of the xml. The generated code also contains a class definition, with a method perform() which is used to execute the actions associated with a particular workflow item. (The class instances are created by calling the exported createProcess() factory function). The generated code for an individual workflow item will typically call back into the engine at some point to execute a graph. The activity graphs are stored in the xml. The graph contains details of which activities are required, how those activities link together, what dependencies there are between the activities. For each activity it might contain the following information: Each activity in a graph also has a corresponding helper class instance in the generated code. (The name of the class is cAc followed by the activity number, and the exported factory method is fAc followed by the activity number.) These classes implement the interfaces defined in eclhelper.hpp. The engine uses the information from the xml to produce a graph of activities that need to be executed. It has a general purpose implementation of each activity kind, and it uses the class instance to tailor that general activity to the specific use e.g., what is the filter condition, what fields are set up, what is the sort order? The workunit xml contains details of what inputs can be supplied when that workunit is run. These correspond to STORED definitions in the ECL. The result xml also contains the schema for the results that the workunit will generate. Once an instance of the workunit has been run, the values of the results may be written back into dali's copy of the workunit so they can be retrieved and displayed. Aims for the generated C++ code: Minimal include dependencies. Compile time is an issue - especially for small on-demand queries. To help reduce compile times (and dependencies with the rest of the system) the number of header files included by the generated code is kept to a minimum. In particular references to jlib, boost and icu are kept within the implementation of the runtime functions, and are not included in the public dependencies. Thread-safe. It should be possible to use the members of an activity helper from multiple threads without issue. The helpers may contain some context dependent state, so different instances of the helpers are needed for concurrent use from different contexts (e.g., expansions of a graph.) Concise. The code should be broadly readable, but the variable names etc. are chosen to generate compact code. Functional. Generally the generated code assigns to a variable once, and doesn't modify it afterwards. Some assignments may be conditional, but once the variable is evaluated it isn't updated. (There are of course a few exceptions - e.g., dataset iterators) First a few pointers to help understand the code within eclcc: It makes extensive use of link counting. You need understand that concept to get very far. If something is done more than once then that is generally split into a helper function. The helper functions aren't generally added to the corresponding interface (e.g., IHqlExpression) because the interface would become bloated. Instead they are added as global functions. The big disadvantage of this approach is they can be hard to find. Even better would be for them to be rationalised and organised into namespaces. The code is generally thread-safe unless there would be a significant performance implication. In generally all the code used by the parser for creating expressions is thread safe. Expression graph transforms are thread-safe, and can execute in parallel if a constant (NUM_PARALLEL_TRANSFORMS) is increased. The data structures used to represent the generated code are NOT thread-safe. Much of the code generation is structured fairly procedurally, with classes used to process the stages within it. There is a giant "God" class HqlCppTranslator - which could really do with refactoring. The eclcc parser uses the standard tools bison and flex to process the ECL and convert it to a : expression graph. There are a couple of idiosyncrasies with the way it is implemented. Macros with fully qualified scope. Slightly unusually macros are defined in the same way that other definitions are - in particular to can have references to macros in other modules. This means that there are references to macros within the grammar file (instead of being purely handled by a pre-processor). It also means the lexer keeps an active stack of macros being processed. Attributes on operators. Many of the operators have optional attributes (e.g., KEEP, INNER, LOCAL, ...). If these were all reserved words it would remove a significant number of keywords from use as symbols, and could also mean that when a new attribute was added it broke existing code. To avoid this the lexer looks ahead in the parser tables (by following the potential reductions) to see if the token really could come next. If it can't then it isn't reserved as a symbol. As the workunit is created the code generator builds up the generated code and the xml for the workunit. Most of the xml generation is encapsulated within the IWorkUnit interface. The xml for the graphs is created in an IPropertyTree, and added to the workunit as a block. The C++ generation is ultimately controlled by some template files (thortpl.cpp). The templates are plain text and contain references to allow named sections of code to be expanded at particular points. The code generator builds up some structures in memory for each of those named sections. Once the generation is complete some peephole optimization is applied to the code. This structure is walked to expand each named section of code as required. The BuildCtx class provides a cursor into that generated C++. It will either be created for a given named section, or more typically from another BuildCtx. It has methods for adding the different types of statements. Some are simple (e.g., addExpr()), whilst some create a compound statement (e.g., addFilter). The compound statements change the active selector so any new statements are added within that compound statement. As well as building up a tree of expressions, this data structure also maintains a tree of associations. For instance when a value is evaluated and assigned to a temporary variable, the logical value is associated with that temporary. If the same expression is required later, the association is matched, and the temporary value is used instead of recalculating it. The associations are also used to track the active datasets, classes generated for row-meta information, activity classes etc. etc. Each activity in an expression graph will have an associated class generated in the C++. Each different activity kind expects a helper that implements a particular IHThorArg interface. E.g., a sort activity of kind TAKsort requires a helper that implements IHThorSortArg. The associated factory function is used to create instances of the helper class. The generated class might take one of two forms: This is a class that is used by the engines to encapsulate all the information about a single row -e.g., the format that each activity generates. It is an implementation of the IOutputMeta interface. It includes functions to The same expression nodes are used for representing expressions in the generated C++ as the original ECL expression graph. It is important to keep track of whether an expression represents untranslated ECL, or the "translated" C++. For instance ECL has 1 based indexes, while C++ is zero based. If you processed the expression x[1] it might get translated to x[0] in C++. Translating it again would incorrectly refer to x[-1]. There are two key classes used while building the C++ for an ECL expression: CHqlBoundExpr. This represents a value that has been converted to C++. Depending on the type, one or more of the fields will be filled in. CHqlBoundTarget. This represents the target of an assignment -C++ variable(s) that are going to be assigned the result of evaluating an expression. It is almost always passed as a const parameter to a function because the target is well-defined and the function needs to update that target. A C++ expression is sometimes converted back to an ECL pseudo-expression by calling getTranslatedExpr(). This creates an expression node of kind no_translated to indicate the child expression has already been converted. The generation code for expressions has a hierarchy of calls. Each function is there to allow optimal code to be generated - e.g., not creating a temporary variable if none are required. A typical flow might be: buildExpr(ctx, expr, bound). Evaluate the ecl expression "expr" and save the C++ representation in the class bound. This might then call through to... buildTempExpr(ctx, expr, bound); Create a temporary variable, and evaluate expr and assign it to that temporary variable.... Which then calls. buildExprAssign(ctx, target, expr); evaluate the expression, and ensure it is assigned to the C++ target "target". The default implementation might be to call buildExpr.... An operator must either be implemented in buildExpr() (calling a function called doBuildExprXXX) or in buildExprAssign() (calling a function called doBuildAssignXXX). Some operators are implemented in both places if there are different implementations that would be more efficient in each context. Similarly there are several different assignment functions: The different varieties are there depending on whether the source value or targets have already been translated. (The names could be rationalised!) Most dataset operations are only implemented as activities (e.g., PARSE, DEDUP). If these are used within a transform/filter then eclcc will generate a call to a child query. An activity helper for the appropriate operation will then be generated. However a subset of the dataset operations can also be evaluated inline without calling a child query. Some examples are filters, projects, and simple aggregation. It removes the overhead of the child query call in the simple cases, and often generates more concise code. When datasets are evaluated inline there is a similar hierarchy of function calls: buildDatasetAssign(ctx, target, expr); Evaluate the dataset expression, and assign it to the target (a builder interface). This may then call.... buildIterate(ctx, expr) Iterate through each of the rows in the dataset expression in turn. Which may then call... buildDataset(ctx, expr, target, format) Build the entire dataset, and return it as a single value. Some of the operations (e.g., aggregating a filtered dataset) can be done more efficiently by summing and filtering an iterator, than forcing the filtered dataset to be evaluated first. The interface IHqlCppDatasetCursor allows the code generator to iterate through a dataset, or select a particular element from a dataset. It is used to hide the different representation of datasets, e.g., Generally rows that are serialized (e.g., on disk) are in blocked format, and they are stored as link counted rows in memory. The IReferenceSelector interface and the classes in hqltcppc[2] provide an interface for getting and setting values within a row of a dataset. They hide the details of the layout - e.g., csv/xml/raw data, and the details of exactly how each type is represented in the row. The current implementation of keys in HPCC uses a format which uses a separate 8 byte integer field which was historically used to store the file position in the original file. Other complications are that the integer fields are stored big-endian, and signed integer values are biased. This introduces some complication in the way indexes are handled. You will often find that the logical index definition is replaced with a physical index definition, followed by a project to convert it to the logical view. A similar process occurs for disk files to support VIRTUAL(FILEPOSITION) etc. The following are the main directories used by the ecl compiler. Directory Contents rtl/eclrtpl Template text files used to generate the C++ code rtl/include Headers that declare interfaces implemented by the generated code common/deftype Interfaces and classes for scalar types and values. common/workunit Code for managing the representation of a work unit. ecl/hql Classes and interfaces for parsing and representing an ecl expression graph ecl/hqlcpp Classes for converting an expression graph to a work unit (and C++) As mentioned at the start of this document, one of the main challenges with eclcc is converting the declarative ECL code into imperative C++ code. The direction we are heading in is to allow the engines to support more lazy-evaluation so possibly in this instance to evaluate it the first time it is used (although that may potentially be much less efficient). This will allow the code generator to relax some of its current assumptions. There are several example queries which are already producing pathological behaviour from eclcc, causing it to generate C++ functions which are many thousands of lines long. Currently the grammar for the parser is too specialised. In particular the separate productions for expression, datasets, actions cause problems - e.g., it is impossible to properly allow sets of datasets to be treated in the same way as other sets. The semantic checking (and probably semantic interpretation) is done too early. Really the parser should build up a syntax tree, and then disambiguate it and perform the semantic checks on the syntax tree. The function calls should probably be expanded later than they are. I have tried in the past and hit problems, but I can't remember all the details. Some are related to the semantic checking. The Code Submissions document is aimed at developers that are submitting PRs. This document describes some of the goals and expectations for code reviewers. Code reviews have a few different goals: It is NOT a goal to change the submission until it matches how the reviewer would have coded it. Some general comments on code reviews: All code reviews don't need to be equally strict. The "strictness" of the review should reflect the importance and location of the change. Some examples: What are some examples of checks to bear in mind when reviewing code? General: Content: When reading comments in a review it can sometimes be hard to know why the reviewer made a comment, or what response is expected. If there is any doubt the contributor should ask. However to make it clearer we are aiming to always add a tag to the front of each review comment. The tag will give an indication of why the comment is being made, its severity and what kind of response is expected. Here is a provisional table of tags: The comments should always be constructive. The reviewer should have a reason for each of them, and be able to articulate the reason in the comment or when asked. "I wouldn't have done it like that" is not a good enough on its own! Similarly there is a difference in opinion within the team on some style issues - e.g. standard libraries or jlib, inline or out of line functions, nested or non-nested classes. Reviews should try and avoid commenting on these unless there is a clear reason why they are significant (functionality, efficiency, compile time) and if so spell it out. Code reviewers should discuss any style issues that they consider should be universally adopted that are not in the style guide. The Code Submissions document is aimed at developers that are submitting PRs. This document describes some of the goals and expectations for code reviewers. Code reviews have a few different goals: It is NOT a goal to change the submission until it matches how the reviewer would have coded it. Some general comments on code reviews: All code reviews don't need to be equally strict. The "strictness" of the review should reflect the importance and location of the change. Some examples: What are some examples of checks to bear in mind when reviewing code? General: Content: When reading comments in a review it can sometimes be hard to know why the reviewer made a comment, or what response is expected. If there is any doubt the contributor should ask. However to make it clearer we are aiming to always add a tag to the front of each review comment. The tag will give an indication of why the comment is being made, its severity and what kind of response is expected. Here is a provisional table of tags: The comments should always be constructive. The reviewer should have a reason for each of them, and be able to articulate the reason in the comment or when asked. "I wouldn't have done it like that" is not a good enough on its own! Similarly there is a difference in opinion within the team on some style issues - e.g. standard libraries or jlib, inline or out of line functions, nested or non-nested classes. Reviews should try and avoid commenting on these unless there is a clear reason why they are significant (functionality, efficiency, compile time) and if so spell it out. Code reviewers should discuss any style issues that they consider should be universally adopted that are not in the style guide. We welcome submissions to the platform especially in the form of pull requests into the HPCC-Systems github repository. The following describes some of processes for merging PRs. There are a few things that should be considered when creating a PR to increase the likelihood that they can be accepted quickly. All pull requests should be reviewed by someone who is not the author before merging. Complex changes, changes that require input from multiple experts, or that have implications throughout the system should be reviewed by multiple reviewers. This should include someone who is responsible for merging changes for that part of the system. (Unless it is a simple change written by someone with review rights.) Contributors should use the github reviewers section on the PR to request reviews. After a contributor has pushed a set of changes in response to a review, they should refresh the github review status, so the users are notified it is ready for re-review. When the review is complete, a person with responsibility for merging changes to that part of the system should be added as a reviewer (or refreshed), with a comment that it is ready to merge. Reviewers should check for PRs that are ready for their review via github's webpage (filter "review-requested:<reviewer-id>") or via the github CLI (e.g. gh pr status). Contributors should similarly ensure they stay up to date with any comments on requests for change on their submissions. The Version support document contains details of the different versions that are supported, and which version should be targetted for different kinds of changes. Occasionally earlier branches will be chosen, (e.g. security fixes to even older versions) but they should always be carefully discussed (and documented). Changes will always be upmerged into the next point release for all the more recent major and minor versions (and master). We welcome submissions to the platform especially in the form of pull requests into the HPCC-Systems github repository. The following describes some of processes for merging PRs. There are a few things that should be considered when creating a PR to increase the likelihood that they can be accepted quickly. All pull requests should be reviewed by someone who is not the author before merging. Complex changes, changes that require input from multiple experts, or that have implications throughout the system should be reviewed by multiple reviewers. This should include someone who is responsible for merging changes for that part of the system. (Unless it is a simple change written by someone with review rights.) Contributors should use the github reviewers section on the PR to request reviews. After a contributor has pushed a set of changes in response to a review, they should refresh the github review status, so the users are notified it is ready for re-review. When the review is complete, a person with responsibility for merging changes to that part of the system should be added as a reviewer (or refreshed), with a comment that it is ready to merge. Reviewers should check for PRs that are ready for their review via github's webpage (filter "review-requested:<reviewer-id>") or via the github CLI (e.g. gh pr status). Contributors should similarly ensure they stay up to date with any comments on requests for change on their submissions. The Version support document contains details of the different versions that are supported, and which version should be targetted for different kinds of changes. Occasionally earlier branches will be chosen, (e.g. security fixes to even older versions) but they should always be carefully discussed (and documented). Changes will always be upmerged into the next point release for all the more recent major and minor versions (and master). Some basic guidlines to ensure your documentation works well with VitePress Documents can be located anywhere in the repository folder structure. If it makes sense to have documentation "close" to specific components, then it can be located in the same folder as the component. For example, any developer documentation for specific plugins can be located in those folders. If this isn't appropriate then the documentation can be located in the WARNING There is an exclusion list in the Documentation is written in Markdown. This is a simple format that is easy to read and write. It is also easy to convert to other formats, such as HTML, PDF, and Word. Markdown is supported by many editors, including Visual Studio Code, and is supported by VitePress. TIP VitePress extends Markdown with some additional features, such as custom containers, it is recommended that you refer to the VitePress documentation for more details. To assist with the writing of documentation, VitePress can be used to render the documentation locally. This allows you to see how the documentation will look when it is published. To start the local development server you need to type the following commands in the root HPCC-Platform folder: This will start a local development server and display the URL that you can use to view the documentation. The default URL is http://localhost:5173/HPCC-Platform, but it may be different on your machine. The server will automatically reload when you make changes to the documentation. WARNING The first time you start the VitePress server it will take a while to complete. This is because it is locating all the markdown files in the repository and creating the html pages. Once it has completed this step once, it will be much faster to start the server again. To add a new document, you need to add a new markdown file to the repository. The file should be named appropriately and have the To add a new document to the sidebar, you need to add an entry to the TIP You can find more information on the config.js file in the VitePress documentation. The conent of the main landing page is located in Some basic guidlines to ensure your documentation works well with VitePress Documents can be located anywhere in the repository folder structure. If it makes sense to have documentation "close" to specific components, then it can be located in the same folder as the component. For example, any developer documentation for specific plugins can be located in those folders. If this isn't appropriate then the documentation can be located in the WARNING There is an exclusion list in the Documentation is written in Markdown. This is a simple format that is easy to read and write. It is also easy to convert to other formats, such as HTML, PDF, and Word. Markdown is supported by many editors, including Visual Studio Code, and is supported by VitePress. TIP VitePress extends Markdown with some additional features, such as custom containers, it is recommended that you refer to the VitePress documentation for more details. To assist with the writing of documentation, VitePress can be used to render the documentation locally. This allows you to see how the documentation will look when it is published. To start the local development server you need to type the following commands in the root HPCC-Platform folder: This will start a local development server and display the URL that you can use to view the documentation. The default URL is http://localhost:5173/HPCC-Platform, but it may be different on your machine. The server will automatically reload when you make changes to the documentation. WARNING The first time you start the VitePress server it will take a while to complete. This is because it is locating all the markdown files in the repository and creating the html pages. Once it has completed this step once, it will be much faster to start the server again. To add a new document, you need to add a new markdown file to the repository. The file should be named appropriately and have the To add a new document to the sidebar, you need to add an entry to the TIP You can find more information on the config.js file in the VitePress documentation. The conent of the main landing page is located in The most upto date details of building the system are found on the HPCC Wiki. The HPCC Platform sources are hosted on GitHub. You can download a snapshot of any branch using the download button there, or you can set up a git clone of the repository. If you are planning to contribute changes to the system, see the CONTRIBUTORS document for information about how to set up a GitHub fork of the project through which pull-requests can be made. The HPCC platform requires a number of third party tools and libraries in order to build. The HPCC Wiki contains the details of the dependencies that are required for different distributions. For building any documentation, the following are also required: NOTE: Installing the above via alternative methods (i.e. from source) may place installations outside of searched paths. The HPCC system is built using the cross-platform build tool cmake, which is available for Windows, virtually all flavors of Linux, FreeBSD, and other platforms. You should install cmake version 2.8.3 or later before building the sources. On some distros you will need to build cmake from sources if the version of cmake in the standard repositories for that distro is not modern enough. It is good practice in cmake to separate the build directory where objects and executables are made from the source directory, and the HPCC cmake scripts will enforce this. To build the sources, create a directory where the built files should be located, and from that directory, run: Depending on your operating system and the compilers installed on it, this will create a makefile, Visual Studio .sln file, or other build script for building the system. If cmake was configured to create a makefile, then you can build simply by typing: If a Visual Studio solution file was created, you can load it simply by typing the name: This will load the solution in Visual Studio where you can build in the usual way. To make an installation package on a supported linux system, use the command: This will first do a make to ensure everything is up to date, then will create the appropriate package for your operating system, Currently supported package formats are rpm (for RedHat/Centos) and .deb (for Debian and Ubuntu). If the operating system is not one of the above, or is not recognized, make package will create a tarball. The package installation does not start the service on the machine, so if you want to give it a go or test it (see below), make sure to start the service manually and wait until all services are up (mainly wait for EclWatch to come up on port 8010). After compiling, installing the package and starting the services, you can test the HPCC platform on a single-node setup. Some components have their own unit-tests. Once you have compiled (no need to start the services), you can already run them. Supposing you build a Debug version, from the build directory you can run: and: You can also run the Dali regression self-tests: MORE Completely out of date - needs rewriting. The ECLCC compiler tests rely on two distinct runs: a known good one and your test build. For normal development, you can safely assume that the OSS/master branch in github is good. For overnight testing, golden directories need to be maintained according to the test infrastructure. There are Bash (Linux) and Batch (Windows) scripts to run the regressions: The basic idea behind this tests is to compare the output files (logs and XML files) between runs. The log files should change slightly (the comparison should be good enough to filter most irrelevant differences), but the XML files should be identical if nothing has changed. You should only see differences in the XML where you have changed in the code, or new tests were added as part of your development. On Linux, there are two steps: Step 1: Check-out OSS/master, compile and run the regressions to populate the 'golden' directory: This will run the regressions in parallel, using as many CPUs as you have, and using your just-compiled ECLCC, assuming you compiled for Debug version. Step 2: Make your changes (or check-out your branch), compile and run again, this time output to a new directory and compare to the 'golden' repo.: This will run the regressions in the same way, output to 'my_branch' dir and compare it to the golden version, highlighting the differences. NOTE: If you changed the headers that the compiled binaries will use, you must re-install the package (or provide -i option to the script to the new headers). Step 3: Step 2 only listed the differences, now you need to see what they are. For that, re-run the regressing script omitting the compiler, since the only thing we'll do is to compare verbosely.: This will show you all differences, using the same ignore filters as before, between your two branches. Once you're happy with the differences, commit and issue a pull-request. TODO: Describe compiler tests on Windows. On linux systems, the makefile generated by cmake will build a specific version (debug or release) of the system depending on the options selected when cmake is first run in that directory. The default is to build a release system. In order to build a debug system instead, use command: You can then run make or make package in the usual way to build the system. On a Windows system, cmake always generates s solution file with both debug and release target platforms in it, so you can select which one to build within Visual Studio. The most upto date details of building the system are found on the HPCC Wiki. The HPCC Platform sources are hosted on GitHub. You can download a snapshot of any branch using the download button there, or you can set up a git clone of the repository. If you are planning to contribute changes to the system, see the CONTRIBUTORS document for information about how to set up a GitHub fork of the project through which pull-requests can be made. The HPCC platform requires a number of third party tools and libraries in order to build. The HPCC Wiki contains the details of the dependencies that are required for different distributions. For building any documentation, the following are also required: NOTE: Installing the above via alternative methods (i.e. from source) may place installations outside of searched paths. The HPCC system is built using the cross-platform build tool cmake, which is available for Windows, virtually all flavors of Linux, FreeBSD, and other platforms. You should install cmake version 2.8.3 or later before building the sources. On some distros you will need to build cmake from sources if the version of cmake in the standard repositories for that distro is not modern enough. It is good practice in cmake to separate the build directory where objects and executables are made from the source directory, and the HPCC cmake scripts will enforce this. To build the sources, create a directory where the built files should be located, and from that directory, run: Depending on your operating system and the compilers installed on it, this will create a makefile, Visual Studio .sln file, or other build script for building the system. If cmake was configured to create a makefile, then you can build simply by typing: If a Visual Studio solution file was created, you can load it simply by typing the name: This will load the solution in Visual Studio where you can build in the usual way. To make an installation package on a supported linux system, use the command: This will first do a make to ensure everything is up to date, then will create the appropriate package for your operating system, Currently supported package formats are rpm (for RedHat/Centos) and .deb (for Debian and Ubuntu). If the operating system is not one of the above, or is not recognized, make package will create a tarball. The package installation does not start the service on the machine, so if you want to give it a go or test it (see below), make sure to start the service manually and wait until all services are up (mainly wait for EclWatch to come up on port 8010). After compiling, installing the package and starting the services, you can test the HPCC platform on a single-node setup. Some components have their own unit-tests. Once you have compiled (no need to start the services), you can already run them. Supposing you build a Debug version, from the build directory you can run: and: You can also run the Dali regression self-tests: MORE Completely out of date - needs rewriting. The ECLCC compiler tests rely on two distinct runs: a known good one and your test build. For normal development, you can safely assume that the OSS/master branch in github is good. For overnight testing, golden directories need to be maintained according to the test infrastructure. There are Bash (Linux) and Batch (Windows) scripts to run the regressions: The basic idea behind this tests is to compare the output files (logs and XML files) between runs. The log files should change slightly (the comparison should be good enough to filter most irrelevant differences), but the XML files should be identical if nothing has changed. You should only see differences in the XML where you have changed in the code, or new tests were added as part of your development. On Linux, there are two steps: Step 1: Check-out OSS/master, compile and run the regressions to populate the 'golden' directory: This will run the regressions in parallel, using as many CPUs as you have, and using your just-compiled ECLCC, assuming you compiled for Debug version. Step 2: Make your changes (or check-out your branch), compile and run again, this time output to a new directory and compare to the 'golden' repo.: This will run the regressions in the same way, output to 'my_branch' dir and compare it to the golden version, highlighting the differences. NOTE: If you changed the headers that the compiled binaries will use, you must re-install the package (or provide -i option to the script to the new headers). Step 3: Step 2 only listed the differences, now you need to see what they are. For that, re-run the regressing script omitting the compiler, since the only thing we'll do is to compare verbosely.: This will show you all differences, using the same ignore filters as before, between your two branches. Once you're happy with the differences, commit and issue a pull-request. TODO: Describe compiler tests on Windows. On linux systems, the makefile generated by cmake will build a specific version (debug or release) of the system depending on the options selected when cmake is first run in that directory. The default is to build a release system. In order to build a debug system instead, use command: You can then run make or make package in the usual way to build the system. On a Windows system, cmake always generates s solution file with both debug and release target platforms in it, so you can select which one to build within Visual Studio. Version 8.4 of the HPCC platform allows package files to define dependencies between git repositories and also allows you to compile directly from a git repository. E.g. There are no futher requirements if the repositories are public, but private repositories have the additional complication of supplying authentication information. Git provides various methods for providing the credentials... The following are the recommended approaches configuring the credentials on a local development system interacting with github: In this scenario, the ssh key associated with the local developer machine is registered with the github account. For more details see https://docs.github.com/en/authentication/connecting-to-github-with-ssh/about-ssh This is used when the github reference is of the form ssh://github.com. The sshkey can be protected with a passcode, and there are various options to avoid having to enter the passcode each time. It is preferrable to use the https:// protocol instead of ssh:// for links in package-lock.json files. If ssh:// is used it requires any machine that processes the dependency to have access to a registered ssh key. Download the GitHub command line tool (https://github.com/cli/cli). You can then use it to authenticate all git access with Probably the simplest option if you are using github. More details are found at https://cli.github.com/manual/gh_auth_login These are similar to a password, but with additional restrictions on their lifetime and the resources that can be accessed. Details on how to to create them are found : https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token These can then be used with the various git credential caching options. E.g. see https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage All of the options above are likely to involve some user interaction - passphrases for ssh keys, web interaction with github authentication, and initial entry for cached access tokens. This is problematic for eclccserver - which cannot support user interaction, and it is preferrable not to pass credentials around. The solution is to use a personal access token securely stored as a secret. (This would generally be associated with a special service account.) This avoids the need to pass credentials and allows the keys to be rotated. The following describes the support in the different versions: In Kubernetes you need to take the following steps: a) add the gitUsername property to the eclccserver component in the value.yaml file: b) add a secret to the values.yaml file, with a key that matches the username: note: this cannot currently use a vault - probably need to rethink that. (Possibly extract from secret and supply as an optional environment variable to be picked up by the bash script.) c) add a secret to Kubernetes containing the personal access token: When a query is submitted to eclccserver, any git repositories are accessed using the user name and password. Bare-metal require some similar configuration steps: a) Define the environment variable HPCC_GIT_USERNAME b) Store the access token in /opt/HPCCSystems/secrets/git/$HPCC_GIT_USERNAME/password E.g. Version 8.4 of the HPCC platform allows package files to define dependencies between git repositories and also allows you to compile directly from a git repository. E.g. There are no futher requirements if the repositories are public, but private repositories have the additional complication of supplying authentication information. Git provides various methods for providing the credentials... The following are the recommended approaches configuring the credentials on a local development system interacting with github: In this scenario, the ssh key associated with the local developer machine is registered with the github account. For more details see https://docs.github.com/en/authentication/connecting-to-github-with-ssh/about-ssh This is used when the github reference is of the form ssh://github.com. The sshkey can be protected with a passcode, and there are various options to avoid having to enter the passcode each time. It is preferrable to use the https:// protocol instead of ssh:// for links in package-lock.json files. If ssh:// is used it requires any machine that processes the dependency to have access to a registered ssh key. Download the GitHub command line tool (https://github.com/cli/cli). You can then use it to authenticate all git access with Probably the simplest option if you are using github. More details are found at https://cli.github.com/manual/gh_auth_login These are similar to a password, but with additional restrictions on their lifetime and the resources that can be accessed. Details on how to to create them are found : https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token These can then be used with the various git credential caching options. E.g. see https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage All of the options above are likely to involve some user interaction - passphrases for ssh keys, web interaction with github authentication, and initial entry for cached access tokens. This is problematic for eclccserver - which cannot support user interaction, and it is preferrable not to pass credentials around. The solution is to use a personal access token securely stored as a secret. (This would generally be associated with a special service account.) This avoids the need to pass credentials and allows the keys to be rotated. The following describes the support in the different versions: In Kubernetes you need to take the following steps: a) add the gitUsername property to the eclccserver component in the value.yaml file: b) add a secret to the values.yaml file, with a key that matches the username: note: this cannot currently use a vault - probably need to rethink that. (Possibly extract from secret and supply as an optional environment variable to be picked up by the bash script.) c) add a secret to Kubernetes containing the personal access token: When a query is submitted to eclccserver, any git repositories are accessed using the user name and password. Bare-metal require some similar configuration steps: a) Define the environment variable HPCC_GIT_USERNAME b) Store the access token in /opt/HPCCSystems/secrets/git/$HPCC_GIT_USERNAME/password E.g. This document covers the main steps taken by the LDAP Security Manager during initialization. It is important to note that the LDAP Security Manager uses the LDAP protocol to access an Active Directory, AD. The AD is the store for users, groups, permissions, resources, and more. The term LDAP is generally overused to refer to both. Each service and/or component using the LDAP security manager gets its own instance of the security manager. This includes a unique connection pool (see below). All operations described apply to each LDAP instance. The following sections cover the main steps taken during initialization The following items are loaded from the configuration: The LDAP Security Manager supports using multiple ADs. The FQDN or IP address of each AD host is read from configuration data and stored internally. The source is a comma separated list stored in the ldapAddress config value. Each entry is added to a pool of ADs. Note that all ADs are expected to use the same credentials and have the same configuration AD credentials consist of a username and password. The LDAP security manager users these to perform all operations on behalf of users and components in the cluster. There are three potential sources for credentials. As stated above, when multiple ADs are in use, the configuration of each must be the same. This includes credentials. During initialization, the security manager begins incrementing through the set of defined ADs until it is able to connect and retrieve information from an AD. Once retrieved, the information is used for all ADs (see statement above about all ADs being the same). The accessed AD is marked as the current AD and no other ADs are accessed during initialization. The retrieved information is used to verify the AD type so the security manager adjusts for variations between types. Additionally, defined DNs may be adjusted to match AD type requirements. The manager handles connections to an AD in order to perform required operations. It is possible that values such as permissions and resources may be cached to improve performance. The LDAP security manager maintains a pool of LDAP connections. The pool is limited in size to maxConnections from the configuration. The connection pool starts empty. As connections are created, each is added to the pool until the max allowed is reached. The following process is used when an LDAP connection is needed. First, the connection pool is searched for a free connection. If found and valid, the connection is returned. A connection is considered free if no one is using it and valid if the AD can be accessed. If no valid free connections are found, a new uninitialized connection is created. For a new connection, an attempt is made to connect to each AD starting with the current. See Handling AD Hosts below for how ADs are cycled when a connection fails. For each AD, as it cycles through, connection attempts are retried with a short delay between each. If unable to connect, the AD host is marked rejected and the next is attempted. Once a new connection has been established, if the max number of connections has not been reached yet, the connection is added to the pool. It is important to note that if the pool has reached its max size, new connections will continue to be made, but are not saved in the pool. This allows the pool to maintain a steady working state, but allow for higher demand. Connections not saved to the pool are deleted once no longer in use. The manager keeps a list of AD hosts and the index of the current host. The current host is used for all AD operations until there is a failure. At that time the manager marks the host as "rejected" and moves to the next host using a round-robin scheme. This document covers the main steps taken by the LDAP Security Manager during initialization. It is important to note that the LDAP Security Manager uses the LDAP protocol to access an Active Directory, AD. The AD is the store for users, groups, permissions, resources, and more. The term LDAP is generally overused to refer to both. Each service and/or component using the LDAP security manager gets its own instance of the security manager. This includes a unique connection pool (see below). All operations described apply to each LDAP instance. The following sections cover the main steps taken during initialization The following items are loaded from the configuration: The LDAP Security Manager supports using multiple ADs. The FQDN or IP address of each AD host is read from configuration data and stored internally. The source is a comma separated list stored in the ldapAddress config value. Each entry is added to a pool of ADs. Note that all ADs are expected to use the same credentials and have the same configuration AD credentials consist of a username and password. The LDAP security manager users these to perform all operations on behalf of users and components in the cluster. There are three potential sources for credentials. As stated above, when multiple ADs are in use, the configuration of each must be the same. This includes credentials. During initialization, the security manager begins incrementing through the set of defined ADs until it is able to connect and retrieve information from an AD. Once retrieved, the information is used for all ADs (see statement above about all ADs being the same). The accessed AD is marked as the current AD and no other ADs are accessed during initialization. The retrieved information is used to verify the AD type so the security manager adjusts for variations between types. Additionally, defined DNs may be adjusted to match AD type requirements. The manager handles connections to an AD in order to perform required operations. It is possible that values such as permissions and resources may be cached to improve performance. The LDAP security manager maintains a pool of LDAP connections. The pool is limited in size to maxConnections from the configuration. The connection pool starts empty. As connections are created, each is added to the pool until the max allowed is reached. The following process is used when an LDAP connection is needed. First, the connection pool is searched for a free connection. If found and valid, the connection is returned. A connection is considered free if no one is using it and valid if the AD can be accessed. If no valid free connections are found, a new uninitialized connection is created. For a new connection, an attempt is made to connect to each AD starting with the current. See Handling AD Hosts below for how ADs are cycled when a connection fails. For each AD, as it cycles through, connection attempts are retried with a short delay between each. If unable to connect, the AD host is marked rejected and the next is attempted. Once a new connection has been established, if the max number of connections has not been reached yet, the connection is added to the pool. It is important to note that if the pool has reached its max size, new connections will continue to be made, but are not saved in the pool. This allows the pool to maintain a steady working state, but allow for higher demand. Connections not saved to the pool are deleted once no longer in use. The manager keeps a list of AD hosts and the index of the current host. The current host is used for all AD operations until there is a failure. At that time the manager marks the host as "rejected" and moves to the next host using a round-robin scheme. This memory manager started life as the memory manager which was only used for the Roxie engine. It had several original design goals: (Note that efficient usage of memory does not appear on that list - the expectation when the memory manager was first designed was that Roxie queries would use minimal amounts of memory and speed was more important. Some subsequent changes e.g., Packed heaps, and configurable bucket sizes help mitigate that.) The basic design is to reserve (but not commit) a single large block of memory in the virtual address space. This memory is subdivided into "pages". (These are not the same as the os virtual memory pages. The memory manager pages are currently defined as 1Mb in size.) The system uses a bitmap to indicate whether each page from the global memory has been allocated. All active IRowManager instances allocate pages from the same global memory space. To help reduce fragmentation allocations for single pages are fulfilled from one end of the address space, while allocations for multiple pages are fulfilled from the other. This provides the primary interface for allocating memory. The size of a requested allocation is rounded up to the next "bucket" size, and the allocation is then satisfied by the heap associated with that bucket size. Different engines can specify different bucket sizes - an optional list is provided to setTotalMemoryLimit. Roxie tends to use fewer buckets to help reduce the number of active heaps. Thor uses larger numbers since it is more important to minimize the memory wasted. Roxie uses a separate instance of IRowManager for each query. This provides the mechanism for limiting how much memory a query uses. Thor uses a single instance of an IRowManager for each slave/master. Memory is allocated from a set of "heaps - where each heap allocates blocks of memory of a single size. The heap exclusively owns a set of heaplet (each 1 page in size), which are held in a doubly linked list, and sub allocates memory from those heaplets. Information about each heaplet is stored in the base of the page (using a class with virtual functions) and the address of an allocated row is masked to determine which heap object it belongs to, and how it should be linked/released etc. Any pointer not in the allocated virtual address (e.g., constant data) can be linked/released with no effect. Each heaplet contains a high water mark of the address within the page that has already been allocated (freeBase), and a lockless singly-linked list of rows which have been released (r_block). Releasing a row is non-blocking and does not involve any spin locks or critical sections. However, this means that empty pages need to be returned to the global memory pool at another time. (This is done in releaseEmptyPages()). When the last row in a page is released a flag (possibleEmptyPages) is set in its associated heap. * This is checked before trying to free pages from a particular heap, avoiding waiting on a lock and traversing a candidate list. Any page which might contain some spare memory is added to a lockless spare memory linked list. * Items are popped from this list when a heap fails to allocate memory from the current heaplet. Each item is checked in turn if it has space before allocating a new heaplet. * The list is also walked when checking to see which pages can be returned to the global memory. The doubly linked heaplet list allows efficient freeing. Each allocation has a link count and an allocator id associated with it. The allocator id represents the type of the row, and is used to determine what destructor needs to be called when the row is destroyed. (The count for a row also contains a flag in the top bit to indicate if it is fully constructed, and therefore valid for the destructor to be called.) A specialized heap is used to manage all allocations that require more than one page of memory. These allocations are not held on a free list when they are released, but each is returned directly to the global memory pool. Allocations in the huge heap can be expanded and shrunk using the resizeRow() functions - see below. By default a fixed size heaps rounds the requested allocation size up to the next bucket size. A packed heap changes this behaviour and it is rounded up to the next 4 byte boundary instead. This reduces the amount of memory wasted for each row, but potentially increases the number of distinct heaps. By default all fixed size heaps of the same size are shared. This reduces the memory consumption, but if the heap is used by multiple threads it can cause significant contention. If a unique heap is specified then it will not be shared with any other requests. Unique heaps store information about the type of each row in the heaplet header, rather than per row - which reduces the allocation overhead for each row. (Note to self: Is there ever any advantage having a heap that is unique but not packed??) Blocked is an option on createFixedRowHeap() to allocate multiple rows from the heaplet, and then return the additional rows on subsequent calls. It is likely to reduce the average number of atomic operations required for each row being allocated, but the heap that is returned can only be used from a single thread because it is not thread safe. By default the heaplets use a lock free singly linked list to keep track of rows that have been freed. This requires an atomic operation for each allocation and for each free. The scanning allocator uses an alternative approach. When a row is freed the row header is marked, and a row is allocated by scanning through the heaplet for rows that have been marked as free. Scanning uses atomic store and get, rather than more expensive synchronized atomic operations, so is generally faster than the linked list - provided a free row is found fairly quickly. The scanning heaps have better best-case performance, but worse worse-case performance (if large numbers of rows need to be scanned before a free row is found). The best-case tends to be true if only one thread/activity is accessing a particular heap, and the worse-case if multiple activities are accessing a heap, particularly if the rows are being buffered. It is the default for thor which tends to have few active allocators, but not for roxie, which tends to have large numbers of allocators. This is another varation on the scanning allocator, which further reduces the number of atomic operations. Usually when a row is allocated the link count on the heaplet is increased, and when it is freed the link count is decremented. This option delays decrementing the link count when the row is released, by marking the row with a different free flag. If it is subsequently reallocated there is no need to increment the link count. The downside is that it is more expensive to check whether a heaplet is completely empty (since you can no longer rely on the heaplet linkcount alone). Thor has additional requirements to roxie. In roxie, if a query exceeds its memory requirements then it is terminated. Thor needs to be able to spill rows and other memory to disk and continue. This is achieved by allowing any process that stores buffered rows to register a callback with the memory manager. When more memory is required these callbacks are called to free up memory, and allow the job to continue. Each callback can specify a priority - lower priority callbacks are called first since they are assumed to have a lower cost associated with spilling. When more memory is required the callbacks are called in priority order until one of them succeeds. The can also be passed a flag to indicate it is critical to force them to free up as much memory as possible. There are several different complications involved with the memory spilling: Some rules to follow when implementing callbacks: A callback cannot allocate any memory from the memory manager. If it does it is likely to deadlock. You cannot allocate memory while holding a lock if that lock is also required by a callback. Again this will cause deadlock. If it proves impossible you can use a try-lock primitive in the callback, but it means you won't be able to spill those rows. If the heaps are fragmented it may be more efficient to repack the heaps than spill to disk. If you're resizing a potentially big block of memory use the resize function with the callback. Some of the memory allocations cover more than one "page" - e.g., arrays used to store blocks of rows. (These are called huge pages internally, not to be confused with operating system support for huge pages...) When one of these memory blocks needs to be expanded you need to be careful: Occasionally you have processes which read a large number of rows and then filter them so only a few are still held in memory. Rows tend to be allocated in sequence through the heap pages, which can mean those few remaining rows are scattered over many pages. If they could all be moved to a single page it would free up a significant amount of memory. The memory manager contains a function to pack a set of rows into a smaller number of pages: IRowManager->compactRows(). This works by iterating through each of the rows in a list. If the row belongs to a heap that could be compacted, and isn't part of a full heaplet, then the row is moved. Since subsequent rows tend to be allocated from the same heaplet this has the effect of compacting the rows. Much of the time Thor doesn't uses full memory available to it. If you are running multiple Thor processes on the same machine you may want to configure the system so that each Thor has a private block of memory, but there is also a shared block of memory which can be used by whichever process needs it. The ILargeMemCallback provides a mechanism to dynamically allocate more memory to a process as it requires it. This could potentially be done in stages rather than all or nothing. (Currently unused as far as I know... the main problem is that borrowing memory needs to be coordinated.) When OS processes use a large amount of memory, mapping virtual addresses to physical addresses can begin to take a significant proportion of the execution time. This is especially true once the TLB is not large enough to store all the mappings. Huge pages can significantly help with this problem by reducing the number of TLB entries needed to cover the virtual address space. The memory manager supports huge pages in two different ways: Huge pages can be preallocated (e.g., with hugeadm) for exclusive use as huge pages. If huge pages are enabled for a particular engine, and sufficient huge pages are available to supply the memory for the memory manager, then they will be used. Linux kernels from 2.6.38 onward have support for transparent huge pages. These do not need to be preallocated, instead the operating system tries to use them behind the scenes. HPCC version 5.2 and following takes advantage of this feature to significantly speed memory access up when large amounts of memory are used by each process. Preallocated huge pages tend to be more efficient, but they have the disadvantage that the operating system currently does not reuse unused huge pages for other purposes e.g., disk cache. There is also a memory manager option to not return the memory to the operating system when it is no longer required. This has the advantage of not clearing the memory whenever it is required again, but the same disadvantage as preallocated huge pages that the unused memory cannot be used for disk cache. We recommend this option is selected when preallocated huge pages are in use - until the kernel allows them to be reused. Changes in 6.x allow Thor to run multiple channels within the same process. This allows data that is constant for all channels to be shared between all slave channels - a prime example is the rhs of a lookup join. For the queries to run efficiently the memory manager needs to ensure that each slave channel has the same amount of memory - especially when memory is being used that is shared between them. createGlobalRowManager() allows a single global row manager to be created which also provides slave row managers for the different channels via the querySlaveRowManager(unsigned slave) method. This memory manager started life as the memory manager which was only used for the Roxie engine. It had several original design goals: (Note that efficient usage of memory does not appear on that list - the expectation when the memory manager was first designed was that Roxie queries would use minimal amounts of memory and speed was more important. Some subsequent changes e.g., Packed heaps, and configurable bucket sizes help mitigate that.) The basic design is to reserve (but not commit) a single large block of memory in the virtual address space. This memory is subdivided into "pages". (These are not the same as the os virtual memory pages. The memory manager pages are currently defined as 1Mb in size.) The system uses a bitmap to indicate whether each page from the global memory has been allocated. All active IRowManager instances allocate pages from the same global memory space. To help reduce fragmentation allocations for single pages are fulfilled from one end of the address space, while allocations for multiple pages are fulfilled from the other. This provides the primary interface for allocating memory. The size of a requested allocation is rounded up to the next "bucket" size, and the allocation is then satisfied by the heap associated with that bucket size. Different engines can specify different bucket sizes - an optional list is provided to setTotalMemoryLimit. Roxie tends to use fewer buckets to help reduce the number of active heaps. Thor uses larger numbers since it is more important to minimize the memory wasted. Roxie uses a separate instance of IRowManager for each query. This provides the mechanism for limiting how much memory a query uses. Thor uses a single instance of an IRowManager for each slave/master. Memory is allocated from a set of "heaps - where each heap allocates blocks of memory of a single size. The heap exclusively owns a set of heaplet (each 1 page in size), which are held in a doubly linked list, and sub allocates memory from those heaplets. Information about each heaplet is stored in the base of the page (using a class with virtual functions) and the address of an allocated row is masked to determine which heap object it belongs to, and how it should be linked/released etc. Any pointer not in the allocated virtual address (e.g., constant data) can be linked/released with no effect. Each heaplet contains a high water mark of the address within the page that has already been allocated (freeBase), and a lockless singly-linked list of rows which have been released (r_block). Releasing a row is non-blocking and does not involve any spin locks or critical sections. However, this means that empty pages need to be returned to the global memory pool at another time. (This is done in releaseEmptyPages()). When the last row in a page is released a flag (possibleEmptyPages) is set in its associated heap. * This is checked before trying to free pages from a particular heap, avoiding waiting on a lock and traversing a candidate list. Any page which might contain some spare memory is added to a lockless spare memory linked list. * Items are popped from this list when a heap fails to allocate memory from the current heaplet. Each item is checked in turn if it has space before allocating a new heaplet. * The list is also walked when checking to see which pages can be returned to the global memory. The doubly linked heaplet list allows efficient freeing. Each allocation has a link count and an allocator id associated with it. The allocator id represents the type of the row, and is used to determine what destructor needs to be called when the row is destroyed. (The count for a row also contains a flag in the top bit to indicate if it is fully constructed, and therefore valid for the destructor to be called.) A specialized heap is used to manage all allocations that require more than one page of memory. These allocations are not held on a free list when they are released, but each is returned directly to the global memory pool. Allocations in the huge heap can be expanded and shrunk using the resizeRow() functions - see below. By default a fixed size heaps rounds the requested allocation size up to the next bucket size. A packed heap changes this behaviour and it is rounded up to the next 4 byte boundary instead. This reduces the amount of memory wasted for each row, but potentially increases the number of distinct heaps. By default all fixed size heaps of the same size are shared. This reduces the memory consumption, but if the heap is used by multiple threads it can cause significant contention. If a unique heap is specified then it will not be shared with any other requests. Unique heaps store information about the type of each row in the heaplet header, rather than per row - which reduces the allocation overhead for each row. (Note to self: Is there ever any advantage having a heap that is unique but not packed??) Blocked is an option on createFixedRowHeap() to allocate multiple rows from the heaplet, and then return the additional rows on subsequent calls. It is likely to reduce the average number of atomic operations required for each row being allocated, but the heap that is returned can only be used from a single thread because it is not thread safe. By default the heaplets use a lock free singly linked list to keep track of rows that have been freed. This requires an atomic operation for each allocation and for each free. The scanning allocator uses an alternative approach. When a row is freed the row header is marked, and a row is allocated by scanning through the heaplet for rows that have been marked as free. Scanning uses atomic store and get, rather than more expensive synchronized atomic operations, so is generally faster than the linked list - provided a free row is found fairly quickly. The scanning heaps have better best-case performance, but worse worse-case performance (if large numbers of rows need to be scanned before a free row is found). The best-case tends to be true if only one thread/activity is accessing a particular heap, and the worse-case if multiple activities are accessing a heap, particularly if the rows are being buffered. It is the default for thor which tends to have few active allocators, but not for roxie, which tends to have large numbers of allocators. This is another varation on the scanning allocator, which further reduces the number of atomic operations. Usually when a row is allocated the link count on the heaplet is increased, and when it is freed the link count is decremented. This option delays decrementing the link count when the row is released, by marking the row with a different free flag. If it is subsequently reallocated there is no need to increment the link count. The downside is that it is more expensive to check whether a heaplet is completely empty (since you can no longer rely on the heaplet linkcount alone). Thor has additional requirements to roxie. In roxie, if a query exceeds its memory requirements then it is terminated. Thor needs to be able to spill rows and other memory to disk and continue. This is achieved by allowing any process that stores buffered rows to register a callback with the memory manager. When more memory is required these callbacks are called to free up memory, and allow the job to continue. Each callback can specify a priority - lower priority callbacks are called first since they are assumed to have a lower cost associated with spilling. When more memory is required the callbacks are called in priority order until one of them succeeds. The can also be passed a flag to indicate it is critical to force them to free up as much memory as possible. There are several different complications involved with the memory spilling: Some rules to follow when implementing callbacks: A callback cannot allocate any memory from the memory manager. If it does it is likely to deadlock. You cannot allocate memory while holding a lock if that lock is also required by a callback. Again this will cause deadlock. If it proves impossible you can use a try-lock primitive in the callback, but it means you won't be able to spill those rows. If the heaps are fragmented it may be more efficient to repack the heaps than spill to disk. If you're resizing a potentially big block of memory use the resize function with the callback. Some of the memory allocations cover more than one "page" - e.g., arrays used to store blocks of rows. (These are called huge pages internally, not to be confused with operating system support for huge pages...) When one of these memory blocks needs to be expanded you need to be careful: Occasionally you have processes which read a large number of rows and then filter them so only a few are still held in memory. Rows tend to be allocated in sequence through the heap pages, which can mean those few remaining rows are scattered over many pages. If they could all be moved to a single page it would free up a significant amount of memory. The memory manager contains a function to pack a set of rows into a smaller number of pages: IRowManager->compactRows(). This works by iterating through each of the rows in a list. If the row belongs to a heap that could be compacted, and isn't part of a full heaplet, then the row is moved. Since subsequent rows tend to be allocated from the same heaplet this has the effect of compacting the rows. Much of the time Thor doesn't uses full memory available to it. If you are running multiple Thor processes on the same machine you may want to configure the system so that each Thor has a private block of memory, but there is also a shared block of memory which can be used by whichever process needs it. The ILargeMemCallback provides a mechanism to dynamically allocate more memory to a process as it requires it. This could potentially be done in stages rather than all or nothing. (Currently unused as far as I know... the main problem is that borrowing memory needs to be coordinated.) When OS processes use a large amount of memory, mapping virtual addresses to physical addresses can begin to take a significant proportion of the execution time. This is especially true once the TLB is not large enough to store all the mappings. Huge pages can significantly help with this problem by reducing the number of TLB entries needed to cover the virtual address space. The memory manager supports huge pages in two different ways: Huge pages can be preallocated (e.g., with hugeadm) for exclusive use as huge pages. If huge pages are enabled for a particular engine, and sufficient huge pages are available to supply the memory for the memory manager, then they will be used. Linux kernels from 2.6.38 onward have support for transparent huge pages. These do not need to be preallocated, instead the operating system tries to use them behind the scenes. HPCC version 5.2 and following takes advantage of this feature to significantly speed memory access up when large amounts of memory are used by each process. Preallocated huge pages tend to be more efficient, but they have the disadvantage that the operating system currently does not reuse unused huge pages for other purposes e.g., disk cache. There is also a memory manager option to not return the memory to the operating system when it is no longer required. This has the advantage of not clearing the memory whenever it is required again, but the same disadvantage as preallocated huge pages that the unused memory cannot be used for disk cache. We recommend this option is selected when preallocated huge pages are in use - until the kernel allows them to be reused. Changes in 6.x allow Thor to run multiple channels within the same process. This allows data that is constant for all channels to be shared between all slave channels - a prime example is the rhs of a lookup join. For the queries to run efficiently the memory manager needs to ensure that each slave channel has the same amount of memory - especially when memory is being used that is shared between them. createGlobalRowManager() allows a single global row manager to be created which also provides slave row managers for the different channels via the querySlaveRowManager(unsigned slave) method. This document describes the design of a metrics framework that allows HPCC Systems components to implement a metric collection strategy. Metrics provide the following functionality: Alerts and monitoring An important DevOps function is to monitor the cluster and providing alerts when problems are detected. Aggregated metric values from multiple sources provide the necessary data to build a complete picture of cluster health that drives monitoring and alerts. Scaling As described above, aggregated metric data is also used to dynamically respond to changing cluster demands and load. Metrics provide the monitoring capability to react and take action Fault diagnosis and resource monitoring Metrics provide historical data useful in diagnosing problems by profiling how demand and usage patterns may change prior to a fault. Predictive analysis can also be applied. Analysis of jobs/workunits and profiling With proper instrumentation, a robust dynamic metric strategy can track workunit processing. Internal problems with queries should be diagnosed from deep drill down logging. The document consists of several sections in order to provide requirements as well as the design of framework components. Some definitions are useful. Metric : A measurement defined by a component that represents an internal state that is useful in a system reliability engineering function. In the context of the framework, a metric is an object representing the above. Metric Value : The current value of a metric. Metric Updating : The component task of updating metric state. Collection : A framework process of selecting relevant metrics based on configuration and then retrieving their values. Reporting : A framework process of converting values obtained during a collection into a format suitable for ingestion by a collection system. Trigger : What causes the collection of metric values. Collection System : The store for metric values generated during the reporting framework process. This section describes how components expect to use the framework. It is not a complete list of all requirements but rather a sample. Roxie desires to keep a count of many different internal values. Some examples are Disk type operations such as seeks and reads Execution totals Need to track items such as total numbers of items such as success and failures as well as breaking some counts into individual reasons. For example, failures may need be categorized such as as Or even by priority (high, low, sla, etc.) Current operational levels such as the length of internal queues The latency of operations such as queue results, agent responses, and gateway responses Roxie also has the need to track internal memory usage beyond the pod/system level capabilities. Tracking the state of its large fixed memory pool is necessary. The Roxie buddy system also must track how often and who is completing requests. The "I Beat You To It" set of metrics must be collected and exposed in order to detect pending node failure. While specific action on these counts is not known up front, it appears clear that these values are useful and should be collected. There does not appear to be a need for creating and destroying metrics dynamically. The set of metrics is most likely to be created at startup and remain active through the life of the Roxie. If, however, stats collection seeps into the metrics framework, dynamic creation and destruction of stats metrics is a likely requirement. There are some interesting decisions with respect to ESP and collection of metrics. Different applications within ESP present different use cases for collection. Ownership of a given task drives some of these use cases. Take workunit queues. If ownership of the task, with respect to metrics, is WsWorkunits, then use cases are centric to that component. However, if agents listening on the queue are to report metrics, then a different set of use cases emerge. It is clear that additional work is needed to generate clear ownership of metrics gathered by ESP and/or the tasks it performs. ESP needs to report the activeTransactions value from the TxSummary class(es). This gives an indication of how busy the ESP is in terms of client requests. Direct measurement of response time in requests may not be useful since the type of request causes different execution paths within ESP that are expected to take widely varying amounts of time. Creation of metrics for each method is not recommended. However, two possible solutions are to a) create a metric for request types, or b) use a histogram to measure response time ranges. Another option mentioned redefines the meaning of a bucket in a histogram. Instead of a numeric distribution, each bucket represents a unique subtask within an overall "metric" representing a measured operation. This should be explored whether for operational or developmental purposes. For tracking specific queries and their health, the feeling is that logging can accomplish this better than metrics since the list of queries to monitor will vary between clusters. Additionally, operational metrics solving the cases mentioned above will give a view into the overall health of ESP which will affect the execution of queries. Depending on actions taken by these metrics, scaling may solve overload conditions to keep cluster responsiveness acceptable. For Roxie a workunit operates as a service. Measuring service performance using a histogram to capture response times as a distribution may be appropriate. Extracting the 95th percentile of response time may be useful as well. There are currently no use cases requiring consistency between values of different metrics. At this time the only concrete metric identified is the number of requests received. As the framework design progresses and ESP is instrumented, the list will grow. From information gathered, Dali plans to keep counts and rates for many of the items it manages. This section covers the design and architecture of the framework. It discusses the main areas of the design, the interactions between each area, and an overall process model of how the framework operates. The framework consists of three major areas: metrics, sinks, and the glue logic. These areas work together with the platform and the component to provide a reusable metrics collection function. Metrics represent the quantifiable component state measurements used to track and assess the status of the component. Metrics are typically scalar values that are easily aggregated by a collection system. Aggregated values provide the necessary input to take component and cluster actions such as scaling up and down. The component is responsible for creating metrics and instrumenting the code. The framework provides the support for collecting and reporting the values. Metrics provide the following: In addition, the framework provides the support for retrieving values so that the component does not participate in metric reporting. The component simply creates the metrics it needs, then instruments the component to update the metric whenever its state changes. For example, the component may create a metric that counts the total number of requests received. Then, wherever the component receives a request, a corresponding update to the count is added. Nowhere in the component is any code added to retrieve the count as that is handled by the framework. Sinks provide a pluggable interface to hide the specifics of collection systems so that the metrics framework is independent of those dependencies. Sinks: The third area of the framework is the glue logic, referred to as the MetricsManager. It manages the metrics system for the component. It provides the following: The framework is designed to be instantiated into a component as part of its process and address space. All objects instantiated as part of the framework are owned by the component and are not shareable with any other component whether local or remote. Any coordination or consistency requirements that may arise in the implementation of a sink shall be the sole responsibility of the sink. The framework is implemented within jlib. The following sections describe each area of the framework. Components use metrics to measure their internal state. Metrics can represent everything from the number of requests received to the average length some value remains cached. Components are responsible for creating and updating metrics for each measured state. The framework shall provide a set of metrics designed to cover the majority of component measurement requirements. All metrics share a common interface to allow the framework to manage them in a common way. To meet the requirement to manage metrics independent of the underlying metric state, all metrics implement a common interface. All metrics then add their specific methods to update and retrieve internal state. Generally the component uses the update method(s) to update state and the framework uses retrieval methods to get current state when reporting. The metric insures synchronized access. For components that already have an implementation that tracks a metric, the framework provides a way to instantiate a custom metric. The custom metric allows the component to leverage the existing implementation and give the framework access to the metric value for collection and reporting. Note that custom metrics only support simple scalar metrics such as a counter or a gauge. The framework defines a sink interface to support the different requirements of collection systems. Examples of collection systems are Prometheus, Datadog, and Elasticsearch. Each has different requirements for how and when measurements are ingested. The following are examples of different collection system requirements: Sinks are responsible for two main functions: initiating a collection and reporting measurements to the collection system. The Metrics Reporter provides the support to complete these functions. The sink encapsulates all of the collection system requirements providing a pluggable architecture that isolates components from these differences. The framework supports multiple sinks concurrently, each operating independently. Instrumented components are not aware of the sink or sinks in use. Sinks can be changed without requiring changes to a component. Therefore, components are independent of the collection system(s) in use. The metrics reporter class provides all of the common functions to bind together the component, the metrics it creates, and the sinks to which measurements are reported. It is responsible for the following: The sections that follow discuss metric implementations. A counter metric is a monotonically increasing value that "counts" the total occurrences of some event. Examples include the number of requests received, or the number of cache misses. Once created, the component instruments the code with updates to the count whenever appropriate. A gauge metric is a continuously updated value representing the current state of an interesting value in the component. For example, the amount of memory used in an internal buffer, or the number of requests waiting on a queue. A gauge metric may increase or decrease in value as needed. Reading the value of a gauge is a stateless operation in that there are no dependencies on the previous reading. The value returned shall always be the current state. Once created, the component shall update the gauge anytime the state of what is measured is updated. The metric shall provide methods to increase and decrease the value. The sink reads the value during collection and reporting. A custom metric is a class that allows a component to leverage existing metrics. The component creates an instance of a custom metric (a templated class) and passes a reference to the underlying metric value. When collection is performed, the custom metric simply reads the value of the metric using the reference provided during construction. The component maintains full responsibility for updating the metric value as the custom metric class provides no update methods. The component is also responsible for ensuring atomic access to the value if necessary. Records counts of measurements according to defined bucket limits. When created, the caller defines as set of bucket limits. During event recording, the component records measurements. The metric separates each recorded measurement into its bucket by testing the measurement value against each bucket limit using a less than or equal test. Each bucket contains a count of measurements meeting that criteria. Additionally, the metric maintains a default bucket for measurements outside of the maximum bucket limit. This is sometimes known as the "inf" bucket. Some storage systems, such as Prometheus, require each bucket to accumulate its measurements with the previous bucket(s). It is the responsibility of the sink to accumulate values as needed. A histogram metric that allows setting the bucket limit units in one domain, but take measurements in another domain. For example, the bucket limits may represent millisecond durations, yet it is more effecient to use execution cycles to take the measurements. A scaled histogram converts from the the measurement domain (cycles) to the limit units domain using a scale factor provided at initialization. All conversions are encapsulated in the scaled histogram class such that no external scaling is required by any consumer such as a sink. This section discusses configuration. Since Helm charts are capable of combining configuration data at a global level into a component's specific configuration, The combined configuration takes the form as shown below. Note that as the design progresses it is expected that there will be additions. Where (based on being a child of the current component): metrics : Metrics configuration for the component metrics.sinks : List of sinks defined for the component (may have been combined with global config) metrics.sinks[].type : The type for the sink. The type is substituted into the following pattern to determine the lib to load: libhpccmetrics<type><shared_object_extension> metrics.sinks[].name : A name for the sink. metrics.sinks[].settings : A set of key/value pairs passed to the sink when initialized. It should contain information necessary for the operation of the sink. Nested YML is supported. Example settings are the prometheus server name, or the collection period for a periodic sink. Metric names shall follow a convention as outlined in this section. Because different collection systems have different requirements for how metric value reports are generated, naming is split into two parts. First, each metric is given a base name that describes what the underlying value is. Second, meta data is assigned to each metric to further qualify the value. For example, a set of metrics may count the number of requests a component has received. Each metric would have the same base name, but meta data would separate types of request (GET vs POST), or disposition such as pass or fail. The following convention defines how metric names are formed: Names consist of parts separated by a period (.) Each part shall use snake case (allows for compound names in each part) Each name shall begin with a prefix representing the scop of the metric Names for metric types shall be named as follows (followed by examples): Gauges: <scope>.<plural-noun>.<state> esp.requests.waiting, esp.status_requests.waiting Counters: <scope>.<plural-noun>.<past-tense-verb> thor.requests.failed, esp.gateway_requests.queued Time: <scope>.<singular-noun>.<state or active-verb>.time dali.request.blocked.time, dali.request.process.time Meta data further qualifies a metric value. This allows metrics to have the same name, but different scopes or categories. Generally, meta data is only used to furher qualify metrics that would have the same base name, but need further distinction. An example best describes a use case for meta data. Consider a component that accepts HTTP requests, but needs to track GET and POST requests separately. Instead of defining metrics with names post_requests.received and get_requests.received, the component creates two metrics with the base name requests.received and attaches meta data describing the request type of POST to one and GET to the other. Use of meta data allows aggregating both types of requests into a single combined count of received requests while allowing a breakdown by type. Meta data is represented as a key/value pair and is attached to the metric by the component during metric creation. The sink is responsible for converting meta data into useful information for the collection system during reporting. The Component Instrumentation section covers how meta data is added to a metric. In order to instrument a component for metrics using the framework, a component must include the metrics header from jlib (jmetrics.hpp) and add jlib as a dependent lib (if not already doing so). The general steps for instrumentation are The metrics reporter is a singleton created using the platform defined singleton pattern template. The component must obtain a reference to the reporter. Use the following example: Metrics are wrapped by a standard C++ shared pointer. The component is responsible for maintaining a reference to each shared pointer during the lifetime of the metric. The framework keeps a weak pointer to each metric and thus does not maintain a reference. The following is an example of creating a counter metric and adding it to the reporter. The using namespace eliminates the need to prefix all metrics types with hpccMetrics. Its use is assumed for all code examples that follow. Note the metric type for both the shared pointer variable and in the make_shared template that creates the metric and returns a shared pointer. Simply substitute other metric types and handle any differences in the constructor arguments as needed. Once created, add updates to the metric state throughout the component code where required. Using the above example, the following line of code increments the counter metric by 1. Note that only a single line of code is required to update the metric. That's it! There are no component requirements related to collection or reporting of metric values. That is handled by the framework and loaded sinks. For convenience, there are function templates that handle creating the reporter, creating a metric, and adding the metric to the reporter. For example, the above three lines of code that created the reporter, a metric, and added it, can be replaced by the following: For convenience a similar function template exists for creating custom metrics. For a custom metric the framework must know the metric type and have a reference to the underlying state variable. The following template function handles creating a custom metric and adding it to the reporter (which is created if needed as well): Where: metricType A defined metric type as defined by the MetricType enum. value A reference to the underlying event state which must be a scalar value convertable to a 64bit unsigned integer (__uint64) A component, depending on requirements, may attach meta data to further qualify created metrics. Meta data takes the form of key value pairs. The base metric class MetricBase constructor defines a parameter for a vector of meta data. Metric subclasses also define meta data as a constructor parameter, however an empty vector is the default. The IMetric interface defines a method for retrieving the meta data. Meta data is order dependent. Below are two examples of constructing a metric with meta data. One creates the vector and passes it as a parameter, the other constructs the vector in place. Metric units are treated separately from the base name and meta data. The reason is to allow the sink to translate based on collection system requirements. The base framework provides a convenience method for converting units into a string. However, the sink is free to do any conversions, both actual units and the string representation, as needed. Metric units are defined using a subset of the StaticsMeasure enumeration values defined in jstatscodes.h. The current values are used: This document describes the design of a metrics framework that allows HPCC Systems components to implement a metric collection strategy. Metrics provide the following functionality: Alerts and monitoring An important DevOps function is to monitor the cluster and providing alerts when problems are detected. Aggregated metric values from multiple sources provide the necessary data to build a complete picture of cluster health that drives monitoring and alerts. Scaling As described above, aggregated metric data is also used to dynamically respond to changing cluster demands and load. Metrics provide the monitoring capability to react and take action Fault diagnosis and resource monitoring Metrics provide historical data useful in diagnosing problems by profiling how demand and usage patterns may change prior to a fault. Predictive analysis can also be applied. Analysis of jobs/workunits and profiling With proper instrumentation, a robust dynamic metric strategy can track workunit processing. Internal problems with queries should be diagnosed from deep drill down logging. The document consists of several sections in order to provide requirements as well as the design of framework components. Some definitions are useful. Metric : A measurement defined by a component that represents an internal state that is useful in a system reliability engineering function. In the context of the framework, a metric is an object representing the above. Metric Value : The current value of a metric. Metric Updating : The component task of updating metric state. Collection : A framework process of selecting relevant metrics based on configuration and then retrieving their values. Reporting : A framework process of converting values obtained during a collection into a format suitable for ingestion by a collection system. Trigger : What causes the collection of metric values. Collection System : The store for metric values generated during the reporting framework process. This section describes how components expect to use the framework. It is not a complete list of all requirements but rather a sample. Roxie desires to keep a count of many different internal values. Some examples are Disk type operations such as seeks and reads Execution totals Need to track items such as total numbers of items such as success and failures as well as breaking some counts into individual reasons. For example, failures may need be categorized such as as Or even by priority (high, low, sla, etc.) Current operational levels such as the length of internal queues The latency of operations such as queue results, agent responses, and gateway responses Roxie also has the need to track internal memory usage beyond the pod/system level capabilities. Tracking the state of its large fixed memory pool is necessary. The Roxie buddy system also must track how often and who is completing requests. The "I Beat You To It" set of metrics must be collected and exposed in order to detect pending node failure. While specific action on these counts is not known up front, it appears clear that these values are useful and should be collected. There does not appear to be a need for creating and destroying metrics dynamically. The set of metrics is most likely to be created at startup and remain active through the life of the Roxie. If, however, stats collection seeps into the metrics framework, dynamic creation and destruction of stats metrics is a likely requirement. There are some interesting decisions with respect to ESP and collection of metrics. Different applications within ESP present different use cases for collection. Ownership of a given task drives some of these use cases. Take workunit queues. If ownership of the task, with respect to metrics, is WsWorkunits, then use cases are centric to that component. However, if agents listening on the queue are to report metrics, then a different set of use cases emerge. It is clear that additional work is needed to generate clear ownership of metrics gathered by ESP and/or the tasks it performs. ESP needs to report the activeTransactions value from the TxSummary class(es). This gives an indication of how busy the ESP is in terms of client requests. Direct measurement of response time in requests may not be useful since the type of request causes different execution paths within ESP that are expected to take widely varying amounts of time. Creation of metrics for each method is not recommended. However, two possible solutions are to a) create a metric for request types, or b) use a histogram to measure response time ranges. Another option mentioned redefines the meaning of a bucket in a histogram. Instead of a numeric distribution, each bucket represents a unique subtask within an overall "metric" representing a measured operation. This should be explored whether for operational or developmental purposes. For tracking specific queries and their health, the feeling is that logging can accomplish this better than metrics since the list of queries to monitor will vary between clusters. Additionally, operational metrics solving the cases mentioned above will give a view into the overall health of ESP which will affect the execution of queries. Depending on actions taken by these metrics, scaling may solve overload conditions to keep cluster responsiveness acceptable. For Roxie a workunit operates as a service. Measuring service performance using a histogram to capture response times as a distribution may be appropriate. Extracting the 95th percentile of response time may be useful as well. There are currently no use cases requiring consistency between values of different metrics. At this time the only concrete metric identified is the number of requests received. As the framework design progresses and ESP is instrumented, the list will grow. From information gathered, Dali plans to keep counts and rates for many of the items it manages. This section covers the design and architecture of the framework. It discusses the main areas of the design, the interactions between each area, and an overall process model of how the framework operates. The framework consists of three major areas: metrics, sinks, and the glue logic. These areas work together with the platform and the component to provide a reusable metrics collection function. Metrics represent the quantifiable component state measurements used to track and assess the status of the component. Metrics are typically scalar values that are easily aggregated by a collection system. Aggregated values provide the necessary input to take component and cluster actions such as scaling up and down. The component is responsible for creating metrics and instrumenting the code. The framework provides the support for collecting and reporting the values. Metrics provide the following: In addition, the framework provides the support for retrieving values so that the component does not participate in metric reporting. The component simply creates the metrics it needs, then instruments the component to update the metric whenever its state changes. For example, the component may create a metric that counts the total number of requests received. Then, wherever the component receives a request, a corresponding update to the count is added. Nowhere in the component is any code added to retrieve the count as that is handled by the framework. Sinks provide a pluggable interface to hide the specifics of collection systems so that the metrics framework is independent of those dependencies. Sinks: The third area of the framework is the glue logic, referred to as the MetricsManager. It manages the metrics system for the component. It provides the following: The framework is designed to be instantiated into a component as part of its process and address space. All objects instantiated as part of the framework are owned by the component and are not shareable with any other component whether local or remote. Any coordination or consistency requirements that may arise in the implementation of a sink shall be the sole responsibility of the sink. The framework is implemented within jlib. The following sections describe each area of the framework. Components use metrics to measure their internal state. Metrics can represent everything from the number of requests received to the average length some value remains cached. Components are responsible for creating and updating metrics for each measured state. The framework shall provide a set of metrics designed to cover the majority of component measurement requirements. All metrics share a common interface to allow the framework to manage them in a common way. To meet the requirement to manage metrics independent of the underlying metric state, all metrics implement a common interface. All metrics then add their specific methods to update and retrieve internal state. Generally the component uses the update method(s) to update state and the framework uses retrieval methods to get current state when reporting. The metric insures synchronized access. For components that already have an implementation that tracks a metric, the framework provides a way to instantiate a custom metric. The custom metric allows the component to leverage the existing implementation and give the framework access to the metric value for collection and reporting. Note that custom metrics only support simple scalar metrics such as a counter or a gauge. The framework defines a sink interface to support the different requirements of collection systems. Examples of collection systems are Prometheus, Datadog, and Elasticsearch. Each has different requirements for how and when measurements are ingested. The following are examples of different collection system requirements: Sinks are responsible for two main functions: initiating a collection and reporting measurements to the collection system. The Metrics Reporter provides the support to complete these functions. The sink encapsulates all of the collection system requirements providing a pluggable architecture that isolates components from these differences. The framework supports multiple sinks concurrently, each operating independently. Instrumented components are not aware of the sink or sinks in use. Sinks can be changed without requiring changes to a component. Therefore, components are independent of the collection system(s) in use. The metrics reporter class provides all of the common functions to bind together the component, the metrics it creates, and the sinks to which measurements are reported. It is responsible for the following: The sections that follow discuss metric implementations. A counter metric is a monotonically increasing value that "counts" the total occurrences of some event. Examples include the number of requests received, or the number of cache misses. Once created, the component instruments the code with updates to the count whenever appropriate. A gauge metric is a continuously updated value representing the current state of an interesting value in the component. For example, the amount of memory used in an internal buffer, or the number of requests waiting on a queue. A gauge metric may increase or decrease in value as needed. Reading the value of a gauge is a stateless operation in that there are no dependencies on the previous reading. The value returned shall always be the current state. Once created, the component shall update the gauge anytime the state of what is measured is updated. The metric shall provide methods to increase and decrease the value. The sink reads the value during collection and reporting. A custom metric is a class that allows a component to leverage existing metrics. The component creates an instance of a custom metric (a templated class) and passes a reference to the underlying metric value. When collection is performed, the custom metric simply reads the value of the metric using the reference provided during construction. The component maintains full responsibility for updating the metric value as the custom metric class provides no update methods. The component is also responsible for ensuring atomic access to the value if necessary. Records counts of measurements according to defined bucket limits. When created, the caller defines as set of bucket limits. During event recording, the component records measurements. The metric separates each recorded measurement into its bucket by testing the measurement value against each bucket limit using a less than or equal test. Each bucket contains a count of measurements meeting that criteria. Additionally, the metric maintains a default bucket for measurements outside of the maximum bucket limit. This is sometimes known as the "inf" bucket. Some storage systems, such as Prometheus, require each bucket to accumulate its measurements with the previous bucket(s). It is the responsibility of the sink to accumulate values as needed. A histogram metric that allows setting the bucket limit units in one domain, but take measurements in another domain. For example, the bucket limits may represent millisecond durations, yet it is more effecient to use execution cycles to take the measurements. A scaled histogram converts from the the measurement domain (cycles) to the limit units domain using a scale factor provided at initialization. All conversions are encapsulated in the scaled histogram class such that no external scaling is required by any consumer such as a sink. This section discusses configuration. Since Helm charts are capable of combining configuration data at a global level into a component's specific configuration, The combined configuration takes the form as shown below. Note that as the design progresses it is expected that there will be additions. Where (based on being a child of the current component): metrics : Metrics configuration for the component metrics.sinks : List of sinks defined for the component (may have been combined with global config) metrics.sinks[].type : The type for the sink. The type is substituted into the following pattern to determine the lib to load: libhpccmetrics<type><shared_object_extension> metrics.sinks[].name : A name for the sink. metrics.sinks[].settings : A set of key/value pairs passed to the sink when initialized. It should contain information necessary for the operation of the sink. Nested YML is supported. Example settings are the prometheus server name, or the collection period for a periodic sink. Metric names shall follow a convention as outlined in this section. Because different collection systems have different requirements for how metric value reports are generated, naming is split into two parts. First, each metric is given a base name that describes what the underlying value is. Second, meta data is assigned to each metric to further qualify the value. For example, a set of metrics may count the number of requests a component has received. Each metric would have the same base name, but meta data would separate types of request (GET vs POST), or disposition such as pass or fail. The following convention defines how metric names are formed: Names consist of parts separated by a period (.) Each part shall use snake case (allows for compound names in each part) Each name shall begin with a prefix representing the scop of the metric Names for metric types shall be named as follows (followed by examples): Gauges: <scope>.<plural-noun>.<state> esp.requests.waiting, esp.status_requests.waiting Counters: <scope>.<plural-noun>.<past-tense-verb> thor.requests.failed, esp.gateway_requests.queued Time: <scope>.<singular-noun>.<state or active-verb>.time dali.request.blocked.time, dali.request.process.time Meta data further qualifies a metric value. This allows metrics to have the same name, but different scopes or categories. Generally, meta data is only used to furher qualify metrics that would have the same base name, but need further distinction. An example best describes a use case for meta data. Consider a component that accepts HTTP requests, but needs to track GET and POST requests separately. Instead of defining metrics with names post_requests.received and get_requests.received, the component creates two metrics with the base name requests.received and attaches meta data describing the request type of POST to one and GET to the other. Use of meta data allows aggregating both types of requests into a single combined count of received requests while allowing a breakdown by type. Meta data is represented as a key/value pair and is attached to the metric by the component during metric creation. The sink is responsible for converting meta data into useful information for the collection system during reporting. The Component Instrumentation section covers how meta data is added to a metric. In order to instrument a component for metrics using the framework, a component must include the metrics header from jlib (jmetrics.hpp) and add jlib as a dependent lib (if not already doing so). The general steps for instrumentation are The metrics reporter is a singleton created using the platform defined singleton pattern template. The component must obtain a reference to the reporter. Use the following example: Metrics are wrapped by a standard C++ shared pointer. The component is responsible for maintaining a reference to each shared pointer during the lifetime of the metric. The framework keeps a weak pointer to each metric and thus does not maintain a reference. The following is an example of creating a counter metric and adding it to the reporter. The using namespace eliminates the need to prefix all metrics types with hpccMetrics. Its use is assumed for all code examples that follow. Note the metric type for both the shared pointer variable and in the make_shared template that creates the metric and returns a shared pointer. Simply substitute other metric types and handle any differences in the constructor arguments as needed. Once created, add updates to the metric state throughout the component code where required. Using the above example, the following line of code increments the counter metric by 1. Note that only a single line of code is required to update the metric. That's it! There are no component requirements related to collection or reporting of metric values. That is handled by the framework and loaded sinks. For convenience, there are function templates that handle creating the reporter, creating a metric, and adding the metric to the reporter. For example, the above three lines of code that created the reporter, a metric, and added it, can be replaced by the following: For convenience a similar function template exists for creating custom metrics. For a custom metric the framework must know the metric type and have a reference to the underlying state variable. The following template function handles creating a custom metric and adding it to the reporter (which is created if needed as well): Where: metricType A defined metric type as defined by the MetricType enum. value A reference to the underlying event state which must be a scalar value convertable to a 64bit unsigned integer (__uint64) A component, depending on requirements, may attach meta data to further qualify created metrics. Meta data takes the form of key value pairs. The base metric class MetricBase constructor defines a parameter for a vector of meta data. Metric subclasses also define meta data as a constructor parameter, however an empty vector is the default. The IMetric interface defines a method for retrieving the meta data. Meta data is order dependent. Below are two examples of constructing a metric with meta data. One creates the vector and passes it as a parameter, the other constructs the vector in place. Metric units are treated separately from the base name and meta data. The reason is to allow the sink to translate based on collection system requirements. The base framework provides a convenience method for converting units into a string. However, the sink is free to do any conversions, both actual units and the string representation, as needed. Metric units are defined using a subset of the StaticsMeasure enumeration values defined in jstatscodes.h. The current values are used: Documentation about the new file work. YAML files. The following are the YAML definitions which are used to serialize file information from dali/external store to the engines and if necessary to the worker nodes. This is already covered in the deployed helm charts. It has been extended and rationalized slightly. storage: : hostGroups: - name: <required> hosts: [ .... ] - name: <required> hostGroup: <name> count: <unsigned:#hosts> # how many hosts within the host group are used ?(default is number of hosts) offset: <unsigned:0> # index of first host included in the derived group delta: <unsigned:0> # first host within the range[offset..offset+count-1] in the derived group Changes: * The replication information has been removed from the storage plane. It will now be specified on the thor instance indicating where (if anywhere) files are replicated. * The hash character (#) in a prefix or a secret name will be substituted with the device number. This replaces the old includeDeviceInPath property. This allows more flexible device substition for both local mounts and storage accounts. The number of hashes provides the default padding for the device number. (Existing Helm charts will need to be updated to follow these new rules.) * Neither thor or roxie replication is special cased. They are represented as multiple locations that the file lives (see examples below). Existing baremetal environments would be mapped to this new representation with implicit replication planes. (It is worth checking the mapping to roxie is fine.) file: - name: <logical-file-name> format: <type> # e.g. flat, csv, xml, key, parquet meta: <binary> # (opt) format of the file, (serialized hpcc field information). metaCrc: <unsigned> # hash of the meta numParts # How many file parts. singlePartNoSuffix: <boolean> # Does a single part file include .part_1_of_1? numRows: # total number of rows in the file (if known) rawSize: # total uncompressed size diskSize # is this useful? when binary copying? planes: [] # list of storage planes that the file is stored on. tlk: # ???Should the tlk be stored in the meta and returned? splitType: <split-format> # Are there associated split points, and if so what format? (And if so what variant?) #options relating to the format of the input file: : grouped: <boolean> # is the file grouped? compressed: <boolean> blockCompressed: <boolean> formatOptions: # Any options that relate to the file format e.g. csvTerminator. These are nested because they can be completely free format recordSize: # if a fixed size record. Not really sure it is useful # extra fields that are used to return information from the file lookup service missing: <boolean> # true if the file could not be found external: <boolean> # filename of the form external:: or plane: If the information needs to be signed to be passed to dafilesrv for example, the entire structure of (storage, files) is serialized, and compressed, and that then signed. Logically executed on the engine, and retrived from dali or in future versions from an esp service (even if for remote reads). GetFileInfomation(<logical-filename>, <options>) The logical-filename can be any logical name - including a super file, or an implicit superfile. options include: * Are compressed sizes needed? * Are signatures required? * Is virtual fileposition (non-local) required? * name of the user This returns a structure that provides information about a list of files meta: : hostGroups: storage: files: secrets: #The secret names are known, how do we know which keys are required for those secrets? Some key questions: * Should the TLK be in the dali meta information? [Possibly, but not in first phase. ] * Should the split points be in the dali meta information? [Probably not, but the meta should indicate whether they exist, and if so what format they are. ] * Super files (implicit or explicit) can contain the same file information more than once. Should it be duplicated, or have a flag to indicate a repeat. [I suspect this is fairly uncommon, so duplication would be fine for the first version.] * What storage plane information is serialized back? [ all is simplest. Can optimize later. ] NOTE: This doesn't address the question of writing to a disk file... Local class for interpreting the results. Logically executed on the manager, and may gather extra information that will be serialized to all workers. The aim is that the same class implementations are used by all the engines (and fileview in esp). MasterFileCollection : RemoteFileCollection : FileCollection(eclReadOptions, eclFormatOptions, wuid, user, expectedMeta, projectedMeta); MasterFileCollection //Master has access to dali RemoteFileCollection : has access to remote esp // think some more FileCollection::GatherFileInformation(<logical-filename>, gatherOptions); - potentially called once per query. - class is responsible for optimizing case where it matches the previous call (e.g. in a child query). - possibly responsible for retrieving the split points () Following options are used to control whether split points are retrieved when file information is gathered * number of channels reading the data? * number of strands reading each channel? * preserve order? gatherOptions: * is it a temporary file? This class serializes all information to every worker, where it is used to recereate a copy of the master filecollection. This will contain information derived from dali, and locally e.g. options specified in the activity helper. Each worker has a complete copy of the file information. (This is similar to dafilesrv with security tokens.) The files that are actually required by a worker are calculated by calling the following function. (Note the derived information is not serialized.) FilePartition FileCollection::calculatePartition(numChannels, partitionOptions) partitionOptions: * number of channels reading the data? * number of strands reading each channel? * which channel? * preserve order? * myIP A file partition contains a list of file slices: class FileSlice (not serialized) { IMagicRowStream * createRowStream(filter, ...); // MORE! File * logicalFile; offset_t startOffset; offset_t endOffset; }; Things to bear in mind: - Optimize same file reused in a child query (filter likely to change) - Optimize same format reused in a child query (filename may be dynamic) - Intergrating third party file formats and distributed file systems may require extra information. - optimize reusing the format options. - ideally fail over to a backup copy midstream.. and retry in failed read e.g. if network fault Example definition for a thor400, and two thor200s on the same nodes: hostGroup: - name: thor400Group host: [node400_01,node400_02,node400_03,...node400_400] storage: : planes: #Simple 400 way thor - name: thor400 prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group #The storage plane used for replicating files on thor. - name: thor400_R1 prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group offset: 1 # A 200 way thor using the first 200 nodes as the thor 400 - name: thor200A prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group size: 200 # A 200 way thor using the second 200 nodes as the thor 400 - name: thor200B prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group size: 200 start: 200 # The replication plane for a 200 way thor using the second 200 nodes as the thor 400 - name: thor200B_R1 prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group size: 200 start: 200 offset: 1 # A roxie storage where 50way files are stored on a 100 way roxie - name: roxie100 prefix: /var/lib/HPCCSystems/roxie100 hosts: thor400Group size: 50 # The replica of the roxie storage where 50way files are stored on a 100 way roxie - name: roxie100_R1 prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group start: 50 size: 50 device = (start + (part + offset) % size; size <= numDevices offset < numDevices device <= numDevices; There is no special casing of roxie replication, and each file exists on multiple storage planes. All of these should be considered when determining which is the best copy to read from a particular engine node. Creating storage planes from an existing systems [implemented] a) Create baremetal storage planes [done] b) [a] Start simplifying information in dali meta (e.g. partmask, remove full path name) c) [a] Switch reading code to use storageplane away from using dali path and environment paths - in ALL disk reading and writing code - change numDevices so it matches the container d) [c] Convert dali information from using copies to multiple groups/planese) [a] Reimplement the current code to create an IPropertyTree from dali file information (in a form that can be reused in dali) *f) [e] Refactor existing PR to use data in an IPropertyTree and cleanly separate the interfaces. g) Switch hthor over to using the new classes by default and work through all issues h) Refactor stream reading code. Look at the spark interfaces for inspiration/compatibility i) Refactor disk writing code into common class? j) [e] create esp service for accessing meta information k) [h] Refactor and review azure blob code l) [k] Re-implement S3 reading and writing code. m) Switch fileview over to using the new classes. (Great test they can be used in another context + fixes a longstanding bug.) ) Implications for index reading? Will they end up being treated as a normal file? Don't implement for 8.0, although interface may support it. *) My primary focus for initial work. Buffer sizes: - storage plane specifies an optimal reading minimum - compression may have a requirement - the use for the data may impose a requirement e.g. a subset of the data, or only fetching a single record Look at lambda functions to create split points for a file. Can we use the java classes to implement it on binary files (and csv/xml)? ****************** Reading classes and meta information****************** meta comes from a combination of the information in dfs and the helper The main meta information uses the same structure that is return by the function that returns file infromation from dali. The format specific options are contained in a nested attribute so they can be completely arbitrary The helper class also generates a meta structure. Some options fill in root elements - e.g. compressed. Some fill in a new section (hints: @x=y). The format options are generated from the paramaters to the dataset format. note normally there is only a single (or very few) files, so merging isn't too painful. queryMeta() queryOptions() rename meta to format? ??? Where does DFUserver fit in in a container system? DFU has the following main functionality in a bare metal system: a) Spray a file from a 1 way landing zone to an N-way thor b) Convert file format when spraying. I suspect utf-16->utf8 is the only option actually used. c) Spray multiple files from a landing zone to a single logical file on an N-way thor d) Copy a logical file from a remote environment e) Despray a logical file to an external landing zone. f) Replicate an existing logical file on a given group. g) Copy logical files between groups h) File monitoring i) logical file operations j) superfile operations ECL has the ability to read a logical file directly from a landingzone using 'FILE::<ip>' file syntax, but I don't think it is used very frequently. How does this map to a containerized system? I think the same basic operations are likely to be useful. a) In most scenarios Landing zones are likely to be replaced with (blob) storage accounts. But for security reasons these are likely to remain distinct from the main location used by HPCC to store datasets. (The customer will have only access keys to copy files to and from those storage accounts.) The containerized system has a way for ECL to directly read from a blob storage account ('PLANE::<plane'), but I imagine users will still want to copy the files in many situations to control the lifetime of the copies etc. b) We still need a way to convert from utf16 to utf8, or extend the platform to allow utf16 to be read directly. c) This is still equally useful, allowing a set of files to be stored as a single file in a form that is easy for ECL to process. d) Important for copying data from an existing bare metal system to the cloud, and from a cloud system back to a bare metal system. e) Useful for exporting results to customers f+g) Essentially the same thing in the cloud world. It might still be useful to have h) I suspect we will need to map this to cloud-specific apis. i+j) Just as applicable in the container world. Broadly, landing zones in bare metal map to special storage planes in containerized, and groups also map to more general storage planes. There are a couple of complications connected with the implementation: Suggestions: => Milestones a) Move ftslave code to dafilesrv (partition, pull, push) [Should be included in 7.12.x stream to allow remote read compatibility?] b) Create a dafilesrv component to the helm charts, with internal and external services. c) use storage planes to determine how files are sprayed etc. (bare-metal, #devices) Adapt dfu/fileservices calls to take (storageplane,number) instead of cluster. There should already be a 1:1 mapping from existing cluster to storage planes in a bare-metal system, so this may not involve much work. [May also need a flag to indicate if ._1_of_1 is appended?] d) Select correct dafilesrv for bare-metal storage planes, or load balanced service for other. (May need to think through how remote files are represented.) => Can import from a bare metal system or a containerized system using command line?? : NOTE: Bare-metal to containerized will likely need push operations on the bare-metal system. (And therefore serialized security information) This may still cause issues since it is unlikely containerized will be able to pull from bare-metal. Pushing, but not creating a logical file entry on the containerized system should be easier since it can use a local storage plane definition. e) Switch over to using the esp based meta information, so that it can include details of storage planes and secrets. : [Note this would also need to be in 7.12.x to allow remote export to containerized, that may well be a step too far] f) Add option to configure the number of file parts for spray/copy/despray g) Ensure that eclwatch picks up the list of storage planes (and the default number of file parts), and has ability to specify #parts. Later: h) plan how cloud-services can be used for some of the copies i) investigate using serverless functions to calculate split points. j) Use refactored disk read/write interfaces to clean up read and copy code. k) we may not want to expose access keys to allow remote reads/writes - in which they would need to be pushed from a bare-metal dafilesrv to a containerized dafilesrv. Other dependencies: * Refactored file meta information. If this is switching to being plane based, then the meta information should also be plane based. Main difference is not including the path in the meta information (can just be ignored) * esp service for getting file information. When reading remotely it needs to go via this now... Documentation about the new file work. YAML files. The following are the YAML definitions which are used to serialize file information from dali/external store to the engines and if necessary to the worker nodes. This is already covered in the deployed helm charts. It has been extended and rationalized slightly. storage: : hostGroups: - name: <required> hosts: [ .... ] - name: <required> hostGroup: <name> count: <unsigned:#hosts> # how many hosts within the host group are used ?(default is number of hosts) offset: <unsigned:0> # index of first host included in the derived group delta: <unsigned:0> # first host within the range[offset..offset+count-1] in the derived group Changes: * The replication information has been removed from the storage plane. It will now be specified on the thor instance indicating where (if anywhere) files are replicated. * The hash character (#) in a prefix or a secret name will be substituted with the device number. This replaces the old includeDeviceInPath property. This allows more flexible device substition for both local mounts and storage accounts. The number of hashes provides the default padding for the device number. (Existing Helm charts will need to be updated to follow these new rules.) * Neither thor or roxie replication is special cased. They are represented as multiple locations that the file lives (see examples below). Existing baremetal environments would be mapped to this new representation with implicit replication planes. (It is worth checking the mapping to roxie is fine.) file: - name: <logical-file-name> format: <type> # e.g. flat, csv, xml, key, parquet meta: <binary> # (opt) format of the file, (serialized hpcc field information). metaCrc: <unsigned> # hash of the meta numParts # How many file parts. singlePartNoSuffix: <boolean> # Does a single part file include .part_1_of_1? numRows: # total number of rows in the file (if known) rawSize: # total uncompressed size diskSize # is this useful? when binary copying? planes: [] # list of storage planes that the file is stored on. tlk: # ???Should the tlk be stored in the meta and returned? splitType: <split-format> # Are there associated split points, and if so what format? (And if so what variant?) #options relating to the format of the input file: : grouped: <boolean> # is the file grouped? compressed: <boolean> blockCompressed: <boolean> formatOptions: # Any options that relate to the file format e.g. csvTerminator. These are nested because they can be completely free format recordSize: # if a fixed size record. Not really sure it is useful # extra fields that are used to return information from the file lookup service missing: <boolean> # true if the file could not be found external: <boolean> # filename of the form external:: or plane: If the information needs to be signed to be passed to dafilesrv for example, the entire structure of (storage, files) is serialized, and compressed, and that then signed. Logically executed on the engine, and retrived from dali or in future versions from an esp service (even if for remote reads). GetFileInfomation(<logical-filename>, <options>) The logical-filename can be any logical name - including a super file, or an implicit superfile. options include: * Are compressed sizes needed? * Are signatures required? * Is virtual fileposition (non-local) required? * name of the user This returns a structure that provides information about a list of files meta: : hostGroups: storage: files: secrets: #The secret names are known, how do we know which keys are required for those secrets? Some key questions: * Should the TLK be in the dali meta information? [Possibly, but not in first phase. ] * Should the split points be in the dali meta information? [Probably not, but the meta should indicate whether they exist, and if so what format they are. ] * Super files (implicit or explicit) can contain the same file information more than once. Should it be duplicated, or have a flag to indicate a repeat. [I suspect this is fairly uncommon, so duplication would be fine for the first version.] * What storage plane information is serialized back? [ all is simplest. Can optimize later. ] NOTE: This doesn't address the question of writing to a disk file... Local class for interpreting the results. Logically executed on the manager, and may gather extra information that will be serialized to all workers. The aim is that the same class implementations are used by all the engines (and fileview in esp). MasterFileCollection : RemoteFileCollection : FileCollection(eclReadOptions, eclFormatOptions, wuid, user, expectedMeta, projectedMeta); MasterFileCollection //Master has access to dali RemoteFileCollection : has access to remote esp // think some more FileCollection::GatherFileInformation(<logical-filename>, gatherOptions); - potentially called once per query. - class is responsible for optimizing case where it matches the previous call (e.g. in a child query). - possibly responsible for retrieving the split points () Following options are used to control whether split points are retrieved when file information is gathered * number of channels reading the data? * number of strands reading each channel? * preserve order? gatherOptions: * is it a temporary file? This class serializes all information to every worker, where it is used to recereate a copy of the master filecollection. This will contain information derived from dali, and locally e.g. options specified in the activity helper. Each worker has a complete copy of the file information. (This is similar to dafilesrv with security tokens.) The files that are actually required by a worker are calculated by calling the following function. (Note the derived information is not serialized.) FilePartition FileCollection::calculatePartition(numChannels, partitionOptions) partitionOptions: * number of channels reading the data? * number of strands reading each channel? * which channel? * preserve order? * myIP A file partition contains a list of file slices: class FileSlice (not serialized) { IMagicRowStream * createRowStream(filter, ...); // MORE! File * logicalFile; offset_t startOffset; offset_t endOffset; }; Things to bear in mind: - Optimize same file reused in a child query (filter likely to change) - Optimize same format reused in a child query (filename may be dynamic) - Intergrating third party file formats and distributed file systems may require extra information. - optimize reusing the format options. - ideally fail over to a backup copy midstream.. and retry in failed read e.g. if network fault Example definition for a thor400, and two thor200s on the same nodes: hostGroup: - name: thor400Group host: [node400_01,node400_02,node400_03,...node400_400] storage: : planes: #Simple 400 way thor - name: thor400 prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group #The storage plane used for replicating files on thor. - name: thor400_R1 prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group offset: 1 # A 200 way thor using the first 200 nodes as the thor 400 - name: thor200A prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group size: 200 # A 200 way thor using the second 200 nodes as the thor 400 - name: thor200B prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group size: 200 start: 200 # The replication plane for a 200 way thor using the second 200 nodes as the thor 400 - name: thor200B_R1 prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group size: 200 start: 200 offset: 1 # A roxie storage where 50way files are stored on a 100 way roxie - name: roxie100 prefix: /var/lib/HPCCSystems/roxie100 hosts: thor400Group size: 50 # The replica of the roxie storage where 50way files are stored on a 100 way roxie - name: roxie100_R1 prefix: /var/lib/HPCCSystems/thor400 hosts: thor400Group start: 50 size: 50 device = (start + (part + offset) % size; size <= numDevices offset < numDevices device <= numDevices; There is no special casing of roxie replication, and each file exists on multiple storage planes. All of these should be considered when determining which is the best copy to read from a particular engine node. Creating storage planes from an existing systems [implemented] a) Create baremetal storage planes [done] b) [a] Start simplifying information in dali meta (e.g. partmask, remove full path name) c) [a] Switch reading code to use storageplane away from using dali path and environment paths - in ALL disk reading and writing code - change numDevices so it matches the container d) [c] Convert dali information from using copies to multiple groups/planese) [a] Reimplement the current code to create an IPropertyTree from dali file information (in a form that can be reused in dali) *f) [e] Refactor existing PR to use data in an IPropertyTree and cleanly separate the interfaces. g) Switch hthor over to using the new classes by default and work through all issues h) Refactor stream reading code. Look at the spark interfaces for inspiration/compatibility i) Refactor disk writing code into common class? j) [e] create esp service for accessing meta information k) [h] Refactor and review azure blob code l) [k] Re-implement S3 reading and writing code. m) Switch fileview over to using the new classes. (Great test they can be used in another context + fixes a longstanding bug.) ) Implications for index reading? Will they end up being treated as a normal file? Don't implement for 8.0, although interface may support it. *) My primary focus for initial work. Buffer sizes: - storage plane specifies an optimal reading minimum - compression may have a requirement - the use for the data may impose a requirement e.g. a subset of the data, or only fetching a single record Look at lambda functions to create split points for a file. Can we use the java classes to implement it on binary files (and csv/xml)? ****************** Reading classes and meta information****************** meta comes from a combination of the information in dfs and the helper The main meta information uses the same structure that is return by the function that returns file infromation from dali. The format specific options are contained in a nested attribute so they can be completely arbitrary The helper class also generates a meta structure. Some options fill in root elements - e.g. compressed. Some fill in a new section (hints: @x=y). The format options are generated from the paramaters to the dataset format. note normally there is only a single (or very few) files, so merging isn't too painful. queryMeta() queryOptions() rename meta to format? ??? Where does DFUserver fit in in a container system? DFU has the following main functionality in a bare metal system: a) Spray a file from a 1 way landing zone to an N-way thor b) Convert file format when spraying. I suspect utf-16->utf8 is the only option actually used. c) Spray multiple files from a landing zone to a single logical file on an N-way thor d) Copy a logical file from a remote environment e) Despray a logical file to an external landing zone. f) Replicate an existing logical file on a given group. g) Copy logical files between groups h) File monitoring i) logical file operations j) superfile operations ECL has the ability to read a logical file directly from a landingzone using 'FILE::<ip>' file syntax, but I don't think it is used very frequently. How does this map to a containerized system? I think the same basic operations are likely to be useful. a) In most scenarios Landing zones are likely to be replaced with (blob) storage accounts. But for security reasons these are likely to remain distinct from the main location used by HPCC to store datasets. (The customer will have only access keys to copy files to and from those storage accounts.) The containerized system has a way for ECL to directly read from a blob storage account ('PLANE::<plane'), but I imagine users will still want to copy the files in many situations to control the lifetime of the copies etc. b) We still need a way to convert from utf16 to utf8, or extend the platform to allow utf16 to be read directly. c) This is still equally useful, allowing a set of files to be stored as a single file in a form that is easy for ECL to process. d) Important for copying data from an existing bare metal system to the cloud, and from a cloud system back to a bare metal system. e) Useful for exporting results to customers f+g) Essentially the same thing in the cloud world. It might still be useful to have h) I suspect we will need to map this to cloud-specific apis. i+j) Just as applicable in the container world. Broadly, landing zones in bare metal map to special storage planes in containerized, and groups also map to more general storage planes. There are a couple of complications connected with the implementation: Suggestions: => Milestones a) Move ftslave code to dafilesrv (partition, pull, push) [Should be included in 7.12.x stream to allow remote read compatibility?] b) Create a dafilesrv component to the helm charts, with internal and external services. c) use storage planes to determine how files are sprayed etc. (bare-metal, #devices) Adapt dfu/fileservices calls to take (storageplane,number) instead of cluster. There should already be a 1:1 mapping from existing cluster to storage planes in a bare-metal system, so this may not involve much work. [May also need a flag to indicate if ._1_of_1 is appended?] d) Select correct dafilesrv for bare-metal storage planes, or load balanced service for other. (May need to think through how remote files are represented.) => Can import from a bare metal system or a containerized system using command line?? : NOTE: Bare-metal to containerized will likely need push operations on the bare-metal system. (And therefore serialized security information) This may still cause issues since it is unlikely containerized will be able to pull from bare-metal. Pushing, but not creating a logical file entry on the containerized system should be easier since it can use a local storage plane definition. e) Switch over to using the esp based meta information, so that it can include details of storage planes and secrets. : [Note this would also need to be in 7.12.x to allow remote export to containerized, that may well be a step too far] f) Add option to configure the number of file parts for spray/copy/despray g) Ensure that eclwatch picks up the list of storage planes (and the default number of file parts), and has ability to specify #parts. Later: h) plan how cloud-services can be used for some of the copies i) investigate using serverless functions to calculate split points. j) Use refactored disk read/write interfaces to clean up read and copy code. k) we may not want to expose access keys to allow remote reads/writes - in which they would need to be pushed from a bare-metal dafilesrv to a containerized dafilesrv. Other dependencies: * Refactored file meta information. If this is switching to being plane based, then the meta information should also be plane based. Main difference is not including the path in the meta information (can just be ignored) * esp service for getting file information. When reading remotely it needs to go via this now... This directory contains the documentation specifically targeted at developers of the HPCC system. TIP These documents are generated from Markdown by VitePress. See VitePress Markdown for more details. The ECL language is documented in the ecl language reference manual (generated as ECLLanguageReference-<version>.pdf). This directory contains the documentation specifically targeted at developers of the HPCC system. TIP These documents are generated from Markdown by VitePress. See VitePress Markdown for more details. The ECL language is documented in the ecl language reference manual (generated as ECLLanguageReference-<version>.pdf). This document covers security configuration values and meanings. It does not serve as the source for how to configure security, but rather what the different values mean. These are not covered in the docs nor does any reasonable help information exist in the config manager or yaml files. Security is configured either through an LDAP server or a plugin. Additionally, these are supported in both legacy deployments that use environment.xml and containerized deployments using Kubernetes and Helm charts. While these methods differ, the configuration values remain the same. Focus is placed on the different values and not the deployment target. Differences based on deployment can be found in the relevant platform documents. Security is implemented via a security manager interface. Managers are loaded and used by components within the system to check authorization and authentication. LDAP is an exception to the loadable manager model. It is not a compliant loadable module like other security plugins. For that reason, the configuration for each is separated into two sections below: LDAP and Plugin Security Managers. LDAP is a protocol that connects to an Active Directory server (AD). The term LDAP is used interchangeably with AD. Below are the configuration values for an LDAP connection. These are valid for both legacy (environment.xml) and containerized deployments. For legacy deployments the configuration manager is the primary vehicle for setting these values. However, some values are not available through the tool and must be set manually in the environment.xml if needed for a legacy deployment. In containerized environments, a LDAP configuration block is required for each component. Currently, this results in a verbose configuration where much of the information is repeated. LDAP is capable if handling user authentication and feature access authorization (such as filescopes). Notes: Plugin security managers are separate shared objects loaded and initialized by the system. The manager interface is passed to components in order to provide necessary security functions. Each plugin has its own configuration. HPCC components can be configured to use a plugin as needed. See documentation for the settings and how to enable. To be added. To be added This document covers security configuration values and meanings. It does not serve as the source for how to configure security, but rather what the different values mean. These are not covered in the docs nor does any reasonable help information exist in the config manager or yaml files. Security is configured either through an LDAP server or a plugin. Additionally, these are supported in both legacy deployments that use environment.xml and containerized deployments using Kubernetes and Helm charts. While these methods differ, the configuration values remain the same. Focus is placed on the different values and not the deployment target. Differences based on deployment can be found in the relevant platform documents. Security is implemented via a security manager interface. Managers are loaded and used by components within the system to check authorization and authentication. LDAP is an exception to the loadable manager model. It is not a compliant loadable module like other security plugins. For that reason, the configuration for each is separated into two sections below: LDAP and Plugin Security Managers. LDAP is a protocol that connects to an Active Directory server (AD). The term LDAP is used interchangeably with AD. Below are the configuration values for an LDAP connection. These are valid for both legacy (environment.xml) and containerized deployments. For legacy deployments the configuration manager is the primary vehicle for setting these values. However, some values are not available through the tool and must be set manually in the environment.xml if needed for a legacy deployment. In containerized environments, a LDAP configuration block is required for each component. Currently, this results in a verbose configuration where much of the information is repeated. LDAP is capable if handling user authentication and feature access authorization (such as filescopes). Notes: Plugin security managers are separate shared objects loaded and initialized by the system. The manager interface is passed to components in order to provide necessary security functions. Each plugin has its own configuration. HPCC components can be configured to use a plugin as needed. See documentation for the settings and how to enable. To be added. To be added This document covers user authentication, the process of verifying the identity of the user. Authorization is a separate topic covering whether a user should be allowed to perform a specific operation of access a specific resource. Each supported security manager is covered. Generally, when authentication is needed, the security manager client should call the ISecManagerauthenticateUser method. The method also allows the caller to detect if the user being authenticated is a superuser. Use of that feature is beyond the scope of this document. In practice, this method is rarely if ever called. User authentication is generally performed as part of authorization. This is covered in more detail below. This section covers how each supported security manager handles user authentication. As stated above, the method authenticateUser is defined for this purpose. However, other methods also perform user authentication. The sections that follow describe in general how each security manager performs user authentication, whether from directly calling the authenticateUser method, or as an ancillary action taken when another method is called. The LDAP security manager uses the configured Active Directory to authenticate users. Once authenticated, the user is added to the permissions cache, if enabled, to prevent repeated trips to the AD whenever an authentication check is required. If caching is enabled, a lookup is done to see if the user is already cached. If so, the cached user authentication status is returned. Note that the cached status remains until either the cache time to live expires or is cleared either manually or through some other programmatic action. If caching is not enabled, a request is sent to the AD to validate the user credentials. In either case, if digital signatures are configured, the user is also digitally signed using the username. Digitally signing the user allows for quick authentication by validating the signature against the username. During initial authentication, if the digital signature exists, it is verified to provide a fast way to authenticate the user. If the signature is not verified, the user is marked as not authenticated. Authentication status is stored in the security user object so that further checks are not necessary when the same user object is used in multiple calls to the security manager. Authentication in the htpasswd manager does not support singularly authenticating the user without also authorizing resource access. See the special case for authentication with authorization below. Regardless, the htpasswd manager authenticates users using the .htpasswd file that is installed on the cluster. It does so by finding the user in the file and verifying that the input hashed password matches the stored hashed password in file. The single user security manager allows the definition of a single username with a password. The values are set in the environment configuration and are read during the initialization of the manager. All authentication requests validate against the configured username and password. The process is a simple comparison. Note that the password stored in the environment is hashed. Since resource access authorization requires an authenticated user, the authorization process also authenticates the user before checking authorization. There are a couple of advantages to this The authenticate method, or any of its overloads or derivatives, accepts a resource or resource list and a user. These methods authenticate the user first before checking access to the specified resource. ECL Watch uses user authentication during authorization during its log in process. Instead of first authenticating the user, it calls an authenticate method passing both the user and the necessary resources for which the user must have access in order to log into ECL Watch. This document covers user authentication, the process of verifying the identity of the user. Authorization is a separate topic covering whether a user should be allowed to perform a specific operation of access a specific resource. Each supported security manager is covered. Generally, when authentication is needed, the security manager client should call the ISecManagerauthenticateUser method. The method also allows the caller to detect if the user being authenticated is a superuser. Use of that feature is beyond the scope of this document. In practice, this method is rarely if ever called. User authentication is generally performed as part of authorization. This is covered in more detail below. This section covers how each supported security manager handles user authentication. As stated above, the method authenticateUser is defined for this purpose. However, other methods also perform user authentication. The sections that follow describe in general how each security manager performs user authentication, whether from directly calling the authenticateUser method, or as an ancillary action taken when another method is called. The LDAP security manager uses the configured Active Directory to authenticate users. Once authenticated, the user is added to the permissions cache, if enabled, to prevent repeated trips to the AD whenever an authentication check is required. If caching is enabled, a lookup is done to see if the user is already cached. If so, the cached user authentication status is returned. Note that the cached status remains until either the cache time to live expires or is cleared either manually or through some other programmatic action. If caching is not enabled, a request is sent to the AD to validate the user credentials. In either case, if digital signatures are configured, the user is also digitally signed using the username. Digitally signing the user allows for quick authentication by validating the signature against the username. During initial authentication, if the digital signature exists, it is verified to provide a fast way to authenticate the user. If the signature is not verified, the user is marked as not authenticated. Authentication status is stored in the security user object so that further checks are not necessary when the same user object is used in multiple calls to the security manager. Authentication in the htpasswd manager does not support singularly authenticating the user without also authorizing resource access. See the special case for authentication with authorization below. Regardless, the htpasswd manager authenticates users using the .htpasswd file that is installed on the cluster. It does so by finding the user in the file and verifying that the input hashed password matches the stored hashed password in file. The single user security manager allows the definition of a single username with a password. The values are set in the environment configuration and are read during the initialization of the manager. All authentication requests validate against the configured username and password. The process is a simple comparison. Note that the password stored in the environment is hashed. Since resource access authorization requires an authenticated user, the authorization process also authenticates the user before checking authorization. There are a couple of advantages to this The authenticate method, or any of its overloads or derivatives, accepts a resource or resource list and a user. These methods authenticate the user first before checking access to the specified resource. ECL Watch uses user authentication during authorization during its log in process. Instead of first authenticating the user, it calls an authenticate method passing both the user and the necessary resources for which the user must have access in order to log into ECL Watch. Everyone has their own ideas of what the best code formatting style is, but most would agree that code in a mixture of styles is the worst of all worlds. A consistent coding style makes unfamiliar code easier to understand and navigate. In an ideal world, the HPCC sources would adhere to the coding standards described perfectly. In reality, there are many places that do not. These are being cleaned up as and when we find time. Unlike most software projects around, HPCC has some very specific constraints that makes most basic design decisions difficult, and often the results are odd to developers getting acquainted with its code base. For example, when HPCC was initially developed, most common-place libraries we have today (like STL and Boost) weren't available or stable enough at the time. Also, at the beginning, both C++ and Java were being considered as the language of choice, but development started with C++. So a C++ library that copied most behaviour of the Java standard library (At the time, Java 1.4) was created (see jlib below) to make the transition, if ever taken, easier. The transition never happened, but the decisions were taken and the whole platform is designed on those terms. Most importantly, the performance constraints in HPCC can make no-brainer decisions look impossible in HPCC. One example is the use of traditional smart pointers implementations (such as boost::shared_ptr or C++'s auto_ptr), that can lead to up to 20% performance hit if used instead of our internal shared pointer implementation. The last important point to consider is that some libraries/systems were designed to replace older ones but haven't got replaced yet. There is a slow movement to deprecate old systems in favour of consolidating a few ones as the elected official ways to use HPCC (Thor, Roxie) but old systems still could be used for years in tests or legacy sub-systems. In a nutshell, expect re-implementation of well-known containers and algorithms, expect duplicated functionality of sub-systems and expect to be required to use less-friendly libraries for the sake of performance, stability and longevity. For the most part out coding style conventions match those described at http://geosoft.no/development/cppstyle.html, with a few exceptions or extensions as noted below. We use the extension .cpp for C++ source files, and .h or .hpp for header files. Header files with the .hpp extension should be used for headers that are internal to a single library, while header files with the .h extension should be used for the interface that the library exposes. There will typically be one .h file per library, and one .hpp file per cpp file. Source file names within a single shared library should share a common prefix to aid in identifying where they belong. Header files with extension .ipp (i for internal) and .tpp (t for template) will be phased out in favour of the scheme described above. We adopted a Java-like inheritance model, with macro substitution for the basic Java keywords. This changes nothing on the code, but make it clearer for the reader on what's the recipient of the inheritance doing with it's base. There is no semantic check, which makes it difficult to enforce such scheme, which has led to code not using it intermixed with code using it. You should use it when possible, most importantly on code that already uses it. We also tend to write methods inline, which matches well with C++ Templates requirements. We, however, do not enforce the one-class-per-file rule. See the Interfaces section for more information on our implementation of interfaces. Class and interface names are in CamelCase with a leading capital letter. Interface names should be prefixed capital I followed by another capital. Class names may be prefixed with a C if there is a corresponding I-prefixed interface name, e.g. when the interface is primarily used to create an opaque type, but need not be otherwise. Variables, function and method names, and parameters use camelCase starting with a lower case letter. Parameters may be prefixed with underscore when the parameter is used to initialize a member variable of the same name. Common cases are constructors and setter methods. Example: Use real pointers when you can, and smart pointers when you have to. Take extra care on understanding the needs of your pointers and their scope. Most programs can afford a few dangling pointers, but a high-performance clustering platform cannot. Most importantly, use common sense and a lot of thought. Here are a few guidelines: Warning: Direct manipulation of the ownership might cause Refer to [Reference counted objects]{.title-ref} for more information on our smart pointer implementation, Methods that return pointers to link counted objects, or that use them, should use a common naming standard: We use 4 spaces to indent each level. TAB characters should not be used. The { that starts a new scope and the corresponding } to close it are placed on a new line by themselves, and are not indented. This is sometimes known as the Allman or ANSI style. We generally believe in the philosophy that well written code is self-documenting. Comments are also encouraged to describe why something is done, rather than how - which should be clear from the code. javadoc-formatted comments for classes and interfaces are being added. The virtual keyword should be included on the declaration of all virtual functions - including those in derived classes, and the override keyword should be used on all virtual functions in derived classes. MORE: Update!!! We do not use namespaces. We probably should, following the Google style guide's guidelines - see http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Namespaces We often pretend we are coding in Java and write all our class members inline. The ECL style guide is published separately. We use the commonly accepted conventions for formatting these files. Consistent use of design patterns helps make the code easy to understand. While C++ does not have explicit support for interfaces (in the java sense), an abstract class with no data members and all functions pure virtual can be used in the same way. Interfaces are pure virtual classes. They are similar concepts to Java's interfaces and should be used on public APIs. If you need common code, use policies (see below). An interface's name must start with an 'I' and the base class for its concrete implementations should start with a 'C' and have the same name, ex: When an interface has multiple implementations, try to stay as close as possible to this rule. Ex: Or, for partial implementation, use something like this: Extend current interfaces only on a 'is-a' approach, not to aggregate functionality. Avoid pollution of public interfaces by having only the public methods on the most-base interface in the header, and internal implementation in the source file. Prefer pImpl idiom (pointer-to-implementation) for functionality-only requirements and policy based design for interface requirements. Example 1: You want to decouple part of the implementation from your class, and this part does not implements the interface your contract requires.: Example2: You want to implement the common part of one (or more) interface(s) in a range of sub-classes.: NOTE: Interfaces deliberately do not contain virtual destructors. This is to help ensure that they are never destroyed by calling delete directly. This means that, to use This interface controls how you Link() and Release() the pointer. This is necessary because in some inner parts of HPCC, the use of a "really smart" smart pointer would add too many links and releases (on temporaries, local variables, members, etc) that could add to a significant performance hit. The CInterface implementation also include a virtual function beforeDispose() which is called before the object is deleted. This allows resources to be cleanly freed up, with the full class hierarchy (including virtual functions) available even when freeing items in base classes. It is often used for caches that do not cause the objects to be retained. MORE: This needs documenting MORE! Requiring more work: * namespaces * STL * c++11 * Review all documentation * Better examples for shared Everyone has their own ideas of what the best code formatting style is, but most would agree that code in a mixture of styles is the worst of all worlds. A consistent coding style makes unfamiliar code easier to understand and navigate. In an ideal world, the HPCC sources would adhere to the coding standards described perfectly. In reality, there are many places that do not. These are being cleaned up as and when we find time. Unlike most software projects around, HPCC has some very specific constraints that makes most basic design decisions difficult, and often the results are odd to developers getting acquainted with its code base. For example, when HPCC was initially developed, most common-place libraries we have today (like STL and Boost) weren't available or stable enough at the time. Also, at the beginning, both C++ and Java were being considered as the language of choice, but development started with C++. So a C++ library that copied most behaviour of the Java standard library (At the time, Java 1.4) was created (see jlib below) to make the transition, if ever taken, easier. The transition never happened, but the decisions were taken and the whole platform is designed on those terms. Most importantly, the performance constraints in HPCC can make no-brainer decisions look impossible in HPCC. One example is the use of traditional smart pointers implementations (such as boost::shared_ptr or C++'s auto_ptr), that can lead to up to 20% performance hit if used instead of our internal shared pointer implementation. The last important point to consider is that some libraries/systems were designed to replace older ones but haven't got replaced yet. There is a slow movement to deprecate old systems in favour of consolidating a few ones as the elected official ways to use HPCC (Thor, Roxie) but old systems still could be used for years in tests or legacy sub-systems. In a nutshell, expect re-implementation of well-known containers and algorithms, expect duplicated functionality of sub-systems and expect to be required to use less-friendly libraries for the sake of performance, stability and longevity. For the most part out coding style conventions match those described at http://geosoft.no/development/cppstyle.html, with a few exceptions or extensions as noted below. We use the extension .cpp for C++ source files, and .h or .hpp for header files. Header files with the .hpp extension should be used for headers that are internal to a single library, while header files with the .h extension should be used for the interface that the library exposes. There will typically be one .h file per library, and one .hpp file per cpp file. Source file names within a single shared library should share a common prefix to aid in identifying where they belong. Header files with extension .ipp (i for internal) and .tpp (t for template) will be phased out in favour of the scheme described above. We adopted a Java-like inheritance model, with macro substitution for the basic Java keywords. This changes nothing on the code, but make it clearer for the reader on what's the recipient of the inheritance doing with it's base. There is no semantic check, which makes it difficult to enforce such scheme, which has led to code not using it intermixed with code using it. You should use it when possible, most importantly on code that already uses it. We also tend to write methods inline, which matches well with C++ Templates requirements. We, however, do not enforce the one-class-per-file rule. See the Interfaces section for more information on our implementation of interfaces. Class and interface names are in CamelCase with a leading capital letter. Interface names should be prefixed capital I followed by another capital. Class names may be prefixed with a C if there is a corresponding I-prefixed interface name, e.g. when the interface is primarily used to create an opaque type, but need not be otherwise. Variables, function and method names, and parameters use camelCase starting with a lower case letter. Parameters may be prefixed with underscore when the parameter is used to initialize a member variable of the same name. Common cases are constructors and setter methods. Example: Use real pointers when you can, and smart pointers when you have to. Take extra care on understanding the needs of your pointers and their scope. Most programs can afford a few dangling pointers, but a high-performance clustering platform cannot. Most importantly, use common sense and a lot of thought. Here are a few guidelines: Warning: Direct manipulation of the ownership might cause Refer to [Reference counted objects]{.title-ref} for more information on our smart pointer implementation, Methods that return pointers to link counted objects, or that use them, should use a common naming standard: We use 4 spaces to indent each level. TAB characters should not be used. The { that starts a new scope and the corresponding } to close it are placed on a new line by themselves, and are not indented. This is sometimes known as the Allman or ANSI style. We generally believe in the philosophy that well written code is self-documenting. Comments are also encouraged to describe why something is done, rather than how - which should be clear from the code. javadoc-formatted comments for classes and interfaces are being added. The virtual keyword should be included on the declaration of all virtual functions - including those in derived classes, and the override keyword should be used on all virtual functions in derived classes. MORE: Update!!! We do not use namespaces. We probably should, following the Google style guide's guidelines - see http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Namespaces We often pretend we are coding in Java and write all our class members inline. The ECL style guide is published separately. We use the commonly accepted conventions for formatting these files. Consistent use of design patterns helps make the code easy to understand. While C++ does not have explicit support for interfaces (in the java sense), an abstract class with no data members and all functions pure virtual can be used in the same way. Interfaces are pure virtual classes. They are similar concepts to Java's interfaces and should be used on public APIs. If you need common code, use policies (see below). An interface's name must start with an 'I' and the base class for its concrete implementations should start with a 'C' and have the same name, ex: When an interface has multiple implementations, try to stay as close as possible to this rule. Ex: Or, for partial implementation, use something like this: Extend current interfaces only on a 'is-a' approach, not to aggregate functionality. Avoid pollution of public interfaces by having only the public methods on the most-base interface in the header, and internal implementation in the source file. Prefer pImpl idiom (pointer-to-implementation) for functionality-only requirements and policy based design for interface requirements. Example 1: You want to decouple part of the implementation from your class, and this part does not implements the interface your contract requires.: Example2: You want to implement the common part of one (or more) interface(s) in a range of sub-classes.: NOTE: Interfaces deliberately do not contain virtual destructors. This is to help ensure that they are never destroyed by calling delete directly. This means that, to use This interface controls how you Link() and Release() the pointer. This is necessary because in some inner parts of HPCC, the use of a "really smart" smart pointer would add too many links and releases (on temporaries, local variables, members, etc) that could add to a significant performance hit. The CInterface implementation also include a virtual function beforeDispose() which is called before the object is deleted. This allows resources to be cleanly freed up, with the full class hierarchy (including virtual functions) available even when freeing items in base classes. It is often used for caches that do not cause the objects to be retained. MORE: This needs documenting MORE! Requiring more work: * namespaces * STL * c++11 * Review all documentation * Better examples for shared The modern tool used for generating all our official assets is the Github Actions build-asset workflow on the hpcc-systems/HPCC-Platform repository, located here. Developers and contributors can utilize this same workflow on their own forked repository. This allows developers to quickly create assets for testing changes and test for errors before the peer review process. Build assets will generate every available project under the HPCC-Platform namespace. There currently is not an option to control which packages in the build matrix get generated. But most packages get built in parallel, and released after the individual matrix job is completed, so there is no waiting on packages you don't need. Exceptions to this are for packages that require other builds to complete, such as the ECLIDE. Upon completion of each step and matrix job in the workflow, the assets will be output to the repositories tags tab. An example for the The build assets workflow requires several repository secrets be available on a developers machine in order to run properly. You can access these secrets and variables by going to the Create a secret by clicking the green To generate the self signed certificate for windows packages, you will need to do the following steps. You will be asked to "enter export password", this will be what goes in the variable SIGNING_CERTIFICATE_PASSPHRASE in Github Actions. On linux: On MacOS: From here you can For linux builds we're going to generate a private key using GnuPG (gpg). Start the process by entering a terminal and run the command; You will be given several options in this process. For type of key, select For keysize, enter For expiration date, select Input your real name. Input your company email address. For comment, input something like Then it will ask you to enter a passphrase for the key, and confirm the passphrase. Do not leave this blank. A key should be output and entered into your gpg keychain. Now we need to export the key for use in the github actions secret. To extract your key run Now open private.pgp, copy all, and go to github actions secrets. Paste the output into the secret "SIGNING_SECRET" The build-asset workflow is kicked off by a tag being pushed to the developers HPCC-Platform repository. Before we push the tag to our HPCC-Platform repository, we will want to have other tags in place if we want LN and ECLIDE builds to function correctly. Suggested tag patterns are If you choose not to tag the LN and ECLIDE builds, the community builds will generate but errors will be thrown for any build utilizing the LN repository. ECLIDE will not even attempt a build unless you are also successfully building LN due to the dependency scheme we use. The 'Baremetal' builds are designed to generate our clienttools targets for windows-2022 and macos-12 distributions. These jobs contain both the COMMUNITY and LN builds. If the LN build is not tagged, the COMMUNITY section of the job will run, and the assets will be uploaded, but the job will fail when it tries to build LN. If you choose to precede your Jira number with Once the LN and ECLIDE repository tags have been created and pushed with the same base branch that your work is based on for the HPCC-Platform, then you are free to push the HPCC-Platform tag which will initiate the build process. The summary of the build-asset workflow can then be viewed for progress, and individual jobs can be selected to check build outputs. Assets from the workflow will be released into the corresponding tag location, either in the HPCC-Platform repository for all community based builds, or the LN repository for any builds containing proprietary plugins. Simply browse to the releases or tag tab of your repository and select the tag name you just built. The assets will show up there as the build completes. An example of this on the hpcc-systems repository is hpcc-systems/HPCC-Platform/releases. The modern tool used for generating all our official assets is the Github Actions build-asset workflow on the hpcc-systems/HPCC-Platform repository, located here. Developers and contributors can utilize this same workflow on their own forked repository. This allows developers to quickly create assets for testing changes and test for errors before the peer review process. Build assets will generate every available project under the HPCC-Platform namespace. There currently is not an option to control which packages in the build matrix get generated. But most packages get built in parallel, and released after the individual matrix job is completed, so there is no waiting on packages you don't need. Exceptions to this are for packages that require other builds to complete, such as the ECLIDE. Upon completion of each step and matrix job in the workflow, the assets will be output to the repositories tags tab. An example for the The build assets workflow requires several repository secrets be available on a developers machine in order to run properly. You can access these secrets and variables by going to the Create a secret by clicking the green To generate the self signed certificate for windows packages, you will need to do the following steps. You will be asked to "enter export password", this will be what goes in the variable SIGNING_CERTIFICATE_PASSPHRASE in Github Actions. On linux: On MacOS: From here you can For linux builds we're going to generate a private key using GnuPG (gpg). Start the process by entering a terminal and run the command; You will be given several options in this process. For type of key, select For keysize, enter For expiration date, select Input your real name. Input your company email address. For comment, input something like Then it will ask you to enter a passphrase for the key, and confirm the passphrase. Do not leave this blank. A key should be output and entered into your gpg keychain. Now we need to export the key for use in the github actions secret. To extract your key run Now open private.pgp, copy all, and go to github actions secrets. Paste the output into the secret "SIGNING_SECRET" The build-asset workflow is kicked off by a tag being pushed to the developers HPCC-Platform repository. Before we push the tag to our HPCC-Platform repository, we will want to have other tags in place if we want LN and ECLIDE builds to function correctly. Suggested tag patterns are If you choose not to tag the LN and ECLIDE builds, the community builds will generate but errors will be thrown for any build utilizing the LN repository. ECLIDE will not even attempt a build unless you are also successfully building LN due to the dependency scheme we use. The 'Baremetal' builds are designed to generate our clienttools targets for windows-2022 and macos-12 distributions. These jobs contain both the COMMUNITY and LN builds. If the LN build is not tagged, the COMMUNITY section of the job will run, and the assets will be uploaded, but the job will fail when it tries to build LN. If you choose to precede your Jira number with Once the LN and ECLIDE repository tags have been created and pushed with the same base branch that your work is based on for the HPCC-Platform, then you are free to push the HPCC-Platform tag which will initiate the build process. The summary of the build-asset workflow can then be viewed for progress, and individual jobs can be selected to check build outputs. Assets from the workflow will be released into the corresponding tag location, either in the HPCC-Platform repository for all community based builds, or the LN repository for any builds containing proprietary plugins. Simply browse to the releases or tag tab of your repository and select the tag name you just built. The assets will show up there as the build completes. An example of this on the hpcc-systems repository is hpcc-systems/HPCC-Platform/releases. We release a new version of the platform every 3 months. If there are major changes in functionality, or significant backward compatibility issues then it will be tagged as a new major version, otherwise a new minor version. We normally maintain 4 versions of the system, which means that each new release will typically be supported for a year. Once a new major or minor version has been tagged gold it should not have any changes that change the behavior of queries. Which versions should changes be applied to? The following gives some examples of the types of changes and which version they would be most relevant to target. "master": "<current>": "<previous>": "<critical>" fixes only: "<security>" fixes only: Occasionally earlier branches will be chosen, (e.g. security fixes to even older versions) but they should always be carefully discussed (and documented). We aim to produce new point releases once a week. The point releases will contain a) Any changes to the code base for that branch. b) Any security fixes for libraries that are project dependencies. We will upgrade to the latest point release for the library that fixes the security issue. c) For the cloud any security fixes in the base image or the packages installed in that image. If there are no changes in any of those areas for a particular version then a new point release will not be created. If you are deploying a system to the cloud you have one of two options a) Use the images that are automatically built and published as part of the build pipeline. This image is currently based on ubuntu 22.04 and contains the majority of packages users will require. b) Use your own hardened base image, and install the containerized package that we publish into that image. We currently generate the following versions of the package and images: It is recommended that you deploy the "release with symbols" version to all bare-metal and non-production cloud deployments. The extra symbols allow the system to generate stack backtraces which make it much easier to diagnose problems if they occur. The "release without symbols" version is recommended for Kubernetes production deployments. Deploying a system without symbols reduces the size of the images. This reduces the time it takes Kubernetes to copy the image before provisioning a new node. We release a new version of the platform every 3 months. If there are major changes in functionality, or significant backward compatibility issues then it will be tagged as a new major version, otherwise a new minor version. We normally maintain 4 versions of the system, which means that each new release will typically be supported for a year. Once a new major or minor version has been tagged gold it should not have any changes that change the behavior of queries. Which versions should changes be applied to? The following gives some examples of the types of changes and which version they would be most relevant to target. "master": "<current>": "<previous>": "<critical>" fixes only: "<security>" fixes only: Occasionally earlier branches will be chosen, (e.g. security fixes to even older versions) but they should always be carefully discussed (and documented). We aim to produce new point releases once a week. The point releases will contain a) Any changes to the code base for that branch. b) Any security fixes for libraries that are project dependencies. We will upgrade to the latest point release for the library that fixes the security issue. c) For the cloud any security fixes in the base image or the packages installed in that image. If there are no changes in any of those areas for a particular version then a new point release will not be created. If you are deploying a system to the cloud you have one of two options a) Use the images that are automatically built and published as part of the build pipeline. This image is currently based on ubuntu 22.04 and contains the majority of packages users will require. b) Use your own hardened base image, and install the containerized package that we publish into that image. We currently generate the following versions of the package and images: It is recommended that you deploy the "release with symbols" version to all bare-metal and non-production cloud deployments. The extra symbols allow the system to generate stack backtraces which make it much easier to diagnose problems if they occur. The "release without symbols" version is recommended for Kubernetes production deployments. Deploying a system without symbols reduces the size of the images. This reduces the time it takes Kubernetes to copy the image before provisioning a new node. A workunit contains all the information that the system requires about a query - including the parameters it takes, how to execute it, and how to format the results. Understanding the contents of a workunit is a key step to understanding how the HPCC system fits together. This document begins with an overview of the different elements in a workunit. That is then followed by a walk-through executing a simple query, with a more detailed description of some of the workunit components to show how they all tie together. Before looking at the contents of a workunit it is important to understand one of the design goals behind the HPCC system. The HPCC system logically splits the code that is needed to execute a query in two. On the one hand there are the algorithms that are used to perform different dataset operations (e.g. sorting, deduping). The same algorithms are used by all the queries that execute on the system. On the other hand there is the meta-data that describes the columns present in the datasets, which columns you need to sort by, and the order of operations required by the query. These are typically different for each query. This "meta-data" includes generated compare functions, classes that describe the record formats, serialized data and graphs. A workunit only contains data and code relating to the meta data for the query, i.e. "what to do", while the different engines (hthor, roxie, and thor) implement the algorithms - "how to do it". If you look at a workunit for an ECL query that sorts a dataset you will not find code to perform the sort itself in the workunit - you will not even find a call to a sort library function - that logic is contained in the engine that executes the query. One consequence of this split, which can be initially confusing, is that execution continually passes back and forth between the generated code and the engines. By the end of this document you should have a better understanding of how the generated code is structured and the part it plays in executing a query. Note the term "Query" is used as a generic term to cover read-only queries (typically run in roxie) and ETL (Extract, Transform, and Load) style queries that create lots of persistent datafiles (typically run in Thor). Also, the term "workunit" is used ambiguously. The dll created from a query is called a workunit (which is static), but "workunit" is also used to describe a single execution of a query (which includes the parameters and results). It should be clear from the context which of these is meant. Throughout this document "dll" is a generic term used to refer to a dynamically loaded library. These correspond to shared objects in Linux (extension '.so'), dynamic libraries in Max OS X ('.dylib'), and dynamic link libraries in windows ('.dll'). A workunit is generated by the ecl compiler, and consists of a single dll. That dll contains several different elements: A workunit dll contains everything that the engines need to execute the query. When a workunit is executed, key elements of the xml information are cloned from the workunit dll and copied into a database. This is then augmented with other information as the query is executed - e.g. input parameters, results, statistics, etc.. The contents of the workunit are accessed through an "IWorkUnit" interface (defined in common/workunit/workunit.hpp) that hides the implementation details. (Workunit information is currently stored in the Dali database - one of the components within the HPCC platform. Work is in-progress to allow the bulk of this workunit data to be stored in Cassandra or another third-party database instead.) The workunit information is used by most of the components in the system. The following is a quick outline: eclcc Creates a workunit dll from an ecl query. eclccserver Executes eclcc to create a workunit dll, and then clones some of the information into dali to create an active instance, ready to execute. esp Uses information in the workunit dll to publish workunits. This includes details of the parameters that the query takes, how they should be formatted, and the results it returns. eclscheduler Monitors workunits that are waiting for events, and updates them when those events occur. eclagent/Roxie Process the different workflow actions, and workflow code. hThor/Roxie/Thor Execute graphs within the workflow items. Dali This database is used to store the state of the workunit state. The following ECL will be used as an example for the rest of the discussion. It is a very simple search that takes a string parameter 'searchName', which is the name of the person to search for, and returns the matching records from an index. It also outputs the word 'Done!' as a separate result. Extracts from the XML and C++ that are generated from this example will be included in the following discussion. This section outlines the different sections in a workunit. This is followed by a walk-through of the stages that take place when a workunit is executed, together with a more detailed explanation of the workunit contents. The workflow is the highest level of control within a workunit. It is used for two related purposes: Each piece of independent ECL is given a unique workflow id (wfid). Often workflow items need to be executed in a particular order, e.g. ensuring a persist exists before using it, which is managed with dependencies between different workflow items. Our example above generates the following XML entry in the workunit: This contains two workflow items. The first workflow item (wfid=1) ensures that the stored value has a default value if it has not been supplied. The second item (with wfid=2) is the main code for the query. This has a dependency on the first workflow item because the stored variable needs to be intialised before it is executed. The generated code contains a class instance that is used for executing the code associated with the workflow items. It is generated at the end of the main C++ module. E.g.: The main element is a switch statement inside the perform() function that allows the workflow engines to execute the code associated with a particular workflow item. There is also an associated factory function that is exported from the dll, and is used by the engines to create instances of the class: Most of the work executing a query involves processing dataset operations, which are implemented as a graph of activities. Each graph is represented in the workunit as an xml graph structure (currently it uses the xgmml format). The graph xml includes details of which types of activities are required to be executed, how they are linked together, and any other dependencies. The graph in our example is particularly simple: This graph contains a single subgraph (node id=2) that contains two activities - an index read activity and an output result activity. These activities are linked by a single edge (id "2_0"). The details of the contents are covered in the section on executing graphs below. Each activity has a corresponding class instance in the generated code, and a factory function for creating instances of that class: The helper class for an activity implements the interface that is required for that particular kind. (The interfaces are defined in rtl/include/eclhelper.hpp - further details below.) The are several other items, detailed below, that are logically associated with a workunit. The information may be stored in the workunit dll or in various other location e.g. Dali, Sasha or Cassandra. It is all accessed through the IWorkUnit interface in common/workunit/workunit.hpp that hides the implementation details. For instance information generated at runtime cannot by definition be included in the workunit dll. Options that are supplied to eclcc via the -f command line option, or the #option statement are included in the <Debug> section of the workunit xml: Note, the names of workunit options are case insensitive, and converted to lower case. Many queries contain input parameters that modify their behaviour. These correspond to STORED definitions in the ECL. Our example contains a single string "searchName", so the workunit contains a single input parameter: The implementation details of the schema information is encapsulated by the IConstWUResult interface in workunit.hpp. The workunit xml also contains details of each result that the query generates, including a serialized description of the output record format: in our example there are two - the dataset of results and the text string "Done!". The values of the results for a query are associated with the workunit. (They are currently saved in dali, but this may change in version 6.0.) Any timings generated when compiling the query are included in the workunit dll: Other statistics and timings created when running the query are stored in the runtime copy of the workunit. (Statistics for graph elements are stored in a different format from global statistics, but the IWorkUnit interface ensures the implementation details are hidden.) It is possible to include other user-defined resources in the workunit dll - e.g. web pages, or dashboard layouts. I have to confess I do not understand them... ??Tony please provide some more information....! Once a workunit has been compiled to a dll it is ready to be executed. Execution can be triggered in different ways, E.g.: Most queries create persistent workunits in dali and then update those workunits with results as they are calculated, however for some roxie queries (e.g. in a production system) the execution workunits are transient. The following walk-through details the main stages executing a query, and the effect each of the query elements has. The system uses several inter-process queues to communicate between the different components in the system. These queues are implemented by dali. Components can subscribe to one or more queues, and receive notifications when entries are avaialable. Some example queues are: When a workunit is ready to be run, the workflow controls the flow of execution. The workflow engine (defined in common/workunit/workflow.cpp) is responsible for determining which workflow item should be executed next. The workflow for Thor and hThor jobs is coordinated by eclagent, while roxie includes the workflow engine in its process. The eclscheduler also uses the workflow engine to process events and mark workflow items ready for execution. eclagent, or roxie calls the createProcess() function from the workunit dll to create an instance of the generated workflow helper class, and passes it to the workflow engine. The workflow engine walks the workflow items to find any items that are ready to be executed (have the state "reqd" - i.e. required). If a required workflow item has dependencies on other child workflow items then those children are executed first. Once all dependencies have executed successfully the parent workflow item is executed. The example has the following workflow entries: Item 2 has a state of "reqd", so it should be evaluated now. Item 2 has a dependency on item 1, so that must be evaluated first. This is achieved by calling MyEclProcess::perform() on the object that was previously created from the workunit dll, passing in wfid = 1. That will execute the following code: This checks if a value has been provided for the input parameter, and if not assigns a default value of "Smith". The function returns control to the workflow engine. With the dependencies for wfid 2 now satisfied, the generated code for that workflow id is now executed: Most of the work for this workflow item involves executing graph1 (by calling back into eclagent/roxie). However, the code also directly sets another result. This is fairly typical - the code inside MyProcess::perform often combines evaluating scalar results, executing graphs, and calling functions that cannot (currently) be called inside a graph (e.g. those involving superfile transactions). Once all of the required workflow items are executed, the workunit is marked as completed. Alternatively, if there are workflow items that are waiting to be triggered by an event, the workunit will be passed to the scheduler, which will keep monitoring for events. Note that most items in the xml workflow begin in the state WFStateNull. This means that it is valid to execute them, but they haven't been executed yet. Typically, only a few items begin with the state WFStateReqd. There are various specialised types of workflow items - e.g. sequential, wait, independent, but they all follow the same basic approach of executing dependencies and then executing that particular item. Most of the interesting work in an ECL query is done within a graph. The call ctx->executeGraph will either execute the graph locally (in the case of hthor and roxie), or add the workunit onto a queue (for Thor). Whichever happens, control will pass to that engine. Each item mode/type can affect the dependency structure of the workflow: Sequential/Ordered The workflow structure for sequential and ordered is the same. An item is made to contain all of the actions in the statement. This is achieved by making each action a dependency of this item. The dependencies, and consequently the actions, must be executed in order. An extra item is always inserted before each dependency. This means that if other statements reference the same dependency, it will only be performed once. Persist When the persist workflow service is used, two items are created. One item contains the graphs that perform the expression defined in ECL. It also stores the wfid for the second item. The second item is used to determine whether the persist is up to date. Condition (IF) The IF function has either 2 or 3 arguments: the expression, the trueresult, and sometimes the falseresult. For each argument, a workflow item is created. These items are stored as dependencies to the condition, in the order stated above. Contingency (SUCCESS/FAILURE) When a contingency clause is defined for an attribute, the attribute item stores the wfid of the contingency. If both success and failure are used, then the item will store the wfid of each contingency. The contingency is composed of items, just like the larger query. Recovery When a workflow item fails, if it has a recovery clause, the item will be re-executed. The clause contains actions defined by the programmer to remedy the problem. This clause is stored differently to SUCCESS/FAILURE, in that the recovery clause is a dependency of the failed item. In order to stop the recovery clause from being executed like the other dependencies, it is marked with WFStateSkip. Independent This specifier is used when a programmer wants to common up code for the query. It prevents the same piece of code from being executed twice in different contexts. To achieve this, an extra item is inserted between the expression and whichever items depend on it. This means that although the attribute can be referenced several times, it will only be executed once. All the engines (roxie, hThor, Thor) execute graphs in a very similar way. The main differences are that hThor and Thor execute a sub graph at a time, while roxie executes a complete graph as one. Roxie is also optimized to minimize the overhead of executing a query - since the same query tends to be run multiple times. This means that roxie creates a graph of factory objects and those are then used to create the activities. The core details are the same for each of them though. First, a recap of the structure of the graph together with the full xml for the graph definition in our example: Each graph (e.g. graph1) consists of 1 or more subgraphs (in the example above, node id=1). Each of those subgraphs contains 1 or more activities (node id=2, node id=3). The xml for each activity might contain the following information: Graphs also contain edges that can take one of 3 forms: Edges within graphs : These are used to indicate how the activities are connected. The source activity is used as the input to the target activity. These edges have the following format: Edges between graphs : These are used to indicate direct dependencies between activities. For instance there will be an edge connecting the activity that writes a spill file to the activity that reads it. These edges have the following format: Other dependencies. : These are similar to the edges between graphs, but they are used for values that are used within an activity. For instance one part of the graph may calculate the maximum value of a column, and another activity may filter out all records that do not match that maximum value. The format is the same as the edges between graphs except that the edge contains the following attribute: Each activity in a graph also has a corresponding helper class instance in the generated code. (The name of the class is "cAc" followed by the activity number, and the exported factory method is "fAc" followed by the activity number.) Each helper class implements a specialised interface (derived from IHThorArg) - the particular interface is determined by the value of the "_kind" attribute for the activity. The contents of file rtl/include/eclhelper.hpp is key to understanding how the generated code relates to the activities. Each kind of activity requires a helper class that implements a specific interface. The helpers allow the engine to tailor the generalised activity implementation to the the particular instance - e.g. for a filter activity whether a row should be included or excluded. The appendix at the end of this document contains some further information about this file. The classes in the generated workunits are normally derived from base implementations of those interfaces (which are implemented in rtl/include/eclhelper_base.hpp). This reduces the size of the generated code by providing default implementations for various functions. For instance the helper for the index read (activity 2) is defined as: Some of the methods to highlight are: a) onCreate() - common to all activities. It is called by the engine when the helper is first created, and allows the helper to cache information that does not change - in this case the name that is being searched for. b) getFileName() - determines the name of the index being read. c) createSegmentMonitors() - defines which columns to filter, and which values to match against. d) transform() - create the record to return from the activity. It controls which columns should be included from the index row in the output. (In this case all.) To execute a graph, the engine walks the activities in the graph xml and creates, in memory, a graph of implementation activities. For each activity, the name of the helper factory is calculated from the activity number (e.g. fAc2 for activity 2). That function is imported from the loaded dll, and then called to create an instance of the generated helper class - in this case cAc2. The engine then creates an instance of the class for implementing the activity, and passes the previously created helper object to the constructor. The engine uses the _kind attribute in the graph to determine which activity class should be used. E.g. In the example above activity 2 has a _kind of 77, which corresponds to TAKindexread. For an index-read activity roxie will create an instance of CRoxieServerIndexReadActivity. (The generated helper that is passed to the activity instance will implement IHThorIndexReadArg). The activity implementations may also extract other information from the xml for the activity - e.g. hints. Once all the activities are created the edge information is used to link inputs activities to output activities and add other dependencies. Note: Any subgraph that is marked with Executing a graph involves executing all the root subgraphs that it contains. All dependencies of the activities within the subgraph must be executed before a subgraph is executed. To execute a subgraph, the engine executes each of the sink activities on separate threads, and then waits for each of those threads to complete. Each sink activity lazily pulls input rows on demand from activities further up the graph, processes them and returns when complete. (If you examine the code you will find that this is a simplification. The implementation for processing dependencies is more fine grained to ensure IF datasets, OUPUT(,UPDATE) and other ECL constructs are executed correctly.) In our example the execution flows as follows: The execution generally switches back and forth between the code in the engines, and the members of the generated helper classes. There are some other details of query execution that are worth highlighting: Child Queries : Some activities perform complicated operations on child datasets of the input rows. E.g. remove all duplicate people who are marked as living at this address. This will create a "child query" in the graph - i.e. a nested graph within a subgraph, which may be executed each time a new input row is processed by the containing activity. (The graph of activities for each child query is created at the same time as the parent activity. The activity instances are reinitialised and re-executed for each input row processed by the parent activity to minimise the create-time overhead.) Other helper functions : The generated code contains other functions that are used to describe the meta information for the rows processed within the graph. E.g. the following class describes the output from the index read activity: Inline dataset operations : The rule mentioned at the start - that the generated code does not contain any knowledge of how to perform a particular dataset operation - does have one notable exception. Some operations on child datasets are very simple to implement, and more efficient if they are implemented using inline C++ code. (The generated code is smaller, and they avoid the overhead of setting up a child graph.) Examples include filtering and aggregating column values from a child dataset. The full code in the different engines is more complicated than the simplified process outlined above, especially when it comes to executing dependencies, but the broad outline is the same. More information on the work done in the code generator to create the workunit can be found in ecl/eclcc/DOCUMENTATION.rst. The C++ code can be generated as a single C++ file or multiple files. The system defaults to multiple C++ files, so that they can be compiled in parallel (and to avoid problems some compilers have with very large files). When multipe C++ files are generated the metadata classes and workflow classes are generated in the main module, and the activities are generated in the files suffixed with a number. It may be easier to understand the generated code if it is in one place. In which case, compile your query with the option -fspanMultipleCpp=0. Use -fsaveCppTempFiles to ensure the C++ files are not deleted (the C++ files will appear as helpers in the workunit details). IEclProcess : The interface that is used by the workflow engine to execute the different workflow items in the generated code. ThorActivityKind : This enumeration contains one entry for each activity supported by the engines. ICodeContext : This interface is implemented by the engine, and provides a mechanism for the generated code to call back into the engine. For example resolveChildQuery() is used to obtain a reference to a child query that can then be executed later. IOutputMetaData : This interface is used to describe any meta data associated with the data being processed by the queries. IHThorArg : The base interface for defining information about an activity. Each activity has an associated interface that is derived from this interface. E.g. each instance of the sort activity will have a helper class implementing IHThorSortArg in the generated query. There is normally a corresponding base class for each interface in eclhelper_base.hpp that is used by the generated code e.g. CThorSortArg. ARowBuilder : This abstract base class is used by the transform functions to reserve memory for the rows that are created. IEngineRowAllocator : Used by the generated code to allocate rows and rowsets. Can also be used to release rows (or call the global function rtlReleaseRow()). IGlobalCodeContext : Provides access to functions that cannot be called inside a graph - i.e. can only be called from the global workflow code. Most functions are related to the internal implementation of particular workflow item types (e.g. persists). activity : An activity is the basic unit of dataset processing implemented by the engines. Each activity corresponds to a node in the thor execution graph. Instances of the activities are connnected together to create the graph. dll : A dynamically loaded library. These correspond to shared objects in Linux (extension '.so'), dynamic libraries in Max OS X ('.dylib'), and dynamic link libraries in windows ('.dll'). superfile : A composite file which allows a collection of files to be treated as a single compound file. ?What else should go here? The XML for a workunit can be viewed on the XML tag in eclwatch, or generated by compiling the ECL using the -wu option with eclcc. Alternatively eclcc -b -S can be used to generate the XML and the C++ at the same time (the output filenames are derived from the input name). A workunit contains all the information that the system requires about a query - including the parameters it takes, how to execute it, and how to format the results. Understanding the contents of a workunit is a key step to understanding how the HPCC system fits together. This document begins with an overview of the different elements in a workunit. That is then followed by a walk-through executing a simple query, with a more detailed description of some of the workunit components to show how they all tie together. Before looking at the contents of a workunit it is important to understand one of the design goals behind the HPCC system. The HPCC system logically splits the code that is needed to execute a query in two. On the one hand there are the algorithms that are used to perform different dataset operations (e.g. sorting, deduping). The same algorithms are used by all the queries that execute on the system. On the other hand there is the meta-data that describes the columns present in the datasets, which columns you need to sort by, and the order of operations required by the query. These are typically different for each query. This "meta-data" includes generated compare functions, classes that describe the record formats, serialized data and graphs. A workunit only contains data and code relating to the meta data for the query, i.e. "what to do", while the different engines (hthor, roxie, and thor) implement the algorithms - "how to do it". If you look at a workunit for an ECL query that sorts a dataset you will not find code to perform the sort itself in the workunit - you will not even find a call to a sort library function - that logic is contained in the engine that executes the query. One consequence of this split, which can be initially confusing, is that execution continually passes back and forth between the generated code and the engines. By the end of this document you should have a better understanding of how the generated code is structured and the part it plays in executing a query. Note the term "Query" is used as a generic term to cover read-only queries (typically run in roxie) and ETL (Extract, Transform, and Load) style queries that create lots of persistent datafiles (typically run in Thor). Also, the term "workunit" is used ambiguously. The dll created from a query is called a workunit (which is static), but "workunit" is also used to describe a single execution of a query (which includes the parameters and results). It should be clear from the context which of these is meant. Throughout this document "dll" is a generic term used to refer to a dynamically loaded library. These correspond to shared objects in Linux (extension '.so'), dynamic libraries in Max OS X ('.dylib'), and dynamic link libraries in windows ('.dll'). A workunit is generated by the ecl compiler, and consists of a single dll. That dll contains several different elements: A workunit dll contains everything that the engines need to execute the query. When a workunit is executed, key elements of the xml information are cloned from the workunit dll and copied into a database. This is then augmented with other information as the query is executed - e.g. input parameters, results, statistics, etc.. The contents of the workunit are accessed through an "IWorkUnit" interface (defined in common/workunit/workunit.hpp) that hides the implementation details. (Workunit information is currently stored in the Dali database - one of the components within the HPCC platform. Work is in-progress to allow the bulk of this workunit data to be stored in Cassandra or another third-party database instead.) The workunit information is used by most of the components in the system. The following is a quick outline: eclcc Creates a workunit dll from an ecl query. eclccserver Executes eclcc to create a workunit dll, and then clones some of the information into dali to create an active instance, ready to execute. esp Uses information in the workunit dll to publish workunits. This includes details of the parameters that the query takes, how they should be formatted, and the results it returns. eclscheduler Monitors workunits that are waiting for events, and updates them when those events occur. eclagent/Roxie Process the different workflow actions, and workflow code. hThor/Roxie/Thor Execute graphs within the workflow items. Dali This database is used to store the state of the workunit state. The following ECL will be used as an example for the rest of the discussion. It is a very simple search that takes a string parameter 'searchName', which is the name of the person to search for, and returns the matching records from an index. It also outputs the word 'Done!' as a separate result. Extracts from the XML and C++ that are generated from this example will be included in the following discussion. This section outlines the different sections in a workunit. This is followed by a walk-through of the stages that take place when a workunit is executed, together with a more detailed explanation of the workunit contents. The workflow is the highest level of control within a workunit. It is used for two related purposes: Each piece of independent ECL is given a unique workflow id (wfid). Often workflow items need to be executed in a particular order, e.g. ensuring a persist exists before using it, which is managed with dependencies between different workflow items. Our example above generates the following XML entry in the workunit: This contains two workflow items. The first workflow item (wfid=1) ensures that the stored value has a default value if it has not been supplied. The second item (with wfid=2) is the main code for the query. This has a dependency on the first workflow item because the stored variable needs to be intialised before it is executed. The generated code contains a class instance that is used for executing the code associated with the workflow items. It is generated at the end of the main C++ module. E.g.: The main element is a switch statement inside the perform() function that allows the workflow engines to execute the code associated with a particular workflow item. There is also an associated factory function that is exported from the dll, and is used by the engines to create instances of the class: Most of the work executing a query involves processing dataset operations, which are implemented as a graph of activities. Each graph is represented in the workunit as an xml graph structure (currently it uses the xgmml format). The graph xml includes details of which types of activities are required to be executed, how they are linked together, and any other dependencies. The graph in our example is particularly simple: This graph contains a single subgraph (node id=2) that contains two activities - an index read activity and an output result activity. These activities are linked by a single edge (id "2_0"). The details of the contents are covered in the section on executing graphs below. Each activity has a corresponding class instance in the generated code, and a factory function for creating instances of that class: The helper class for an activity implements the interface that is required for that particular kind. (The interfaces are defined in rtl/include/eclhelper.hpp - further details below.) The are several other items, detailed below, that are logically associated with a workunit. The information may be stored in the workunit dll or in various other location e.g. Dali, Sasha or Cassandra. It is all accessed through the IWorkUnit interface in common/workunit/workunit.hpp that hides the implementation details. For instance information generated at runtime cannot by definition be included in the workunit dll. Options that are supplied to eclcc via the -f command line option, or the #option statement are included in the <Debug> section of the workunit xml: Note, the names of workunit options are case insensitive, and converted to lower case. Many queries contain input parameters that modify their behaviour. These correspond to STORED definitions in the ECL. Our example contains a single string "searchName", so the workunit contains a single input parameter: The implementation details of the schema information is encapsulated by the IConstWUResult interface in workunit.hpp. The workunit xml also contains details of each result that the query generates, including a serialized description of the output record format: in our example there are two - the dataset of results and the text string "Done!". The values of the results for a query are associated with the workunit. (They are currently saved in dali, but this may change in version 6.0.) Any timings generated when compiling the query are included in the workunit dll: Other statistics and timings created when running the query are stored in the runtime copy of the workunit. (Statistics for graph elements are stored in a different format from global statistics, but the IWorkUnit interface ensures the implementation details are hidden.) It is possible to include other user-defined resources in the workunit dll - e.g. web pages, or dashboard layouts. I have to confess I do not understand them... ??Tony please provide some more information....! Once a workunit has been compiled to a dll it is ready to be executed. Execution can be triggered in different ways, E.g.: Most queries create persistent workunits in dali and then update those workunits with results as they are calculated, however for some roxie queries (e.g. in a production system) the execution workunits are transient. The following walk-through details the main stages executing a query, and the effect each of the query elements has. The system uses several inter-process queues to communicate between the different components in the system. These queues are implemented by dali. Components can subscribe to one or more queues, and receive notifications when entries are avaialable. Some example queues are: When a workunit is ready to be run, the workflow controls the flow of execution. The workflow engine (defined in common/workunit/workflow.cpp) is responsible for determining which workflow item should be executed next. The workflow for Thor and hThor jobs is coordinated by eclagent, while roxie includes the workflow engine in its process. The eclscheduler also uses the workflow engine to process events and mark workflow items ready for execution. eclagent, or roxie calls the createProcess() function from the workunit dll to create an instance of the generated workflow helper class, and passes it to the workflow engine. The workflow engine walks the workflow items to find any items that are ready to be executed (have the state "reqd" - i.e. required). If a required workflow item has dependencies on other child workflow items then those children are executed first. Once all dependencies have executed successfully the parent workflow item is executed. The example has the following workflow entries: Item 2 has a state of "reqd", so it should be evaluated now. Item 2 has a dependency on item 1, so that must be evaluated first. This is achieved by calling MyEclProcess::perform() on the object that was previously created from the workunit dll, passing in wfid = 1. That will execute the following code: This checks if a value has been provided for the input parameter, and if not assigns a default value of "Smith". The function returns control to the workflow engine. With the dependencies for wfid 2 now satisfied, the generated code for that workflow id is now executed: Most of the work for this workflow item involves executing graph1 (by calling back into eclagent/roxie). However, the code also directly sets another result. This is fairly typical - the code inside MyProcess::perform often combines evaluating scalar results, executing graphs, and calling functions that cannot (currently) be called inside a graph (e.g. those involving superfile transactions). Once all of the required workflow items are executed, the workunit is marked as completed. Alternatively, if there are workflow items that are waiting to be triggered by an event, the workunit will be passed to the scheduler, which will keep monitoring for events. Note that most items in the xml workflow begin in the state WFStateNull. This means that it is valid to execute them, but they haven't been executed yet. Typically, only a few items begin with the state WFStateReqd. There are various specialised types of workflow items - e.g. sequential, wait, independent, but they all follow the same basic approach of executing dependencies and then executing that particular item. Most of the interesting work in an ECL query is done within a graph. The call ctx->executeGraph will either execute the graph locally (in the case of hthor and roxie), or add the workunit onto a queue (for Thor). Whichever happens, control will pass to that engine. Each item mode/type can affect the dependency structure of the workflow: Sequential/Ordered The workflow structure for sequential and ordered is the same. An item is made to contain all of the actions in the statement. This is achieved by making each action a dependency of this item. The dependencies, and consequently the actions, must be executed in order. An extra item is always inserted before each dependency. This means that if other statements reference the same dependency, it will only be performed once. Persist When the persist workflow service is used, two items are created. One item contains the graphs that perform the expression defined in ECL. It also stores the wfid for the second item. The second item is used to determine whether the persist is up to date. Condition (IF) The IF function has either 2 or 3 arguments: the expression, the trueresult, and sometimes the falseresult. For each argument, a workflow item is created. These items are stored as dependencies to the condition, in the order stated above. Contingency (SUCCESS/FAILURE) When a contingency clause is defined for an attribute, the attribute item stores the wfid of the contingency. If both success and failure are used, then the item will store the wfid of each contingency. The contingency is composed of items, just like the larger query. Recovery When a workflow item fails, if it has a recovery clause, the item will be re-executed. The clause contains actions defined by the programmer to remedy the problem. This clause is stored differently to SUCCESS/FAILURE, in that the recovery clause is a dependency of the failed item. In order to stop the recovery clause from being executed like the other dependencies, it is marked with WFStateSkip. Independent This specifier is used when a programmer wants to common up code for the query. It prevents the same piece of code from being executed twice in different contexts. To achieve this, an extra item is inserted between the expression and whichever items depend on it. This means that although the attribute can be referenced several times, it will only be executed once. All the engines (roxie, hThor, Thor) execute graphs in a very similar way. The main differences are that hThor and Thor execute a sub graph at a time, while roxie executes a complete graph as one. Roxie is also optimized to minimize the overhead of executing a query - since the same query tends to be run multiple times. This means that roxie creates a graph of factory objects and those are then used to create the activities. The core details are the same for each of them though. First, a recap of the structure of the graph together with the full xml for the graph definition in our example: Each graph (e.g. graph1) consists of 1 or more subgraphs (in the example above, node id=1). Each of those subgraphs contains 1 or more activities (node id=2, node id=3). The xml for each activity might contain the following information: Graphs also contain edges that can take one of 3 forms: Edges within graphs : These are used to indicate how the activities are connected. The source activity is used as the input to the target activity. These edges have the following format: Edges between graphs : These are used to indicate direct dependencies between activities. For instance there will be an edge connecting the activity that writes a spill file to the activity that reads it. These edges have the following format: Other dependencies. : These are similar to the edges between graphs, but they are used for values that are used within an activity. For instance one part of the graph may calculate the maximum value of a column, and another activity may filter out all records that do not match that maximum value. The format is the same as the edges between graphs except that the edge contains the following attribute: Each activity in a graph also has a corresponding helper class instance in the generated code. (The name of the class is "cAc" followed by the activity number, and the exported factory method is "fAc" followed by the activity number.) Each helper class implements a specialised interface (derived from IHThorArg) - the particular interface is determined by the value of the "_kind" attribute for the activity. The contents of file rtl/include/eclhelper.hpp is key to understanding how the generated code relates to the activities. Each kind of activity requires a helper class that implements a specific interface. The helpers allow the engine to tailor the generalised activity implementation to the the particular instance - e.g. for a filter activity whether a row should be included or excluded. The appendix at the end of this document contains some further information about this file. The classes in the generated workunits are normally derived from base implementations of those interfaces (which are implemented in rtl/include/eclhelper_base.hpp). This reduces the size of the generated code by providing default implementations for various functions. For instance the helper for the index read (activity 2) is defined as: Some of the methods to highlight are: a) onCreate() - common to all activities. It is called by the engine when the helper is first created, and allows the helper to cache information that does not change - in this case the name that is being searched for. b) getFileName() - determines the name of the index being read. c) createSegmentMonitors() - defines which columns to filter, and which values to match against. d) transform() - create the record to return from the activity. It controls which columns should be included from the index row in the output. (In this case all.) To execute a graph, the engine walks the activities in the graph xml and creates, in memory, a graph of implementation activities. For each activity, the name of the helper factory is calculated from the activity number (e.g. fAc2 for activity 2). That function is imported from the loaded dll, and then called to create an instance of the generated helper class - in this case cAc2. The engine then creates an instance of the class for implementing the activity, and passes the previously created helper object to the constructor. The engine uses the _kind attribute in the graph to determine which activity class should be used. E.g. In the example above activity 2 has a _kind of 77, which corresponds to TAKindexread. For an index-read activity roxie will create an instance of CRoxieServerIndexReadActivity. (The generated helper that is passed to the activity instance will implement IHThorIndexReadArg). The activity implementations may also extract other information from the xml for the activity - e.g. hints. Once all the activities are created the edge information is used to link inputs activities to output activities and add other dependencies. Note: Any subgraph that is marked with Executing a graph involves executing all the root subgraphs that it contains. All dependencies of the activities within the subgraph must be executed before a subgraph is executed. To execute a subgraph, the engine executes each of the sink activities on separate threads, and then waits for each of those threads to complete. Each sink activity lazily pulls input rows on demand from activities further up the graph, processes them and returns when complete. (If you examine the code you will find that this is a simplification. The implementation for processing dependencies is more fine grained to ensure IF datasets, OUPUT(,UPDATE) and other ECL constructs are executed correctly.) In our example the execution flows as follows: The execution generally switches back and forth between the code in the engines, and the members of the generated helper classes. There are some other details of query execution that are worth highlighting: Child Queries : Some activities perform complicated operations on child datasets of the input rows. E.g. remove all duplicate people who are marked as living at this address. This will create a "child query" in the graph - i.e. a nested graph within a subgraph, which may be executed each time a new input row is processed by the containing activity. (The graph of activities for each child query is created at the same time as the parent activity. The activity instances are reinitialised and re-executed for each input row processed by the parent activity to minimise the create-time overhead.) Other helper functions : The generated code contains other functions that are used to describe the meta information for the rows processed within the graph. E.g. the following class describes the output from the index read activity: Inline dataset operations : The rule mentioned at the start - that the generated code does not contain any knowledge of how to perform a particular dataset operation - does have one notable exception. Some operations on child datasets are very simple to implement, and more efficient if they are implemented using inline C++ code. (The generated code is smaller, and they avoid the overhead of setting up a child graph.) Examples include filtering and aggregating column values from a child dataset. The full code in the different engines is more complicated than the simplified process outlined above, especially when it comes to executing dependencies, but the broad outline is the same. More information on the work done in the code generator to create the workunit can be found in ecl/eclcc/DOCUMENTATION.rst. The C++ code can be generated as a single C++ file or multiple files. The system defaults to multiple C++ files, so that they can be compiled in parallel (and to avoid problems some compilers have with very large files). When multipe C++ files are generated the metadata classes and workflow classes are generated in the main module, and the activities are generated in the files suffixed with a number. It may be easier to understand the generated code if it is in one place. In which case, compile your query with the option -fspanMultipleCpp=0. Use -fsaveCppTempFiles to ensure the C++ files are not deleted (the C++ files will appear as helpers in the workunit details). IEclProcess : The interface that is used by the workflow engine to execute the different workflow items in the generated code. ThorActivityKind : This enumeration contains one entry for each activity supported by the engines. ICodeContext : This interface is implemented by the engine, and provides a mechanism for the generated code to call back into the engine. For example resolveChildQuery() is used to obtain a reference to a child query that can then be executed later. IOutputMetaData : This interface is used to describe any meta data associated with the data being processed by the queries. IHThorArg : The base interface for defining information about an activity. Each activity has an associated interface that is derived from this interface. E.g. each instance of the sort activity will have a helper class implementing IHThorSortArg in the generated query. There is normally a corresponding base class for each interface in eclhelper_base.hpp that is used by the generated code e.g. CThorSortArg. ARowBuilder : This abstract base class is used by the transform functions to reserve memory for the rows that are created. IEngineRowAllocator : Used by the generated code to allocate rows and rowsets. Can also be used to release rows (or call the global function rtlReleaseRow()). IGlobalCodeContext : Provides access to functions that cannot be called inside a graph - i.e. can only be called from the global workflow code. Most functions are related to the internal implementation of particular workflow item types (e.g. persists). activity : An activity is the basic unit of dataset processing implemented by the engines. Each activity corresponds to a node in the thor execution graph. Instances of the activities are connnected together to create the graph. dll : A dynamically loaded library. These correspond to shared objects in Linux (extension '.so'), dynamic libraries in Max OS X ('.dylib'), and dynamic link libraries in windows ('.dll'). superfile : A composite file which allows a collection of files to be treated as a single compound file. ?What else should go here? The XML for a workunit can be viewed on the XML tag in eclwatch, or generated by compiling the ECL using the -wu option with eclcc. Alternatively eclcc -b -S can be used to generate the XML and the C++ at the same time (the output filenames are derived from the input name). This document is intended for anyone that wants to contribute documentation to our project. The first audience is platform developers, so we can streamline the process of documenting new features. However, these guidelines apply to anyone who wants to contribute to any of our documentation (Language Reference, Programmer’s Guide, etc.). This set of guidelines should help you understand the information needed to create sufficient documentation for any new feature you add. The good news is that we are all here to help you and support your writing efforts. We will help by advising you along the way, and by reviewing and editing your submissions. When you create a new feature or function, clear documentation is crucial for both internal teams and external users. You worked hard on the feature, so it deserves proper notice and usage. Contributions to the platform are always welcome, and we strongly encourage developers and users to contribute documentation. You can contribute on many levels: Developer Notes End user “Readmes” in the form of MD files in the GitHub repository Blogs Formal documentation Regardless of the form you are planning to deliver, here are the required and optional components to include in a document. Tip: VS Code is very good at editing MD files. There is a built-in Preview panel available to be able to see the rendered form. In addition, GitHub Copilot is MD-aware and can help you write and format. For example, you can ask the Copilot, “How can I align the content within the cells of my Markdown table?” GitHub copilot will show you the alignment options. Overview What it is: Briefly describe the feature's purpose and the problem it solves. Why it matters: Explain the value proposition for users and the overall impact on the software. Target audience: Specify who this feature is designed for (for example, all users or specific user roles). Use Cases: Provide concrete examples of how a user might leverage this feature in a real-world scenario. Installation and Configuration: Details on how to install and basic setup for use, if needed. This must include any system requirements or dependencies. User Guide / Functionality How it works: Provide a task-oriented, step-by-step guide for using the feature. If possible, include screenshots for visual learners. Tips, Tricks, and Techniques: Explain any shortcuts or clever uses of the feature that may be non-obvious. Inputs and Outputs: Detail the information users need to provide to the feature and the format of the results. Error Handling: Explain what happens if users encounter errors and how to troubleshoot common issues. Limitations and Considerations: Advanced Usage API Reference (for technical audiences) FAQs Additional Resources Links to related documentation: Include links to relevant documentation for features that interact with this new addition. Tutorials: Consider creating tutorials for a more interactive learning experience. Videos: Consider requesting that a video be made to provide a more interactive visual learning experience. You should provide a simple script or outline of what should be shown in the video. Target your audience: Tailor the level of detail and technical jargon you use based on whether the documentation is for developers or end-users. Clarity and Conciseness: Use clear, concise language and maintain a consistent structure for easy navigation. Always use present tense, active voice. Remember, you’re writing for users and programmers, not academics, so keep it simple and straightforward. See the HPCC Style Guide for additional guidance. Visual Aids: Screenshots, diagrams, and flowcharts can significantly enhance understanding. A picture can communicate instantly what a thousand words cannot. Maintain and Update: Regularly review and update documentation as the feature evolves or based on user feedback. By following these guidelines and including the required and optional components, you can create comprehensive documentation that empowers users and streamlines the adoption of your new software feature. The boundary between a developer's responsibilities and the documentation team’s responsibility is not cast in stone. However, there are some guidelines that can help you decide what your responsibility is. Here are some examples: This typically needs a simple one or two word change in the area of the documentation where that setting is documented. However, the change could impact existing deployments or existing code and therefore it might also require a short write-up for the Red Book and/or Release Announcement. If the setting is used by both bare-metal and containerized, you should provide information about how the new setting is used in each of those deployments. This needs some changes to existing documentation so the best way to provide the information is in a documentation Jira issue. If it s a new keyword, function, or action, a brief overview should be included. For a Standard Library function, the developer should update the Javadoc comment in the appropriate ECL file. For a command line tool change, the developer should update the Usage section of the code. This is a candidate for either an MD file, a blog, or both. Since there should have been some sort of design specification document, that could easily be repurposed as a good start for this. Since this is information that is probably only of interest to other developers, a write-up in the form of an MD file in the repo is the best approach. If it affects end-users or operations, then a more formal document or a blog might be a good idea. If it affects existing deployments or existing code, then a Red Book notice might also be needed. New tests are frequently added and the regression suite readme files should be updated at the same time. If the tests are noteworthy, we could add a mention in the Platform Release Notes. In general, it makes sense to keep simple documentation near the code. For example, a document about ECL Agent should go in the ECLAgent folder. However, there are times where that is either not possible or a document may cover more than one component. In those cases, there are a few options as shown below. devdoc: This is a general folder for any developer document. devdoc/docs: This is a folder for documents about documentation. devdoc/userdoc: This is a collection of docs targeted toward the end-user rather than developers. This is primarily for informal documents aimed at end-users. This info can and should be incorporated into the formal docs. devdoc/userdoc/troubleshoot: Information related to troubleshooting particular components devdoc/userdoc/azure: Useful information about Azure Cloud portal and cli devdoc/userdoc/roxie: Useful information for running roxie devdoc/userdoc/thor: Useful information for running thor devdoc/userdoc/blogs: COMING SOON: Location and storage of original text for blogs. It also has docs with guidelines and instructions on writing Blogs You can include your documentation with your code in a Pull Request or create a separate Jira and Pull Request for the documentation. This depends on the size of the code and doc. For a large project or change, a separate Pull request for the documentation is better. This might allow the code change to be merged faster. For minor code changes, for example the addition of a parameter to an existing ECL keyword, you can request a documentation change in a Jira issue. You should provide sufficient details in the Jira. For example, If you add an optional parameter named Foo, you should provide details about what values can be passed in through the Foo parameter and what those values mean. You should also provide the default value used if the parameter is omitted. This document is intended for anyone that wants to contribute documentation to our project. The first audience is platform developers, so we can streamline the process of documenting new features. However, these guidelines apply to anyone who wants to contribute to any of our documentation (Language Reference, Programmer’s Guide, etc.). This set of guidelines should help you understand the information needed to create sufficient documentation for any new feature you add. The good news is that we are all here to help you and support your writing efforts. We will help by advising you along the way, and by reviewing and editing your submissions. When you create a new feature or function, clear documentation is crucial for both internal teams and external users. You worked hard on the feature, so it deserves proper notice and usage. Contributions to the platform are always welcome, and we strongly encourage developers and users to contribute documentation. You can contribute on many levels: Developer Notes End user “Readmes” in the form of MD files in the GitHub repository Blogs Formal documentation Regardless of the form you are planning to deliver, here are the required and optional components to include in a document. Tip: VS Code is very good at editing MD files. There is a built-in Preview panel available to be able to see the rendered form. In addition, GitHub Copilot is MD-aware and can help you write and format. For example, you can ask the Copilot, “How can I align the content within the cells of my Markdown table?” GitHub copilot will show you the alignment options. Overview What it is: Briefly describe the feature's purpose and the problem it solves. Why it matters: Explain the value proposition for users and the overall impact on the software. Target audience: Specify who this feature is designed for (for example, all users or specific user roles). Use Cases: Provide concrete examples of how a user might leverage this feature in a real-world scenario. Installation and Configuration: Details on how to install and basic setup for use, if needed. This must include any system requirements or dependencies. User Guide / Functionality How it works: Provide a task-oriented, step-by-step guide for using the feature. If possible, include screenshots for visual learners. Tips, Tricks, and Techniques: Explain any shortcuts or clever uses of the feature that may be non-obvious. Inputs and Outputs: Detail the information users need to provide to the feature and the format of the results. Error Handling: Explain what happens if users encounter errors and how to troubleshoot common issues. Limitations and Considerations: Advanced Usage API Reference (for technical audiences) FAQs Additional Resources Links to related documentation: Include links to relevant documentation for features that interact with this new addition. Tutorials: Consider creating tutorials for a more interactive learning experience. Videos: Consider requesting that a video be made to provide a more interactive visual learning experience. You should provide a simple script or outline of what should be shown in the video. Target your audience: Tailor the level of detail and technical jargon you use based on whether the documentation is for developers or end-users. Clarity and Conciseness: Use clear, concise language and maintain a consistent structure for easy navigation. Always use present tense, active voice. Remember, you’re writing for users and programmers, not academics, so keep it simple and straightforward. See the HPCC Style Guide for additional guidance. Visual Aids: Screenshots, diagrams, and flowcharts can significantly enhance understanding. A picture can communicate instantly what a thousand words cannot. Maintain and Update: Regularly review and update documentation as the feature evolves or based on user feedback. By following these guidelines and including the required and optional components, you can create comprehensive documentation that empowers users and streamlines the adoption of your new software feature. The boundary between a developer's responsibilities and the documentation team’s responsibility is not cast in stone. However, there are some guidelines that can help you decide what your responsibility is. Here are some examples: This typically needs a simple one or two word change in the area of the documentation where that setting is documented. However, the change could impact existing deployments or existing code and therefore it might also require a short write-up for the Red Book and/or Release Announcement. If the setting is used by both bare-metal and containerized, you should provide information about how the new setting is used in each of those deployments. This needs some changes to existing documentation so the best way to provide the information is in a documentation Jira issue. If it s a new keyword, function, or action, a brief overview should be included. For a Standard Library function, the developer should update the Javadoc comment in the appropriate ECL file. For a command line tool change, the developer should update the Usage section of the code. This is a candidate for either an MD file, a blog, or both. Since there should have been some sort of design specification document, that could easily be repurposed as a good start for this. Since this is information that is probably only of interest to other developers, a write-up in the form of an MD file in the repo is the best approach. If it affects end-users or operations, then a more formal document or a blog might be a good idea. If it affects existing deployments or existing code, then a Red Book notice might also be needed. New tests are frequently added and the regression suite readme files should be updated at the same time. If the tests are noteworthy, we could add a mention in the Platform Release Notes. In general, it makes sense to keep simple documentation near the code. For example, a document about ECL Agent should go in the ECLAgent folder. However, there are times where that is either not possible or a document may cover more than one component. In those cases, there are a few options as shown below. devdoc: This is a general folder for any developer document. devdoc/docs: This is a folder for documents about documentation. devdoc/userdoc: This is a collection of docs targeted toward the end-user rather than developers. This is primarily for informal documents aimed at end-users. This info can and should be incorporated into the formal docs. devdoc/userdoc/troubleshoot: Information related to troubleshooting particular components devdoc/userdoc/azure: Useful information about Azure Cloud portal and cli devdoc/userdoc/roxie: Useful information for running roxie devdoc/userdoc/thor: Useful information for running thor devdoc/userdoc/blogs: COMING SOON: Location and storage of original text for blogs. It also has docs with guidelines and instructions on writing Blogs You can include your documentation with your code in a Pull Request or create a separate Jira and Pull Request for the documentation. This depends on the size of the code and doc. For a large project or change, a separate Pull request for the documentation is better. This might allow the code change to be merged faster. For minor code changes, for example the addition of a parameter to an existing ECL keyword, you can request a documentation change in a Jira issue. You should provide sufficient details in the Jira. For example, If you add an optional parameter named Foo, you should provide details about what values can be passed in through the Foo parameter and what those values mean. You should also provide the default value used if the parameter is omitted. Use this template to create new documentation for a feature, function, or process. Not every feature will require all six sections. Delete the ones that are not applicaple. [In this section, provide a brief introduction to the new feature, highlighting its purpose and benefits.] [If applicable, explain the steps required to set up and configure the new feature, including any dependencies or prerequisites.] [Provide detailed instructions on how to use the new feature, including its functionality, options, and any relevant examples.] [If applicable, document the references for the new feature, including any classes, methods, or parameters.] [Offer a step-by-step tutorial on how to implement or utilize the new feature, with code snippets or screenshots if appropriate.] [Address common issues or errors that users may encounter while using the new feature, along with possible solutions or workarounds.] Use this template to create new documentation for a feature, function, or process. Not every feature will require all six sections. Delete the ones that are not applicaple. [In this section, provide a brief introduction to the new feature, highlighting its purpose and benefits.] [If applicable, explain the steps required to set up and configure the new feature, including any dependencies or prerequisites.] [Provide detailed instructions on how to use the new feature, including its functionality, options, and any relevant examples.] [If applicable, document the references for the new feature, including any classes, methods, or parameters.] [Offer a step-by-step tutorial on how to implement or utilize the new feature, with code snippets or screenshots if appropriate.] [Address common issues or errors that users may encounter while using the new feature, along with possible solutions or workarounds.] This section covers the best practice information for writing and contributing to the HPCC Systems® Platform documentation. We strive to maintain a consistent voice. These guidelines can help your writing match that voice. Use present tense, active voice. Documentation should be you speaking directly to the reader. Simply tell them what to do. This example sentence: "The user selects the file menu." is passive voice. You wouldn’t say it that way in a conversation. You should use active voice wording, such as: "Select the file menu". Similarly, instructions like these are active voice: "Press the button" or "Submit the file". Documentation is you instructing the user. Just tell them what to do. Be Brief and keep it simple. Be efficient with your words. Keep sentences short and concise. Keep paragraphs short as well. Use just a few sentences per paragraph. Use simple words wherever possible and try to avoid lengthy explanations. Consistency: Be consistent. Use the same voice across all documents. Use the same term, use the same spelling, punctuation, etc. Follow the conventions in this guide. When writing formal documentation that is more than a new feature announcement, do not refer to a new feature, or coming features as such, this does not hold up well over time. Do not refer to how things were done in the past. For example, "in the past we had to do X-Y-Z steps and now we no longer have to. This ‘new feature’ can do it in one step, Z". This only adds potential confusion. Just instruct on exactly what needs to be done now, using words efficiently as possible, so for this example just say “perform step Z” There are many terms specific to HPCC Systems® and the HPCC Systems platform. Use the following style guide for word usage and capitalization guidelines to use when referring to system components or other HPCC-specific entities. Maintain consistent usage throughout all docs. Officially and legally the organizaion's name is HPCC Systems® and it is a registered trademark. You should always refer to the platform as the HPCC Systems® platform and the registration mark ® should appear in the first and most prominent mention of the name. While it is acceptable to use the ® anywhere in a document, it is required to be used in the first and most prominent mention - so the average reader will be aware. Any usage after that first and most prominent is optional. Components and Tools HPCC Systems Platform Dali Sasha Thor hThor ROXIE DFU Server (Distributed File Utility Server) ESP Server (Enterprise Server Platform) ESP Services WsECL ECL Watch ECL Server ECLCC Server ECL Agent ECL IDE ECL Plug-in for Eclipse ECL Playground LDAP dafilesrv VS Code (no hyphen) ECL Language Extension for VS Code Configuration Manager (not ConfigMgr or ConfigManager) Note: when referring to the startup command, use configmgr (always lowercase) HPCCSystems.com or http://HPCCSystems.com (do not include the www portion) ECL (Enterprise Control Language) ECL command-line interface Note: when referring to the ECL command-line tool command, ecl is lowercase DFU Workunits ECL Workunits Workunit WUID HPCC Systems® multi-node Superfiles, subfiles package map The username is the (usually unique) thing you type in with your password, for example: bobsmith66. The user name is the name of the user, the user's real-life name, for example: Bob Smith. Use the following conventions for these commonly used terms: right-click double-click drag-and-drop click-and-drag plug-in drop list bare metal (no hyphen) blue/green (not blue-green) Common Vulnerabilities and Exposures (CVEs) You click a link. You select a tab. You press a button. You check a (check)box. Hyphenated when used as a noun. No Hyphen when used as a verb phrase. Examples: Did you read the write-up? Would you write up the steps to reproduce? To “assure” a person of something is to make him or her confident of it. According to the Associated Press style, to “ensure” that something happens is to make certain that it does, and to “insure” is to issue an insurance policy. Other authorities, however, consider “ensure” and “insure” interchangeable. We prefer ensure when it is not talking about insurance. If you have any questions, feel free to contact us at docfeedback@hpccsystems.com This section covers the best practice information for writing and contributing to the HPCC Systems® Platform documentation. We strive to maintain a consistent voice. These guidelines can help your writing match that voice. Use present tense, active voice. Documentation should be you speaking directly to the reader. Simply tell them what to do. This example sentence: "The user selects the file menu." is passive voice. You wouldn’t say it that way in a conversation. You should use active voice wording, such as: "Select the file menu". Similarly, instructions like these are active voice: "Press the button" or "Submit the file". Documentation is you instructing the user. Just tell them what to do. Be Brief and keep it simple. Be efficient with your words. Keep sentences short and concise. Keep paragraphs short as well. Use just a few sentences per paragraph. Use simple words wherever possible and try to avoid lengthy explanations. Consistency: Be consistent. Use the same voice across all documents. Use the same term, use the same spelling, punctuation, etc. Follow the conventions in this guide. When writing formal documentation that is more than a new feature announcement, do not refer to a new feature, or coming features as such, this does not hold up well over time. Do not refer to how things were done in the past. For example, "in the past we had to do X-Y-Z steps and now we no longer have to. This ‘new feature’ can do it in one step, Z". This only adds potential confusion. Just instruct on exactly what needs to be done now, using words efficiently as possible, so for this example just say “perform step Z” There are many terms specific to HPCC Systems® and the HPCC Systems platform. Use the following style guide for word usage and capitalization guidelines to use when referring to system components or other HPCC-specific entities. Maintain consistent usage throughout all docs. Officially and legally the organizaion's name is HPCC Systems® and it is a registered trademark. You should always refer to the platform as the HPCC Systems® platform and the registration mark ® should appear in the first and most prominent mention of the name. While it is acceptable to use the ® anywhere in a document, it is required to be used in the first and most prominent mention - so the average reader will be aware. Any usage after that first and most prominent is optional. Components and Tools HPCC Systems Platform Dali Sasha Thor hThor ROXIE DFU Server (Distributed File Utility Server) ESP Server (Enterprise Server Platform) ESP Services WsECL ECL Watch ECL Server ECLCC Server ECL Agent ECL IDE ECL Plug-in for Eclipse ECL Playground LDAP dafilesrv VS Code (no hyphen) ECL Language Extension for VS Code Configuration Manager (not ConfigMgr or ConfigManager) Note: when referring to the startup command, use configmgr (always lowercase) HPCCSystems.com or http://HPCCSystems.com (do not include the www portion) ECL (Enterprise Control Language) ECL command-line interface Note: when referring to the ECL command-line tool command, ecl is lowercase DFU Workunits ECL Workunits Workunit WUID HPCC Systems® multi-node Superfiles, subfiles package map The username is the (usually unique) thing you type in with your password, for example: bobsmith66. The user name is the name of the user, the user's real-life name, for example: Bob Smith. Use the following conventions for these commonly used terms: right-click double-click drag-and-drop click-and-drag plug-in drop list bare metal (no hyphen) blue/green (not blue-green) Common Vulnerabilities and Exposures (CVEs) You click a link. You select a tab. You press a button. You check a (check)box. Hyphenated when used as a noun. No Hyphen when used as a verb phrase. Examples: Did you read the write-up? Would you write up the steps to reproduce? To “assure” a person of something is to make him or her confident of it. According to the Associated Press style, to “ensure” that something happens is to make certain that it does, and to “insure” is to issue an insurance policy. Other authorities, however, consider “ensure” and “insure” interchangeable. We prefer ensure when it is not talking about insurance. If you have any questions, feel free to contact us at docfeedback@hpccsystems.com This series of blog posts started life as a series of walk-throughs and brainstorming sessions at a team offsite. This series will look at adding a new activity to the system. The idea is to give an walk through of the work involved, to highlight the different areas that need changing, and hopefully encourage others to add their own activities. In parallel with the description in this blog there is a series of commits to the github repository that correspond to the different stages in adding the activity. Once the blog is completed, the text will also be checked into the source control tree for future reference. The new activity is going to be a QUANTILE activity, which can be used to find the records that split a dataset into equal sized blocks. Two common uses are to find the median of a set of data (split into 2) or percentiles (split into 100). It can also be used to split a dataset for distribution across the nodes in a system. One hope is that the classes used to implement quantile in Thor can also be used to improve the performance of the global sort operation. It may seem fatuous, but the first task in adding any activity to the system is to work out what that activity is going to do! You can approach this in an iterative manner - starting with a minimal set of functionality and adding options as you think of them - or start with a more complete initial design. We have used both approaches in the past to add capabilities to the HPCC system, but on this occasion we will be starting from a more complete design - the conclusion of our initial design discussion: "What are the inputs, options and capabilities that might be useful in a QUANTILE activity?" The discussion produced the following items: Which dataset is being processed? This is always required and should be the first argument to the activity. How many parts to split the dataset into? This is always required, so it should be the next argument to the activity. Which fields are being used to order (and split) the dataset? Again this is always required, so the list of fields should follow the number of partitions. Which fields are returned? Normally the input row, but often it would be useful for the output to include details of which quantile a row corresponds to. To allow this an optional transform could be passed the input row as LEFT and the quantile number as COUNTER. How about first and last rows in the dataset? Sometimes it is also useful to know the first and last rows. Add flags to allow them to be optionally returned. How do you cope with too few input rows (including an empty input)? After some discussion we decided that QUANTILE should always return the number of parts requested. If there were fewer items in the input they would be duplicated as appropriate. We should provide a DEDUP flag for the situations when that is not desired. If there is an empty dataset as input then the default (blank) row will be created. Should all rows have the same weighting? Generally you want the same weighting for each row. However, if you are using QUANTILE to split your dataset, and the cost of the next operation depends on some feature of the row (e.g., the frequency of the firstname) then you may want to weight the rows differently. What if we are only interested in the 5th and 95th centiles? We could optionally allow a set of values to be selected from the results. There were also some implementation details concluded from the discussions: How accurate should the results be? The simplest implementation of QUANTILE (sort and then select the correct rows) will always produce accurate results. However, there may be some implementations that can produce an approximate answer more quickly. Therefore we could add a SKEW attribute to allow early termination. Does the implementation need to be stable? In other words, if there are rows with identical values for the ordering fields, but other fields not part of the ordering with different values, does it matter which of those rows are returned? Does the relative order within those matching rows matter? The general principle in the HPCC system is that sort operations should be stable, and that where possible activities return consistent, reproducible results. However, that often has a cost - either in performance or memory consumption. The design discussion highlighted the fact that if all the fields from the row are included in the sort order then the relative order does not matter because the duplicate rows will be indistinguishable. (This is also true for sorts, and following the discussion an optimization was added to 5.2 to take advantage of this.) For the QUANTILE activity we will add an ECL flag, but the code generator should also aim to spot this automatically. Returning counts of the numbers in each quantile might be interesting. This has little value when the results are exact, but may be more useful when a SKEW is specified to allow an approximate answer, or if a dataset might have a vast numbers of duplicates. It is possibly something to add to a future version of the activity. For an approximate answer, calculating the counts is likely to add an additional cost to the implementation, so the target engine should be informed if this is required. Is the output always sorted by the partition fields? If this naturally falls out of the implementations then it would be worth including it in the specification. Initially we will assume not, but will revisit after it has been implemented. After all the discussions we arrived at the following syntax: We also summarised a few implementation details: Finally, deciding on the name of the activity took almost as long as designing it! The end result of this process was summarised in a JIRA issue: https://track.hpccsystems.com/browse/HPCC-12267, which contains details of the desired syntax and semantics. It also contains some details of the next blog topic - test cases. Incidentally, a question that arose from of the design discussion was "What ECL can we use if we want to annotate a dataset with partition points?". Ideally the user needs a join activity which walks through a table of rows, and matches against the first row that contains key values less than or equal to the values in the search row. There are other situations where that operation would also be useful. Our conclusion was that the system does not have a simple way to achieve that, and that it was a deficiency in the current system, so another JIRA was created (see https://track.hpccsystems.com/browse/HPCC-13016). This is often how the design discussions proceed, with discussions in one area leading to new ideas in another. Similarly we concluded it would be useful to distribute rows in a dataset based on a partition (see https://track.hpccsystems.com/browse/HPCC-13260). When adding new features to the system, or changing the code generator, the first step is often to write some ECL test cases. They have proved very useful for several reasons: As part of the design discussion we also started to create a list of useful test cases (they follow below in the order they were discussed). The tests perform varying functions. Some of the tests are checking that the core functionality works correctly, while others check unusual situations and that strange boundary cases are covered. The tests are not exhaustive, but they are a good starting point and new tests can be added as the implementation progresses. The following is the list of tests that should be created as part of implementing this activity: Ideally any test cases for features should be included in the runtime regression suite, which is found in the testing/regress directory in the github repository. Tests that check invalid syntax should go in the compiler regression suite (ecl/regress). Commit https://github.com/ghalliday/HPCC-Platform/commit/d75e6b40e3503f851265670a27889d8adc73f645 contains the test cases so far. Note, the test examples in that commit do not yet cover all the cases above. Before the final pull request for the feature is merged the list above should be revisited and the test suite extended to include any missing tests. In practice it may be easier to write the test cases in parallel with implementing the parser -since that allows you to check their syntax. Some of the examples in the commit were created before work was started on the parser, others during, and some while implementing the feature itself. The first stage in implementing QUANTILE will be to add it to the parser. This can sometimes highlight issues with the syntax and cause revisions to the design. In this case there were two technical issues integrating the syntax into the grammar. (If you are not interested in shift/reduce conflicts you may want to skip a few paragraphs and jump to the walkthrough of the changes.) Originally, the optional transform was specified inside an attribute, e.g., something like OUTPUT(transform). However, this was not very consistent with the way that other transforms were implemented, so the syntax was updated so it became an optional transform following the partition field list. When the syntax was added to the grammar we hit another problem: Currently, a single production (sortList) in the grammar is used for matching sort orders. As well as accepting fields from a dataset the sort order production has been extended to accept any named attribute that can follow a sort order (e.g., LOCAL). This is because (with one token lookahead) it is ambiguous where the sort order finishes and the list of attributes begins. Trying to include transforms in those productions revealed other problems: In order to make some progress I elected to choose the last option and require the sort order to be included in curly braces. There are already a couple of activities - subsort and a form of atmost that similarly require them (and if redesigning ECL from scratch I would be tempted to require them everywhere). The final syntax is something that will need further discussion as part of the review of the pull request though, and may need to be revisited. Having decided how to solve the ambiguities in the grammar, the following is a walkthrough of the changes that were made as part of commit https://github.com/ghalliday/HPCC-Platform/commit/3d623d1c6cd151a0a5608aa20ae4739a008f6e44: no_quantile in hqlexpr.hpp The ECL query is represented by a graph of "expression" nodes - each has a "kind" that comes from the enumeration _node_operator. The first requirement is to add a new enumeration to represent the new activity - in this case we elected to reuse an unused placeholder. (These placeholders correspond to some old operators that are no longer supported. They have not been removed because the other elements in the enumeration need to keep the same values since they are used for calculating derived persistent values e.g., the hashes for persists.) New attribute names in hqlatoms. The quantile activity introduces some new attribute names that have not been used before. All names are represented in an atom table, so the code in hqlatoms.hpp/cpp is updated to define the new atoms. Properties of no_quantile There are various places that need to be updated to allow the system to know about the properties of the new operator: hqlattr This contains code to calculate derived attributes. The first entry in the case statement is currently unused (the function should be removed). The second, inside calcRowInformation(), is used to predict how many rows are generated by this activity. This information is percolated through the graph and is used for optimizations, and input counts can be used to select the best implementation for a particular activity. hqlexpr Most changes are relatively simple including the text for the operator, whether it is constant, and the number of dataset arguments it has. One key function is getChildDatasetType() that indicates the kind of dataset arguments the operator has, which in turn controls how LEFT/RIGHT are interpreted. In this case some of the activity arguments (e.g., the number of quantiles) implicitly use fields within the parent dataset, and the transform uses LEFT, so the operator returns childdataset_datasetleft. hqlir This entry is used for generating an intermediate representation of the graph. This can be useful for debugging issues. (Running eclcc with the logging options "--logfile xxx" and "--logdetail 999" will include details of the expression tree at each point in the code generation process in the log file. Also defining -ftraceIR will output the graphs in the IR format.) hqlfold This is the constant folder. At the moment the only change is to ensure that fields that are assigned constants within the transform are processed correctly. Future work could add code to optimize quantile applied to an empty dataset, or selecting 1 division. hqlmeta Similar to the functions in hqlattr that calculate derived attributes, these functions are used to calculate how the rows coming out of an activity are sorted, grouped and distributed. It is vital to only preserve information that is guaranteed to be true - otherwise invalid optimizations might be performed on the rest of the expression tree. reservedwords.cpp A new entry indicating which category the keyword belongs to. Finally we have the changes to the parser to recognise the new syntax: hqllex.l This file contains the lexer that breaks the ecl file into tokens. There are two new tokens - QUANTILE and SCORE. Hqlgram.y This file contains the grammar that matches the language. There are two productions - one that matches the version of QUANTILE with a transform and one without. (Two productions are used instead of an optional transform to avoid shift/reduce errors.) hqlgram2.cpp This contains the bulk of the code that is executed by the productions in the grammar. Changes here include new entries added to a case statement to get the text for the new tokens, and a new entry in the simplify() call. This helps reduce the number of valid tokens that could follow when reporting a syntax error. Looking back over those changes, one reflection is that there are lots of different places that need to be changed. How does a programmer know which functions need to change, and what happens if some are missed? In this example, the locations were found by searching for an activity with a similar syntax e.g., no_soapcall_ds or no_normalize. It is too easy to miss something, especially for somebody new to the code - although if you do then you will trigger a runtime internal error. It would be much better if the code was refactored so that the bulk of the changes were in one place. (See JIRA https://track.hpccsystems.com/browse/HPCC-13434 that has been added to track improvement of the situation.) With these changes implemented the examples from the previous pull request now syntax check. The next stage in the process involves thinking through the details of how the activity will be implemented. The next stage in adding a new activity to the system is to define the interface between the generated code and the engines. The important file for this stage is rtl/include/eclhelper.hpp, which contains the interfaces between the engines and the generated code. These interfaces define the information required by the engines to customize each of the different activities. The changes that define the interface for quantile are found in commit https://github.com/ghalliday/HPCC-Platform/commit/06534d8e9962637fe9a5188d1cc4ab32c3925010. Adding a quantile activity involves the following changes: ThorActivityKind - TAKquantile Each activity that the engines support has an entry in this enumeration. This value is stored in the graph as the _kind attribute of the node. ActivityInterfaceEnum - TAIquantilearg_1 This enumeration in combination with the selectInterface() member of IHThorArg provides a mechanism for helper interfaces to be extended while preserving backwards compatibility with older workunits. The mechanism is rarely used (but valuable when it is), and adding a new activity only requires a single new entry. IHThorArg This is the base interface that all activity interfaces are derived from. This interface does not need to change, but it is worth noting because each activity defines a specialized version of it. The names of the specialised interfaces follow a pattern; in this case the new interface is IHThorQuantileArg. IHThorQuantileArg The following is an outline of the new member functions, with comments on their use: getFlags() Many of the interfaces have a getFlags() function. It provides a concise way of returning several Boolean options in a single call - provided those options do not change during the execution of the activity. The flags are normally defined with explicit values in an enumeration before the interface. The labels often follow the pattern T<First-letter-of-activity>F<lowercase-name>, i.e. TQFxxx ~= Thor-Quantile-Flag-XXX. getNumDivisions() Returns how many parts to split the dataset into. getSkew() Corresponds to the SKEW() attribute. queryCompare() Returns an implementation of the interface used to compare two rows. createDefault(rowBuilder) A function used to create a default row - used if there are no input rows. transform(rowBuilder, _left, _counter) The function to create the output record from the input record and the partition number (passed as counter). getScore(_left) What weighting should be given to this row? getRange(isAll, tlen, tgt) Corresponds to the RANGE attribute. Note that the different engines all use the same specialised interface - it contains a superset of the functions required by the different targets. Occasionally some of the engines do not need to use some of the functions (e.g., to serialize information between nodes) so the code generator may output empty implementations. For each interface defined in eclhelper.hpp there is a base implementation class defined in eclhelper_base.hpp. The classes generated for each activity in a query by the code generator are derived from one of these base classes. Therefore we need to create a corresponding new class CThorQuantileArg. It often provides default implementations for some of the helper functions to help reduce the size of the generated code (e.g., getScore returning 1). Often the process of designing the helper interface is dynamic. As the implementation is created, new options or possibilities for optimizations appear. These require extensions and changes to the helper interface in order to be implemented by the engines. Once the initial interface has been agreed, work on the code generator and the engines can proceeded in parallel. (It is equally possible to design this interface before any work on the parser begins, allowing more work to overlap.) There are some more details on the contents of thorhelper.hpp in the documentation ecl/eclcc/WORKUNIT.rst within the HPCC repository. Adding a new activity to the code generator is (surprisingly!) a relatively simple operation. The process is more complicated if the activity also requires an implementation that generates inline C++, but that only applies to a small subset of very simple activities, e.g., filter, aggregate. Changes to the code generator also tend to be more substantial if you add a new type, but that is also not the case for the quantile activity. For quantile, the only change required is to add a function that generates an implementation of the helper class. The code for all the different activities follows a very similar pattern - generate input activities, generate the helper for this activity, and link the input activities to this new activity. It is often easiest to copy the boiler-plate code from a similar activity (e.g., sort) and then adapt it. (Yes, some of this code could also be refactored... any volunteers?) There are a large number of helper functions available to help generate transforms and other member functions, which also simplifies the process. The new code is found in commit https://github.com/ghalliday/HPCC-Platform/commit/47f850d827f1655fd6a78fb9c07f1e911b708175. Most of the code should be self explanatory, but one item is worth highlighting. The code generator builds up a structure in memory that represents the C++ code that is being generated. The BuildCtx class is used to represent a location within that generated code where new code can be inserted. The instance variable contains several BuildCtx members that are used to represent locations to generate code within the helper class (classctx, nestedctx, createctx and startctx). They are used for different purposes: classctx Used to generate any member functions that can be called as soon as the helper object has been created, e.g., getFlags(). nestedctx Used to generate nested member classes and objects - e.g., comparison classes. startctx Any function that may return a value that depends on the context/parent activity. For example if QUANTILE is used inside the TRANSFORM of a PROJECT, the number of partition points may depend on a field in the LEFT row of the PROJECT. Therefore the getNumDivisions() member function needs to be generated inside instance->startctx. These functions can only be called by the engine after onCreate() and onStart() have been called to set up the current context. createctx Really, this is a historical artefact from many years ago. It was originally used for functions that could be dependent on a global expression, but not a parent row. Almost all such restrictions have since been removed, and those that remain should probably be replaced with either classctx or startctx. The only other change is to extend the switch statement in common/thorcommon/thorcommon.cpp to add a text description of the activity. With the code generator outputting all the information we need, we can now implement the activity in one of the engines. (As I mentioned previously, in practice this is often done in parallel with adding it to the code generator.) Roxie and hThor are the best engines to start with because most of their activities run on a single node - so the implementations tend to be less complicated. It is also relatively easy to debug them, by compiling to create a stand-alone executable, and then running that executable inside a debugger. The following description walks-through the roxie changes: The changes have been split into two commits to make the code changes easier to follow. The first commit (https://github.com/ghalliday/HPCC-Platform/commit/30da006df9ae01c9aa784e91129457883e9bb8f3) adds the simplest implementation of the activity: Code is added to ccdquery to process the new TAKquantile activity kind, and create a factory object of the correct type. The implementation of the factory class is relatively simple - it primarily contains a method for creating an instance of the activity class. Some factories create instances of the helper and cache any information that never changes (in this case the value returned by getFlags(), which is a very marginal optimization). The classes that implement the existing sort algorithms are extended to return the sorted array in a single call. This allows the quicksort variants to be implemented more efficiently. The class CRoxieServerQuantileActivity contains the code for implementing the quantile activity. It has the following methods: Constructor Extracts any information from the helper that does not vary, and initializes all member variables. start() This function is called before the graph is executed. It evaluates any helper methods that might vary from execution to execution (e.g., getRange(), numDivisions()), but which do not depend on the current row. reset() Called when a graph has finished executing - after an activity has finished processing all its records. It is used to clean up any variables, and restore the activity ready for processing again (e.g., if it is inside a child query). needsAllocator() Returns true if this activity creates new rows. nextInGroup() The main function in the activity. This function is called by whichever activity is next in the graph to request a new row from the quantile activity. The functions should be designed so they return the next row as quickly as possible, and delay any processing until it is needed. In this case the input is not read and sorted until the first row is requested. Note, the call to the helper.transform() returns the size of the resulting row, and returns zero if the row should be skipped. The call to finaliseRowClear() after a successful row creation is there to indicate that the row can no longer be modified, and ensures that any child rows will be correctly freed when the row is freed. The function also contains extra logic to ensure that groups are implemented correctly. The end of a group is marked by returning a single NULL row, the end of the dataset by two contiguous NULL rows. It is important to ensure that a group that has all its output rows skipped doesn't return two NULLs in a row - hence the checks for anyThisGroup. With those changes in place, the second commit https://github.com/ghalliday/HPCC-Platform/commit/aeaa209092ea1af9660c6908062c1b0b9acff36b adds support for the RANGE, FIRST, and LAST attributes. It also optimizes the cases where the input is already sorted, and the version of QUANTILE which does not include a transform. (If you are looking at the change in github then it is useful to ignore whitespace changes by appending ?w=1 to the URL). The main changes are TBD... hthor - trivial,sharing code and deprecated. Discussion, of possible improvements. Hoares' algorithm. Ln2(n) < 4k? SKEW and Hoares Ordered RANGE. Calc offsets from the quantile (see testing/regress/ecl/xxxxx?) SCORE TBD Basic activity structure Locally sorting and allowing the inputs to spill. The partitioning approach Classes Skew Optimizations This series of blog posts started life as a series of walk-throughs and brainstorming sessions at a team offsite. This series will look at adding a new activity to the system. The idea is to give an walk through of the work involved, to highlight the different areas that need changing, and hopefully encourage others to add their own activities. In parallel with the description in this blog there is a series of commits to the github repository that correspond to the different stages in adding the activity. Once the blog is completed, the text will also be checked into the source control tree for future reference. The new activity is going to be a QUANTILE activity, which can be used to find the records that split a dataset into equal sized blocks. Two common uses are to find the median of a set of data (split into 2) or percentiles (split into 100). It can also be used to split a dataset for distribution across the nodes in a system. One hope is that the classes used to implement quantile in Thor can also be used to improve the performance of the global sort operation. It may seem fatuous, but the first task in adding any activity to the system is to work out what that activity is going to do! You can approach this in an iterative manner - starting with a minimal set of functionality and adding options as you think of them - or start with a more complete initial design. We have used both approaches in the past to add capabilities to the HPCC system, but on this occasion we will be starting from a more complete design - the conclusion of our initial design discussion: "What are the inputs, options and capabilities that might be useful in a QUANTILE activity?" The discussion produced the following items: Which dataset is being processed? This is always required and should be the first argument to the activity. How many parts to split the dataset into? This is always required, so it should be the next argument to the activity. Which fields are being used to order (and split) the dataset? Again this is always required, so the list of fields should follow the number of partitions. Which fields are returned? Normally the input row, but often it would be useful for the output to include details of which quantile a row corresponds to. To allow this an optional transform could be passed the input row as LEFT and the quantile number as COUNTER. How about first and last rows in the dataset? Sometimes it is also useful to know the first and last rows. Add flags to allow them to be optionally returned. How do you cope with too few input rows (including an empty input)? After some discussion we decided that QUANTILE should always return the number of parts requested. If there were fewer items in the input they would be duplicated as appropriate. We should provide a DEDUP flag for the situations when that is not desired. If there is an empty dataset as input then the default (blank) row will be created. Should all rows have the same weighting? Generally you want the same weighting for each row. However, if you are using QUANTILE to split your dataset, and the cost of the next operation depends on some feature of the row (e.g., the frequency of the firstname) then you may want to weight the rows differently. What if we are only interested in the 5th and 95th centiles? We could optionally allow a set of values to be selected from the results. There were also some implementation details concluded from the discussions: How accurate should the results be? The simplest implementation of QUANTILE (sort and then select the correct rows) will always produce accurate results. However, there may be some implementations that can produce an approximate answer more quickly. Therefore we could add a SKEW attribute to allow early termination. Does the implementation need to be stable? In other words, if there are rows with identical values for the ordering fields, but other fields not part of the ordering with different values, does it matter which of those rows are returned? Does the relative order within those matching rows matter? The general principle in the HPCC system is that sort operations should be stable, and that where possible activities return consistent, reproducible results. However, that often has a cost - either in performance or memory consumption. The design discussion highlighted the fact that if all the fields from the row are included in the sort order then the relative order does not matter because the duplicate rows will be indistinguishable. (This is also true for sorts, and following the discussion an optimization was added to 5.2 to take advantage of this.) For the QUANTILE activity we will add an ECL flag, but the code generator should also aim to spot this automatically. Returning counts of the numbers in each quantile might be interesting. This has little value when the results are exact, but may be more useful when a SKEW is specified to allow an approximate answer, or if a dataset might have a vast numbers of duplicates. It is possibly something to add to a future version of the activity. For an approximate answer, calculating the counts is likely to add an additional cost to the implementation, so the target engine should be informed if this is required. Is the output always sorted by the partition fields? If this naturally falls out of the implementations then it would be worth including it in the specification. Initially we will assume not, but will revisit after it has been implemented. After all the discussions we arrived at the following syntax: We also summarised a few implementation details: Finally, deciding on the name of the activity took almost as long as designing it! The end result of this process was summarised in a JIRA issue: https://track.hpccsystems.com/browse/HPCC-12267, which contains details of the desired syntax and semantics. It also contains some details of the next blog topic - test cases. Incidentally, a question that arose from of the design discussion was "What ECL can we use if we want to annotate a dataset with partition points?". Ideally the user needs a join activity which walks through a table of rows, and matches against the first row that contains key values less than or equal to the values in the search row. There are other situations where that operation would also be useful. Our conclusion was that the system does not have a simple way to achieve that, and that it was a deficiency in the current system, so another JIRA was created (see https://track.hpccsystems.com/browse/HPCC-13016). This is often how the design discussions proceed, with discussions in one area leading to new ideas in another. Similarly we concluded it would be useful to distribute rows in a dataset based on a partition (see https://track.hpccsystems.com/browse/HPCC-13260). When adding new features to the system, or changing the code generator, the first step is often to write some ECL test cases. They have proved very useful for several reasons: As part of the design discussion we also started to create a list of useful test cases (they follow below in the order they were discussed). The tests perform varying functions. Some of the tests are checking that the core functionality works correctly, while others check unusual situations and that strange boundary cases are covered. The tests are not exhaustive, but they are a good starting point and new tests can be added as the implementation progresses. The following is the list of tests that should be created as part of implementing this activity: Ideally any test cases for features should be included in the runtime regression suite, which is found in the testing/regress directory in the github repository. Tests that check invalid syntax should go in the compiler regression suite (ecl/regress). Commit https://github.com/ghalliday/HPCC-Platform/commit/d75e6b40e3503f851265670a27889d8adc73f645 contains the test cases so far. Note, the test examples in that commit do not yet cover all the cases above. Before the final pull request for the feature is merged the list above should be revisited and the test suite extended to include any missing tests. In practice it may be easier to write the test cases in parallel with implementing the parser -since that allows you to check their syntax. Some of the examples in the commit were created before work was started on the parser, others during, and some while implementing the feature itself. The first stage in implementing QUANTILE will be to add it to the parser. This can sometimes highlight issues with the syntax and cause revisions to the design. In this case there were two technical issues integrating the syntax into the grammar. (If you are not interested in shift/reduce conflicts you may want to skip a few paragraphs and jump to the walkthrough of the changes.) Originally, the optional transform was specified inside an attribute, e.g., something like OUTPUT(transform). However, this was not very consistent with the way that other transforms were implemented, so the syntax was updated so it became an optional transform following the partition field list. When the syntax was added to the grammar we hit another problem: Currently, a single production (sortList) in the grammar is used for matching sort orders. As well as accepting fields from a dataset the sort order production has been extended to accept any named attribute that can follow a sort order (e.g., LOCAL). This is because (with one token lookahead) it is ambiguous where the sort order finishes and the list of attributes begins. Trying to include transforms in those productions revealed other problems: In order to make some progress I elected to choose the last option and require the sort order to be included in curly braces. There are already a couple of activities - subsort and a form of atmost that similarly require them (and if redesigning ECL from scratch I would be tempted to require them everywhere). The final syntax is something that will need further discussion as part of the review of the pull request though, and may need to be revisited. Having decided how to solve the ambiguities in the grammar, the following is a walkthrough of the changes that were made as part of commit https://github.com/ghalliday/HPCC-Platform/commit/3d623d1c6cd151a0a5608aa20ae4739a008f6e44: no_quantile in hqlexpr.hpp The ECL query is represented by a graph of "expression" nodes - each has a "kind" that comes from the enumeration _node_operator. The first requirement is to add a new enumeration to represent the new activity - in this case we elected to reuse an unused placeholder. (These placeholders correspond to some old operators that are no longer supported. They have not been removed because the other elements in the enumeration need to keep the same values since they are used for calculating derived persistent values e.g., the hashes for persists.) New attribute names in hqlatoms. The quantile activity introduces some new attribute names that have not been used before. All names are represented in an atom table, so the code in hqlatoms.hpp/cpp is updated to define the new atoms. Properties of no_quantile There are various places that need to be updated to allow the system to know about the properties of the new operator: hqlattr This contains code to calculate derived attributes. The first entry in the case statement is currently unused (the function should be removed). The second, inside calcRowInformation(), is used to predict how many rows are generated by this activity. This information is percolated through the graph and is used for optimizations, and input counts can be used to select the best implementation for a particular activity. hqlexpr Most changes are relatively simple including the text for the operator, whether it is constant, and the number of dataset arguments it has. One key function is getChildDatasetType() that indicates the kind of dataset arguments the operator has, which in turn controls how LEFT/RIGHT are interpreted. In this case some of the activity arguments (e.g., the number of quantiles) implicitly use fields within the parent dataset, and the transform uses LEFT, so the operator returns childdataset_datasetleft. hqlir This entry is used for generating an intermediate representation of the graph. This can be useful for debugging issues. (Running eclcc with the logging options "--logfile xxx" and "--logdetail 999" will include details of the expression tree at each point in the code generation process in the log file. Also defining -ftraceIR will output the graphs in the IR format.) hqlfold This is the constant folder. At the moment the only change is to ensure that fields that are assigned constants within the transform are processed correctly. Future work could add code to optimize quantile applied to an empty dataset, or selecting 1 division. hqlmeta Similar to the functions in hqlattr that calculate derived attributes, these functions are used to calculate how the rows coming out of an activity are sorted, grouped and distributed. It is vital to only preserve information that is guaranteed to be true - otherwise invalid optimizations might be performed on the rest of the expression tree. reservedwords.cpp A new entry indicating which category the keyword belongs to. Finally we have the changes to the parser to recognise the new syntax: hqllex.l This file contains the lexer that breaks the ecl file into tokens. There are two new tokens - QUANTILE and SCORE. Hqlgram.y This file contains the grammar that matches the language. There are two productions - one that matches the version of QUANTILE with a transform and one without. (Two productions are used instead of an optional transform to avoid shift/reduce errors.) hqlgram2.cpp This contains the bulk of the code that is executed by the productions in the grammar. Changes here include new entries added to a case statement to get the text for the new tokens, and a new entry in the simplify() call. This helps reduce the number of valid tokens that could follow when reporting a syntax error. Looking back over those changes, one reflection is that there are lots of different places that need to be changed. How does a programmer know which functions need to change, and what happens if some are missed? In this example, the locations were found by searching for an activity with a similar syntax e.g., no_soapcall_ds or no_normalize. It is too easy to miss something, especially for somebody new to the code - although if you do then you will trigger a runtime internal error. It would be much better if the code was refactored so that the bulk of the changes were in one place. (See JIRA https://track.hpccsystems.com/browse/HPCC-13434 that has been added to track improvement of the situation.) With these changes implemented the examples from the previous pull request now syntax check. The next stage in the process involves thinking through the details of how the activity will be implemented. The next stage in adding a new activity to the system is to define the interface between the generated code and the engines. The important file for this stage is rtl/include/eclhelper.hpp, which contains the interfaces between the engines and the generated code. These interfaces define the information required by the engines to customize each of the different activities. The changes that define the interface for quantile are found in commit https://github.com/ghalliday/HPCC-Platform/commit/06534d8e9962637fe9a5188d1cc4ab32c3925010. Adding a quantile activity involves the following changes: ThorActivityKind - TAKquantile Each activity that the engines support has an entry in this enumeration. This value is stored in the graph as the _kind attribute of the node. ActivityInterfaceEnum - TAIquantilearg_1 This enumeration in combination with the selectInterface() member of IHThorArg provides a mechanism for helper interfaces to be extended while preserving backwards compatibility with older workunits. The mechanism is rarely used (but valuable when it is), and adding a new activity only requires a single new entry. IHThorArg This is the base interface that all activity interfaces are derived from. This interface does not need to change, but it is worth noting because each activity defines a specialized version of it. The names of the specialised interfaces follow a pattern; in this case the new interface is IHThorQuantileArg. IHThorQuantileArg The following is an outline of the new member functions, with comments on their use: getFlags() Many of the interfaces have a getFlags() function. It provides a concise way of returning several Boolean options in a single call - provided those options do not change during the execution of the activity. The flags are normally defined with explicit values in an enumeration before the interface. The labels often follow the pattern T<First-letter-of-activity>F<lowercase-name>, i.e. TQFxxx ~= Thor-Quantile-Flag-XXX. getNumDivisions() Returns how many parts to split the dataset into. getSkew() Corresponds to the SKEW() attribute. queryCompare() Returns an implementation of the interface used to compare two rows. createDefault(rowBuilder) A function used to create a default row - used if there are no input rows. transform(rowBuilder, _left, _counter) The function to create the output record from the input record and the partition number (passed as counter). getScore(_left) What weighting should be given to this row? getRange(isAll, tlen, tgt) Corresponds to the RANGE attribute. Note that the different engines all use the same specialised interface - it contains a superset of the functions required by the different targets. Occasionally some of the engines do not need to use some of the functions (e.g., to serialize information between nodes) so the code generator may output empty implementations. For each interface defined in eclhelper.hpp there is a base implementation class defined in eclhelper_base.hpp. The classes generated for each activity in a query by the code generator are derived from one of these base classes. Therefore we need to create a corresponding new class CThorQuantileArg. It often provides default implementations for some of the helper functions to help reduce the size of the generated code (e.g., getScore returning 1). Often the process of designing the helper interface is dynamic. As the implementation is created, new options or possibilities for optimizations appear. These require extensions and changes to the helper interface in order to be implemented by the engines. Once the initial interface has been agreed, work on the code generator and the engines can proceeded in parallel. (It is equally possible to design this interface before any work on the parser begins, allowing more work to overlap.) There are some more details on the contents of thorhelper.hpp in the documentation ecl/eclcc/WORKUNIT.rst within the HPCC repository. Adding a new activity to the code generator is (surprisingly!) a relatively simple operation. The process is more complicated if the activity also requires an implementation that generates inline C++, but that only applies to a small subset of very simple activities, e.g., filter, aggregate. Changes to the code generator also tend to be more substantial if you add a new type, but that is also not the case for the quantile activity. For quantile, the only change required is to add a function that generates an implementation of the helper class. The code for all the different activities follows a very similar pattern - generate input activities, generate the helper for this activity, and link the input activities to this new activity. It is often easiest to copy the boiler-plate code from a similar activity (e.g., sort) and then adapt it. (Yes, some of this code could also be refactored... any volunteers?) There are a large number of helper functions available to help generate transforms and other member functions, which also simplifies the process. The new code is found in commit https://github.com/ghalliday/HPCC-Platform/commit/47f850d827f1655fd6a78fb9c07f1e911b708175. Most of the code should be self explanatory, but one item is worth highlighting. The code generator builds up a structure in memory that represents the C++ code that is being generated. The BuildCtx class is used to represent a location within that generated code where new code can be inserted. The instance variable contains several BuildCtx members that are used to represent locations to generate code within the helper class (classctx, nestedctx, createctx and startctx). They are used for different purposes: classctx Used to generate any member functions that can be called as soon as the helper object has been created, e.g., getFlags(). nestedctx Used to generate nested member classes and objects - e.g., comparison classes. startctx Any function that may return a value that depends on the context/parent activity. For example if QUANTILE is used inside the TRANSFORM of a PROJECT, the number of partition points may depend on a field in the LEFT row of the PROJECT. Therefore the getNumDivisions() member function needs to be generated inside instance->startctx. These functions can only be called by the engine after onCreate() and onStart() have been called to set up the current context. createctx Really, this is a historical artefact from many years ago. It was originally used for functions that could be dependent on a global expression, but not a parent row. Almost all such restrictions have since been removed, and those that remain should probably be replaced with either classctx or startctx. The only other change is to extend the switch statement in common/thorcommon/thorcommon.cpp to add a text description of the activity. With the code generator outputting all the information we need, we can now implement the activity in one of the engines. (As I mentioned previously, in practice this is often done in parallel with adding it to the code generator.) Roxie and hThor are the best engines to start with because most of their activities run on a single node - so the implementations tend to be less complicated. It is also relatively easy to debug them, by compiling to create a stand-alone executable, and then running that executable inside a debugger. The following description walks-through the roxie changes: The changes have been split into two commits to make the code changes easier to follow. The first commit (https://github.com/ghalliday/HPCC-Platform/commit/30da006df9ae01c9aa784e91129457883e9bb8f3) adds the simplest implementation of the activity: Code is added to ccdquery to process the new TAKquantile activity kind, and create a factory object of the correct type. The implementation of the factory class is relatively simple - it primarily contains a method for creating an instance of the activity class. Some factories create instances of the helper and cache any information that never changes (in this case the value returned by getFlags(), which is a very marginal optimization). The classes that implement the existing sort algorithms are extended to return the sorted array in a single call. This allows the quicksort variants to be implemented more efficiently. The class CRoxieServerQuantileActivity contains the code for implementing the quantile activity. It has the following methods: Constructor Extracts any information from the helper that does not vary, and initializes all member variables. start() This function is called before the graph is executed. It evaluates any helper methods that might vary from execution to execution (e.g., getRange(), numDivisions()), but which do not depend on the current row. reset() Called when a graph has finished executing - after an activity has finished processing all its records. It is used to clean up any variables, and restore the activity ready for processing again (e.g., if it is inside a child query). needsAllocator() Returns true if this activity creates new rows. nextInGroup() The main function in the activity. This function is called by whichever activity is next in the graph to request a new row from the quantile activity. The functions should be designed so they return the next row as quickly as possible, and delay any processing until it is needed. In this case the input is not read and sorted until the first row is requested. Note, the call to the helper.transform() returns the size of the resulting row, and returns zero if the row should be skipped. The call to finaliseRowClear() after a successful row creation is there to indicate that the row can no longer be modified, and ensures that any child rows will be correctly freed when the row is freed. The function also contains extra logic to ensure that groups are implemented correctly. The end of a group is marked by returning a single NULL row, the end of the dataset by two contiguous NULL rows. It is important to ensure that a group that has all its output rows skipped doesn't return two NULLs in a row - hence the checks for anyThisGroup. With those changes in place, the second commit https://github.com/ghalliday/HPCC-Platform/commit/aeaa209092ea1af9660c6908062c1b0b9acff36b adds support for the RANGE, FIRST, and LAST attributes. It also optimizes the cases where the input is already sorted, and the version of QUANTILE which does not include a transform. (If you are looking at the change in github then it is useful to ignore whitespace changes by appending ?w=1 to the URL). The main changes are TBD... hthor - trivial,sharing code and deprecated. Discussion, of possible improvements. Hoares' algorithm. Ln2(n) < 4k? SKEW and Hoares Ordered RANGE. Calc offsets from the quantile (see testing/regress/ecl/xxxxx?) SCORE TBD Basic activity structure Locally sorting and allowing the inputs to spill. The partitioning approach Classes Skew Optimizations Because I could. Many of the pieces needed for Roxie were already created for use in other systems – ECL language, code generator, index creation in Thor, etc. Indexes could be used by Moxie, but that relied on monolithic single-part indexes, was single-threaded (forked a process per query) and had limited ability to do any queries beyond simple index lookups. ECL had already proved itself as a way to express more complex queries concisely, and the concept of doing the processing next to the data had been proved in hOle and Thor, so Roxie – using the same concept for online queries using indexes – was a natural extension of that, reusing the existing index creation and code generation, but adding a new run-time engine geared towards pre-deployed queries and sending index lookup requests to the node holding the index data. The code generator creates a graph (DAG) representing the query, with one node per activity and links representing the inputs and dependencies. There is also a helper class for each activity. Roxie loads this graph for all published queries, creating a factory for each activity and recording how they are linked. When a query is executed, the factories create the activity instances and link them together. All activities without output activities (known as ‘sinks’) are then executed (often on parallel threads), and will typically result in a value being written to a workunit, to the socket that the query was received in, or to a global “context” area where subsequent parts of the query might read it. Data is pulled through the activity graph, by any activity that wants a row from its input requesting it. Evaluation is therefore lazy, with data only calculated as needed. However, to reduce latency in some cases activities will prepare results ahead of when they are requested – for example an index read activity will send the request to the agent(s) as soon as it is started rather than waiting for the data to be requested by its downstream activity. This may result in wasted work, and in some cases may result in data coming back from an agent after the requesting query has completed after discovering it didn’t need it after all – this results in the dreaded “NO msg collator found – using default” tracing (not an error but may be indicative of a query that could use some tuning). Before requesting rows from an input, it should be started, and when no more rows are required it should be stopped. It should be reset before destruction or reuse (for example for the next row in a child query). Balancing the desire to reduce latency with the desire to avoid wasted work can be tricky. Conditional activities (IF etc) will not start their unused inputs, so that queries can be written that do different index reads depending on the input. There is also the concept of a “delayed start” activity – I would need to look at the code to remind myself of how those are used. Splitter activities are a bit painful – they may result in arbitrary buffering of the data consumed by one output until another output is ready to request a row. It’s particularly complex when some of the outputs don’t start at all – the splitter needs to keep track of how many of the inputs have been started and stopped (an input that is not going to be used must be stopped, so that splitters know not to keep data for them). Tracking these start/stop/reset calls accurately is very important otherwise you can end up with weird bugs including potential crashes when activities are destroyed. Therefore we report errors if the counts don’t tally properly at the end of a query – but working out where a call was missed is often not trivial. Usually it’s because of an exception thrown from an unexpected place, e.g. midway through starting. Note that row requests from the activities above a splitter may execute on whichever thread downstream from the splitter happens to need that particular row first. The splitter code for tracking whether any of the downstream activities still need a row is a bit hairy/inefficient, IIRC. There may be scope to optimize (but I would recommend adding some good unit test cases first!) When there are multiple agents fulfilling data on a channel, work is shared among them via a hash of the packet header, which is used to determine which agent should work on that packet. However, if it doesn’t start working on it within a short period (either because the node is down, or because it is too busy on other in-flight requests), then another node may take over. The IBYTI messages are used to indicate that a node has started to work on a packet and therefore there is no need for a secondary to take over. The priority of agents as determined by the packet hash is also used to determine how to proceed if an IBYTI is received after starting to work on a request. If the IBYTI is from a lower priority buddy (sub-channel) then it is ignored, if it’s from a higher priority one then the processing will be abandoned. When multicast is enabled, the IBYTI is sent on the same multicast channel as the original packet (and care is needed to ignore ones sent by yourself). Otherwise it is sent to all buddy IPs. Nodes keep track of how often they have had to step in for a supposedly higher priority node, and reduce their wait time before stepping in each time this happens, so if a node has crashed then the buddy nodes will end up taking over without every packet being delayed. (QUESTION – does this result in the load on the first node after the failed node getting double the load?) Newer code for cloud systems (where the topology may change dynamically) send the information about the buddy nodes in the packet header rather than assuming all nodes already have a consistent version of that information. This ensures that all agents are using the same assumptions about buddy nodes and their ordering. An index is basically a big sorted table of the keyed fields, divided into pages, with an index of the last row from each page used to be able to locate pages quickly. The bottom level pages (‘leaves’) may also contain payload fields that do not form part of the lookup but can be returned with it. Typical usage within LN Risk tends to lean towards one of two cases: There may be some other cases of note too though – e.g. an error code lookup file which heavily, used, or Boolean search logic keys using smart-stepping to implement boolean search conditions. It is necessary to store the index pages on disk compressed – they are very compressible – but decompression can be expensive. For this reason traditionally we have maintained a cache of decompressed pages in addition to the cache of compressed pages that can be found in the Linux page cache. However, it would be much preferred if we could avoid decompressing as much as possible, ideally to the point where no significant cache of the decompressed pages was needed. Presently we need to decompress to search, so we’ve been looking at options to compress the pages in such a way that searching can be done using the compressed form. The current design being played with here uses a form of DFA to perform searching/matching on the keyed fields – the DFA data is a compact representation of the data in the keyed fields but is also efficient to use as for searching. For the payload part, we are looking at several options (potentially using more than one of them depending on the exact data) including: A fast (to decompress) compression algorithm that handles small blocks of data efficiently is needed. Zstd may be one possible candidate. Preliminary work to enable the above changes involved some code restructuring to make it possible to plug in different compression formats more easily, and to vary the compression format per page. It’s used in the cloud to ensure that all nodes can know the IP addresses of all agents currently processing requests for a given channel. These addresses can change over time due to pod restarts or scaling events. Nodes report to the topology server periodically, and it responds to them with the current topology state. There may be multiple topology servers running (for redundancy purposes). If so all reports should go to all, and it should not matter which one’s answer is used. (QUESTION – how is the send to all done?) All IFileIO objects used to read files from Roxie are instantiated as IRoxieLazyFileIO objects, which means: The underlying file handles can be closed in the background, in order to handle the case where file handles are a limited resource. The maximum (and minimum) number of open files can be configured separately for local versus remote files (sometimes remote connections are a scarcer resource than local, if there are limits at the remote end). The actual file connected to can be switched out in the background, to handle the case where a file read from a remote location becomes unavailable, and to switch to reading from a local location after a background file copy operation completes. Original IBYTI implementation allocated a thread (from the pool) to each incoming query packet, but some will block for a period to allow an IBYTI to arrive to avoid unnecessary work. It was done this way for historical reasons - mainly that the addition of the delay was after the initial IBYTI implementation, so that in the very earliest versions there was no priority given to any particular subchannel and all would start processing at the same time if they had capacity to do so. This implementation does not seem particularly smart - in particular it's typing up worker threads even though they are not actually working, and may result in the throughput of the Roxie agent being reduced. For that reason an alternative implementation (controlled by the NEW_IBYTI flag) was created during the cloud transition which tracks what incoming packets are waiting for IBYTI expiry via a separate queue, and they are only allocated to a worker thread once the IBYTI delay times out. So far the NEW_IBYTI flag has only been set on containerized systems (simply to avoid rocking the boat on the bare-metal systems), but we may turn on in bare metal too going forward (and if so, the old version of the code can be removed sooner or later). Sometimes when developing/debugging Roxie features, it's simplest to run a standalone executable. Using server mode may be useful if wanting to debug server/agent traffic messaging. For example, to test IBYTI behaviour on a single node, use Having first compiled a suitable bit of ECL into a.out. I have found a snippet like this quite handy: Roxie (optionally) maintains a list of the most recently accessed file pages (in a circular buffer), and flushes this information periodically to text files that will persist from one run of Roxie to the next. On startup, these files are processed and the relevant pages preloaded into the linux page cache to ensure that the "hot" pages are already available and maximum performance is available immediately once the Roxie is brought online, rather than requiring a replay of a "typical" query set to heat the cache as used to be done. In particular this should allow a node to be "warm" before being added to the cluster when autoscaling. There are some questions outstanding about how this operates that may require empirical testing to answer. Firstly, how does this interact with volumes mounted via k8s pvc's, and in particular with cloud billing systems that charge per read. Will the reads that are done to warm the cache be done in large chunks, or will they happen one linux page at a time? The code at the Roxie level operates by memory-mapping the file then touching a byte within each linux page that we want to be "warm", but does the linux paging subsystem fetch larger blocks? Do huge pages play a part here? Secondly, the prewarm is actually done by a child process (ccdcache), but the parent process is blocked while it happens. It would probably make sense to at allow at least some of the other startup operations of the parent process to proceed in parallel. There are two reasons why the cache prewarm is done using a child process. Firstly is to allow there to be a standalone way to prewarm prior to launching a Roxie, which might be useful for automation in some bare-metal systems. Secondly, because there is a possibility of segfaults resulting from the prewarm if the file has changed size since the cache warming was done, it is easier to contain, capture, and recover from such faults in a child process than it would be inside Roxie. However, it would probably be possible to avoid these segfaults (by checking more carefully against file size before trying to warm a page, for example) and then link the code into Roxie while still keeping the code common with the standalone executable version. Thirdly, need to check that the prewarm is complete before adding a new agent to the topology. This is especially relevant if we make any change to do the prewarm asynchronously. Fourthly, there are potential race conditions when reading/writing the file containing cache information, since this file may be written by any agent operating on the same channel, at any time. Fifthly, how is the amount of information tracked decided? It should be at least related to the amount of memory available to the linux page cache, but that's not a completely trivial thing to calculate. Should we restrict to the most recent N when outputting, where N is calculated from, for example /proc/meminfo's Active(file) value? Unfortunately on containerized sytems that reflects the host, but perhaps /sys/fs/cgroup/memory.stat can be used instead? When deciding how much to track, we can pick an upper limit from the pod's memory limit. This could be read from /sys/fs/cgroup/memory.max though we currently read from the config file instead. We should probably (a) subtract the roxiemem size from that and (b) think about a value that will work on bare-metal and fusion too. However, because we don't dedup the entries in the circular buffer used for tracking hot pages until the info is flushed, the appropriate size is not really the same as the memory size. We track all reads by page, and before writing also add all pages in the jhtree cache with info about the node type. Note that a hit in the jhtree page cache won't be noted as a read OTHER than via this last-minute add. This isn't really specific to Roxie, but was originally added for federated Roxie systems... When a Roxie query (or hthor/thor) makes a SOAPCALL, there is an option to specify a list of target gateway IPs, and failover to the next in the list if the first does not respond in a timely fashion. In order to avoid this "timely fashion" check adding an overhead to every query made when a listed gateway is unavailable, we maintain a "blacklist" of nodes that have been seen to fail, and do not attempt to connect to them. There is a "deblacklister" thread that checks periodically whether it is now possible to connect to a previously-blacklisted gateway, and removes it from the list if so. There are a number of potential questions and issues with this code: Should the scope of the blacklist be different? Possible scopes are: Options 2 and 4 above would allow all aspects of the blacklisting behaviour to be specified by options on the SOAPCALL. We could control whether or not the blacklister is to be used at all via a SOAPCALL option with any of the above... The HPCC Platform includes a rudimentary performance tracing feature using periodic stack capture to generate flame graphs. Roxie supports this in 3 ways: The perf trace operates as follows: The basic info captured at step 1 (or maybe 2) could also be analysed to give other insights, such as: Unfortunately some info present in the original stack text files is lost in the folded summary - in particular related to the TID that the stack is on. Can we spot lifetimes of threads and/or should we treat "stacks" on different threads as different? Thread pools might render this difficult though. There is an option in stack-collapse-elfutils.pl to include the TID when considering whether stacks match, so perhaps we should just (optionally) use that. In localAgent mode, the global queueManager object (normally a RoxieUdpSocketQueueManager) is replaced by a RoxieLocalQueueManager. Outbound packets are added directly to target queue, inbound are packed into DataBuffers. There is also "local optimizations" mode where any index operation reading a one-part file (does the same apply to one-part disk files?) just reads it directly on the server (regardless of localAgent setting). Typically still injected into receiver code though as otherwise handling exception cases, limits etc would all be duplicated/messy. Rows created in localOptimization mode are created directly in the caller's row manager, and are injected in serialized format. Why are inbound not created directly in the desired destination's allocator and then marked as serialized? Some lifespan issues... are they insurmountable? We do pack into dataBuffers rather than MemoryBuffers, which avoids a need to copy the data before the receiver can use it. Large rows get split and will require copying again, but we could set dataBufferSize to be bigger in localAgent mode to mitigate this somewhat. What is the lifespan issue? In-flight queries may be abandoned when a server-side query fails, times out, or no longer needs the data. Using DataBuffer does not have this issue as they are attached to the query's memory manager/allocation once read. Or we could bypass the agent queue altogether, but rather more refactoring needed for that (might almost be easier to extent the "local optimization" mode to use multiple threads at that point) abortPending, replyPending, and abortPendingData methods are unimplemented, which may lead to some inefficiencies? Requests from server to agents are send via UDP (and have a size limit of 64k as a result). Historically they were sent using multicast to go to all agents on a channel at the same time, but since most cloud providers do not support multicast, there has long been an option to avoid multicast and send explicitly to the agent IPs. In bare metal systems these IPs are known via the topology file, and do not change. In cloud systems the topology server provides the IPs of all agents for a channel. In cloud systems, the list of IPs that a message was sent to is included in the message header, so that the IBYTI messages can be sent without requiring that all agents/servers have the same topology information at any given moment (they will stay in sync because of topology server, but may be temporarily out of sync when nodes are added/removed, until next time topology info is retrieved). This is controled by the SUBCHANNELS_IN_HEADER define. Packets back from agents to server go via the udplib message-passing code. This can best be described by looking at the sending and receiving sides separately. When sending, results are split into individual packets (DataBuffers), each designed to be under 1 MTU in size. Traditionally this meant they were 1k, but they can be set larger (8k is good). They do have to be a power of 2 because of how they are allocated from the roxiemem heap. The sender maintains a set of UdpReceiverEntry objects, one for each server that it is conversing with. Each UdpReceiverEntry maintains multiple queues of data packets waiting to be sent, one queue for each priority. The UdpReceiverEntry maintains a count of how many packets are contained across all its queues in packetsQueued, so that it knows if there is data to send. The priority levels are: 0: Out Of Band 1: Fast lane 2: Standard This is designed to allow control information to be sent without getting blocked by data, and high priority queries to avoid being blocked by data going to lower priority ones. The mechanism for deciding what packet to send next is a little odd though - rather than sending all higher-priorty packets before any lower-priority ones, it round robins across the queues sending up to N^2 from queue 0 then up to N from queue 1 then 1 from queue 2, where N is set by the UdpOutQsPriority option, or 1 if not set. This may be a mistake - probably any from queue 0 should be sent first, before round-robining the other queues in this fashion. UdpReceiverEntry objects are also responsible for maintaining a list of packets that have been sent but receiver has not yet indicated that they have arrived. If an agent has data ready for a given receiver, it will send a requestToSend to that receiver, and wait for a permitToSend response. Sequence numbers are used to handle situations where these messages get lost. A permitToSend that does not contain the expected sequence number is ignored. Because I could. Many of the pieces needed for Roxie were already created for use in other systems – ECL language, code generator, index creation in Thor, etc. Indexes could be used by Moxie, but that relied on monolithic single-part indexes, was single-threaded (forked a process per query) and had limited ability to do any queries beyond simple index lookups. ECL had already proved itself as a way to express more complex queries concisely, and the concept of doing the processing next to the data had been proved in hOle and Thor, so Roxie – using the same concept for online queries using indexes – was a natural extension of that, reusing the existing index creation and code generation, but adding a new run-time engine geared towards pre-deployed queries and sending index lookup requests to the node holding the index data. The code generator creates a graph (DAG) representing the query, with one node per activity and links representing the inputs and dependencies. There is also a helper class for each activity. Roxie loads this graph for all published queries, creating a factory for each activity and recording how they are linked. When a query is executed, the factories create the activity instances and link them together. All activities without output activities (known as ‘sinks’) are then executed (often on parallel threads), and will typically result in a value being written to a workunit, to the socket that the query was received in, or to a global “context” area where subsequent parts of the query might read it. Data is pulled through the activity graph, by any activity that wants a row from its input requesting it. Evaluation is therefore lazy, with data only calculated as needed. However, to reduce latency in some cases activities will prepare results ahead of when they are requested – for example an index read activity will send the request to the agent(s) as soon as it is started rather than waiting for the data to be requested by its downstream activity. This may result in wasted work, and in some cases may result in data coming back from an agent after the requesting query has completed after discovering it didn’t need it after all – this results in the dreaded “NO msg collator found – using default” tracing (not an error but may be indicative of a query that could use some tuning). Before requesting rows from an input, it should be started, and when no more rows are required it should be stopped. It should be reset before destruction or reuse (for example for the next row in a child query). Balancing the desire to reduce latency with the desire to avoid wasted work can be tricky. Conditional activities (IF etc) will not start their unused inputs, so that queries can be written that do different index reads depending on the input. There is also the concept of a “delayed start” activity – I would need to look at the code to remind myself of how those are used. Splitter activities are a bit painful – they may result in arbitrary buffering of the data consumed by one output until another output is ready to request a row. It’s particularly complex when some of the outputs don’t start at all – the splitter needs to keep track of how many of the inputs have been started and stopped (an input that is not going to be used must be stopped, so that splitters know not to keep data for them). Tracking these start/stop/reset calls accurately is very important otherwise you can end up with weird bugs including potential crashes when activities are destroyed. Therefore we report errors if the counts don’t tally properly at the end of a query – but working out where a call was missed is often not trivial. Usually it’s because of an exception thrown from an unexpected place, e.g. midway through starting. Note that row requests from the activities above a splitter may execute on whichever thread downstream from the splitter happens to need that particular row first. The splitter code for tracking whether any of the downstream activities still need a row is a bit hairy/inefficient, IIRC. There may be scope to optimize (but I would recommend adding some good unit test cases first!) When there are multiple agents fulfilling data on a channel, work is shared among them via a hash of the packet header, which is used to determine which agent should work on that packet. However, if it doesn’t start working on it within a short period (either because the node is down, or because it is too busy on other in-flight requests), then another node may take over. The IBYTI messages are used to indicate that a node has started to work on a packet and therefore there is no need for a secondary to take over. The priority of agents as determined by the packet hash is also used to determine how to proceed if an IBYTI is received after starting to work on a request. If the IBYTI is from a lower priority buddy (sub-channel) then it is ignored, if it’s from a higher priority one then the processing will be abandoned. When multicast is enabled, the IBYTI is sent on the same multicast channel as the original packet (and care is needed to ignore ones sent by yourself). Otherwise it is sent to all buddy IPs. Nodes keep track of how often they have had to step in for a supposedly higher priority node, and reduce their wait time before stepping in each time this happens, so if a node has crashed then the buddy nodes will end up taking over without every packet being delayed. (QUESTION – does this result in the load on the first node after the failed node getting double the load?) Newer code for cloud systems (where the topology may change dynamically) send the information about the buddy nodes in the packet header rather than assuming all nodes already have a consistent version of that information. This ensures that all agents are using the same assumptions about buddy nodes and their ordering. An index is basically a big sorted table of the keyed fields, divided into pages, with an index of the last row from each page used to be able to locate pages quickly. The bottom level pages (‘leaves’) may also contain payload fields that do not form part of the lookup but can be returned with it. Typical usage within LN Risk tends to lean towards one of two cases: There may be some other cases of note too though – e.g. an error code lookup file which heavily, used, or Boolean search logic keys using smart-stepping to implement boolean search conditions. It is necessary to store the index pages on disk compressed – they are very compressible – but decompression can be expensive. For this reason traditionally we have maintained a cache of decompressed pages in addition to the cache of compressed pages that can be found in the Linux page cache. However, it would be much preferred if we could avoid decompressing as much as possible, ideally to the point where no significant cache of the decompressed pages was needed. Presently we need to decompress to search, so we’ve been looking at options to compress the pages in such a way that searching can be done using the compressed form. The current design being played with here uses a form of DFA to perform searching/matching on the keyed fields – the DFA data is a compact representation of the data in the keyed fields but is also efficient to use as for searching. For the payload part, we are looking at several options (potentially using more than one of them depending on the exact data) including: A fast (to decompress) compression algorithm that handles small blocks of data efficiently is needed. Zstd may be one possible candidate. Preliminary work to enable the above changes involved some code restructuring to make it possible to plug in different compression formats more easily, and to vary the compression format per page. It’s used in the cloud to ensure that all nodes can know the IP addresses of all agents currently processing requests for a given channel. These addresses can change over time due to pod restarts or scaling events. Nodes report to the topology server periodically, and it responds to them with the current topology state. There may be multiple topology servers running (for redundancy purposes). If so all reports should go to all, and it should not matter which one’s answer is used. (QUESTION – how is the send to all done?) All IFileIO objects used to read files from Roxie are instantiated as IRoxieLazyFileIO objects, which means: The underlying file handles can be closed in the background, in order to handle the case where file handles are a limited resource. The maximum (and minimum) number of open files can be configured separately for local versus remote files (sometimes remote connections are a scarcer resource than local, if there are limits at the remote end). The actual file connected to can be switched out in the background, to handle the case where a file read from a remote location becomes unavailable, and to switch to reading from a local location after a background file copy operation completes. Original IBYTI implementation allocated a thread (from the pool) to each incoming query packet, but some will block for a period to allow an IBYTI to arrive to avoid unnecessary work. It was done this way for historical reasons - mainly that the addition of the delay was after the initial IBYTI implementation, so that in the very earliest versions there was no priority given to any particular subchannel and all would start processing at the same time if they had capacity to do so. This implementation does not seem particularly smart - in particular it's typing up worker threads even though they are not actually working, and may result in the throughput of the Roxie agent being reduced. For that reason an alternative implementation (controlled by the NEW_IBYTI flag) was created during the cloud transition which tracks what incoming packets are waiting for IBYTI expiry via a separate queue, and they are only allocated to a worker thread once the IBYTI delay times out. So far the NEW_IBYTI flag has only been set on containerized systems (simply to avoid rocking the boat on the bare-metal systems), but we may turn on in bare metal too going forward (and if so, the old version of the code can be removed sooner or later). Sometimes when developing/debugging Roxie features, it's simplest to run a standalone executable. Using server mode may be useful if wanting to debug server/agent traffic messaging. For example, to test IBYTI behaviour on a single node, use Having first compiled a suitable bit of ECL into a.out. I have found a snippet like this quite handy: Roxie (optionally) maintains a list of the most recently accessed file pages (in a circular buffer), and flushes this information periodically to text files that will persist from one run of Roxie to the next. On startup, these files are processed and the relevant pages preloaded into the linux page cache to ensure that the "hot" pages are already available and maximum performance is available immediately once the Roxie is brought online, rather than requiring a replay of a "typical" query set to heat the cache as used to be done. In particular this should allow a node to be "warm" before being added to the cluster when autoscaling. There are some questions outstanding about how this operates that may require empirical testing to answer. Firstly, how does this interact with volumes mounted via k8s pvc's, and in particular with cloud billing systems that charge per read. Will the reads that are done to warm the cache be done in large chunks, or will they happen one linux page at a time? The code at the Roxie level operates by memory-mapping the file then touching a byte within each linux page that we want to be "warm", but does the linux paging subsystem fetch larger blocks? Do huge pages play a part here? Secondly, the prewarm is actually done by a child process (ccdcache), but the parent process is blocked while it happens. It would probably make sense to at allow at least some of the other startup operations of the parent process to proceed in parallel. There are two reasons why the cache prewarm is done using a child process. Firstly is to allow there to be a standalone way to prewarm prior to launching a Roxie, which might be useful for automation in some bare-metal systems. Secondly, because there is a possibility of segfaults resulting from the prewarm if the file has changed size since the cache warming was done, it is easier to contain, capture, and recover from such faults in a child process than it would be inside Roxie. However, it would probably be possible to avoid these segfaults (by checking more carefully against file size before trying to warm a page, for example) and then link the code into Roxie while still keeping the code common with the standalone executable version. Thirdly, need to check that the prewarm is complete before adding a new agent to the topology. This is especially relevant if we make any change to do the prewarm asynchronously. Fourthly, there are potential race conditions when reading/writing the file containing cache information, since this file may be written by any agent operating on the same channel, at any time. Fifthly, how is the amount of information tracked decided? It should be at least related to the amount of memory available to the linux page cache, but that's not a completely trivial thing to calculate. Should we restrict to the most recent N when outputting, where N is calculated from, for example /proc/meminfo's Active(file) value? Unfortunately on containerized sytems that reflects the host, but perhaps /sys/fs/cgroup/memory.stat can be used instead? When deciding how much to track, we can pick an upper limit from the pod's memory limit. This could be read from /sys/fs/cgroup/memory.max though we currently read from the config file instead. We should probably (a) subtract the roxiemem size from that and (b) think about a value that will work on bare-metal and fusion too. However, because we don't dedup the entries in the circular buffer used for tracking hot pages until the info is flushed, the appropriate size is not really the same as the memory size. We track all reads by page, and before writing also add all pages in the jhtree cache with info about the node type. Note that a hit in the jhtree page cache won't be noted as a read OTHER than via this last-minute add. This isn't really specific to Roxie, but was originally added for federated Roxie systems... When a Roxie query (or hthor/thor) makes a SOAPCALL, there is an option to specify a list of target gateway IPs, and failover to the next in the list if the first does not respond in a timely fashion. In order to avoid this "timely fashion" check adding an overhead to every query made when a listed gateway is unavailable, we maintain a "blacklist" of nodes that have been seen to fail, and do not attempt to connect to them. There is a "deblacklister" thread that checks periodically whether it is now possible to connect to a previously-blacklisted gateway, and removes it from the list if so. There are a number of potential questions and issues with this code: Should the scope of the blacklist be different? Possible scopes are: Options 2 and 4 above would allow all aspects of the blacklisting behaviour to be specified by options on the SOAPCALL. We could control whether or not the blacklister is to be used at all via a SOAPCALL option with any of the above... The HPCC Platform includes a rudimentary performance tracing feature using periodic stack capture to generate flame graphs. Roxie supports this in 3 ways: The perf trace operates as follows: The basic info captured at step 1 (or maybe 2) could also be analysed to give other insights, such as: Unfortunately some info present in the original stack text files is lost in the folded summary - in particular related to the TID that the stack is on. Can we spot lifetimes of threads and/or should we treat "stacks" on different threads as different? Thread pools might render this difficult though. There is an option in stack-collapse-elfutils.pl to include the TID when considering whether stacks match, so perhaps we should just (optionally) use that. In localAgent mode, the global queueManager object (normally a RoxieUdpSocketQueueManager) is replaced by a RoxieLocalQueueManager. Outbound packets are added directly to target queue, inbound are packed into DataBuffers. There is also "local optimizations" mode where any index operation reading a one-part file (does the same apply to one-part disk files?) just reads it directly on the server (regardless of localAgent setting). Typically still injected into receiver code though as otherwise handling exception cases, limits etc would all be duplicated/messy. Rows created in localOptimization mode are created directly in the caller's row manager, and are injected in serialized format. Why are inbound not created directly in the desired destination's allocator and then marked as serialized? Some lifespan issues... are they insurmountable? We do pack into dataBuffers rather than MemoryBuffers, which avoids a need to copy the data before the receiver can use it. Large rows get split and will require copying again, but we could set dataBufferSize to be bigger in localAgent mode to mitigate this somewhat. What is the lifespan issue? In-flight queries may be abandoned when a server-side query fails, times out, or no longer needs the data. Using DataBuffer does not have this issue as they are attached to the query's memory manager/allocation once read. Or we could bypass the agent queue altogether, but rather more refactoring needed for that (might almost be easier to extent the "local optimization" mode to use multiple threads at that point) abortPending, replyPending, and abortPendingData methods are unimplemented, which may lead to some inefficiencies? Requests from server to agents are send via UDP (and have a size limit of 64k as a result). Historically they were sent using multicast to go to all agents on a channel at the same time, but since most cloud providers do not support multicast, there has long been an option to avoid multicast and send explicitly to the agent IPs. In bare metal systems these IPs are known via the topology file, and do not change. In cloud systems the topology server provides the IPs of all agents for a channel. In cloud systems, the list of IPs that a message was sent to is included in the message header, so that the IBYTI messages can be sent without requiring that all agents/servers have the same topology information at any given moment (they will stay in sync because of topology server, but may be temporarily out of sync when nodes are added/removed, until next time topology info is retrieved). This is controled by the SUBCHANNELS_IN_HEADER define. Packets back from agents to server go via the udplib message-passing code. This can best be described by looking at the sending and receiving sides separately. When sending, results are split into individual packets (DataBuffers), each designed to be under 1 MTU in size. Traditionally this meant they were 1k, but they can be set larger (8k is good). They do have to be a power of 2 because of how they are allocated from the roxiemem heap. The sender maintains a set of UdpReceiverEntry objects, one for each server that it is conversing with. Each UdpReceiverEntry maintains multiple queues of data packets waiting to be sent, one queue for each priority. The UdpReceiverEntry maintains a count of how many packets are contained across all its queues in packetsQueued, so that it knows if there is data to send. The priority levels are: 0: Out Of Band 1: Fast lane 2: Standard This is designed to allow control information to be sent without getting blocked by data, and high priority queries to avoid being blocked by data going to lower priority ones. The mechanism for deciding what packet to send next is a little odd though - rather than sending all higher-priorty packets before any lower-priority ones, it round robins across the queues sending up to N^2 from queue 0 then up to N from queue 1 then 1 from queue 2, where N is set by the UdpOutQsPriority option, or 1 if not set. This may be a mistake - probably any from queue 0 should be sent first, before round-robining the other queues in this fashion. UdpReceiverEntry objects are also responsible for maintaining a list of packets that have been sent but receiver has not yet indicated that they have arrived. If an agent has data ready for a given receiver, it will send a requestToSend to that receiver, and wait for a permitToSend response. Sequence numbers are used to handle situations where these messages get lost. A permitToSend that does not contain the expected sequence number is ignored. Documentation in this directory is targeted to end-users of the HPCC Systems Platform. This is less-formal documentation intended to be produced and released more quickly than the published HPCC documentation. See HPCC documentation if you would like to contribute to our official docs. INFO Documentation in this directory is targeted to end-users of the HPCC Systems Platform. This is less-formal documentation intended to be produced and released more quickly than the published HPCC documentation. See HPCC documentation if you would like to contribute to our official docs. INFO Answer: Answer: Answer: Answer: Unlock the full potential of GitHub Copilot with these carefully curated prompts designed to streamline your workflow and enhance productivity. Whether you're summarizing complex texts, brainstorming innovative ideas, or creating detailed guides, these prompts will help you get the most out of Copilot's capabilities. Dive in and discover how these simple yet powerful prompts can save you time and effort in your daily tasks. Here are a few simple prompts that can save time: Provide a brief summary of the text below. Show the key points in a bullet list. List [N (number)] ways to [accomplish a specific goal or solve a problem]? Include a short description for each approach in a bullet list. What are the key differences/similarities between [concept A] and [concept B]? Present the information in a table format. Explain [complex topic] in simple terms. Use analogies or examples to help make it easier to understand. Brainstorm [N (number)] ideas for [a specific project or topic]? Include a short description for each idea in a bullet list. Create a template for [a specific type of document, such as a business email, proposal, etc.]. Include the key elements to include in a bullet list. Write a step-by-step guide on how to [specific task or procedure]. Number the steps to improve clarity. These prompts are more focused and are meant to accomplish a specific purpose. They are included here to spark your imagination. If you think of others that could be included here to share with the world, please send your ideas to docfeedback@hpccsystems.com. Write comments for this ECL code in javadoc format Create an ECL File Definition, record structure, and inline dataset with [N (number)] of records with the following fields [list of fields] Write ECL code to classify records into [categories]. AI hallucinations refer to instances where artificial intelligence systems generate information or responses that are incorrect, misleading, or entirely fabricated. Hallucinations can occur due to various reasons, such as limitations in the training data, inherent biases, or the AI's attempt to provide an answer even when it lacks sufficient context or knowledge. Understanding and mitigating AI hallucinations is crucial for ensuring the reliability and accuracy of AI-driven applications. Creating effective prompts is essential to minimize AI hallucinations. Here are some tips to help you craft prompts that lead to accurate and reliable responses: Be Specific and Clear: Ambiguous prompts can lead to incorrect or irrelevant answers. Clearly define the task and provide specific instructions. Provide Context: Give the AI enough background information to understand the task. This helps in generating more accurate responses. Use Constraints: Limit the scope of the response by specifying constraints such as word count, format, or specific details to include. Ask for Evidence or Sources: Encourage the AI to provide evidence or cite sources for the information it generates. Iterative Refinement: Start with a broad prompt and refine it based on the initial responses to get more accurate results. By following these guidelines, you can reduce the likelihood of AI hallucinations and ensure that the responses generated are accurate and useful. Unlock the full potential of GitHub Copilot with these carefully curated prompts designed to streamline your workflow and enhance productivity. Whether you're summarizing complex texts, brainstorming innovative ideas, or creating detailed guides, these prompts will help you get the most out of Copilot's capabilities. Dive in and discover how these simple yet powerful prompts can save you time and effort in your daily tasks. Here are a few simple prompts that can save time: Provide a brief summary of the text below. Show the key points in a bullet list. List [N (number)] ways to [accomplish a specific goal or solve a problem]? Include a short description for each approach in a bullet list. What are the key differences/similarities between [concept A] and [concept B]? Present the information in a table format. Explain [complex topic] in simple terms. Use analogies or examples to help make it easier to understand. Brainstorm [N (number)] ideas for [a specific project or topic]? Include a short description for each idea in a bullet list. Create a template for [a specific type of document, such as a business email, proposal, etc.]. Include the key elements to include in a bullet list. Write a step-by-step guide on how to [specific task or procedure]. Number the steps to improve clarity. These prompts are more focused and are meant to accomplish a specific purpose. They are included here to spark your imagination. If you think of others that could be included here to share with the world, please send your ideas to docfeedback@hpccsystems.com. Write comments for this ECL code in javadoc format Create an ECL File Definition, record structure, and inline dataset with [N (number)] of records with the following fields [list of fields] Write ECL code to classify records into [categories]. AI hallucinations refer to instances where artificial intelligence systems generate information or responses that are incorrect, misleading, or entirely fabricated. Hallucinations can occur due to various reasons, such as limitations in the training data, inherent biases, or the AI's attempt to provide an answer even when it lacks sufficient context or knowledge. Understanding and mitigating AI hallucinations is crucial for ensuring the reliability and accuracy of AI-driven applications. Creating effective prompts is essential to minimize AI hallucinations. Here are some tips to help you craft prompts that lead to accurate and reliable responses: Be Specific and Clear: Ambiguous prompts can lead to incorrect or irrelevant answers. Clearly define the task and provide specific instructions. Provide Context: Give the AI enough background information to understand the task. This helps in generating more accurate responses. Use Constraints: Limit the scope of the response by specifying constraints such as word count, format, or specific details to include. Ask for Evidence or Sources: Encourage the AI to provide evidence or cite sources for the information it generates. Iterative Refinement: Start with a broad prompt and refine it based on the initial responses to get more accurate results. By following these guidelines, you can reduce the likelihood of AI hallucinations and ensure that the responses generated are accurate and useful. Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Answer: Some users may experience an issue when trying to install the ECL IDE / client tools version 8.10 and later. Some users may experience an issue when trying to install the ECL IDE / client tools version 8.10 and later. The ecl-bundle executable (normally executed from the ecl executable by specifying 'ecl bundle XXX' is designed to manipulate ecl bundle files The metadata for ECL bundles is described using an exported module called Bundle within the bundle's source tree - typically, this means that a file called Bundle.ecl will be added to the highest level of the bundle's directory tree. In order to extract the information from the Bundle module, eclcc is run in 'evaluate' mode (using the -Me option) which will parse the bundle module and output the required fields to stdout. ecl-bundle also executes eclcc (using the --showpaths option) to determine where bundle files are to be located. In order to make versioning easier, bundle files are not copied directly into the bundles directory. A bundle called "MyBundle" that announces itself as version "x.y.z" will be installed to the directory $BUNDLEDIR/_versions/MyBundle/x.y.z A "redirect" file called MyBundle.ecl is then created in $BUNDLEDIR, which redirects any IMPORT MyBundle statement to actually import the currently active version of the bundle in _versions/MyBundle/x.y.z By rewriting this redirect file, it is possible to switch to using a different version of a bundle without having to uninstall and reinstall. In a future release, we hope to make it possible to specify that bundle A requires version X of bundle B, while bundle C requires version Y of bundle B. That will require the redirect files to be 'local' to a bundle (and will require that bundle B uses a redirect file to ensure it picks up the local copy of B when making internal calls). An IBundleInfo represents a specific copy of a bundle, and is created by explicitly parsing a snippet of ECL that imports it, with the ECL include path set to include only the specified bundle. An IBundleInfoSet represents all the installed versions of a particular named bundle. An IBundleCollection represents all the bundles on the system. Every individual subcommand is represented by a class derived (directly or indirectly) from EclCmdCommon. These classes are responsible for command-line parsing, usage text output, and (most importantly) execution of the desired outcomes. The ecl-bundle executable (normally executed from the ecl executable by specifying 'ecl bundle XXX' is designed to manipulate ecl bundle files The metadata for ECL bundles is described using an exported module called Bundle within the bundle's source tree - typically, this means that a file called Bundle.ecl will be added to the highest level of the bundle's directory tree. In order to extract the information from the Bundle module, eclcc is run in 'evaluate' mode (using the -Me option) which will parse the bundle module and output the required fields to stdout. ecl-bundle also executes eclcc (using the --showpaths option) to determine where bundle files are to be located. In order to make versioning easier, bundle files are not copied directly into the bundles directory. A bundle called "MyBundle" that announces itself as version "x.y.z" will be installed to the directory $BUNDLEDIR/_versions/MyBundle/x.y.z A "redirect" file called MyBundle.ecl is then created in $BUNDLEDIR, which redirects any IMPORT MyBundle statement to actually import the currently active version of the bundle in _versions/MyBundle/x.y.z By rewriting this redirect file, it is possible to switch to using a different version of a bundle without having to uninstall and reinstall. In a future release, we hope to make it possible to specify that bundle A requires version X of bundle B, while bundle C requires version Y of bundle B. That will require the redirect files to be 'local' to a bundle (and will require that bundle B uses a redirect file to ensure it picks up the local copy of B when making internal calls). An IBundleInfo represents a specific copy of a bundle, and is created by explicitly parsing a snippet of ECL that imports it, with the ECL include path set to include only the specified bundle. An IBundleInfoSet represents all the installed versions of a particular named bundle. An IBundleCollection represents all the bundles on the system. Every individual subcommand is represented by a class derived (directly or indirectly) from EclCmdCommon. These classes are responsible for command-line parsing, usage text output, and (most importantly) execution of the desired outcomes. The ECL code in the standard library should follow the following style guidelines: For example: Some additional rules for attributes in the library: The ECL code in the standard library should follow the following style guidelines: For example: Some additional rules for attributes in the library: A C++11 single-file header-only cross platform HTTP/HTTPS library. It's extremely easy to setup. Just include httplib.h file in your code! The followings are built-in mappings: NOTE: These the static file server methods are not thread safe. Without content length: As default, the server sends Please see Server example and Client example. You can change the thread count by setting NOTE: Constructor with scheme-host-port string is now supported! or or NOTE: OpenSSL is required for Digest Authentication. NOTE: OpenSSL is required for Digest Authentication. NOTE: This feature is not available on Windows, yet. SSL support is available with NOTE: cpp-httplib currently supports only version 1.1.1. The server can applie compression to the following MIME type contents: 'gzip' compression is available with Brotli compression is available with g++ 4.8 and below cannot build this library since Include Note: Cygwin on Windows is not supported. MIT license (© 2020 Yuji Hirose) These folks made great contributions to polish this library to totally another level from a simple toy! A C++11 single-file header-only cross platform HTTP/HTTPS library. It's extremely easy to setup. Just include httplib.h file in your code! The followings are built-in mappings: NOTE: These the static file server methods are not thread safe. Without content length: As default, the server sends Please see Server example and Client example. You can change the thread count by setting NOTE: Constructor with scheme-host-port string is now supported! or or NOTE: OpenSSL is required for Digest Authentication. NOTE: OpenSSL is required for Digest Authentication. NOTE: This feature is not available on Windows, yet. SSL support is available with NOTE: cpp-httplib currently supports only version 1.1.1. The server can applie compression to the following MIME type contents: 'gzip' compression is available with Brotli compression is available with g++ 4.8 and below cannot build this library since Include Note: Cygwin on Windows is not supported. MIT license (© 2020 Yuji Hirose) These folks made great contributions to polish this library to totally another level from a simple toy! This is a high level description of the data obfuscation framework defined in A domain is a representation of the obfuscation requirements applicable to a set of data. Consider that the data used to represent individuals likely differs between countries. With different data, requirements for obfuscation may reasonably be expected to vary. Assuming that requirements do change between countries, each country's requirements could logically constitute a separate domain. The capacity to define multiple domains does not create a requirement to do so. Requirements can change over time. To support this, a domain can be seen as a collection of requirement snapshots where each snapshot defines the complete set of requirements for the domain at a point in time. Snapshots are referenced by unique version numbers, which should be sequential starting at 1. Obfuscation is always applied based on a single snapshot of a domain's requirements. Domains are represented in the framework interface as text identifiers. Each distinct domain is identified by at least one unique identifier. A masker is a provider of obfuscation for a single snapshot of a domain's requirements. There are three masking operations defined in this framework. Each instance decides which of the three it will support, and how it will support them. The three operations are: A masker may be either stateless or stateful. With a stateless masker, identical input will produce identical output for every requested operation. A stateful masker, however, enables its user to affect operation outputs (i.e., identical input may not produce identical output for each operation). Maskers are represented in the framework interface using A profile is a stateless masker. Each instance defines the requirements of one or more snapshots of a single domain. Snapshots are versioned. Each profile declares a minimum, maximum, and default version. Masker operations apply to the default version. Other declared versions may be accessed using a stateful context, which the profile can create on demand. Each instance must support at least one version of a domain's requirements. Whether an instance supports more than one version depends on the implementation and on user preference. A domain can be viewed as a collection of one or more profiles where each profile defines a unique set of requirement snapshots applicable to the same underlying data. Refer to A context is a stateful masker. Instantiated by and tightly coupled to a profile, it provides some user control over how masking operations are completed. Refer to Each snapshot defines one or more value types. A value type is a representation of the requirements pertaining to a particular concept of a domain datum. Requirements include: A Social Security Number, or SSN, is a U.S.-centric datum that requires obfuscation and for which a value type may be defined. Element names associated with an SSN may include, but are not limited to, SSN and SOCS; the value type is expected to identify all such names used within the domain. SSN occurrences are frequently partially masked, with common formats being to mask only the first four or the last five digits of the nine digit number; the value type defines which formats are available besides the default of masking all characters. Value types are represented in the framework using A mask style describes how obfuscation is applied to a value. It is always defined in the context of a value type, and a value type may define multiple. A value type is not required to define any mask styles. If none are defined, all value characters are obfuscated. If the requested mask style is not defined, the default obfuscation occurs; the value type will not attempt to guess which of the defined styles is appropriate. Mask styles are represented in the framework using A rule contains the information necessary to locate at least one occurrence of a value type datum to be obfuscated. It is always defined in the context of a value type, and a value type may define multiple. maskValue requests do not use rules. maskContent requests rely on rules for locating affected values, with the relationship between a profile and its rules an implementation detail. maskMarkupValue requests, when implemented, may also use rules or may take an entirely different approach. Rules are represented in the framework as an abastract concept that cannot be inspected individually. Inspection may be used to establish the presence of rules, but not to examine individual instances. The combination of a shared library and entry point function describes a plugin. The input to a plugin is a property tree describing one or more profiles. The output of an entry point function is an iterator of profiles. The profiles created by the function may all be associated with the same domain, but this is not required. Plugin results are represented in the framework using An engine is the platform's interface to obfuscation. It loads domains by loading one or more plugins. Plugins yield profiles, from which domains are inferred. Once configured with at least one domain, a caller can obtain obfuscation in multiple ways: Engines are represented in the framework using This section assumes a context is in use. Absent a context, only the default state of the default version of a profile can be used. The framework reserves multiple custom property names, which are described in subsequent subsections. It also allows implementations to define additional properties using any non-reserved name. Suppose an implementation defines a property to override the default obfuscation pattern. Let's call this property All custom properties, whether defined in the framework or in third party libraries, are managed using a generic context interface. This interface includes: The abstraction includes the concept of sets of related value types. All value types should be assigned membership in at least one set. One set should be selected by default. The custom context property valuetype-set is used to select a different set. The rationale for this is an expectation that certain data always requires obfuscation. A password, for example, would never not require obfuscation. Other data may only require obfuscacation in certain situations, such as when required by individual customer agreements. Callers should not be required to complete additional steps to act on data that must always be obfuscated, and should not need to know which data falls in which category. The set name "*" is reserved to select all value types regardless of their defined set membership. This mechanism is intended to assist with compatibility checks, and should be used with care in other situations. Use The property For improved compatibility checks, implementations are encouraged to reserve additional property names that enable checks for individual set names. For each set name, foo, the included implementations report property The included implementations allow membership in either a default, unnamed set or in any number of named sets. Absent a contextual request for a named set, the unnamed set is selected by default. With a contextual set request, the members of the named set are selected in addition to members of the unnamed set. Members of the unnamed set are never not selected. Unnamed set membership cannot be defined explicitly. Because this set is always selected, assigning a value type to both a named and the unnamed set is redundant - the type will be selected whether the named set is requested or not. Example 1a: default value type Value type Example 1b: sample set memberships Extending the previous example, five types and four sets are in use. Property This table shows which types are selected for each requested set: Example 1c: sample accepted sets Continuing the previous example, the profile has declared acceptance of two additional sets using properties The abstraction includes the concept of sets of related rules. All rules should be assigned membership in at least one set. One set should be selected by default. The custom context property rule-set is used to select a different set. The rationale for this is similar to yet different than that of value type sets. Instead of one set of rules that are always selected, backward compatibility with a proprietary legacy implementation requires that the default set be replaced by an alternate collection. The set name "*" is reserved to select all rules regardless of their defined set membership. This does not override constraints imposed by the current value type set. This mechanism is intended to assist with compatibility checks, and should be used with care in other situations. Use The property For improved compatibility checks, implementations are encouraged to reserve additional property names that enable checks for individual set names. For each set name, foo, the included implementations report property The included implementations allow membership in any number of named sets as well as a default, unnamed set. Absent a contextual request for a named set, the unnamed set is selected by default. With a contextual set request, only the members of the requested set are selected. Unlike value type sets, the unnamed set is not always selected. Because of this difference, a rule may be assigned explicit membership in the unnamed set. This is optional for rules that belong only to the unnamed set and is required for rules meant to be selected as part of the unnamed set and one or more named sets. Example 2: rule set memberships The rules described here are intentionally incomplete, showing only what is necessary for this example. Four rules are defined: property Given a single domain datum, obfuscate the content "as needed". The framework anticipates two interpretations of "as needed", either conditional or unconditional: Each plugin will define its own interpretation of "as needed". The API distinction between conditional and unconditional is a hint intended to guide implementations capable of both, and should be ignored by implementations that are not. The The The The The The name "*" is reserved by the abstraction for compatibility checks. To confirm the availability of a specific mask style first requires confirmation of the value type to which the mask style is related. Use The included implementations are inherently conditional. Obfuscation depends on matching Unconditional obfuscation is enabled in profiles that include a value type instance named "*". If an instance with this name is selected by the currently selected value type set, all values for undefined value types will be obfuscated. Values for defined but unselected value types will not be obfuscated. Example 3a: conditional obfuscation This snippet declares two value types, with two total sets. The table shows which values will be conditionally obfuscated based on a the combination of "typeN" in the table represents any named type, excluding the reserved "*", not explicitly defined in the profile. Example 3b: unconditional obfuscation Extending the previous example with a third value type, values of unknown type are now obfuscated. Values of known but unselected type remain unobfuscated. The name reserved by the abstraction to detect support for unconditional obfuscation is the same name reserved by the implementation to define that support. "typeN" in the table represents any named type, excluding the reserved "*", not explicitly defined in the profile. The The There is no equivalent to the unconditional mode offered by The The It is not possible to discern any information about the selected rules, such as their associated value types or the cues used to apply them in a buffer. The groundwork exists for two types of rules, serial and parallel. Serial rules, as the name implies, are evaluated sequentially. Parallel rules, on the other hand, are evaluated concurrently. Concurrent evaluation is expected to be more efficient than sequential, but no concurrent implementation is provided. The included sequential implementation should be viewed as a starting point. Each rule defines a start token and an optional end token. For each occurrence of the start token in the blob, a corresponding search for an end token (newline if omitted), is performed. When both parts are found, the content between is obfuscated using the associated value type's default mask style. Traversing a potentially large blob of text once per rule is inefficient. A concurrent implementation is in development to improve performance and capabilities. A domain originated using the original implementation and migrated to the new implementation illustrates one domain implemented by multiple plugins and, by extension, multiple profiles. Where Given a value and a relative location within a structure document, an implementation must decide whether obfuscation is required, may be required, or is not required. If required, it can obfuscate immediately. If not required, and can return immediately. If it might be required, the implementation must request the context it needs from the caller in order to make a final determination. A request to mask the content of an element named There is no equivalent to the unconditional mode offered by The The Assme a value type representing passwords is defined. If given an element value containing a URL with an embedded password, obfuscation is required. But most likely not for the entire value. In addition to identifying the value type associated with value, the offset and length of the embedded substring requiring obfuscation are supplied; use of A TBD Use of runtime compatibility checks may be unavoidable in some circumstances, but reliance on this in every script is inefficient. It may also devalue trace output by omitting messaging required to debug an issue, a problem that might only be found when said messaging is needed. The This can test which values will or won't be affected by Returning to the earlier SSN example, a caller might require that an SSN value type will be obfuscated. It might also want to use a particilar mask style, but may be prepared for its absence. A caller may prefer to mask the first four digits but may accept masking the last five instead, or may omit certain trace messages. The The The To improve compatibility check capabilities, a snapshot may synthesize properties that, if set, will have no effect. Specifically, a caller may be more interested to know if a particular set name (for either value type or rule set membership) is used than to know that an unidentified set name is used. The The The In all elements described as accepting This is a high level description of the data obfuscation framework defined in A domain is a representation of the obfuscation requirements applicable to a set of data. Consider that the data used to represent individuals likely differs between countries. With different data, requirements for obfuscation may reasonably be expected to vary. Assuming that requirements do change between countries, each country's requirements could logically constitute a separate domain. The capacity to define multiple domains does not create a requirement to do so. Requirements can change over time. To support this, a domain can be seen as a collection of requirement snapshots where each snapshot defines the complete set of requirements for the domain at a point in time. Snapshots are referenced by unique version numbers, which should be sequential starting at 1. Obfuscation is always applied based on a single snapshot of a domain's requirements. Domains are represented in the framework interface as text identifiers. Each distinct domain is identified by at least one unique identifier. A masker is a provider of obfuscation for a single snapshot of a domain's requirements. There are three masking operations defined in this framework. Each instance decides which of the three it will support, and how it will support them. The three operations are: A masker may be either stateless or stateful. With a stateless masker, identical input will produce identical output for every requested operation. A stateful masker, however, enables its user to affect operation outputs (i.e., identical input may not produce identical output for each operation). Maskers are represented in the framework interface using A profile is a stateless masker. Each instance defines the requirements of one or more snapshots of a single domain. Snapshots are versioned. Each profile declares a minimum, maximum, and default version. Masker operations apply to the default version. Other declared versions may be accessed using a stateful context, which the profile can create on demand. Each instance must support at least one version of a domain's requirements. Whether an instance supports more than one version depends on the implementation and on user preference. A domain can be viewed as a collection of one or more profiles where each profile defines a unique set of requirement snapshots applicable to the same underlying data. Refer to A context is a stateful masker. Instantiated by and tightly coupled to a profile, it provides some user control over how masking operations are completed. Refer to Each snapshot defines one or more value types. A value type is a representation of the requirements pertaining to a particular concept of a domain datum. Requirements include: A Social Security Number, or SSN, is a U.S.-centric datum that requires obfuscation and for which a value type may be defined. Element names associated with an SSN may include, but are not limited to, SSN and SOCS; the value type is expected to identify all such names used within the domain. SSN occurrences are frequently partially masked, with common formats being to mask only the first four or the last five digits of the nine digit number; the value type defines which formats are available besides the default of masking all characters. Value types are represented in the framework using A mask style describes how obfuscation is applied to a value. It is always defined in the context of a value type, and a value type may define multiple. A value type is not required to define any mask styles. If none are defined, all value characters are obfuscated. If the requested mask style is not defined, the default obfuscation occurs; the value type will not attempt to guess which of the defined styles is appropriate. Mask styles are represented in the framework using A rule contains the information necessary to locate at least one occurrence of a value type datum to be obfuscated. It is always defined in the context of a value type, and a value type may define multiple. maskValue requests do not use rules. maskContent requests rely on rules for locating affected values, with the relationship between a profile and its rules an implementation detail. maskMarkupValue requests, when implemented, may also use rules or may take an entirely different approach. Rules are represented in the framework as an abastract concept that cannot be inspected individually. Inspection may be used to establish the presence of rules, but not to examine individual instances. The combination of a shared library and entry point function describes a plugin. The input to a plugin is a property tree describing one or more profiles. The output of an entry point function is an iterator of profiles. The profiles created by the function may all be associated with the same domain, but this is not required. Plugin results are represented in the framework using An engine is the platform's interface to obfuscation. It loads domains by loading one or more plugins. Plugins yield profiles, from which domains are inferred. Once configured with at least one domain, a caller can obtain obfuscation in multiple ways: Engines are represented in the framework using This section assumes a context is in use. Absent a context, only the default state of the default version of a profile can be used. The framework reserves multiple custom property names, which are described in subsequent subsections. It also allows implementations to define additional properties using any non-reserved name. Suppose an implementation defines a property to override the default obfuscation pattern. Let's call this property All custom properties, whether defined in the framework or in third party libraries, are managed using a generic context interface. This interface includes: The abstraction includes the concept of sets of related value types. All value types should be assigned membership in at least one set. One set should be selected by default. The custom context property valuetype-set is used to select a different set. The rationale for this is an expectation that certain data always requires obfuscation. A password, for example, would never not require obfuscation. Other data may only require obfuscacation in certain situations, such as when required by individual customer agreements. Callers should not be required to complete additional steps to act on data that must always be obfuscated, and should not need to know which data falls in which category. The set name "*" is reserved to select all value types regardless of their defined set membership. This mechanism is intended to assist with compatibility checks, and should be used with care in other situations. Use The property For improved compatibility checks, implementations are encouraged to reserve additional property names that enable checks for individual set names. For each set name, foo, the included implementations report property The included implementations allow membership in either a default, unnamed set or in any number of named sets. Absent a contextual request for a named set, the unnamed set is selected by default. With a contextual set request, the members of the named set are selected in addition to members of the unnamed set. Members of the unnamed set are never not selected. Unnamed set membership cannot be defined explicitly. Because this set is always selected, assigning a value type to both a named and the unnamed set is redundant - the type will be selected whether the named set is requested or not. Example 1a: default value type Value type Example 1b: sample set memberships Extending the previous example, five types and four sets are in use. Property This table shows which types are selected for each requested set: Example 1c: sample accepted sets Continuing the previous example, the profile has declared acceptance of two additional sets using properties The abstraction includes the concept of sets of related rules. All rules should be assigned membership in at least one set. One set should be selected by default. The custom context property rule-set is used to select a different set. The rationale for this is similar to yet different than that of value type sets. Instead of one set of rules that are always selected, backward compatibility with a proprietary legacy implementation requires that the default set be replaced by an alternate collection. The set name "*" is reserved to select all rules regardless of their defined set membership. This does not override constraints imposed by the current value type set. This mechanism is intended to assist with compatibility checks, and should be used with care in other situations. Use The property For improved compatibility checks, implementations are encouraged to reserve additional property names that enable checks for individual set names. For each set name, foo, the included implementations report property The included implementations allow membership in any number of named sets as well as a default, unnamed set. Absent a contextual request for a named set, the unnamed set is selected by default. With a contextual set request, only the members of the requested set are selected. Unlike value type sets, the unnamed set is not always selected. Because of this difference, a rule may be assigned explicit membership in the unnamed set. This is optional for rules that belong only to the unnamed set and is required for rules meant to be selected as part of the unnamed set and one or more named sets. Example 2: rule set memberships The rules described here are intentionally incomplete, showing only what is necessary for this example. Four rules are defined: property Given a single domain datum, obfuscate the content "as needed". The framework anticipates two interpretations of "as needed", either conditional or unconditional: Each plugin will define its own interpretation of "as needed". The API distinction between conditional and unconditional is a hint intended to guide implementations capable of both, and should be ignored by implementations that are not. The The The The The The name "*" is reserved by the abstraction for compatibility checks. To confirm the availability of a specific mask style first requires confirmation of the value type to which the mask style is related. Use The included implementations are inherently conditional. Obfuscation depends on matching Unconditional obfuscation is enabled in profiles that include a value type instance named "*". If an instance with this name is selected by the currently selected value type set, all values for undefined value types will be obfuscated. Values for defined but unselected value types will not be obfuscated. Example 3a: conditional obfuscation This snippet declares two value types, with two total sets. The table shows which values will be conditionally obfuscated based on a the combination of "typeN" in the table represents any named type, excluding the reserved "*", not explicitly defined in the profile. Example 3b: unconditional obfuscation Extending the previous example with a third value type, values of unknown type are now obfuscated. Values of known but unselected type remain unobfuscated. The name reserved by the abstraction to detect support for unconditional obfuscation is the same name reserved by the implementation to define that support. "typeN" in the table represents any named type, excluding the reserved "*", not explicitly defined in the profile. The The There is no equivalent to the unconditional mode offered by The The It is not possible to discern any information about the selected rules, such as their associated value types or the cues used to apply them in a buffer. The groundwork exists for two types of rules, serial and parallel. Serial rules, as the name implies, are evaluated sequentially. Parallel rules, on the other hand, are evaluated concurrently. Concurrent evaluation is expected to be more efficient than sequential, but no concurrent implementation is provided. The included sequential implementation should be viewed as a starting point. Each rule defines a start token and an optional end token. For each occurrence of the start token in the blob, a corresponding search for an end token (newline if omitted), is performed. When both parts are found, the content between is obfuscated using the associated value type's default mask style. Traversing a potentially large blob of text once per rule is inefficient. A concurrent implementation is in development to improve performance and capabilities. A domain originated using the original implementation and migrated to the new implementation illustrates one domain implemented by multiple plugins and, by extension, multiple profiles. Where Given a value and a relative location within a structure document, an implementation must decide whether obfuscation is required, may be required, or is not required. If required, it can obfuscate immediately. If not required, and can return immediately. If it might be required, the implementation must request the context it needs from the caller in order to make a final determination. A request to mask the content of an element named There is no equivalent to the unconditional mode offered by The The Assme a value type representing passwords is defined. If given an element value containing a URL with an embedded password, obfuscation is required. But most likely not for the entire value. In addition to identifying the value type associated with value, the offset and length of the embedded substring requiring obfuscation are supplied; use of A TBD Use of runtime compatibility checks may be unavoidable in some circumstances, but reliance on this in every script is inefficient. It may also devalue trace output by omitting messaging required to debug an issue, a problem that might only be found when said messaging is needed. The This can test which values will or won't be affected by Returning to the earlier SSN example, a caller might require that an SSN value type will be obfuscated. It might also want to use a particilar mask style, but may be prepared for its absence. A caller may prefer to mask the first four digits but may accept masking the last five instead, or may omit certain trace messages. The The The To improve compatibility check capabilities, a snapshot may synthesize properties that, if set, will have no effect. Specifically, a caller may be more interested to know if a particular set name (for either value type or rule set membership) is used than to know that an unidentified set name is used. The The The In all elements described as accepting This implements a plugin library producing instances of Standard implementation of Depends on Implementation of Am extension of CMaskStyle, this is intended to partially mask values, such as account numbers or telephone numbers, without assuming knowledge of the values. Configuration is generally expressed as: Where: All four values may be omitted when only the inherited options are needed. If any of the four values are given, count is required and the other three are optional. A value substring containing at most @count instances of the character class denoted by @characters is identified. Character classes are ASCII numeric characters (numbers), ASCII alphabetic characters (letters), ASCII alphabetic and numeric characters (alphanumeric), or any characters (all). The value substring is at the start of the value when @location is first, and at the end of the value when @location is last. The value substring is masked when @action is mask. All of the value with the exception of the substring is masked when @action is keep. A base class for rule implementations, this defines standard rule properties without providing any content knowledge for use with Configuration includes: rule instances will frequently be specific to a content format. For example, a rule applied to XML markup value may include dependencies on XML syntax which are not applicable when masking JSON content. Rules specific to a markup format can be associated with that format using @contentType. Requests to mask content of a type will apply all rules without a @contentType value or with a matching value, and will exclude all rules with a non-matching value. An extension of CRule, this identifies content substrings to be masked based on matching start and end tokens in the content buffer. For each occurrence of a configured start token that is balanced by a corresponding configured end token, the characters between the tokens are masked. This class may be used with TSerialProfile. Configuration includes: No content type knowledge is implied by this class. An instance with @contentType of xml does not inherently know how to find values in XML markup. The defined tokens must include characters such as An implementation of Configuration includes: A concrete extension of Template parameters are: Configuration includes: For profiles supporting a single version, value type names must be unique. For profiles supporting multiple versions, value type names may be repeated but must be unique for each version. To illustrate, consider this snippet: In the example, foo and bar are each defined twice. The redefinition of foo is acceptable because each instance applies to a different version. The redefinition of bar is invalid because both instances claim to apply to version 2. The value type name * is reserved by this class. A profile that includes a value type named * supports unconditional value masking. The requirement of a type definition instead of a simpler flag is to enable the definition of mask styles. It has the side effect of enabling the definition of rules. In theory, an entire profile could be defined using a single value type. This may make sense in some cases, but not in all. For example, a partial mask style intended for use with U.S. Social Security numbers could be inappropriately applied to a password. Use care when configuring this type. An extension of TProfile that adds support for Template parameters are unchanged from Configuration options are unchanged from An implementation of Template parameters are: Configuration includes: For value types supporting a single version, mask style names must be unique. For types supporting multiple versions, style names may be repeated but must be unique for each version. To illustrate, consider this snippet: In the example, foo and bar are each defined twice. The redefinition of foo is acceptable because each instance applies to a different version. The redefinition of bar is invalid because both instances claim to apply to version 2. This section describes the entry point functions exported by the shared library. The library must export one function, and may export multiple functions. The description of each entry point will identify which of the previously described classes is used to represent the returned collection of profiles. If a templated class is identified, the template parameters are also listed. Refer to the class descriptions for additional information. Returns a (possibly empty) collection of profiles supporting This implements a plugin library producing instances of Standard implementation of Depends on Implementation of Am extension of CMaskStyle, this is intended to partially mask values, such as account numbers or telephone numbers, without assuming knowledge of the values. Configuration is generally expressed as: Where: All four values may be omitted when only the inherited options are needed. If any of the four values are given, count is required and the other three are optional. A value substring containing at most @count instances of the character class denoted by @characters is identified. Character classes are ASCII numeric characters (numbers), ASCII alphabetic characters (letters), ASCII alphabetic and numeric characters (alphanumeric), or any characters (all). The value substring is at the start of the value when @location is first, and at the end of the value when @location is last. The value substring is masked when @action is mask. All of the value with the exception of the substring is masked when @action is keep. A base class for rule implementations, this defines standard rule properties without providing any content knowledge for use with Configuration includes: rule instances will frequently be specific to a content format. For example, a rule applied to XML markup value may include dependencies on XML syntax which are not applicable when masking JSON content. Rules specific to a markup format can be associated with that format using @contentType. Requests to mask content of a type will apply all rules without a @contentType value or with a matching value, and will exclude all rules with a non-matching value. An extension of CRule, this identifies content substrings to be masked based on matching start and end tokens in the content buffer. For each occurrence of a configured start token that is balanced by a corresponding configured end token, the characters between the tokens are masked. This class may be used with TSerialProfile. Configuration includes: No content type knowledge is implied by this class. An instance with @contentType of xml does not inherently know how to find values in XML markup. The defined tokens must include characters such as An implementation of Configuration includes: A concrete extension of Template parameters are: Configuration includes: For profiles supporting a single version, value type names must be unique. For profiles supporting multiple versions, value type names may be repeated but must be unique for each version. To illustrate, consider this snippet: In the example, foo and bar are each defined twice. The redefinition of foo is acceptable because each instance applies to a different version. The redefinition of bar is invalid because both instances claim to apply to version 2. The value type name * is reserved by this class. A profile that includes a value type named * supports unconditional value masking. The requirement of a type definition instead of a simpler flag is to enable the definition of mask styles. It has the side effect of enabling the definition of rules. In theory, an entire profile could be defined using a single value type. This may make sense in some cases, but not in all. For example, a partial mask style intended for use with U.S. Social Security numbers could be inappropriately applied to a password. Use care when configuring this type. An extension of TProfile that adds support for Template parameters are unchanged from Configuration options are unchanged from An implementation of Template parameters are: Configuration includes: For value types supporting a single version, mask style names must be unique. For types supporting multiple versions, style names may be repeated but must be unique for each version. To illustrate, consider this snippet: In the example, foo and bar are each defined twice. The redefinition of foo is acceptable because each instance applies to a different version. The redefinition of bar is invalid because both instances claim to apply to version 2. This section describes the entry point functions exported by the shared library. The library must export one function, and may export multiple functions. The description of each entry point will identify which of the previously described classes is used to represent the returned collection of profiles. If a templated class is identified, the template parameters are also listed. Refer to the class descriptions for additional information. Returns a (possibly empty) collection of profiles supporting The purpose of this plugin is to provide authentication and authorization capabilities for HPCC Systems users, with the credentials passed via valid JWT tokens. The intention is to adhere as closely as possibly to the OpenID Connect (OIC) specification, which is a simple identity layer on top of the OAuth 2.0 protocol, while maintaining compatibility with the way HPCC Systems performs authentication and authorization today. More information about the OpenID Connect specification can be found at https://openid.net/specs/openid-connect-core-1_0.html. One of the big advantages of OAuth 2.0 and OIC is that the service (in this case, HPCC Systems) never interacts with the user directly. Instead, authentication is performed by a trusted third party and the (successful) results are passed to the service in the form of a verifiable encoded token. Unfortunately, HPCC Systems does not support the concept of third-party verification. It assumes that users -- really, any client application that operates as a user, including things like IDEs -- will submit username/password credentials for authentication. Until that is changed, HPCC Systems won't be able to fully adhere to the OIC specification. We can, however, implement most of the specification. That is what this plugin does. NOTE: This plugin is not available in a Windows build. Doxygen (https://www.doxygen.nl/index.html) can be used to create nice HTML documentation for the code. Call/caller graphs are also generated for functions if you have dot (https://www.graphviz.org/download/) installed and available on your path. Assuming The documentation can then be accessed via The plugin is called by the HPCC Systems If the session token is not present, the plugin will call a That service authenticates the username/password credentials. If everything is good, the service constructs an OIC-compatible token that includes authorization information for that user and returns it to the plugin. The token is validated according to the OIC specification, including signature verification. Note that token signature verification requires an additional piece of information. Tokens can be signed with a hash-based algorithm or with a public key-based algorithm (the actual algorithm used is determined by the JWT service). To verify either kind of algorithm, the plugin will need either the secret hash key or the public key that matches what the JWT service used. That key is read by the plugin from a file, and the file is determined by a configuration setting (see below). It is possible to change the contents of that file without restarting the esp process. Note, though, that the plugin may not notice that the file's contents have changed for several seconds (changes do not immediately take effect). HPCC Systems uses a well-defined authorization scheme, originally designed around an LDAP implementation. That scheme is represented within the token as JWT claims. This plugin will unpack those claims and map to the authorization checks already in place within the HPCC Systems platform. OIC includes the concept of refresh tokens. Refresh tokens enable a service to re-authorize an existing token without user intervention. Re-authorization typically happens due to a token expiring. Tokens should have a relatively short lifetime -- e.g. 15-30 minutes -- to promote good security and also give administrators the ability to modify a user's authorization while the user is logged in. This plugin fully supports refresh tokens by validating token lifetime at every authorization check and calling a The most obvious outcome of this implementation is that a custom service/endpoint needs to be available. Or rather two services: One to handle the initial user login and one to handle token refreshes. Neither service precisely handles requests and replies in an OIC-compatible way, but the tokens themselves are OIC-compatible, which is good. That allows you to use third-party JWT libraries to construct and validate those tokens. Several items must be defined in the platform's configuration. Within configmgr, the Only the first three items have no default values and must be supplied. Once the If you intend to implement file scope permissions then you will also need provide Dali information about the JWT plugin. In configmgr, within the This plugin supports all authorizations documented in the HPCC Systems® Administrator's Guide with the exception of "View Permissions". Loosely speaking, the permissions are divided into three groups: Feature, Workunit Scope, and File Scope. Feature permissions are supported exactly as documented. A specific permission would exist as a JWT claim, by name, with the associated value being the name of the permission. For example, to grant read-only access to ECL Watch, use this claim: File and workunit scope permissions are handled the same way, but different from feature permissions. The claim is one of the Claim constants in the tables below, and the associated value is a matching pattern. A pattern can be simple string or it can use wildcards (specifically, Linux's file globbing wildcards). Wildcards are not typically needed. Multiple patterns can be set for each claim. The purpose of this plugin is to provide authentication and authorization capabilities for HPCC Systems users, with the credentials passed via valid JWT tokens. The intention is to adhere as closely as possibly to the OpenID Connect (OIC) specification, which is a simple identity layer on top of the OAuth 2.0 protocol, while maintaining compatibility with the way HPCC Systems performs authentication and authorization today. More information about the OpenID Connect specification can be found at https://openid.net/specs/openid-connect-core-1_0.html. One of the big advantages of OAuth 2.0 and OIC is that the service (in this case, HPCC Systems) never interacts with the user directly. Instead, authentication is performed by a trusted third party and the (successful) results are passed to the service in the form of a verifiable encoded token. Unfortunately, HPCC Systems does not support the concept of third-party verification. It assumes that users -- really, any client application that operates as a user, including things like IDEs -- will submit username/password credentials for authentication. Until that is changed, HPCC Systems won't be able to fully adhere to the OIC specification. We can, however, implement most of the specification. That is what this plugin does. NOTE: This plugin is not available in a Windows build. Doxygen (https://www.doxygen.nl/index.html) can be used to create nice HTML documentation for the code. Call/caller graphs are also generated for functions if you have dot (https://www.graphviz.org/download/) installed and available on your path. Assuming The documentation can then be accessed via The plugin is called by the HPCC Systems If the session token is not present, the plugin will call a That service authenticates the username/password credentials. If everything is good, the service constructs an OIC-compatible token that includes authorization information for that user and returns it to the plugin. The token is validated according to the OIC specification, including signature verification. Note that token signature verification requires an additional piece of information. Tokens can be signed with a hash-based algorithm or with a public key-based algorithm (the actual algorithm used is determined by the JWT service). To verify either kind of algorithm, the plugin will need either the secret hash key or the public key that matches what the JWT service used. That key is read by the plugin from a file, and the file is determined by a configuration setting (see below). It is possible to change the contents of that file without restarting the esp process. Note, though, that the plugin may not notice that the file's contents have changed for several seconds (changes do not immediately take effect). HPCC Systems uses a well-defined authorization scheme, originally designed around an LDAP implementation. That scheme is represented within the token as JWT claims. This plugin will unpack those claims and map to the authorization checks already in place within the HPCC Systems platform. OIC includes the concept of refresh tokens. Refresh tokens enable a service to re-authorize an existing token without user intervention. Re-authorization typically happens due to a token expiring. Tokens should have a relatively short lifetime -- e.g. 15-30 minutes -- to promote good security and also give administrators the ability to modify a user's authorization while the user is logged in. This plugin fully supports refresh tokens by validating token lifetime at every authorization check and calling a The most obvious outcome of this implementation is that a custom service/endpoint needs to be available. Or rather two services: One to handle the initial user login and one to handle token refreshes. Neither service precisely handles requests and replies in an OIC-compatible way, but the tokens themselves are OIC-compatible, which is good. That allows you to use third-party JWT libraries to construct and validate those tokens. Several items must be defined in the platform's configuration. Within configmgr, the Only the first three items have no default values and must be supplied. Once the If you intend to implement file scope permissions then you will also need provide Dali information about the JWT plugin. In configmgr, within the This plugin supports all authorizations documented in the HPCC Systems® Administrator's Guide with the exception of "View Permissions". Loosely speaking, the permissions are divided into three groups: Feature, Workunit Scope, and File Scope. Feature permissions are supported exactly as documented. A specific permission would exist as a JWT claim, by name, with the associated value being the name of the permission. For example, to grant read-only access to ECL Watch, use this claim: File and workunit scope permissions are handled the same way, but different from feature permissions. The claim is one of the Claim constants in the tables below, and the associated value is a matching pattern. A pattern can be simple string or it can use wildcards (specifically, Linux's file globbing wildcards). Wildcards are not typically needed. Multiple patterns can be set for each claim. The cleanup parameter has been introduced to allow the user to automatically delete the workunits created by executing the Regression Suite on their local system. It is an optional argument of the run and query sub-command. A custom logging system also creates log files for each execution of the run and query sub-command that contains information about the workunit deletion. ./ecl-test run --cleanup [mode] ./ecl-test query --cleanup [mode] Modes allowed are ‘workunits’, ‘passed’. Default is ‘none’. workunits - all passed and failed workunits are deleted. passed - only the passed workunits of the queries executed are deleted. none - no workunits created during the current run command are deleted. ./ecl-test query ECL_query action1.ecl action2.ecl action4.ecl action5.ecl -t hthor --cleanup workunits [Action] Suite: hthor [Action] Automatic Cleanup Routine [Pass] 1. Workunit Wuid=W20240526-094322 deleted successfully. The cleanup parameter has been introduced to allow the user to automatically delete the workunits created by executing the Regression Suite on their local system. It is an optional argument of the run and query sub-command. A custom logging system also creates log files for each execution of the run and query sub-command that contains information about the workunit deletion. ./ecl-test run --cleanup [mode] ./ecl-test query --cleanup [mode] Modes allowed are ‘workunits’, ‘passed’. Default is ‘none’. workunits - all passed and failed workunits are deleted. passed - only the passed workunits of the queries executed are deleted. none - no workunits created during the current run command are deleted. ./ecl-test query ECL_query action1.ecl action2.ecl action4.ecl action5.ecl -t hthor --cleanup workunits [Action] Suite: hthor [Action] Automatic Cleanup Routine [Pass] 1. Workunit Wuid=W20240526-094322 deleted successfully. The Parquet plugin test suite is a subset of tests in the HPCC Systems regression suite. To run the tests: Change directory to HPCC Platform/testing/regress. To run the entire Parquet test suite: Note: Some Parquet tests require initialization of Parquet files. Use the These commands can be run on any cluster, including hthor or Roxie like the example below. To run a single test file: On the roxie cluster: This is what you should see when you run the command above: This project focuses on the development of a comprehensive test suite for the recently integrated Parquet plugin within the HPCC Systems platform. The objective is to thoroughly evaluate the plugin's functionality, performance, and robustness across different scenarios and configurations. The key deliverables include defining and implementing various test cases, fixing any identified bugs, and providing extensive documentation. The test suite will evaluate all data types supported by ECL and Arrow, as well as file operations, various compression formats, and schema handling. Additionally, the test suite will measure the plugin's performance across different HPCC components and hardware configurations, conduct stress tests to identify potential bottlenecks and bugs, and compare Parquet to other file formats used in the ecosystem, such as JSON, XML, and CSV. The test suite consists of 10 main test files that test different parquet functionality and operations: parquet_types.ecl: Tests various ECL and Arrow data types parquet_schema.ecl: Evaluates Parquet's handling of different schemas parquet_compress.ecl: Tests different compression algorithms parquet_write.ecl: Validates Parquet write operations parquet_empty.ecl: Tests behavior with empty Parquet files parquet_corrupt.ecl: Checks handling of corrupt Parquet data parquet_size.ecl: Compares file sizes across formats parquet_partition.ecl: Tests partitioning in Parquet files parquet_overwrite.ecl: Validates overwrite operations parquet-string.ecl: Focuses on string-related operations Covers 42 data types including ECL and Arrow types Examples: BOOLEAN, INTEGER, STRING, UNICODE, various numeric types, sets, and Arrow-specific types The Parquet plugin test suite shows that the plugin supports all ECL types. Tests all available Arrow compression types: Snappy, GZip, Brotli, LZ4, ZSTD, Uncompressed Compares performance and file sizes for different compression options Tests ParquetIO.Read for creating ECL datasets from Parquet files Tests ParquetIO.Write for writing ECL datasets to Parquet files Read and Write Speeds Comparison with Other File Types Schema Handling and Compatibility Behavior with Corrupt Data and Empty Parquet Files The test suite generally uses key files located in HPCC-Platform/testing/regress/ecl/key, with a ".xml" extension, to evaluate test outcomes. These files store the expected results for comparison. However, some Parquet tests do not rely on key files, and alternative evaluation methods are used in those cases. The Parquet plugin test suite is a subset of tests in the HPCC Systems regression suite. To run the tests: Change directory to HPCC Platform/testing/regress. To run the entire Parquet test suite: Note: Some Parquet tests require initialization of Parquet files. Use the These commands can be run on any cluster, including hthor or Roxie like the example below. To run a single test file: On the roxie cluster: This is what you should see when you run the command above: This project focuses on the development of a comprehensive test suite for the recently integrated Parquet plugin within the HPCC Systems platform. The objective is to thoroughly evaluate the plugin's functionality, performance, and robustness across different scenarios and configurations. The key deliverables include defining and implementing various test cases, fixing any identified bugs, and providing extensive documentation. The test suite will evaluate all data types supported by ECL and Arrow, as well as file operations, various compression formats, and schema handling. Additionally, the test suite will measure the plugin's performance across different HPCC components and hardware configurations, conduct stress tests to identify potential bottlenecks and bugs, and compare Parquet to other file formats used in the ecosystem, such as JSON, XML, and CSV. The test suite consists of 10 main test files that test different parquet functionality and operations: parquet_types.ecl: Tests various ECL and Arrow data types parquet_schema.ecl: Evaluates Parquet's handling of different schemas parquet_compress.ecl: Tests different compression algorithms parquet_write.ecl: Validates Parquet write operations parquet_empty.ecl: Tests behavior with empty Parquet files parquet_corrupt.ecl: Checks handling of corrupt Parquet data parquet_size.ecl: Compares file sizes across formats parquet_partition.ecl: Tests partitioning in Parquet files parquet_overwrite.ecl: Validates overwrite operations parquet-string.ecl: Focuses on string-related operations Covers 42 data types including ECL and Arrow types Examples: BOOLEAN, INTEGER, STRING, UNICODE, various numeric types, sets, and Arrow-specific types The Parquet plugin test suite shows that the plugin supports all ECL types. Tests all available Arrow compression types: Snappy, GZip, Brotli, LZ4, ZSTD, Uncompressed Compares performance and file sizes for different compression options Tests ParquetIO.Read for creating ECL datasets from Parquet files Tests ParquetIO.Write for writing ECL datasets to Parquet files Read and Write Speeds Comparison with Other File Types Schema Handling and Compatibility Behavior with Corrupt Data and Empty Parquet Files The test suite generally uses key files located in HPCC-Platform/testing/regress/ecl/key, with a ".xml" extension, to evaluate test outcomes. These files store the expected results for comparison. However, some Parquet tests do not rely on key files, and alternative evaluation methods are used in those cases. The esdl utility tool aids with creating and managing ESDL-based and Dynamic ESDL services on an HPCC cluster. It consists of several different commands. To generate output from an ESDL definition: To manage ESDL and DESDL services: The sections below cover the commands in more detail. The A manifest file is an XML-formatted template combining elements in and outside of the manifest's The result of running the manifest tool on a manifest file is an XML artifact suitable for use with the ESDL ESP. Supported output includes: A simplified example showing the format of a The tool is permissive and flexible, copying through most markup to the output. Recognized elements in the manifest namespace may be treated differently. They are only required in order to take advantage of the automated processing and simplified format of the manifest file. This example highlights the recommended usage of manifest elements to use the tool's capabilities. Although you could replace some of the elements below with verbatim XML element ordering is significant to proper ESDL script, XSLT, and ESXDL content processing. The platform's These elements may create artifact content, change the tool's behavior, or both. When used as intended, none of these elements will appear in the generated output: Required root element of all manifest files that create output: Recommended child of While possible to omit the There are three categories of attributes that can be defined on the First are the standard attributes used to setup and define the binding: Note that Second are binding-specific or service-specific attributes. This is an open-ended category where most future attributes will belong. Any attribute not in the other two categories is included here and output on the Finally are auxillary attributes. These should be thought of as read-only or for reference, and it is not recommended that you set these in the manifest. They are set by the system when publishing to dali using Recommended child of The element is not required since it is possible to embed a complete Optional element that imports the contents of another file into the output in place of itself. The outcome of the import depends on the context in which this element is used. See EsdlDefinition, Scripts, and Transform for more information. Any XSLTs or ESDL Scripts written inline in a manifest file will have XML escaping applied where required to generate valid XML. If an XSLT contains any text content or markup that needs to be preserved as-is (no XML escaping applied) then be sure to use an For details on using XSLT to generate unescaped output, see this section of the specification: https://www.w3.org/TR/1999/REC-xslt-19991116#disable-output-escaping Optional repeatable element appearing within an Optional repeatable element appearing within an The esdl An example output of each type - WsFoobar-request-prep.xml WsFoobar-logging-prep.xml FoobarSearch-scripts.xml log-prep.xslt WsFoobar.ecm The bundle is suitable to configure a service on an ESP launched in esdl application mode. The binding can be used to configure a service for an ESP using a dali. The esdl utility tool aids with creating and managing ESDL-based and Dynamic ESDL services on an HPCC cluster. It consists of several different commands. To generate output from an ESDL definition: To manage ESDL and DESDL services: The sections below cover the commands in more detail. The A manifest file is an XML-formatted template combining elements in and outside of the manifest's The result of running the manifest tool on a manifest file is an XML artifact suitable for use with the ESDL ESP. Supported output includes: A simplified example showing the format of a The tool is permissive and flexible, copying through most markup to the output. Recognized elements in the manifest namespace may be treated differently. They are only required in order to take advantage of the automated processing and simplified format of the manifest file. This example highlights the recommended usage of manifest elements to use the tool's capabilities. Although you could replace some of the elements below with verbatim XML element ordering is significant to proper ESDL script, XSLT, and ESXDL content processing. The platform's These elements may create artifact content, change the tool's behavior, or both. When used as intended, none of these elements will appear in the generated output: Required root element of all manifest files that create output: Recommended child of While possible to omit the There are three categories of attributes that can be defined on the First are the standard attributes used to setup and define the binding: Note that Second are binding-specific or service-specific attributes. This is an open-ended category where most future attributes will belong. Any attribute not in the other two categories is included here and output on the Finally are auxillary attributes. These should be thought of as read-only or for reference, and it is not recommended that you set these in the manifest. They are set by the system when publishing to dali using Recommended child of The element is not required since it is possible to embed a complete Optional element that imports the contents of another file into the output in place of itself. The outcome of the import depends on the context in which this element is used. See EsdlDefinition, Scripts, and Transform for more information. Any XSLTs or ESDL Scripts written inline in a manifest file will have XML escaping applied where required to generate valid XML. If an XSLT contains any text content or markup that needs to be preserved as-is (no XML escaping applied) then be sure to use an For details on using XSLT to generate unescaped output, see this section of the specification: https://www.w3.org/TR/1999/REC-xslt-19991116#disable-output-escaping Optional repeatable element appearing within an Optional repeatable element appearing within an The esdl An example output of each type - WsFoobar-request-prep.xml WsFoobar-logging-prep.xml FoobarSearch-scripts.xml log-prep.xslt WsFoobar.ecm The bundle is suitable to configure a service on an ESP launched in esdl application mode. The binding can be used to configure a service for an ESP using a dali. This tool is designed to interact with HPCC ESP services, providing four commands: ESDL Directory Location: The tool gathers the directory of the ESDL files from an environment configuration variable; gets the install path from jutil library and appends /componentfiles/esdl_files/. Server and Port Defaults: When using the Custom ESDL Directory Argument: An additional argument could be introduced to allow users to specify the directory of the ESDL files directly in the command. Template Request Generation: A feature could be added to generate template XML or JSON requests. This would simplify the process of filling out requests by providing a pre-structured template. Credential Prompts: The tool could be expanded to prompt for the username and password upon a 401 Unauthorized response. Selective Response Extraction: Another potential feature is to allow extraction of specific tags from the response using XPath expressions. This would make it easier to parse and analyze responses. This tool is designed to interact with HPCC ESP services, providing four commands: ESDL Directory Location: The tool gathers the directory of the ESDL files from an environment configuration variable; gets the install path from jutil library and appends /componentfiles/esdl_files/. Server and Port Defaults: When using the Custom ESDL Directory Argument: An additional argument could be introduced to allow users to specify the directory of the ESDL files directly in the command. Template Request Generation: A feature could be added to generate template XML or JSON requests. This would simplify the process of filling out requests by providing a pre-structured template. Credential Prompts: The tool could be expanded to prompt for the username and password upon a 401 Unauthorized response. Selective Response Extraction: Another potential feature is to allow extraction of specific tags from the response using XPath expressions. This would make it easier to parse and analyze responses. The file tools/git/aliases.sh contains various git aliases which are useful when using git, and may be used by the merge scripts. The file env.sh.example contains some example environment variable settings. Copy that locally to env.sh and modify it to match your local setup. Before running any of the other scripts, process the contents of that file as a source file to initialize the common environment variables. The following tools are required: The following repositories should be checked out in a directory reserved for merging and tagging (default for scripts is ~/git): The following are required for builds prior to 8.12.x The files git-fixversion and git-unupmerge can copied so they are on your default path, and then they will be available as git commands. The following process should be followed when tagging a new set of versions. You can set the For example: Go gold with each of the explicit versions If you have merged changes onto a point-release branch you would normally create a new rc before going gold. If the change was trivial (e.g. removing an unwanted file) then you can use the --ignore option to skip that step. This normally happens after cherry-picking a late fix for a particular version, which has already been merged into the .x candidate branch. A new minor branch is created from the current master... The file tools/git/aliases.sh contains various git aliases which are useful when using git, and may be used by the merge scripts. The file env.sh.example contains some example environment variable settings. Copy that locally to env.sh and modify it to match your local setup. Before running any of the other scripts, process the contents of that file as a source file to initialize the common environment variables. The following tools are required: The following repositories should be checked out in a directory reserved for merging and tagging (default for scripts is ~/git): The following are required for builds prior to 8.12.x The files git-fixversion and git-unupmerge can copied so they are on your default path, and then they will be available as git commands. The following process should be followed when tagging a new set of versions. You can set the For example: Go gold with each of the explicit versions If you have merged changes onto a point-release branch you would normally create a new rc before going gold. If the change was trivial (e.g. removing an unwanted file) then you can use the --ignore option to skip that step. This normally happens after cherry-picking a late fix for a particular version, which has already been merged into the .x candidate branch. A new minor branch is created from the current master... Developers Hub Notes and documentation for developers of the HPCC-Platform=9?this.regexp_groupSpecifier(e):e.current()===63&&e.raise("Invalid group"),this.regexp_disjunction(e),e.eat(41))return e.numCapturingParens+=1,!0;e.raise("Unterminated group")}return!1};A.regexp_eatExtendedAtom=function(e){return e.eat(46)||this.regexp_eatReverseSolidusAtomEscape(e)||this.regexp_eatCharacterClass(e)||this.regexp_eatUncapturingGroup(e)||this.regexp_eatCapturingGroup(e)||this.regexp_eatInvalidBracedQuantifier(e)||this.regexp_eatExtendedPatternCharacter(e)};A.regexp_eatInvalidBracedQuantifier=function(e){return this.regexp_eatBracedQuantifier(e,!0)&&e.raise("Nothing to repeat"),!1};A.regexp_eatSyntaxCharacter=function(e){var t=e.current();return vr(t)?(e.lastIntValue=t,e.advance(),!0):!1};function vr(e){return e===36||e>=40&&e<=43||e===46||e===63||e>=91&&e<=94||e>=123&&e<=125}A.regexp_eatPatternCharacters=function(e){for(var t=e.pos,n=0;(n=e.current())!==-1&&!vr(n);)e.advance();return e.pos!==t};A.regexp_eatExtendedPatternCharacter=function(e){var t=e.current();return t!==-1&&t!==36&&!(t>=40&&t<=43)&&t!==46&&t!==63&&t!==91&&t!==94&&t!==124?(e.advance(),!0):!1};A.regexp_groupSpecifier=function(e){if(e.eat(63)){this.regexp_eatGroupName(e)||e.raise("Invalid group");var t=this.options.ecmaVersion>=16,n=e.groupNames[e.lastStringValue];if(n)if(t)for(var s=0,i=n;sCMake files structure and usage
Directory structure of CMake files
\\- cmake\\_modules/ - Directory storing modules and configurations for CMake
+
+: - FindXXXXX.cmake - CMake find files used to locate libraries,
+ headers, and binaries
+ - commonSetup.cmake - common configuration settings for the
+ entire project (contains configure time options)
+ - docMacros.cmake - common documentation macros used for
+ generating fop and pdf files
+ - optionDefaults.cmake - contains common variables for the
+ platform build
+ - distrocheck.sh - script that determines if the OS uses DEB
+ or RPM
+ - getpackagerevisionarch.sh - script that returns OS version
+ and arch in format used for packaging
+
+ \\- dependencies/ - Directory storing dependency files used for package dependencies
+
+ : - \\<OS\\>.cmake - File containing either DEB or RPM
+ dependencies for the given OS
+
+\\- build-utils/ - Directory for build related utilities
+
+: - cleanDeb.sh - script that unpacks a deb file and rebuilds
+ with fakeroot to clean up lintain errors/warnings
+
Common Macros
Documentation Macros
Initfiles macro
Some standard techniques used in Cmake project files
Common looping
FOREACH( oITEMS
+ item1
+ item2
+ item3
+ item4
+ item5
+)
+ Actions on each item here.
+ENDFOREACH ( oITEMS )
+
Common installs over just install
Common settings for generated source files
Using custom commands between multiple cmake files
FindXXXXX.cmake format
NOT XXXXX_FOUND
+ Externals set
+ define needed vars for finding external based libraries/headers
+
+ Use Native set
+ use FIND_PATH to locate headers
+ use FIND_LIBRARY to find libs
+
+Include Cmake macros file for package handling
+define package handling args for find return (This will set XXXXX_FOUND)
+
+XXXXX_FOUND
+ perform any modifications you feel is needed for the find
+
+Mark defined variables used in package handling args as advanced for return
+
XXXXX_FOUND
+XXXXX_INCLUDE_DIR
+XXXXX_LIBRARIES
+
CMake files structure and usage
Directory structure of CMake files
\\- cmake\\_modules/ - Directory storing modules and configurations for CMake
+
+: - FindXXXXX.cmake - CMake find files used to locate libraries,
+ headers, and binaries
+ - commonSetup.cmake - common configuration settings for the
+ entire project (contains configure time options)
+ - docMacros.cmake - common documentation macros used for
+ generating fop and pdf files
+ - optionDefaults.cmake - contains common variables for the
+ platform build
+ - distrocheck.sh - script that determines if the OS uses DEB
+ or RPM
+ - getpackagerevisionarch.sh - script that returns OS version
+ and arch in format used for packaging
+
+ \\- dependencies/ - Directory storing dependency files used for package dependencies
+
+ : - \\<OS\\>.cmake - File containing either DEB or RPM
+ dependencies for the given OS
+
+\\- build-utils/ - Directory for build related utilities
+
+: - cleanDeb.sh - script that unpacks a deb file and rebuilds
+ with fakeroot to clean up lintain errors/warnings
+
Common Macros
Documentation Macros
Initfiles macro
Some standard techniques used in Cmake project files
Common looping
FOREACH( oITEMS
+ item1
+ item2
+ item3
+ item4
+ item5
+)
+ Actions on each item here.
+ENDFOREACH ( oITEMS )
+
Common installs over just install
Common settings for generated source files
Using custom commands between multiple cmake files
FindXXXXX.cmake format
NOT XXXXX_FOUND
+ Externals set
+ define needed vars for finding external based libraries/headers
+
+ Use Native set
+ use FIND_PATH to locate headers
+ use FIND_LIBRARY to find libs
+
+Include Cmake macros file for package handling
+define package handling args for find return (This will set XXXXX_FOUND)
+
+XXXXX_FOUND
+ perform any modifications you feel is needed for the find
+
+Mark defined variables used in package handling args as advanced for return
+
XXXXX_FOUND
+XXXXX_INCLUDE_DIR
+XXXXX_LIBRARIES
+
Introduction
Purpose
Aims
Key ideas
From declarative to imperative
Flow of processing
Working on the code generator
The regression suite
~/dev/hpcc/ecl/regress/regress.sh -t /regress/hpcc -e /home/<user>/buildr/Release/bin/eclcc -I /home/<user>/dev/hpcc/ecl/regress/modules -I /home/<user>/dev/hpcc/plugins/javaembed -I /home/<user>/dev/hpcc/plugins/v8embed -c /regress/hpcc.master -d bcompare
Running directly from the build directory
ECLCC_ECLBUNDLE_PATH eclBundlesPath
Hints and tips
eclcc myfile.ecl --logfile myfile.log --logdetail 999
regress.sh -q myfile.ecl -l myfile.log
dbglogExpr(expr); // regenerate the ecl for an expression. See other functions in ecl/hql/hqlthql.hpp
EclIR::dbglogIR(expr); // regenerate the IR for an expression. See other functions in ecl/hql/hqlir.hpp
p EclIR::dump_ir(expr)
EclIR::dump_ir(expr1, expr2)
. The first difference between the expressions will be the expression that follows the first "return".DEBUG_TRACK_INSTANCEID
(in ecl/hql/hqlexpr.ipp) will add a unique sequence number to each IHqlExpression that is created. There is also a function checkSeqId() at the start of ecl/hql/hqlexpr.cpp which is called whenever an expression is created, linked, released etc.. Setting a breakpoint in that function can allow you to trace back exactly when and why a particular node was created.Expressions
Expression Graph representation
IHqlExpression
IHqlSimpleScope
IHqlScope
IHqlDataset
Properties and attributes
Field references
x := DATASET(...)
+y := x(x.id != 0);
+z := y(x.id != 100);
+
Attribute "new"
EXISTS(dataset(EXISTS(dataset.childdataset.grandchild))
+
EXISTS(dataset.childdataset(EXISTS(dataset.childdataset.grandchild))
+
Transforming selects
Annotations
Associated side-effects
EXPORT a(x) := FUNCTION
+ Y := F(x);
+ OUTPUT(Y);
+ RETURN G(Y);
+END;
+
Derived properties
Transformations
A := x; B := x; C = A + B;
+
A' := x'; B' := X''; C' := A' + B';
+
Key Stages
Parsing
Normalizing
Scope checking
Constant folding: foldHqlExpression
Expression optimizer: optimizeHqlExpression
Implicit project: insertImplicitProjects
Workunits
is this the correct term? Should it be a query? This should really be independent of this document...)
Workflow
Graph
Inputs and Results
Generated code
Implementation details
Parser
Generated code
C++ Output structures
Activity Helper
Meta helper
Building expressions
Scalar expressions
Datasets
Dataset cursors
Field access classes
Key filepos weirdness
Source code
ecl/eclcc The executable which ties everything together.
Challenges
From declarative to imperative
The parser
Introduction
Purpose
Aims
Key ideas
From declarative to imperative
Flow of processing
Working on the code generator
The regression suite
~/dev/hpcc/ecl/regress/regress.sh -t /regress/hpcc -e /home/<user>/buildr/Release/bin/eclcc -I /home/<user>/dev/hpcc/ecl/regress/modules -I /home/<user>/dev/hpcc/plugins/javaembed -I /home/<user>/dev/hpcc/plugins/v8embed -c /regress/hpcc.master -d bcompare
Running directly from the build directory
ECLCC_ECLBUNDLE_PATH eclBundlesPath
Hints and tips
eclcc myfile.ecl --logfile myfile.log --logdetail 999
regress.sh -q myfile.ecl -l myfile.log
dbglogExpr(expr); // regenerate the ecl for an expression. See other functions in ecl/hql/hqlthql.hpp
EclIR::dbglogIR(expr); // regenerate the IR for an expression. See other functions in ecl/hql/hqlir.hpp
p EclIR::dump_ir(expr)
EclIR::dump_ir(expr1, expr2)
. The first difference between the expressions will be the expression that follows the first "return".DEBUG_TRACK_INSTANCEID
(in ecl/hql/hqlexpr.ipp) will add a unique sequence number to each IHqlExpression that is created. There is also a function checkSeqId() at the start of ecl/hql/hqlexpr.cpp which is called whenever an expression is created, linked, released etc.. Setting a breakpoint in that function can allow you to trace back exactly when and why a particular node was created.Expressions
Expression Graph representation
IHqlExpression
IHqlSimpleScope
IHqlScope
IHqlDataset
Properties and attributes
Field references
x := DATASET(...)
+y := x(x.id != 0);
+z := y(x.id != 100);
+
Attribute "new"
EXISTS(dataset(EXISTS(dataset.childdataset.grandchild))
+
EXISTS(dataset.childdataset(EXISTS(dataset.childdataset.grandchild))
+
Transforming selects
Annotations
Associated side-effects
EXPORT a(x) := FUNCTION
+ Y := F(x);
+ OUTPUT(Y);
+ RETURN G(Y);
+END;
+
Derived properties
Transformations
A := x; B := x; C = A + B;
+
A' := x'; B' := X''; C' := A' + B';
+
Key Stages
Parsing
Normalizing
Scope checking
Constant folding: foldHqlExpression
Expression optimizer: optimizeHqlExpression
Implicit project: insertImplicitProjects
Workunits
is this the correct term? Should it be a query? This should really be independent of this document...)
Workflow
Graph
Inputs and Results
Generated code
Implementation details
Parser
Generated code
C++ Output structures
Activity Helper
Meta helper
Building expressions
Scalar expressions
Datasets
Dataset cursors
Field access classes
Key filepos weirdness
Source code
ecl/eclcc The executable which ties everything together.
Challenges
From declarative to imperative
The parser
Code Review Guidelines
Review Goals
These should have been caught earlier, but better later than never...
This includes following the project coding standards (see Style Guide).
For example providing information about how the current system works, functionality that is already available or suggestions of other approaches the developer may not have thought of.General comments
This should include what change is expected if not obvious. Don’t assume the contributor has same understanding/view as reviewer.
...rather than wasting time trying to second-guess the reviewer.
The reviewer can either agree, or provide reasons why they consider it to be an issue.
If the change could be extended, or only partially solves the issue, a new JIRA should be created for the extra work. If the change will introduce regressions, or fundamentally fails to solve the problem then this does not apply!
Sometimes a significant design problem means the rest of the code has not been reviewed in detail. Other times an initial review has picked up a set of issues, but the reviewer needs to go back and check other aspects in detail. If this is the case it should be explicitly noted.
The reviewer is free to comment on every instance of a repeated issue, but a simple annotation should alert contributor to address appropriately eg: [Please address all instances of this issue]
The contributor should respond to a comment if it isn't obvious where/how they have been addressed (but no need to acknowledge typo/indentation/etc)
Both reviewers and contributors should respond in a timely manner - don't leave it for days. It destroys the flow of thought and conversation.
If they have not been addressed you are guaranteed another review/submit cycle. In particular watch out for collapsed conversations. If there are large numbers of comments GitHub will collapse them, which can make comments easy to miss.
If there are large number of comments > 100, it can be hard to track all the comments and GitHub can become unresponsive. It may be better to close the PR and open a new one.
Making use of the "viewed" button can make it easier to track what has changed - or quickly remove trivial changes from view. Ignoring whitespace can often simplify comparisons - especially when code has been refactored or extra conditions or try/catch bocks have been introduced.Strictness
Checklist
Could this possibly cause problems if data produced with this change is used in earlier/later versions? Could there be problems if it was used in a mixed-version environment?Comment tags
Tag What Why Expected response design: An architectural or design issue The reviewer considers the PR has a significant problem which will affect its functionality or future extensibility reviewer/developer redesign expected before any further changes scope: The scope of the PR does not match the Jira If the scope of the fix is too large it can be hard to review, and take much longer to resolve all the issues before the PR is accepted. Discussion. Split the PR into multiple simpler PRs. function: Incorrect/unexpected functionality implemented The function doesn't match the description in the jira, or doesn't solve the original problem developer expected to address issue (or discuss) security: Something in the code introduces a security problem The reviewer has spotted potential security issues e.g. injection attacks developer expected to discuss the issue (and then address) bug: A coding issue that will cause incorrect behaviour Likely to cause confusion, invalid results or crashes. developer expected to address issue efficiency: The code works, but may have scaling or other efficiency issues. Inefficiency can cause problem in some key functions and areas developer addressing the problem (or discuss) discuss: Reviewer has thought of a potential problem, but not sure if it applies Reviewer has a concern it may be an issue, and wants to check the developer has thought about and addressed the issue Discussion - either in the PR or offline. style: Reviewer points out non-conforming code style Makes the code hard to read Developer to fix indent: A fairly obvious indentation issue Makes the code hard to read Developer to fix. format: Any other unusual formatting Makes the code hard to read Developer to fix. typo: Minor typing error Makes something (code/message/comment) harder to read Developer to fix. minor: A minor issue that could be improved. Education (the suggestion is better for a particular reason), or something simple to clean up at the same time as other changes Developer recommended to fix, but unlikely to stop a merge picky: A very minor issue that could be improved, but is barely worth commenting on Education, or something to clean up at the same time as other changes Developer discretion to fix, wouldn't stop a merge future: An additional feature or functionality that fits in but should be done as a separate PR. Ensure that missing functionality is tracked, but PRs are not held up by additional requirements. Contributor to create Jira (unless trivial) and number noted in response. question: Review has a question that they are not sure of the answer to Reviewer would like clarification to help understand the code or design. The answer may lead to further comments. An answer to the question. note: Reviewer wants to pass on some information to the contributor which they may not know Passing on knowledge/background contributor should consider the note, but no change expected/required personal: Reviewer has an observation based on personal experience Reviewer has comments that would improve the code, but not part of the style guide or required. E.g. patterns for guard conditions Reflect on the suggestion, but no change expected. documentation: This change may have an impact on documentation Make sure changes can be used Contributor to create Jira describing the impact created, and number noted in response. Code Review Guidelines
Review Goals
These should have been caught earlier, but better later than never...
This includes following the project coding standards (see Style Guide).
For example providing information about how the current system works, functionality that is already available or suggestions of other approaches the developer may not have thought of.General comments
This should include what change is expected if not obvious. Don’t assume the contributor has same understanding/view as reviewer.
...rather than wasting time trying to second-guess the reviewer.
The reviewer can either agree, or provide reasons why they consider it to be an issue.
If the change could be extended, or only partially solves the issue, a new JIRA should be created for the extra work. If the change will introduce regressions, or fundamentally fails to solve the problem then this does not apply!
Sometimes a significant design problem means the rest of the code has not been reviewed in detail. Other times an initial review has picked up a set of issues, but the reviewer needs to go back and check other aspects in detail. If this is the case it should be explicitly noted.
The reviewer is free to comment on every instance of a repeated issue, but a simple annotation should alert contributor to address appropriately eg: [Please address all instances of this issue]
The contributor should respond to a comment if it isn't obvious where/how they have been addressed (but no need to acknowledge typo/indentation/etc)
Both reviewers and contributors should respond in a timely manner - don't leave it for days. It destroys the flow of thought and conversation.
If they have not been addressed you are guaranteed another review/submit cycle. In particular watch out for collapsed conversations. If there are large numbers of comments GitHub will collapse them, which can make comments easy to miss.
If there are large number of comments > 100, it can be hard to track all the comments and GitHub can become unresponsive. It may be better to close the PR and open a new one.
Making use of the "viewed" button can make it easier to track what has changed - or quickly remove trivial changes from view. Ignoring whitespace can often simplify comparisons - especially when code has been refactored or extra conditions or try/catch bocks have been introduced.Strictness
Checklist
Could this possibly cause problems if data produced with this change is used in earlier/later versions? Could there be problems if it was used in a mixed-version environment?Comment tags
Tag What Why Expected response design: An architectural or design issue The reviewer considers the PR has a significant problem which will affect its functionality or future extensibility reviewer/developer redesign expected before any further changes scope: The scope of the PR does not match the Jira If the scope of the fix is too large it can be hard to review, and take much longer to resolve all the issues before the PR is accepted. Discussion. Split the PR into multiple simpler PRs. function: Incorrect/unexpected functionality implemented The function doesn't match the description in the jira, or doesn't solve the original problem developer expected to address issue (or discuss) security: Something in the code introduces a security problem The reviewer has spotted potential security issues e.g. injection attacks developer expected to discuss the issue (and then address) bug: A coding issue that will cause incorrect behaviour Likely to cause confusion, invalid results or crashes. developer expected to address issue efficiency: The code works, but may have scaling or other efficiency issues. Inefficiency can cause problem in some key functions and areas developer addressing the problem (or discuss) discuss: Reviewer has thought of a potential problem, but not sure if it applies Reviewer has a concern it may be an issue, and wants to check the developer has thought about and addressed the issue Discussion - either in the PR or offline. style: Reviewer points out non-conforming code style Makes the code hard to read Developer to fix indent: A fairly obvious indentation issue Makes the code hard to read Developer to fix. format: Any other unusual formatting Makes the code hard to read Developer to fix. typo: Minor typing error Makes something (code/message/comment) harder to read Developer to fix. minor: A minor issue that could be improved. Education (the suggestion is better for a particular reason), or something simple to clean up at the same time as other changes Developer recommended to fix, but unlikely to stop a merge picky: A very minor issue that could be improved, but is barely worth commenting on Education, or something to clean up at the same time as other changes Developer discretion to fix, wouldn't stop a merge future: An additional feature or functionality that fits in but should be done as a separate PR. Ensure that missing functionality is tracked, but PRs are not held up by additional requirements. Contributor to create Jira (unless trivial) and number noted in response. question: Review has a question that they are not sure of the answer to Reviewer would like clarification to help understand the code or design. The answer may lead to further comments. An answer to the question. note: Reviewer wants to pass on some information to the contributor which they may not know Passing on knowledge/background contributor should consider the note, but no change expected/required personal: Reviewer has an observation based on personal experience Reviewer has comments that would improve the code, but not part of the style guide or required. E.g. patterns for guard conditions Reflect on the suggestion, but no change expected. documentation: This change may have an impact on documentation Make sure changes can be used Contributor to create Jira describing the impact created, and number noted in response. Code Submission Guidelines
Pull requests
The format should be HPCC-XXXXX (where XXXXX is the bug number) followed by a description of the issue. The text should make sense in a change log by itself - without reference to the jira or the contents of the PR. We should aim to increase the information that is included as part of the commit message - not rely on on the jira.
The code reviewer only has the JIRA and the PR to go on. The JIRA (or associated documentation) should contain enough details to review the PR - e.g. the purpose, main aim, why the change was made etc.. If the scope of the jira has changed then the jira should be updated to reflect that.
If the submission requires changes to the documentation then the JIRA should contain all the details needed to document it, and the PR should either contain the documentation changes, or a documentation JIRA should be created.
The check boxes are there to remind you to consider different aspects of the PR. Not all of them apply to every submission, but if you tick a box and have not really thought about the item then prepare to be embarrassed!
It isn't always possible, but several smaller PRs are much easier to review than one large change. If your submission includes semi-automatic/mechanical changes (e.g. renaming large numbers of function calls, or adding an extra parameter) please keep it as a separate commit. This makes it much easier to review the PR - since the reviewer will be looking for different errors in the different types of changes.
Review your own code in github, after creating the PR to check for silly mistakes. It doesn't take long, and often catches trivial issues. It may avoid the need for a cycle of code-review/fixes. It may be helpful to add some notes to specific changes e.g. "this change is mainly or solely refactoring method A into method B and C. ". Some common examples of trivial issues to look for include: Reviewers
Target branch
Code Submission Guidelines
Pull requests
The format should be HPCC-XXXXX (where XXXXX is the bug number) followed by a description of the issue. The text should make sense in a change log by itself - without reference to the jira or the contents of the PR. We should aim to increase the information that is included as part of the commit message - not rely on on the jira.
The code reviewer only has the JIRA and the PR to go on. The JIRA (or associated documentation) should contain enough details to review the PR - e.g. the purpose, main aim, why the change was made etc.. If the scope of the jira has changed then the jira should be updated to reflect that.
If the submission requires changes to the documentation then the JIRA should contain all the details needed to document it, and the PR should either contain the documentation changes, or a documentation JIRA should be created.
The check boxes are there to remind you to consider different aspects of the PR. Not all of them apply to every submission, but if you tick a box and have not really thought about the item then prepare to be embarrassed!
It isn't always possible, but several smaller PRs are much easier to review than one large change. If your submission includes semi-automatic/mechanical changes (e.g. renaming large numbers of function calls, or adding an extra parameter) please keep it as a separate commit. This makes it much easier to review the PR - since the reviewer will be looking for different errors in the different types of changes.
Review your own code in github, after creating the PR to check for silly mistakes. It doesn't take long, and often catches trivial issues. It may avoid the need for a cycle of code-review/fixes. It may be helpful to add some notes to specific changes e.g. "this change is mainly or solely refactoring method A into method B and C. ". Some common examples of trivial issues to look for include: Reviewers
Target branch
Working with developer documentation
Documentation location
devdoc
or subfolders of devdoc
.devdoc/.vitepress/config.js
file that prevents certain folders from being included in the documentation. If you add a new document to a folder that is excluded, then it will not be included in the documentation. If you need to add a new document to an excluded folder, then you will need to update the exclusion list in the devdoc/.vitepress/config.js
file.Documentation format
Rendering documentation locally with VitePress
npm install
+npm run docs-dev
Adding a new document
.md
file extension. Once the file exists, you can view it by navigating to the appropriate URL. For example, if you add a new file called MyNewDocument.md
to the devdoc
folder, then you can view it by navigating to http://localhost:5173/HPCC-Platform/devdoc/MyNewDocument.html.Adding a new document to the sidebar
devdoc/.vitepress/config.js
file. The entry should be added to the sidebar
section. For example, to add a new document called MyNewDocument.md
to the devdoc
folder, you would add the following entry to the sidebar
section:sidebar: [
+ ...
+ {
+ text: 'My New Document',
+ link: '/devdoc/MyNewDocument'
+ }
+ ...
Editing the main landing page
index.md
in the root folder. Its structure uses the VitePress "home" layout.Working with developer documentation
Documentation location
devdoc
or subfolders of devdoc
.devdoc/.vitepress/config.js
file that prevents certain folders from being included in the documentation. If you add a new document to a folder that is excluded, then it will not be included in the documentation. If you need to add a new document to an excluded folder, then you will need to update the exclusion list in the devdoc/.vitepress/config.js
file.Documentation format
Rendering documentation locally with VitePress
npm install
+npm run docs-dev
Adding a new document
.md
file extension. Once the file exists, you can view it by navigating to the appropriate URL. For example, if you add a new file called MyNewDocument.md
to the devdoc
folder, then you can view it by navigating to http://localhost:5173/HPCC-Platform/devdoc/MyNewDocument.html.Adding a new document to the sidebar
devdoc/.vitepress/config.js
file. The entry should be added to the sidebar
section. For example, to add a new document called MyNewDocument.md
to the devdoc
folder, you would add the following entry to the sidebar
section:sidebar: [
+ ...
+ {
+ text: 'My New Document',
+ link: '/devdoc/MyNewDocument'
+ }
+ ...
Editing the main landing page
index.md
in the root folder. Its structure uses the VitePress "home" layout.Development Guide
HPCC Source
Getting the sources
Building the system from sources
Requirements
sudo apt-get install docbook
+sudo apt-get install xsltproc
+sudo apt-get install fop
Building the system
cmake <source directory>
make
hpccsystems-platform.sln
Packaging
make package
Testing the system
Unit Tests
./Debug/bin/roxie -selftest
./Debug/bin/eclagent -selftest
./Debug/bin/daregress localhost
Regression Tests
Compiler Tests
./regress.sh -t golden -e buildDir/Debug/bin/eclcc
./regress.sh -t my_branch -c golden -e buildDir/Debug/bin/eclcc
./regress.sh -t my_branch -c golden
Debugging the system
cmake -DCMAKE_BUILD_TYPE=Debug <source directory>
Development Guide
HPCC Source
Getting the sources
Building the system from sources
Requirements
sudo apt-get install docbook
+sudo apt-get install xsltproc
+sudo apt-get install fop
Building the system
cmake <source directory>
make
hpccsystems-platform.sln
Packaging
make package
Testing the system
Unit Tests
./Debug/bin/roxie -selftest
./Debug/bin/eclagent -selftest
./Debug/bin/daregress localhost
Regression Tests
Compiler Tests
./regress.sh -t golden -e buildDir/Debug/bin/eclcc
./regress.sh -t my_branch -c golden -e buildDir/Debug/bin/eclcc
./regress.sh -t my_branch -c golden
Debugging the system
cmake -DCMAKE_BUILD_TYPE=Debug <source directory>
HPCC git support
ecl run hthor --main demo.main@ghalliday/gch-ecldemo-d#version1 --server=...
Credentials for local development
gh auth login
Configuring eclccserver
Kubernetes
eclccserver:
+- name: myeclccserver
+ gitUsername: ghalliday
secrets:
+ git:
+ ghalliday: my-git-secret
apiVersion: v1
+kind: Secret
+metadata:
+ name: my-git-secret
+type: Opaque
+stringData:
+ password: ghp_eZLHeuoHxxxxxxxxxxxxxxxxxxxxol3986sS=
kubectl apply -f ~/dev/hpcc/helm/secrets/my-git-secret
Bare-metal
export HPCC_GIT_USERNAME=ghalliday
$cat /opt/HPCCSystems/secrets/git/ghalliday/password
+ghp_eZLHeuoHxxxxxxxxxxxxxxxxxxxxol3986sS=
HPCC git support
ecl run hthor --main demo.main@ghalliday/gch-ecldemo-d#version1 --server=...
Credentials for local development
gh auth login
Configuring eclccserver
Kubernetes
eclccserver:
+- name: myeclccserver
+ gitUsername: ghalliday
secrets:
+ git:
+ ghalliday: my-git-secret
apiVersion: v1
+kind: Secret
+metadata:
+ name: my-git-secret
+type: Opaque
+stringData:
+ password: ghp_eZLHeuoHxxxxxxxxxxxxxxxxxxxxol3986sS=
kubectl apply -f ~/dev/hpcc/helm/secrets/my-git-secret
Bare-metal
export HPCC_GIT_USERNAME=ghalliday
$cat /opt/HPCCSystems/secrets/git/ghalliday/password
+ghp_eZLHeuoHxxxxxxxxxxxxxxxxxxxxol3986sS=
LDAP Security Manager Init
LDAP Instances
Initialization Steps
Load Configuration
AD Hosts
AD Credentials
Retrieve Server Information from the AD
Connections
Connection Pool
Handling AD Hosts
LDAP Security Manager Init
LDAP Instances
Initialization Steps
Load Configuration
AD Hosts
AD Credentials
Retrieve Server Information from the AD
Connections
Connection Pool
Handling AD Hosts
Introduction
Main Structure
The page bitmap
IRowManager
Heaps
Huge Heap
Specialised Heaps:
Packed
Unique
Blocked
Scanning
Delay Release
Dynamic Spilling
Complications
Callback Rules
Resizing Large memory blocks
Compacting heaps
Shared Memory
Huge pages
Global memory and channels
Introduction
Main Structure
The page bitmap
IRowManager
Heaps
Huge Heap
Specialised Heaps:
Packed
Unique
Blocked
Scanning
Delay Release
Dynamic Spilling
Complications
Callback Rules
Resizing Large memory blocks
Compacting heaps
Shared Memory
Huge pages
Global memory and channels
Metrics Framework Design
Introduction
Definitions
Use Scenarios
Roxie
ESP
Dali Use Cases
Framework Design
Framework Implementation
Metrics
Sinks
Metrics Reporter
Metrics Implementations
Counter Metric
Gauge Metric
Custom Metric
Histogram Metric
Scaled Histogram Metric
Configuration
component:
+ metrics:
+ sinks:
+ - type: <sink_type>
+ name: <sink name>
+ settings:
+ sink_setting1: sink_setting_value1
+ sink_setting2: sink_setting_value2
Metric Naming
Base Name
Meta Data
Component Instrumentation
using namespace hpccMetrics;
+MetricsManager &metricsManager = queryMetricsManager();
std::shared_ptr<CounterMetric> pCounter = std::make_shared<CounterMetric>("metricName", "description");
+metricsManager.add(pCounter);
pCounter->inc(1);
auto pCount = createMetricAndAddToManager<CounterMetric>("metricName", "description");
+
auto pCustomMetric = createCustomMetricAndAddToManager("customName", "description", metricType, value);
+
Adding Metric Meta Data
MetricMetaData metaData1{{"key1", "value1"}};
+std::shared_ptr<CounterMetric> pCounter1 =
+ std::make_shared<CounterMetric>("requests.completed", "description", SMeasureCount, metaData1);
+
+std::shared_ptr<CounterMetric> pCounter2 =
+ std::make_shared<CounterMetric>("requests.completed", "description", SMeasureCount, MetricMetaData{{"key1", "value2"}});
Metric Units
`,132)]))}const u=t(n,[["render",r]]);export{m as __pageData,u as default};
diff --git a/assets/devdoc_Metrics.md.BVV7YlV-.lean.js b/assets/devdoc_Metrics.md.BVV7YlV-.lean.js
new file mode 100644
index 00000000000..e83404d25fc
--- /dev/null
+++ b/assets/devdoc_Metrics.md.BVV7YlV-.lean.js
@@ -0,0 +1,17 @@
+import{_ as t,c as i,a3 as s,o as a}from"./chunks/framework.DkhCEVKm.js";const m=JSON.parse('{"title":"Metrics Framework Design","description":"","frontmatter":{},"headers":[],"relativePath":"devdoc/Metrics.md","filePath":"devdoc/Metrics.md","lastUpdated":1731340314000}'),n={name:"devdoc/Metrics.md"};function r(o,e,l,h,c,p){return a(),i("div",null,e[0]||(e[0]=[s(`Metrics Framework Design
Introduction
Definitions
Use Scenarios
Roxie
ESP
Dali Use Cases
Framework Design
Framework Implementation
Metrics
Sinks
Metrics Reporter
Metrics Implementations
Counter Metric
Gauge Metric
Custom Metric
Histogram Metric
Scaled Histogram Metric
Configuration
component:
+ metrics:
+ sinks:
+ - type: <sink_type>
+ name: <sink name>
+ settings:
+ sink_setting1: sink_setting_value1
+ sink_setting2: sink_setting_value2
Metric Naming
Base Name
Meta Data
Component Instrumentation
using namespace hpccMetrics;
+MetricsManager &metricsManager = queryMetricsManager();
std::shared_ptr<CounterMetric> pCounter = std::make_shared<CounterMetric>("metricName", "description");
+metricsManager.add(pCounter);
pCounter->inc(1);
auto pCount = createMetricAndAddToManager<CounterMetric>("metricName", "description");
+
auto pCustomMetric = createCustomMetricAndAddToManager("customName", "description", metricType, value);
+
Adding Metric Meta Data
MetricMetaData metaData1{{"key1", "value1"}};
+std::shared_ptr<CounterMetric> pCounter1 =
+ std::make_shared<CounterMetric>("requests.completed", "description", SMeasureCount, metaData1);
+
+std::shared_ptr<CounterMetric> pCounter2 =
+ std::make_shared<CounterMetric>("requests.completed", "description", SMeasureCount, MetricMetaData{{"key1", "value2"}});
Metric Units
`,132)]))}const u=t(n,[["render",r]]);export{m as __pageData,u as default};
diff --git a/assets/devdoc_NewFileProcessing.md.DGLPSf2v.js b/assets/devdoc_NewFileProcessing.md.DGLPSf2v.js
new file mode 100644
index 00000000000..064d17d7d1b
--- /dev/null
+++ b/assets/devdoc_NewFileProcessing.md.DGLPSf2v.js
@@ -0,0 +1,14 @@
+import{_ as t,c as a,a3 as i,o}from"./chunks/framework.DkhCEVKm.js";const f=JSON.parse('{"title":"Storage planes","description":"","frontmatter":{},"headers":[],"relativePath":"devdoc/NewFileProcessing.md","filePath":"devdoc/NewFileProcessing.md","lastUpdated":1731340314000}'),r={name:"devdoc/NewFileProcessing.md"};function s(n,e,l,p,h,c){return o(),a("div",null,e[0]||(e[0]=[i(`Storage planes
planes:
+
+: name: \\<required\\> prefix: \\<path\\> \\# Root directory for
+ accessing the plane (if pvc defined), or url to access plane.
+ numDevices: 1 \\# number of devices that are part of the plane
+ hostGroup: \\<name\\> \\# Name of the host group for bare metal
+ hosts: \\[ host-names \\] \\# A list of host names for bare metal
+ secret: \\<secret-id\\> \\# what secret is required to access the
+ files. options: \\# not sure if it is needed
+
Files
part: \\# optional information about each of the file parts (Cannot
+implement virtual file position without this) - numRows: \\<count\\>
+\\# number of rows in the file part rawSize: \\<size\\> \\# uncompressed
+size of the file part diskSize: \\<size\\> \\# size of the part on disk
+
Functions
Examples
Milestones:
File reading refactoring
DFU server
Storage planes
planes:
+
+: name: \\<required\\> prefix: \\<path\\> \\# Root directory for
+ accessing the plane (if pvc defined), or url to access plane.
+ numDevices: 1 \\# number of devices that are part of the plane
+ hostGroup: \\<name\\> \\# Name of the host group for bare metal
+ hosts: \\[ host-names \\] \\# A list of host names for bare metal
+ secret: \\<secret-id\\> \\# what secret is required to access the
+ files. options: \\# not sure if it is needed
+
Files
part: \\# optional information about each of the file parts (Cannot
+implement virtual file position without this) - numRows: \\<count\\>
+\\# number of rows in the file part rawSize: \\<size\\> \\# uncompressed
+size of the file part diskSize: \\<size\\> \\# size of the part on disk
+
Functions
Examples
Milestones:
File reading refactoring
DFU server
Developer Documentation
General documentation
Implementation details for different parts of the system
Other documentation
Developer Documentation
General documentation
Implementation details for different parts of the system
Other documentation
Security Configuration
Supported Configurations
Security Managers
LDAP
Value Example Meaning adminGroupName HPCCAdmins Group name containing admin users for the AD cacheTimeout 60 Timeout in minutes to keep cached security data ldapCipherSuite N/A Used when AD is not up to date with latest SSL libs.
AD admin must provideldapPort 389 (default) Insecure port ldapSecurePort 636 (default) Secure port over TLS ldapProtocol ldap ldap for insecure (default), using ldapPort
ldaps for secure using ldapSecurePortldapTimeoutSec 60 (default 5 for debug, 60 otherwise) Connection timeout to an AD before rollint to next AD serverType ActiveDirectory Identifies the type of AD server. (2) filesBasedn ou=files,ou=ecl_kr,DC=z0lpf,DC=onmicrosoft,DC=com DN where filescopes are stored groupsBasedn ou=groups,ou=ecl_kr,DC=z0lpf,DC=onmicrosoft,DC=com DN where groups are stored modulesBaseDn ou=modules,ou=ecl_kr,DC=z0lpf,DC=onmicrosoft,DC=com DN where permissions for resource are stored (1) systemBasedn OU=AADDC Users,DC=z0lpf,DC=onmicrosoft,DC=com DN where the system user is stored usersBasedn OU=AADDC Users,DC=z0lpf,DC=onmicrosoft,DC=com DN where users are stored (3) systemUser hpccAdmin Appears to only be used for IPlanet type ADs, but may still be required systemCommonName hpccAdmin AD username of user to proxy all AD operations systemPassword System user password AD user password ldapAdminSecretKey none Key for Kubernetes secrets (4) (5) ldapAdminVaultId none Vault ID used to load system username and password (5) ldapDomain none Appears to be a comma separated version of the AD domain name components (5) ldapAddress 192.168.10.42 IP address to the AD commonBasedn DC=z0lpf,DC=onmicrosoft,DC=com Overrides the domain retrieved from the AD for the system user (5) templateName none Template used when adding resources (5) authMethod none Not sure yet Plugin Security Managers
httpasswd Security Manager
Single User Security Manager
JWT Security Manager
Security Configuration
Supported Configurations
Security Managers
LDAP
Value Example Meaning adminGroupName HPCCAdmins Group name containing admin users for the AD cacheTimeout 60 Timeout in minutes to keep cached security data ldapCipherSuite N/A Used when AD is not up to date with latest SSL libs.
AD admin must provideldapPort 389 (default) Insecure port ldapSecurePort 636 (default) Secure port over TLS ldapProtocol ldap ldap for insecure (default), using ldapPort
ldaps for secure using ldapSecurePortldapTimeoutSec 60 (default 5 for debug, 60 otherwise) Connection timeout to an AD before rollint to next AD serverType ActiveDirectory Identifies the type of AD server. (2) filesBasedn ou=files,ou=ecl_kr,DC=z0lpf,DC=onmicrosoft,DC=com DN where filescopes are stored groupsBasedn ou=groups,ou=ecl_kr,DC=z0lpf,DC=onmicrosoft,DC=com DN where groups are stored modulesBaseDn ou=modules,ou=ecl_kr,DC=z0lpf,DC=onmicrosoft,DC=com DN where permissions for resource are stored (1) systemBasedn OU=AADDC Users,DC=z0lpf,DC=onmicrosoft,DC=com DN where the system user is stored usersBasedn OU=AADDC Users,DC=z0lpf,DC=onmicrosoft,DC=com DN where users are stored (3) systemUser hpccAdmin Appears to only be used for IPlanet type ADs, but may still be required systemCommonName hpccAdmin AD username of user to proxy all AD operations systemPassword System user password AD user password ldapAdminSecretKey none Key for Kubernetes secrets (4) (5) ldapAdminVaultId none Vault ID used to load system username and password (5) ldapDomain none Appears to be a comma separated version of the AD domain name components (5) ldapAddress 192.168.10.42 IP address to the AD commonBasedn DC=z0lpf,DC=onmicrosoft,DC=com Overrides the domain retrieved from the AD for the system user (5) templateName none Template used when adding resources (5) authMethod none Not sure yet Plugin Security Managers
httpasswd Security Manager
Single User Security Manager
JWT Security Manager
User Authentication
Security Manager User Authentication
LDAP
HTPasswd
Single User
User Authentication During Authorization
User Authentication
Security Manager User Authentication
LDAP
HTPasswd
Single User
User Authentication During Authorization
Coding conventions
Why coding conventions?
C++ coding conventions
Source files
Java-style
Identifiers
class MySQLSuperClass
+{
+ bool haslocalcopy = false;
+ void mySQLFunctionIsCool(int _haslocalcopy, bool enablewrite)
+ {
+ if (enablewrite)
+ haslocalcopy = _haslocalcopy;
+ }
+};
Pointers
Shared
pointers for member variables - unless there is a strong guarantee the object has a longer lifetime.Shared<X>
with either: Owned<X>
: if your new pointer will take ownership of the pointerLinked<X>
: if you are sharing the ownership (shared)Shared<>
pointers to lose the pointers, so subsequent calls to it (like o2->doIt()
after o3 gets ownership) will cause segmentation faults.Shared<>
.Indentation
Comments
Classes
Namespaces
Other
C++11
Other coding conventions
ECL code
Javascript, XML, XSL etc
Design Patterns
Why Design Patterns?
Interfaces
CFoo : implements IFoo { };
CFooCool : implements IFoo { };
+CFooWarm : implements IFoo { };
+CFooALot : implements IFoo { };
CFoo : implements IFoo { };
+CFooCool : public CFoo { };
+CFooWarm : public CFoo { };
interface IFoo
+{
+ virtual void foo()=0;
+};
+
+// Following is implemented in a separate private file...
+class CFoo : implements IFoo
+{
+ MyImpl *pImpl;
+public:
+ virtual void foo() override { pImpl->doSomething(); }
+};
interface ICommon
+{
+ virtual void common()=0;
+};
+interface IFoo : extends ICommon
+{
+ virtual void foo()=0;
+};
+interface IBar : extends ICommon
+{
+ virtual void bar()=0;
+};
+
+template <class IFACE>
+class Base : implements IFACE
+{
+ virtual void common() override { ... };
+}; // Still virtual
+
+class CFoo : public Base<IFoo>
+{
+ void foo() override { 1+1; };
+};
+class CBar : public Base<IBar>
+{
+ void bar() override { 2+2; };
+};
Reference counted objects
Shared<>
is an in-house intrusive smart pointer implementation. It is close to boost's intrusive_ptr. It has two derived implementations: Linked
and Owned
, which are used to control whether the pointer is linked when a shared pointer is created from a real pointer or not, respectively. Ex:Owned<Foo> myFoo = new Foo; // Take owenership of the pointers
+Linked<Foo> anotherFoo = = myFoo; // Shared ownership
Shared<>
is thread-safe and uses atomic reference count handled by each object (rather than by the smart pointer itself, like boost's shared_ptr).Shared<>
, your class must implement the Link() and Release() methods - most commonly by extending the CInterfaceOf<> class, or the CInterface class (and using the IMPLEMENT_IINTERFACE macro in the public section of your class declaration).STL
Structure of the HPCC source tree
Coding conventions
Why coding conventions?
C++ coding conventions
Source files
Java-style
Identifiers
class MySQLSuperClass
+{
+ bool haslocalcopy = false;
+ void mySQLFunctionIsCool(int _haslocalcopy, bool enablewrite)
+ {
+ if (enablewrite)
+ haslocalcopy = _haslocalcopy;
+ }
+};
Pointers
Shared
pointers for member variables - unless there is a strong guarantee the object has a longer lifetime.Shared<X>
with either: Owned<X>
: if your new pointer will take ownership of the pointerLinked<X>
: if you are sharing the ownership (shared)Shared<>
pointers to lose the pointers, so subsequent calls to it (like o2->doIt()
after o3 gets ownership) will cause segmentation faults.Shared<>
.Indentation
Comments
Classes
Namespaces
Other
C++11
Other coding conventions
ECL code
Javascript, XML, XSL etc
Design Patterns
Why Design Patterns?
Interfaces
CFoo : implements IFoo { };
CFooCool : implements IFoo { };
+CFooWarm : implements IFoo { };
+CFooALot : implements IFoo { };
CFoo : implements IFoo { };
+CFooCool : public CFoo { };
+CFooWarm : public CFoo { };
interface IFoo
+{
+ virtual void foo()=0;
+};
+
+// Following is implemented in a separate private file...
+class CFoo : implements IFoo
+{
+ MyImpl *pImpl;
+public:
+ virtual void foo() override { pImpl->doSomething(); }
+};
interface ICommon
+{
+ virtual void common()=0;
+};
+interface IFoo : extends ICommon
+{
+ virtual void foo()=0;
+};
+interface IBar : extends ICommon
+{
+ virtual void bar()=0;
+};
+
+template <class IFACE>
+class Base : implements IFACE
+{
+ virtual void common() override { ... };
+}; // Still virtual
+
+class CFoo : public Base<IFoo>
+{
+ void foo() override { 1+1; };
+};
+class CBar : public Base<IBar>
+{
+ void bar() override { 2+2; };
+};
Reference counted objects
Shared<>
is an in-house intrusive smart pointer implementation. It is close to boost's intrusive_ptr. It has two derived implementations: Linked
and Owned
, which are used to control whether the pointer is linked when a shared pointer is created from a real pointer or not, respectively. Ex:Owned<Foo> myFoo = new Foo; // Take owenership of the pointers
+Linked<Foo> anotherFoo = = myFoo; // Shared ownership
Shared<>
is thread-safe and uses atomic reference count handled by each object (rather than by the smart pointer itself, like boost's shared_ptr).Shared<>
, your class must implement the Link() and Release() methods - most commonly by extending the CInterfaceOf<> class, or the CInterface class (and using the IMPLEMENT_IINTERFACE macro in the public section of your class declaration).STL
Structure of the HPCC source tree
Build Assets for individual developer
Build Assets
hpcc-systems
user repository is hpcc-systems/HPCC-Platform/tags.Dependent variables
settings
tab in your forked repository, and then clicking on the Secrets and Variables - Actions
drop down under Security
on the lefthand side of the settings screen.New Repository Secret
button. The following secrets are needed;Generating the windows signing certificate
openssl req -x509 -sha256 -days 365 -nodes -newkey rsa:2048 -subj "/CN=example.com/C=US/L=Boca Raton" -keyout rootCA.key -out rootCA.crt
openssl genrsa -out server.key 2048
cat > csr.conf <<EOF
+[ req ]
+default_bits = 2048
+prompt = no
+default_md = sha256
+req_extensions = req_ext
+distinguished_name = dn
+
+[ dn ]
+C = US
+ST = Florida
+L = Boca Raton
+O = LexisNexis Risk
+OU = HPCCSystems Development
+CN = example.com
+
+[ req_ext ]
+subjectAltName = @alt_names
+
+[ alt_names ]
+DNS.1 = example.com
+IP.1 = 127.0.0.1
+
+EOF
openssl req -new -key server.key -out server.csr -config csr.conf
cat > cert.conf <<EOF
+
+authorityKeyIdentifier=keyid,issuer
+basicConstraints=CA:FALSE
+keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
+subjectAltName = @alt_names
+
+[ alt_names ]
+DNS.1 = example.com
+
+EOF
openssl x509 -req -in server.csr -CA rootCA.crt -CAkey rootCA.key -CAcreateserial -out server.crt -days 365 -sha256 -extfile cert.conf
openssl pkcs12 -inkey server.key -in server.crt -export -name "hpcc_sign_cert" -out hpcc_sign_cert.pfx
base64 hpcc_sign_cert.pfx > hpcc_sign_cert.base64
base64 -i hpcc_sign_cert.pfx -o hpcc_sign_cert.base64
cat
the output of hpcc_sign_cert.base64 and copy the output into the variable SIGNING_CERTIFICATE in Github Actions.Generating a signing key for linux builds
gpg --full-generate-key
RSA and RSA default
.4096
.0 = key does not expire
.Github actions key for signing linux builds
.gpg --output private.pgp --armor --export-secret-key <email-address-used>
.Starting a build
community_HPCC-12345-rc1
or HPCC-12345-rc1
.community_
then you must tag LN with internal_
and ECLIDE with eclide_
. Otherwise just use the Jira tag in all three repositories.Asset output
Build Assets for individual developer
Build Assets
hpcc-systems
user repository is hpcc-systems/HPCC-Platform/tags.Dependent variables
settings
tab in your forked repository, and then clicking on the Secrets and Variables - Actions
drop down under Security
on the lefthand side of the settings screen.New Repository Secret
button. The following secrets are needed;Generating the windows signing certificate
openssl req -x509 -sha256 -days 365 -nodes -newkey rsa:2048 -subj "/CN=example.com/C=US/L=Boca Raton" -keyout rootCA.key -out rootCA.crt
openssl genrsa -out server.key 2048
cat > csr.conf <<EOF
+[ req ]
+default_bits = 2048
+prompt = no
+default_md = sha256
+req_extensions = req_ext
+distinguished_name = dn
+
+[ dn ]
+C = US
+ST = Florida
+L = Boca Raton
+O = LexisNexis Risk
+OU = HPCCSystems Development
+CN = example.com
+
+[ req_ext ]
+subjectAltName = @alt_names
+
+[ alt_names ]
+DNS.1 = example.com
+IP.1 = 127.0.0.1
+
+EOF
openssl req -new -key server.key -out server.csr -config csr.conf
cat > cert.conf <<EOF
+
+authorityKeyIdentifier=keyid,issuer
+basicConstraints=CA:FALSE
+keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
+subjectAltName = @alt_names
+
+[ alt_names ]
+DNS.1 = example.com
+
+EOF
openssl x509 -req -in server.csr -CA rootCA.crt -CAkey rootCA.key -CAcreateserial -out server.crt -days 365 -sha256 -extfile cert.conf
openssl pkcs12 -inkey server.key -in server.crt -export -name "hpcc_sign_cert" -out hpcc_sign_cert.pfx
base64 hpcc_sign_cert.pfx > hpcc_sign_cert.base64
base64 -i hpcc_sign_cert.pfx -o hpcc_sign_cert.base64
cat
the output of hpcc_sign_cert.base64 and copy the output into the variable SIGNING_CERTIFICATE in Github Actions.Generating a signing key for linux builds
gpg --full-generate-key
RSA and RSA default
.4096
.0 = key does not expire
.Github actions key for signing linux builds
.gpg --output private.pgp --armor --export-secret-key <email-address-used>
.Starting a build
community_HPCC-12345-rc1
or HPCC-12345-rc1
.community_
then you must tag LN with internal_
and ECLIDE with eclide_
. Otherwise just use the Jira tag in all three repositories.Asset output
Current versions
name version current 9.8.x previous 9.6.x critical 9.4.x security 9.2.x Supported versions
Patches and images
Package versions.
Current versions
name version current 9.8.x previous 9.6.x critical 9.4.x security 9.2.x Supported versions
Patches and images
Package versions.
Understanding workunits
Introduction
Contents of a workunit
How is the workunit used?
Example
STRING searchName := 'Smith' : STORED('searchName');
+nameIndex := INDEX({ STRING40 name, STRING80 address }, 'names');
+results := nameIndex(KEYED(name = searchName));
+OUTPUT(results);
+OUTPUT('Done!');
Workunit Main Elements
Workflow
<Workflow>
+ <Item .... wfid="1"/>
+ <Item .... wfid="2">
+ <Dependency wfid="1"/>
+ <Schedule/>
+ </Item>
+</Workflow>
MyProcess
struct MyEclProcess : public EclProcess {
+ virtual int perform(IGlobalCodeContext * gctx, unsigned wfid) {
+ ....
+ switch (wfid) {
+ case 1U:
+ ... code for workflow item 1 ...
+ case 2U:
+ ... code for workflow item 2 ...
+ break;
+ }
+ return 2U;
+ }
+};
extern "C" ECL_API IEclProcess* createProcess()
+{
+ return new MyEclProcess;
+}
Graph
<Graph name="graph1" type="activities">
+ <xgmml>
+ <graph wfid="2">
+ <node id="1">
+ <att>
+ <graph>
+ <att name="rootGraph" value="1"/>
+ <edge id="2_0" source="2" target="3"/>
+ <node id="2" label="Index Read 'names'">
+ ... attributes for activity 2 ...
+ </node>
+ <node id="3" label="Output Result #1">
+ ... attributes for activity 3 ...
+ </node>
+ </graph>
+ </att>
+ </node>
+ </graph>
+ </xgmml>
+</Graph>
Generated Activity Helpers
struct cAc2 : public CThorIndexReadArg {
+ ... Implementation of the helper for activity #2 ...
+};
+extern "C" ECL_API IHThorArg * fAc2() { return new cAc2; }
+
+struct cAc3 : public CThorWorkUnitWriteArg {
+ ... Implementation of the helper for activity #3 ...
+};
+extern "C" ECL_API IHThorArg * fAc3() { return new cAc3; }
Other
Options
<Debug>
+ <addtimingtoworkunit>0</addtimingtoworkunit>
+ <noterecordsizeingraph>1</noterecordsizeingraph>
+ <showmetaingraph>1</showmetaingraph>
+ <showrecordcountingraph>1</showrecordcountingraph>
+ <spanmultiplecpp>0</spanmultiplecpp>
+ <targetclustertype>hthor</targetclustertype>
+</Debug>
Input Parameters
<Variables>
+ <Variable name="searchname">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ searchnameñÿÿÿasciiascii
+ </SchemaRaw>
+ </Variable>
+</Variables>
Results
<Results>
+ <Result isScalar="0"
+ name="Result 1"
+ recordSizeEntry="mf1"
+ rowLimit="-1"
+ sequence="0">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ name(asciiasciiaddressPasciiascii%{ string40 name, string80 address }; </SchemaRaw>
+ </Result>
+ <Result name="Result 2" sequence="1">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ Result_2ñÿÿÿasciiascii </SchemaRaw>
+ </Result>
+</Results>
Timings and Statistics
<Statistics>
+ <Statistic c="eclcc"
+ count="1"
+ creator="eclcc"
+ kind="SizePeakMemory"
+ s="compile"
+ scope=">compile"
+ ts="1428933081084000"
+ unit="sz"
+ value="27885568"/>
+</Statistics>
Manifests
Stages of Execution
Queues
Workflow
<Workflow>
+ <Item mode="normal"
+ state="null"
+ type="normal"
+ wfid="1"/>
+ <Item mode="normal"
+ state="reqd"
+ type="normal"
+ wfid="2">
+ <Dependency wfid="1"/>
+ <Schedule/>
+ </Item>
+</Workflow>
switch (wfid) {
+ case 1U:
+ if (!gctx->isResult("searchname",4294967295U)) {
+ ctx->setResultString("searchname",4294967295U,5U,"Smith");
+ }
+ break;
+ break;
+}
switch (wfid) {
+ case 2U: {
+ ctx->executeGraph("graph1",false,0,NULL);
+ ctx->setResultString(0,1U,5U,"Done!");
+ }
+ break;
+}
Specialised Workflow Items
Graph Execution
Details of the graph structure
<Graph name="graph1" type="activities">
+ <xgmml>
+ <graph wfid="2">
+ <node id="1">
+ <att>
+ <graph>
+ <att name="rootGraph" value="1"/>
+ <edge id="2_0" source="2" target="3"/>
+ <node id="2" label="Index Read 'names'">
+ <att name="definition" value="workuniteg1.ecl(3,1)"/>
+ <att name="name" value="results"/>
+ <att name="_kind" value="77"/>
+ <att name="ecl" value="INDEX({ string40 name, string80 address }, 'names', fileposition(false)); FILTER(KEYED(name = STORED('searchname'))); "/>
+ <att name="recordSize" value="120"/>
+ <att name="predictedCount" value="0..?[disk]"/>
+ <att name="_fileName" value="names"/>
+ </node>
+ <node id="3" label="Output Result #1">
+ <att name="definition" value="workuniteg1.ecl(4,1)"/>
+ <att name="_kind" value="16"/>
+ <att name="ecl" value="OUTPUT(..., workunit); "/>
+ <att name="recordSize" value="120"/>
+ </node>
+ </graph>
+ </att>
+ </node>
+ </graph>
+ </xgmml>
+</Graph>
<edge id="<source-activity-id>_<output-count>" source="<source-activity-id>" target="<target-activity-id">
+
+There is only one edge in our example workunit: \\<edge id="2\\_0"
+source="2" target="3"/\\>.
+
<edge id="<source-activity-id>_<target-activity-id>" source="<source-subgraph-id>" target="<target-subgraph-id>"
+ <att name="_sourceActivity" value="<source-activity-id>"/>
+ <att name="_targetActivity" value="<target-activity-id>"/>
+ </edge>
+
+Roxie often optimizes spilled datasets and treats these edges as
+equivalent to the edges between activities.
+
<att name="_dependsOn" value="1"/>
+
struct cAc2 : public CThorIndexReadArg {
+ virtual unsigned getFormatCrc() {
+ return 470622073U;
+ }
+ virtual bool getIndexLayout(size32_t & __lenResult, void * & __result) { getLayout5(__lenResult, __result, ctx); return true; }
+ virtual IOutputMetaData * queryDiskRecordSize() { return &mx1; }
+ virtual IOutputMetaData * queryOutputMeta() { return &mx1; }
+ virtual void onCreate(ICodeContext * _ctx, IHThorArg *, MemoryBuffer * in) {
+ ctx = _ctx;
+ ctx->getResultString(v2,v1.refstr(),"searchname",4294967295U);
+ }
+ rtlDataAttr v1;
+ unsigned v2;
+ virtual const char * getFileName() {
+ return "names";
+ }
+ virtual void createSegmentMonitors(IIndexReadContext *irc) {
+ Owned<IStringSet> set3;
+ set3.setown(createRtlStringSet(40));
+ char v4[40];
+ rtlStrToStr(40U,v4,v2,v1.getstr());
+ if (rtlCompareStrStr(v2,v1.getstr(),40U,v4) == 0) {
+ set3->addRange(v4,v4);
+ }
+ irc->append(createKeySegmentMonitor(false, set3.getClear(), 0, 40));
+ irc->append(createWildKeySegmentMonitor(40, 80));
+ }
+ virtual size32_t transform(ARowBuilder & crSelf, const void * _left) {
+ crSelf.getSelf();
+ unsigned char * left = (unsigned char *)_left;
+ memcpy(crSelf.row() + 0U,left + 0U,120U);
+ return 120U;
+ }
+};
Executing the graph
<att name="rootGraph" value="1"/>
is a root subgraph. An activity within a subgraph that has no outputs is called a 'sink' (and an activity without any inputs is called a 'source').struct mi1 : public CFixedOutputMetaData {
+ inline mi1() : CFixedOutputMetaData(120) {}
+ virtual const RtlTypeInfo * queryTypeInfo() const { return &ty1; }
+} mx1;
This represents a fixed size row that occupies 120 bytes. The object
+returned by the queryTypeInfo() function provides information about
+the types of the fields:
+
const RtlStringTypeInfo ty2(0x4,40);
+const RtlFieldStrInfo rf1("name",NULL,&ty2);
+const RtlStringTypeInfo ty3(0x4,80);
+const RtlFieldStrInfo rf2("address",NULL,&ty3);
+const RtlFieldInfo * const tl4[] = { &rf1,&rf2, 0 };
+const RtlRecordTypeInfo ty1(0xd,120,tl4);
I.e. a string column, length 40 called "name", followed by a
+string column, length 80 called "address". The interface
+IOutputMetaData in eclhelper.hpp is key to understanding how the
+rows are processed.
+
Appendix
Key types and interfaces from eclhelper.hpp
Glossary
Full text of the workunit XML
<W_LOCAL buildVersion="internal_5.3.0-closedown0"
+ cloneable="1"
+ codeVersion="157"
+ eclVersion="5.3.0"
+ hash="2344844820"
+ state="completed"
+ xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance">
+ <Debug>
+ <addtimingtoworkunit>0</addtimingtoworkunit>
+ <debugnlp>1</debugnlp>
+ <expandpersistinputdependencies>1</expandpersistinputdependencies>
+ <forcegenerate>1</forcegenerate>
+ <noterecordsizeingraph>1</noterecordsizeingraph>
+ <regressiontest>1</regressiontest>
+ <showmetaingraph>1</showmetaingraph>
+ <showrecordcountingraph>1</showrecordcountingraph>
+ <spanmultiplecpp>0</spanmultiplecpp>
+ <targetclustertype>hthor</targetclustertype>
+ </Debug>
+ <Graphs>
+ <Graph name="graph1" type="activities">
+ <xgmml>
+ <graph wfid="2">
+ <node id="1">
+ <att>
+ <graph>
+ <att name="rootGraph" value="1" />
+ <edge id="2_0" source="2" target="3" />
+ <node id="2" label="Index Read 'names'">
+ <att name="definition" value="workuniteg1.ecl(3,1)" />
+ <att name="name" value="results" />
+ <att name="_kind" value="77" />
+ <att name="ecl"
+ value="INDEX({ string40 name, string80 address }, 'names', fileposition(false)); FILTER(KEYED(name = STORED('searchname'))); " />
+ <att name="recordSize" value="120" />
+ <att name="predictedCount" value="0..?[disk]" />
+ <att name="_fileName" value="names" />
+ </node>
+ <node id="3" label="Output Result #1">
+ <att name="definition" value="workuniteg1.ecl(4,1)" />
+ <att name="_kind" value="16" />
+ <att name="ecl" value="OUTPUT(..., workunit); " />
+ <att name="recordSize" value="120" />
+ </node>
+ </graph>
+ </att>
+ </node>
+ </graph>
+ </xgmml>
+ </Graph>
+ </Graphs>
+ <Query fetchEntire="1">
+ <Associated>
+ <File desc="workuniteg1.ecl.cpp"
+ filename="c:\\regressout\\workuniteg1.ecl.cpp"
+ ip="10.121.159.73"
+ type="cpp" />
+ </Associated>
+ </Query>
+ <Results>
+ <Result isScalar="0"
+ name="Result 1"
+ recordSizeEntry="mf1"
+ rowLimit="-1"
+ sequence="0">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ name(asciiasciiaddressPasciiascii%{
+ string40 name, string80 address }; </SchemaRaw>
+ </Result>
+ <Result name="Result 2" sequence="1">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ Result_2ñÿÿÿasciiascii
+ </SchemaRaw>
+ </Result>
+ </Results>
+ <Statistics>
+ <Statistic c="eclcc"
+ count="1"
+ creator="eclcc"
+ kind="SizePeakMemory"
+ s="compile"
+ scope=">compile"
+ ts="1428933081084000"
+ unit="sz"
+ value="27885568" />
+ </Statistics>
+ <Variables>
+ <Variable name="searchname">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ searchnameñÿÿÿasciiascii
+ </SchemaRaw>
+ </Variable>
+ </Variables>
+ <Workflow>
+ <Item mode="normal"
+ state="null"
+ type="normal"
+ wfid="1" />
+ <Item mode="normal"
+ state="reqd"
+ type="normal"
+ wfid="2">
+ <Dependency wfid="1" />
+ <Schedule />
+ </Item>
+ </Workflow>
+</W_LOCAL>
Full contents of the generated C++ (as a single file)
/* Template for generating thor/hthor/roxie output */
+#include "eclinclude4.hpp"
+#include "eclrtl.hpp"
+#include "rtlkey.hpp"
+
+extern RTL_API void rtlStrToStr(size32_t lenTgt,void * tgt,size32_t lenSrc,const void * src);
+extern RTL_API int rtlCompareStrStr(size32_t lenL,const char * l,size32_t lenR,const char * r);
+
+
+const RtlStringTypeInfo ty2(0x4,40);
+const RtlFieldStrInfo rf1("name",NULL,&ty2);
+const RtlStringTypeInfo ty3(0x4,80);
+const RtlFieldStrInfo rf2("address",NULL,&ty3);
+const RtlFieldInfo * const tl4[] = { &rf1,&rf2, 0 };
+const RtlRecordTypeInfo ty1(0xd,120,tl4);
+void getLayout5(size32_t & __lenResult, void * & __result, IResourceContext * ctx) {
+ rtlStrToDataX(__lenResult,__result,87U,"\\000R\\000\\000\\000\\001x\\000\\000\\000\\002\\000\\000\\000\\003\\004\\000\\000\\000name\\004(\\000\\000\\000\\001ascii\\000\\001ascii\\000\\000\\000\\000\\000\\000\\003\\007\\000\\000\\000address\\004P\\000\\000\\000\\001ascii\\000\\001ascii\\000\\000\\000\\000\\000\\000\\002\\000\\000\\000");
+}
+struct mi1 : public CFixedOutputMetaData {
+ inline mi1() : CFixedOutputMetaData(120) {}
+ virtual const RtlTypeInfo * queryTypeInfo() const { return &ty1; }
+} mx1;
+extern "C" ECL_API IOutputMetaData * mf1() { mx1.Link(); return &mx1; }
+
+struct cAc2 : public CThorIndexReadArg {
+ virtual unsigned getFormatCrc() {
+ return 470622073U;
+ }
+ virtual bool getIndexLayout(size32_t & __lenResult, void * & __result) { getLayout5(__lenResult, __result, ctx); return true; }
+ virtual IOutputMetaData * queryDiskRecordSize() { return &mx1; }
+ virtual IOutputMetaData * queryOutputMeta() { return &mx1; }
+ virtual void onCreate(ICodeContext * _ctx, IHThorArg *, MemoryBuffer * in) {
+ ctx = _ctx;
+ ctx->getResultString(v2,v1.refstr(),"searchname",4294967295U);
+ }
+ rtlDataAttr v1;
+ unsigned v2;
+ virtual const char * getFileName() {
+ return "names";
+ }
+ virtual void createSegmentMonitors(IIndexReadContext *irc) {
+ Owned<IStringSet> set3;
+ set3.setown(createRtlStringSet(40));
+ char v4[40];
+ rtlStrToStr(40U,v4,v2,v1.getstr());
+ if (rtlCompareStrStr(v2,v1.getstr(),40U,v4) == 0) {
+ set3->addRange(v4,v4);
+ }
+ irc->append(createKeySegmentMonitor(false, set3.getClear(), 0, 40));
+ irc->append(createWildKeySegmentMonitor(40, 80));
+ }
+ virtual size32_t transform(ARowBuilder & crSelf, const void * _left) {
+ crSelf.getSelf();
+ unsigned char * left = (unsigned char *)_left;
+ memcpy(crSelf.row() + 0U,left + 0U,120U);
+ return 120U;
+ }
+};
+extern "C" ECL_API IHThorArg * fAc2() { return new cAc2; }
+struct cAc3 : public CThorWorkUnitWriteArg {
+ virtual int getSequence() { return 0; }
+ virtual IOutputMetaData * queryOutputMeta() { return &mx1; }
+ virtual void serializeXml(const byte * self, IXmlWriter & out) {
+ mx1.toXML(self, out);
+ }
+};
+extern "C" ECL_API IHThorArg * fAc3() { return new cAc3; }
+
+
+struct MyEclProcess : public EclProcess {
+ virtual unsigned getActivityVersion() const { return ACTIVITY_INTERFACE_VERSION; }
+ virtual int perform(IGlobalCodeContext * gctx, unsigned wfid) {
+ ICodeContext * ctx;
+ ctx = gctx->queryCodeContext();
+ switch (wfid) {
+ case 1U:
+ if (!gctx->isResult("searchname",4294967295U)) {
+ ctx->setResultString("searchname",4294967295U,5U,"Smith");
+ }
+ break;
+ case 2U: {
+ ctx->executeGraph("graph1",false,0,NULL);
+ ctx->setResultString(0,1U,5U,"Done!");
+ }
+ break;
+ }
+ return 2U;
+ }
+};
+
+
+extern "C" ECL_API IEclProcess* createProcess()
+{
+
+ return new MyEclProcess;
+}
Understanding workunits
Introduction
Contents of a workunit
How is the workunit used?
Example
STRING searchName := 'Smith' : STORED('searchName');
+nameIndex := INDEX({ STRING40 name, STRING80 address }, 'names');
+results := nameIndex(KEYED(name = searchName));
+OUTPUT(results);
+OUTPUT('Done!');
Workunit Main Elements
Workflow
<Workflow>
+ <Item .... wfid="1"/>
+ <Item .... wfid="2">
+ <Dependency wfid="1"/>
+ <Schedule/>
+ </Item>
+</Workflow>
MyProcess
struct MyEclProcess : public EclProcess {
+ virtual int perform(IGlobalCodeContext * gctx, unsigned wfid) {
+ ....
+ switch (wfid) {
+ case 1U:
+ ... code for workflow item 1 ...
+ case 2U:
+ ... code for workflow item 2 ...
+ break;
+ }
+ return 2U;
+ }
+};
extern "C" ECL_API IEclProcess* createProcess()
+{
+ return new MyEclProcess;
+}
Graph
<Graph name="graph1" type="activities">
+ <xgmml>
+ <graph wfid="2">
+ <node id="1">
+ <att>
+ <graph>
+ <att name="rootGraph" value="1"/>
+ <edge id="2_0" source="2" target="3"/>
+ <node id="2" label="Index Read 'names'">
+ ... attributes for activity 2 ...
+ </node>
+ <node id="3" label="Output Result #1">
+ ... attributes for activity 3 ...
+ </node>
+ </graph>
+ </att>
+ </node>
+ </graph>
+ </xgmml>
+</Graph>
Generated Activity Helpers
struct cAc2 : public CThorIndexReadArg {
+ ... Implementation of the helper for activity #2 ...
+};
+extern "C" ECL_API IHThorArg * fAc2() { return new cAc2; }
+
+struct cAc3 : public CThorWorkUnitWriteArg {
+ ... Implementation of the helper for activity #3 ...
+};
+extern "C" ECL_API IHThorArg * fAc3() { return new cAc3; }
Other
Options
<Debug>
+ <addtimingtoworkunit>0</addtimingtoworkunit>
+ <noterecordsizeingraph>1</noterecordsizeingraph>
+ <showmetaingraph>1</showmetaingraph>
+ <showrecordcountingraph>1</showrecordcountingraph>
+ <spanmultiplecpp>0</spanmultiplecpp>
+ <targetclustertype>hthor</targetclustertype>
+</Debug>
Input Parameters
<Variables>
+ <Variable name="searchname">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ searchnameñÿÿÿasciiascii
+ </SchemaRaw>
+ </Variable>
+</Variables>
Results
<Results>
+ <Result isScalar="0"
+ name="Result 1"
+ recordSizeEntry="mf1"
+ rowLimit="-1"
+ sequence="0">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ name(asciiasciiaddressPasciiascii%{ string40 name, string80 address }; </SchemaRaw>
+ </Result>
+ <Result name="Result 2" sequence="1">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ Result_2ñÿÿÿasciiascii </SchemaRaw>
+ </Result>
+</Results>
Timings and Statistics
<Statistics>
+ <Statistic c="eclcc"
+ count="1"
+ creator="eclcc"
+ kind="SizePeakMemory"
+ s="compile"
+ scope=">compile"
+ ts="1428933081084000"
+ unit="sz"
+ value="27885568"/>
+</Statistics>
Manifests
Stages of Execution
Queues
Workflow
<Workflow>
+ <Item mode="normal"
+ state="null"
+ type="normal"
+ wfid="1"/>
+ <Item mode="normal"
+ state="reqd"
+ type="normal"
+ wfid="2">
+ <Dependency wfid="1"/>
+ <Schedule/>
+ </Item>
+</Workflow>
switch (wfid) {
+ case 1U:
+ if (!gctx->isResult("searchname",4294967295U)) {
+ ctx->setResultString("searchname",4294967295U,5U,"Smith");
+ }
+ break;
+ break;
+}
switch (wfid) {
+ case 2U: {
+ ctx->executeGraph("graph1",false,0,NULL);
+ ctx->setResultString(0,1U,5U,"Done!");
+ }
+ break;
+}
Specialised Workflow Items
Graph Execution
Details of the graph structure
<Graph name="graph1" type="activities">
+ <xgmml>
+ <graph wfid="2">
+ <node id="1">
+ <att>
+ <graph>
+ <att name="rootGraph" value="1"/>
+ <edge id="2_0" source="2" target="3"/>
+ <node id="2" label="Index Read 'names'">
+ <att name="definition" value="workuniteg1.ecl(3,1)"/>
+ <att name="name" value="results"/>
+ <att name="_kind" value="77"/>
+ <att name="ecl" value="INDEX({ string40 name, string80 address }, 'names', fileposition(false)); FILTER(KEYED(name = STORED('searchname'))); "/>
+ <att name="recordSize" value="120"/>
+ <att name="predictedCount" value="0..?[disk]"/>
+ <att name="_fileName" value="names"/>
+ </node>
+ <node id="3" label="Output Result #1">
+ <att name="definition" value="workuniteg1.ecl(4,1)"/>
+ <att name="_kind" value="16"/>
+ <att name="ecl" value="OUTPUT(..., workunit); "/>
+ <att name="recordSize" value="120"/>
+ </node>
+ </graph>
+ </att>
+ </node>
+ </graph>
+ </xgmml>
+</Graph>
<edge id="<source-activity-id>_<output-count>" source="<source-activity-id>" target="<target-activity-id">
+
+There is only one edge in our example workunit: \\<edge id="2\\_0"
+source="2" target="3"/\\>.
+
<edge id="<source-activity-id>_<target-activity-id>" source="<source-subgraph-id>" target="<target-subgraph-id>"
+ <att name="_sourceActivity" value="<source-activity-id>"/>
+ <att name="_targetActivity" value="<target-activity-id>"/>
+ </edge>
+
+Roxie often optimizes spilled datasets and treats these edges as
+equivalent to the edges between activities.
+
<att name="_dependsOn" value="1"/>
+
struct cAc2 : public CThorIndexReadArg {
+ virtual unsigned getFormatCrc() {
+ return 470622073U;
+ }
+ virtual bool getIndexLayout(size32_t & __lenResult, void * & __result) { getLayout5(__lenResult, __result, ctx); return true; }
+ virtual IOutputMetaData * queryDiskRecordSize() { return &mx1; }
+ virtual IOutputMetaData * queryOutputMeta() { return &mx1; }
+ virtual void onCreate(ICodeContext * _ctx, IHThorArg *, MemoryBuffer * in) {
+ ctx = _ctx;
+ ctx->getResultString(v2,v1.refstr(),"searchname",4294967295U);
+ }
+ rtlDataAttr v1;
+ unsigned v2;
+ virtual const char * getFileName() {
+ return "names";
+ }
+ virtual void createSegmentMonitors(IIndexReadContext *irc) {
+ Owned<IStringSet> set3;
+ set3.setown(createRtlStringSet(40));
+ char v4[40];
+ rtlStrToStr(40U,v4,v2,v1.getstr());
+ if (rtlCompareStrStr(v2,v1.getstr(),40U,v4) == 0) {
+ set3->addRange(v4,v4);
+ }
+ irc->append(createKeySegmentMonitor(false, set3.getClear(), 0, 40));
+ irc->append(createWildKeySegmentMonitor(40, 80));
+ }
+ virtual size32_t transform(ARowBuilder & crSelf, const void * _left) {
+ crSelf.getSelf();
+ unsigned char * left = (unsigned char *)_left;
+ memcpy(crSelf.row() + 0U,left + 0U,120U);
+ return 120U;
+ }
+};
Executing the graph
<att name="rootGraph" value="1"/>
is a root subgraph. An activity within a subgraph that has no outputs is called a 'sink' (and an activity without any inputs is called a 'source').struct mi1 : public CFixedOutputMetaData {
+ inline mi1() : CFixedOutputMetaData(120) {}
+ virtual const RtlTypeInfo * queryTypeInfo() const { return &ty1; }
+} mx1;
This represents a fixed size row that occupies 120 bytes. The object
+returned by the queryTypeInfo() function provides information about
+the types of the fields:
+
const RtlStringTypeInfo ty2(0x4,40);
+const RtlFieldStrInfo rf1("name",NULL,&ty2);
+const RtlStringTypeInfo ty3(0x4,80);
+const RtlFieldStrInfo rf2("address",NULL,&ty3);
+const RtlFieldInfo * const tl4[] = { &rf1,&rf2, 0 };
+const RtlRecordTypeInfo ty1(0xd,120,tl4);
I.e. a string column, length 40 called "name", followed by a
+string column, length 80 called "address". The interface
+IOutputMetaData in eclhelper.hpp is key to understanding how the
+rows are processed.
+
Appendix
Key types and interfaces from eclhelper.hpp
Glossary
Full text of the workunit XML
<W_LOCAL buildVersion="internal_5.3.0-closedown0"
+ cloneable="1"
+ codeVersion="157"
+ eclVersion="5.3.0"
+ hash="2344844820"
+ state="completed"
+ xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance">
+ <Debug>
+ <addtimingtoworkunit>0</addtimingtoworkunit>
+ <debugnlp>1</debugnlp>
+ <expandpersistinputdependencies>1</expandpersistinputdependencies>
+ <forcegenerate>1</forcegenerate>
+ <noterecordsizeingraph>1</noterecordsizeingraph>
+ <regressiontest>1</regressiontest>
+ <showmetaingraph>1</showmetaingraph>
+ <showrecordcountingraph>1</showrecordcountingraph>
+ <spanmultiplecpp>0</spanmultiplecpp>
+ <targetclustertype>hthor</targetclustertype>
+ </Debug>
+ <Graphs>
+ <Graph name="graph1" type="activities">
+ <xgmml>
+ <graph wfid="2">
+ <node id="1">
+ <att>
+ <graph>
+ <att name="rootGraph" value="1" />
+ <edge id="2_0" source="2" target="3" />
+ <node id="2" label="Index Read 'names'">
+ <att name="definition" value="workuniteg1.ecl(3,1)" />
+ <att name="name" value="results" />
+ <att name="_kind" value="77" />
+ <att name="ecl"
+ value="INDEX({ string40 name, string80 address }, 'names', fileposition(false)); FILTER(KEYED(name = STORED('searchname'))); " />
+ <att name="recordSize" value="120" />
+ <att name="predictedCount" value="0..?[disk]" />
+ <att name="_fileName" value="names" />
+ </node>
+ <node id="3" label="Output Result #1">
+ <att name="definition" value="workuniteg1.ecl(4,1)" />
+ <att name="_kind" value="16" />
+ <att name="ecl" value="OUTPUT(..., workunit); " />
+ <att name="recordSize" value="120" />
+ </node>
+ </graph>
+ </att>
+ </node>
+ </graph>
+ </xgmml>
+ </Graph>
+ </Graphs>
+ <Query fetchEntire="1">
+ <Associated>
+ <File desc="workuniteg1.ecl.cpp"
+ filename="c:\\regressout\\workuniteg1.ecl.cpp"
+ ip="10.121.159.73"
+ type="cpp" />
+ </Associated>
+ </Query>
+ <Results>
+ <Result isScalar="0"
+ name="Result 1"
+ recordSizeEntry="mf1"
+ rowLimit="-1"
+ sequence="0">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ name(asciiasciiaddressPasciiascii%{
+ string40 name, string80 address }; </SchemaRaw>
+ </Result>
+ <Result name="Result 2" sequence="1">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ Result_2ñÿÿÿasciiascii
+ </SchemaRaw>
+ </Result>
+ </Results>
+ <Statistics>
+ <Statistic c="eclcc"
+ count="1"
+ creator="eclcc"
+ kind="SizePeakMemory"
+ s="compile"
+ scope=">compile"
+ ts="1428933081084000"
+ unit="sz"
+ value="27885568" />
+ </Statistics>
+ <Variables>
+ <Variable name="searchname">
+ <SchemaRaw xsi:type="SOAP-ENC:base64">
+ searchnameñÿÿÿasciiascii
+ </SchemaRaw>
+ </Variable>
+ </Variables>
+ <Workflow>
+ <Item mode="normal"
+ state="null"
+ type="normal"
+ wfid="1" />
+ <Item mode="normal"
+ state="reqd"
+ type="normal"
+ wfid="2">
+ <Dependency wfid="1" />
+ <Schedule />
+ </Item>
+ </Workflow>
+</W_LOCAL>
Full contents of the generated C++ (as a single file)
/* Template for generating thor/hthor/roxie output */
+#include "eclinclude4.hpp"
+#include "eclrtl.hpp"
+#include "rtlkey.hpp"
+
+extern RTL_API void rtlStrToStr(size32_t lenTgt,void * tgt,size32_t lenSrc,const void * src);
+extern RTL_API int rtlCompareStrStr(size32_t lenL,const char * l,size32_t lenR,const char * r);
+
+
+const RtlStringTypeInfo ty2(0x4,40);
+const RtlFieldStrInfo rf1("name",NULL,&ty2);
+const RtlStringTypeInfo ty3(0x4,80);
+const RtlFieldStrInfo rf2("address",NULL,&ty3);
+const RtlFieldInfo * const tl4[] = { &rf1,&rf2, 0 };
+const RtlRecordTypeInfo ty1(0xd,120,tl4);
+void getLayout5(size32_t & __lenResult, void * & __result, IResourceContext * ctx) {
+ rtlStrToDataX(__lenResult,__result,87U,"\\000R\\000\\000\\000\\001x\\000\\000\\000\\002\\000\\000\\000\\003\\004\\000\\000\\000name\\004(\\000\\000\\000\\001ascii\\000\\001ascii\\000\\000\\000\\000\\000\\000\\003\\007\\000\\000\\000address\\004P\\000\\000\\000\\001ascii\\000\\001ascii\\000\\000\\000\\000\\000\\000\\002\\000\\000\\000");
+}
+struct mi1 : public CFixedOutputMetaData {
+ inline mi1() : CFixedOutputMetaData(120) {}
+ virtual const RtlTypeInfo * queryTypeInfo() const { return &ty1; }
+} mx1;
+extern "C" ECL_API IOutputMetaData * mf1() { mx1.Link(); return &mx1; }
+
+struct cAc2 : public CThorIndexReadArg {
+ virtual unsigned getFormatCrc() {
+ return 470622073U;
+ }
+ virtual bool getIndexLayout(size32_t & __lenResult, void * & __result) { getLayout5(__lenResult, __result, ctx); return true; }
+ virtual IOutputMetaData * queryDiskRecordSize() { return &mx1; }
+ virtual IOutputMetaData * queryOutputMeta() { return &mx1; }
+ virtual void onCreate(ICodeContext * _ctx, IHThorArg *, MemoryBuffer * in) {
+ ctx = _ctx;
+ ctx->getResultString(v2,v1.refstr(),"searchname",4294967295U);
+ }
+ rtlDataAttr v1;
+ unsigned v2;
+ virtual const char * getFileName() {
+ return "names";
+ }
+ virtual void createSegmentMonitors(IIndexReadContext *irc) {
+ Owned<IStringSet> set3;
+ set3.setown(createRtlStringSet(40));
+ char v4[40];
+ rtlStrToStr(40U,v4,v2,v1.getstr());
+ if (rtlCompareStrStr(v2,v1.getstr(),40U,v4) == 0) {
+ set3->addRange(v4,v4);
+ }
+ irc->append(createKeySegmentMonitor(false, set3.getClear(), 0, 40));
+ irc->append(createWildKeySegmentMonitor(40, 80));
+ }
+ virtual size32_t transform(ARowBuilder & crSelf, const void * _left) {
+ crSelf.getSelf();
+ unsigned char * left = (unsigned char *)_left;
+ memcpy(crSelf.row() + 0U,left + 0U,120U);
+ return 120U;
+ }
+};
+extern "C" ECL_API IHThorArg * fAc2() { return new cAc2; }
+struct cAc3 : public CThorWorkUnitWriteArg {
+ virtual int getSequence() { return 0; }
+ virtual IOutputMetaData * queryOutputMeta() { return &mx1; }
+ virtual void serializeXml(const byte * self, IXmlWriter & out) {
+ mx1.toXML(self, out);
+ }
+};
+extern "C" ECL_API IHThorArg * fAc3() { return new cAc3; }
+
+
+struct MyEclProcess : public EclProcess {
+ virtual unsigned getActivityVersion() const { return ACTIVITY_INTERFACE_VERSION; }
+ virtual int perform(IGlobalCodeContext * gctx, unsigned wfid) {
+ ICodeContext * ctx;
+ ctx = gctx->queryCodeContext();
+ switch (wfid) {
+ case 1U:
+ if (!gctx->isResult("searchname",4294967295U)) {
+ ctx->setResultString("searchname",4294967295U,5U,"Smith");
+ }
+ break;
+ case 2U: {
+ ctx->executeGraph("graph1",false,0,NULL);
+ ctx->setResultString(0,1U,5U,"Done!");
+ }
+ break;
+ }
+ return 2U;
+ }
+};
+
+
+extern "C" ECL_API IEclProcess* createProcess()
+{
+
+ return new MyEclProcess;
+}
Contributing Documentation to the HPCC Systems Platform Project
Documenting a New Software Feature--Required and Optional Components
Required Components:
Optional Components:
General Tips
Who should write it?
Changing the default value of a configuration setting
Adding or modifying a Language keyword, Standard Library function, or command line tool action
Adding a new feature that requires an overview.
A feature/function that is only used internally to the system
Extending the tests in the regression suite
Placement
Other Folders
Pull Requests
Documentation Jira Issues
Contributing Documentation to the HPCC Systems Platform Project
Documenting a New Software Feature--Required and Optional Components
Required Components:
Optional Components:
General Tips
Who should write it?
Changing the default value of a configuration setting
Adding or modifying a Language keyword, Standard Library function, or command line tool action
Adding a new feature that requires an overview.
A feature/function that is only used internally to the system
Extending the tests in the regression suite
Placement
Other Folders
Pull Requests
Documentation Jira Issues
Feature Documentation Template
Overview
Setup & Configuration
User Guide
API/CLI/Parameter Reference
Tutorial
Troubleshooting
Feature Documentation Template
Overview
Setup & Configuration
User Guide
API/CLI/Parameter Reference
Tutorial
Troubleshooting
HPCC Writing Style Guide
General
Terms
HPCC Systems®
Other Terms
Common Documentation terms
Usage Instructions
Word Choices
Write up vs Write-up
Assure vs ensure vs insure
HPCC Writing Style Guide
General
Terms
HPCC Systems®
Other Terms
Common Documentation terms
Usage Instructions
Word Choices
Write up vs Write-up
Assure vs ensure vs insure
Quantile 1 - What is it?
QUANTILE(<dataset>, <number-of-ranges>, { sort-order } [, <transform>(LEFT, COUNTER)]
+ [,FIRST][,LAST][,SKEW(<n>)][,UNSTABLE][,SCORE(<score>)][,RANGE(set)][,DEDUP][,LOCAL]
+
+FIRST - Match the first row in the input dataset (as quantile 0)
+LAST - Match the last row in the input dataset (as quantile <n>)
+SKEW - The maximum deviation from the correct results allowed. Defaults to 0.
+UNSTABLE - Is the order of the original input values unimportant?
+SCORE - What weighting should be applied for each row. Defaults to 1.
+RANGE - Which quantiles should actually be returned. (Defaults to ALL).
+DEDUP - Avoid returning a match for an input row more than once.
+
Quantile 2 - Test cases
Quantile 3 - The parser
Quantile 4 - The engine interface.
Quantile 5 - The code generator
Quantile 6 - Roxie
Quantile 7 - Possible roxie improvements
Quantile 8 - Thor
Quantile 1 - What is it?
QUANTILE(<dataset>, <number-of-ranges>, { sort-order } [, <transform>(LEFT, COUNTER)]
+ [,FIRST][,LAST][,SKEW(<n>)][,UNSTABLE][,SCORE(<score>)][,RANGE(set)][,DEDUP][,LOCAL]
+
+FIRST - Match the first row in the input dataset (as quantile 0)
+LAST - Match the last row in the input dataset (as quantile <n>)
+SKEW - The maximum deviation from the correct results allowed. Defaults to 0.
+UNSTABLE - Is the order of the original input values unimportant?
+SCORE - What weighting should be applied for each row. Defaults to 1.
+RANGE - Which quantiles should actually be returned. (Defaults to ALL).
+DEDUP - Avoid returning a match for an input row more than once.
+
Quantile 2 - Test cases
Quantile 3 - The parser
Quantile 4 - The engine interface.
Quantile 5 - The code generator
Quantile 6 - Roxie
Quantile 7 - Possible roxie improvements
Quantile 8 - Thor
Everything you ever wanted to know about Roxie
Why did I create it?
How do activities link together?
Where are the Dragons?
How does “I beat you to it” work?
All about index compression
What is the topology server for?
Lazy File IO
New IBYTI mode
Testing Roxie code
./a.out --server --port=9999 --traceLevel=1 --logFullQueries=1 --expert.addDummyNode --roxieMulticastEnabled=0 --traceRoxiePackets=1
+
rtl := SERVICE
+ unsigned4 sleep(unsigned4 _delay) : eclrtl,action,library='eclrtl',entrypoint='rtlSleep';
+END;
+
+d := dataset([{rtl.sleep(5000)}], {unsigned a});
+allnodes(d)+d;
+
Cache prewarm
Blacklisting sockets
Owned<ISocketConnectWait> scw = nonBlockingConnect(ep, timeoutMS == WAIT_FOREVER ? 60000 : timeoutMS*(retries+1));
I am not sure that is correct (a single attempt to connect with a long timeout doesn't feel like it is the same as multiple attempts with shorter timeouts, for example if there is a load balancer in the mix).perftrace options
Some notes on LocalAgent mode
Some notes on UDP packet sending mechanism
Everything you ever wanted to know about Roxie
Why did I create it?
How do activities link together?
Where are the Dragons?
How does “I beat you to it” work?
All about index compression
What is the topology server for?
Lazy File IO
New IBYTI mode
Testing Roxie code
./a.out --server --port=9999 --traceLevel=1 --logFullQueries=1 --expert.addDummyNode --roxieMulticastEnabled=0 --traceRoxiePackets=1
+
rtl := SERVICE
+ unsigned4 sleep(unsigned4 _delay) : eclrtl,action,library='eclrtl',entrypoint='rtlSleep';
+END;
+
+d := dataset([{rtl.sleep(5000)}], {unsigned a});
+allnodes(d)+d;
+
Cache prewarm
Blacklisting sockets
Owned<ISocketConnectWait> scw = nonBlockingConnect(ep, timeoutMS == WAIT_FOREVER ? 60000 : timeoutMS*(retries+1));
I am not sure that is correct (a single attempt to connect with a long timeout doesn't feel like it is the same as multiple attempts with shorter timeouts, for example if there is a load balancer in the mix).perftrace options
Some notes on LocalAgent mode
Some notes on UDP packet sending mechanism
User Documentation
Directory structure under devdoc
General documentation
HPCC Website documentation
',8)]))}const f=t(i,[["render",s]]);export{m as __pageData,f as default};
diff --git a/assets/devdoc_userdoc_README.md.6WX0TGeB.lean.js b/assets/devdoc_userdoc_README.md.6WX0TGeB.lean.js
new file mode 100644
index 00000000000..4a77359f86d
--- /dev/null
+++ b/assets/devdoc_userdoc_README.md.6WX0TGeB.lean.js
@@ -0,0 +1 @@
+import{_ as t,c as o,a3 as r,o as a}from"./chunks/framework.DkhCEVKm.js";const m=JSON.parse('{"title":"User Documentation","description":"","frontmatter":{},"headers":[],"relativePath":"devdoc/userdoc/README.md","filePath":"devdoc/userdoc/README.md","lastUpdated":1731340314000}'),i={name:"devdoc/userdoc/README.md"};function s(n,e,l,c,u,d){return a(),o("div",null,e[0]||(e[0]=[r('User Documentation
Directory structure under devdoc
General documentation
HPCC Website documentation
',8)]))}const f=t(i,[["render",s]]);export{m as __pageData,f as default};
diff --git a/assets/devdoc_userdoc_WikiGuidelines.md.BcZTSYE9.js b/assets/devdoc_userdoc_WikiGuidelines.md.BcZTSYE9.js
new file mode 100644
index 00000000000..5078df0b683
--- /dev/null
+++ b/assets/devdoc_userdoc_WikiGuidelines.md.BcZTSYE9.js
@@ -0,0 +1 @@
+import{_ as e,c as t,o as i}from"./chunks/framework.DkhCEVKm.js";const u=JSON.parse('{"title":"","description":"","frontmatter":{},"headers":[],"relativePath":"devdoc/userdoc/WikiGuidelines.md","filePath":"devdoc/userdoc/WikiGuidelines.md","lastUpdated":1731340314000}'),d={name:"devdoc/userdoc/WikiGuidelines.md"};function s(o,a,c,r,n,l){return i(),t("div")}const _=e(d,[["render",s]]);export{u as __pageData,_ as default};
diff --git a/assets/devdoc_userdoc_WikiGuidelines.md.BcZTSYE9.lean.js b/assets/devdoc_userdoc_WikiGuidelines.md.BcZTSYE9.lean.js
new file mode 100644
index 00000000000..5078df0b683
--- /dev/null
+++ b/assets/devdoc_userdoc_WikiGuidelines.md.BcZTSYE9.lean.js
@@ -0,0 +1 @@
+import{_ as e,c as t,o as i}from"./chunks/framework.DkhCEVKm.js";const u=JSON.parse('{"title":"","description":"","frontmatter":{},"headers":[],"relativePath":"devdoc/userdoc/WikiGuidelines.md","filePath":"devdoc/userdoc/WikiGuidelines.md","lastUpdated":1731340314000}'),d={name:"devdoc/userdoc/WikiGuidelines.md"};function s(o,a,c,r,n,l){return i(),t("div")}const _=e(d,[["render",s]]);export{u as __pageData,_ as default};
diff --git a/assets/devdoc_userdoc_azure_TipsAndTricks.md.2OyOG7Og.js b/assets/devdoc_userdoc_azure_TipsAndTricks.md.2OyOG7Og.js
new file mode 100644
index 00000000000..6d8e796e2af
--- /dev/null
+++ b/assets/devdoc_userdoc_azure_TipsAndTricks.md.2OyOG7Og.js
@@ -0,0 +1,54 @@
+import{_ as n,c as a,a3 as p,o as e}from"./chunks/framework.DkhCEVKm.js";const u=JSON.parse('{"title":"Azure Portal FAQs","description":"","frontmatter":{},"headers":[],"relativePath":"devdoc/userdoc/azure/TipsAndTricks.md","filePath":"devdoc/userdoc/azure/TipsAndTricks.md","lastUpdated":1731340314000}'),l={name:"devdoc/userdoc/azure/TipsAndTricks.md"};function t(o,s,i,c,r,d){return e(),a("div",null,s[0]||(s[0]=[p(`Azure Portal FAQs
1. go to the resource group service page (https://portal.azure.com/#view/HubsExtension/BrowseResourceGroups)
+2. "Add filter"
+3. Filter on "Admin"
+4. Set the Value to "ALL" or select the names you are interested in (the names in the list are the only ones that have the Admin tag set) This works for all other fields that you can filter on
Login to the Azure portal.
+
+Click on the \`hamburger icon\`, located at the top left corner of the page.
+
+Click on \`Dashboard\` to go to your dashboards.
+
+Click on \`Create\`, located at the top left corner.
+
+Click on the \`Custom\` tile.
+
+Edit the input box to name your dashboard.
+
+Click on \`Resource groups\` in the tile gallery.
+
+Click \`Add\`.
+
+Click and drag the \`lower right corner\` of the tile to resize it to your likings.
+
+Click \`Save\` to save your settings.
+
+You should now be taken to your new dashboard.
+
+Click on your new dashboard tile.
+
+Click on \`Add filter\`, located at the top center of the page.
+
+Click on the \`Filter\` input box to reveal the tags.
+
+Select the \`Admin\` tag.
+
+Click on the \`Value\` input box.
+
+Click on \`Select all\` to unselect all.
+
+Select your name.
+
+Click on \`Apply\`.
+
+Next, click on \`Manage view\`, located at the top left of the page.
+
+Select \`Save view\`.
+
+Enter a name for the view in the input box.
+
+Click \`Save\`
+
+
+
+
+
+Click on \`Manage View\`, located at the top left of the page
Azure Portal FAQs
1. go to the resource group service page (https://portal.azure.com/#view/HubsExtension/BrowseResourceGroups)
+2. "Add filter"
+3. Filter on "Admin"
+4. Set the Value to "ALL" or select the names you are interested in (the names in the list are the only ones that have the Admin tag set) This works for all other fields that you can filter on
Login to the Azure portal.
+
+Click on the \`hamburger icon\`, located at the top left corner of the page.
+
+Click on \`Dashboard\` to go to your dashboards.
+
+Click on \`Create\`, located at the top left corner.
+
+Click on the \`Custom\` tile.
+
+Edit the input box to name your dashboard.
+
+Click on \`Resource groups\` in the tile gallery.
+
+Click \`Add\`.
+
+Click and drag the \`lower right corner\` of the tile to resize it to your likings.
+
+Click \`Save\` to save your settings.
+
+You should now be taken to your new dashboard.
+
+Click on your new dashboard tile.
+
+Click on \`Add filter\`, located at the top center of the page.
+
+Click on the \`Filter\` input box to reveal the tags.
+
+Select the \`Admin\` tag.
+
+Click on the \`Value\` input box.
+
+Click on \`Select all\` to unselect all.
+
+Select your name.
+
+Click on \`Apply\`.
+
+Next, click on \`Manage view\`, located at the top left of the page.
+
+Select \`Save view\`.
+
+Enter a name for the view in the input box.
+
+Click \`Save\`
+
+
+
+
+
+Click on \`Manage View\`, located at the top left of the page
Copilot Prompt Tips
Generic Prompts
Specific Prompts
How to Avoid AI Hallucinations with Good Prompts
Copilot Prompt Tips
Generic Prompts
Specific Prompts
How to Avoid AI Hallucinations with Good Prompts
ROXIE FAQs
Same way as bare metal. Command line, or with the IDE, or from ECL Watch. Just point to the HPCC Systems instance to compile.
+For Example:
+ecl deploy <target> <file>
The copy query command – use the Azure host name or IP address for the target.
+For example:
+ecl queries copy <source_query_path> <target_queryset>
Use the "kubectl get svc" command. Use the external IP address listed for ECL Watch.
+kubectl get svc
If you can reach ECL Watch with the DNS Name then it should also work for the command line.
If you did not set up the containerized instance, then you need to ask your Systems Administrator or whomever set it up..
Same way as bare metal.
+To add a new package file: ecl packagemap add or
+To copy exisitng package file : ecl packagemap copy
kubectl log <podname>
+in addition you can use -f (follow) option to tail the logs. Optionally you can also issue the <namespace> parameter.
+For example:
+kbectl log roxie-agent-1-3b12a587b –namespace MyNameSpace
+Optionally, you may have implemented a log-processing solution such as the Elastic Stack (elastic4hpcclogs).
Use the copy query command and copy or add the Packagemap.
+With data copy start in the logs…copy from remote location specified if data doesn’t exist on the local system.
+The remote location is the remote Dali (use the --daliip=<daliIP> parameter to specify the remote Dali)
+You can also use ECL Watch.
Can use Docker Desktop, or Azure or any cloud provider and install the HPCC Systems Cloud native helm
+charts
Can use WUListQueries
+For example:
+https://[eclwatch]:18010/WsWorkunits/WUListQueries.json?ver_=1.86&ClusterName=roxie&CheckAllNodes=0
One possible reason may be that all of the required storage directories are not present. The directories for ~/
+hpccdata/dalistorage, hpcc-data, debug, queries, sasha, and dropzone are all required to exist or your cluster may not start.
Yes. There is a new method available ServiceQuery.
+https://[eclwatch]:18010/WsResources/ServiceQuery?ver_=1.01&
+For example Roxie Queries:
+https://[eclwatch]:18010/WsResources/ServiceQuery?ver_=1.01&Type=roxie
+or WsECL (eclqueries)
+https://[eclwatch]:18010/WsResources/ServiceQuery?ver_=1.01&Type=eclqueries
ROXIE FAQs
Same way as bare metal. Command line, or with the IDE, or from ECL Watch. Just point to the HPCC Systems instance to compile.
+For Example:
+ecl deploy <target> <file>
The copy query command – use the Azure host name or IP address for the target.
+For example:
+ecl queries copy <source_query_path> <target_queryset>
Use the "kubectl get svc" command. Use the external IP address listed for ECL Watch.
+kubectl get svc
If you can reach ECL Watch with the DNS Name then it should also work for the command line.
If you did not set up the containerized instance, then you need to ask your Systems Administrator or whomever set it up..
Same way as bare metal.
+To add a new package file: ecl packagemap add or
+To copy exisitng package file : ecl packagemap copy
kubectl log <podname>
+in addition you can use -f (follow) option to tail the logs. Optionally you can also issue the <namespace> parameter.
+For example:
+kbectl log roxie-agent-1-3b12a587b –namespace MyNameSpace
+Optionally, you may have implemented a log-processing solution such as the Elastic Stack (elastic4hpcclogs).
Use the copy query command and copy or add the Packagemap.
+With data copy start in the logs…copy from remote location specified if data doesn’t exist on the local system.
+The remote location is the remote Dali (use the --daliip=<daliIP> parameter to specify the remote Dali)
+You can also use ECL Watch.
Can use Docker Desktop, or Azure or any cloud provider and install the HPCC Systems Cloud native helm
+charts
Can use WUListQueries
+For example:
+https://[eclwatch]:18010/WsWorkunits/WUListQueries.json?ver_=1.86&ClusterName=roxie&CheckAllNodes=0
One possible reason may be that all of the required storage directories are not present. The directories for ~/
+hpccdata/dalistorage, hpcc-data, debug, queries, sasha, and dropzone are all required to exist or your cluster may not start.
Yes. There is a new method available ServiceQuery.
+https://[eclwatch]:18010/WsResources/ServiceQuery?ver_=1.01&
+For example Roxie Queries:
+https://[eclwatch]:18010/WsResources/ServiceQuery?ver_=1.01&Type=roxie
+or WsECL (eclqueries)
+https://[eclwatch]:18010/WsResources/ServiceQuery?ver_=1.01&Type=eclqueries
How to resolve issues installing the ECL IDE / Client tools
Problem
If you get an error saying something like, Windows Defender SmartScreen prevented access and there is no option other than "Don't Run":
',6)]))}const f=t(i,[["render",n]]);export{C as __pageData,f as default};
diff --git a/assets/devdoc_userdoc_troubleshoot_ClientsToolIssues.md.Buqpsc4y.lean.js b/assets/devdoc_userdoc_troubleshoot_ClientsToolIssues.md.Buqpsc4y.lean.js
new file mode 100644
index 00000000000..af4aefe1e69
--- /dev/null
+++ b/assets/devdoc_userdoc_troubleshoot_ClientsToolIssues.md.Buqpsc4y.lean.js
@@ -0,0 +1 @@
+import{_ as t,c as o,a3 as s,o as l}from"./chunks/framework.DkhCEVKm.js";const r="/HPCC-Platform/assets/ECLIDE-WindowsProtectionError.ZJeAxFtk.png",a="/HPCC-Platform/assets/ECLIDE-AppProperties.CzKpIibe.png",C=JSON.parse('{"title":"How to resolve issues installing the ECL IDE / Client tools","description":"","frontmatter":{},"headers":[],"relativePath":"devdoc/userdoc/troubleshoot/ClientsToolIssues.md","filePath":"devdoc/userdoc/troubleshoot/ClientsToolIssues.md","lastUpdated":1731340314000}'),i={name:"devdoc/userdoc/troubleshoot/ClientsToolIssues.md"};function n(c,e,d,h,u,p){return l(),o("div",null,e[0]||(e[0]=[s('How to resolve issues installing the ECL IDE / Client tools
Problem
If you get an error saying something like, Windows Defender SmartScreen prevented access and there is no option other than "Don't Run":
',6)]))}const f=t(i,[["render",n]]);export{C as __pageData,f as default};
diff --git a/assets/ecl_ecl-bundle_DOCUMENTATION.md.rUiAyrKM.js b/assets/ecl_ecl-bundle_DOCUMENTATION.md.rUiAyrKM.js
new file mode 100644
index 00000000000..b47e2e24d43
--- /dev/null
+++ b/assets/ecl_ecl-bundle_DOCUMENTATION.md.rUiAyrKM.js
@@ -0,0 +1 @@
+import{_ as t,c as l,a3 as a,o as n}from"./chunks/framework.DkhCEVKm.js";const p=JSON.parse('{"title":"Ecl-bundle source documentation","description":"","frontmatter":{},"headers":[],"relativePath":"ecl/ecl-bundle/DOCUMENTATION.md","filePath":"ecl/ecl-bundle/DOCUMENTATION.md","lastUpdated":1731340314000}'),i={name:"ecl/ecl-bundle/DOCUMENTATION.md"};function o(r,e,s,d,c,u){return n(),l("div",null,e[0]||(e[0]=[a('Ecl-bundle source documentation
Introduction
Purpose
Design
Directory structure
Key classes
Ecl-bundle source documentation
Introduction
Purpose
Design
Directory structure
Key classes
ECL standard library style guide
my_record := RECORD
+ INTEGER4 id;
+ STRING firstname{MAXLENGTH(40)};
+ STRING lastname{MAXLENGTH(50)};
+END;
+
+/**
+ * Returns a dataset of people to treat with caution matching a particular lastname. The
+ * names are maintained in a global database of undesirables.
+ *
+ * @param search_lastname A lastname used as a filter
+ * @return The list of people
+ * @see NoFlyList
+ * @see MorePeopleToAvoid
+ */
+
+EXPORT DodgyCharacters(STRING search_lastname) := FUNCTION
+ raw_ds := DATASET(my_record, 'undesirables', THOR);
+ RETURN raw_ds(last_name = search_lastname);
+END;
`,7)]))}const c=i(l,[["render",t]]);export{o as __pageData,c as default};
diff --git a/assets/ecllibrary_StyleGuide.md.CjMQ9yWq.lean.js b/assets/ecllibrary_StyleGuide.md.CjMQ9yWq.lean.js
new file mode 100644
index 00000000000..4b6c2a2a4b4
--- /dev/null
+++ b/assets/ecllibrary_StyleGuide.md.CjMQ9yWq.lean.js
@@ -0,0 +1,20 @@
+import{_ as i,c as a,a3 as e,o as n}from"./chunks/framework.DkhCEVKm.js";const o=JSON.parse('{"title":"ECL standard library style guide","description":"","frontmatter":{},"headers":[],"relativePath":"ecllibrary/StyleGuide.md","filePath":"ecllibrary/StyleGuide.md","lastUpdated":1731340314000}'),l={name:"ecllibrary/StyleGuide.md"};function t(h,s,p,r,k,d){return n(),a("div",null,s[0]||(s[0]=[e(`ECL standard library style guide
my_record := RECORD
+ INTEGER4 id;
+ STRING firstname{MAXLENGTH(40)};
+ STRING lastname{MAXLENGTH(50)};
+END;
+
+/**
+ * Returns a dataset of people to treat with caution matching a particular lastname. The
+ * names are maintained in a global database of undesirables.
+ *
+ * @param search_lastname A lastname used as a filter
+ * @return The list of people
+ * @see NoFlyList
+ * @see MorePeopleToAvoid
+ */
+
+EXPORT DodgyCharacters(STRING search_lastname) := FUNCTION
+ raw_ds := DATASET(my_record, 'undesirables', THOR);
+ RETURN raw_ds(last_name = search_lastname);
+END;
`,7)]))}const c=i(l,[["render",t]]);export{o as __pageData,c as default};
diff --git a/assets/index.md.C4lg4w25.js b/assets/index.md.C4lg4w25.js
new file mode 100644
index 00000000000..4c2e4c6fe6c
--- /dev/null
+++ b/assets/index.md.C4lg4w25.js
@@ -0,0 +1 @@
+import{_ as t,c as e,o as s}from"./chunks/framework.DkhCEVKm.js";const p=JSON.parse('{"title":"","description":"","frontmatter":{"layout":"home","hero":{"name":"HPCC-Platform","text":"Developers Hub","tagline":"Notes and documentation for developers of the HPCC-Platform","image":{"light":{"src":"/devdoc/hpccsystems.png","alt":"HPCC Systems","link":"https://hpccsystems.com"},"dark":{"src":"/devdoc/hpccsystemsdark.png","alt":"HPCC Systems","link":"https://hpccsystems.com"}},"actions":[{"theme":"brand","text":"Get Started","link":"/devdoc/README"},{"theme":"alt","text":"View on GitHub","link":"https://github.com/hpcc-systems/HPCC-Platform"}]},"features":[{"title":"Clone and Build","details":"Building the HPCC Platform from source code, deploying to a test environment and submitting pull requests.","link":"/devdoc/Development"},{"title":"Issues","details":"Report and track issues relating to the HPCC Platform.","link":"https://track.hpccsystems.com"},{"title":"Discussions","details":"Discuss the HPCC Platform with other platform developers.","link":"https://github.com/hpcc-systems/HPCC-Platform/discussions"},{"title":"HPCC Systems Homepage","details":"Not a HPCC-Platform developer? Please visit the HPCC Systems homepage for end user information and support.","link":"https://hpccsystems.com"}]},"headers":[],"relativePath":"index.md","filePath":"index.md","lastUpdated":1731340314000}'),o={name:"index.md"};function a(i,n,r,c,l,m){return s(),e("div")}const h=t(o,[["render",a]]);export{p as __pageData,h as default};
diff --git a/assets/index.md.C4lg4w25.lean.js b/assets/index.md.C4lg4w25.lean.js
new file mode 100644
index 00000000000..4c2e4c6fe6c
--- /dev/null
+++ b/assets/index.md.C4lg4w25.lean.js
@@ -0,0 +1 @@
+import{_ as t,c as e,o as s}from"./chunks/framework.DkhCEVKm.js";const p=JSON.parse('{"title":"","description":"","frontmatter":{"layout":"home","hero":{"name":"HPCC-Platform","text":"Developers Hub","tagline":"Notes and documentation for developers of the HPCC-Platform","image":{"light":{"src":"/devdoc/hpccsystems.png","alt":"HPCC Systems","link":"https://hpccsystems.com"},"dark":{"src":"/devdoc/hpccsystemsdark.png","alt":"HPCC Systems","link":"https://hpccsystems.com"}},"actions":[{"theme":"brand","text":"Get Started","link":"/devdoc/README"},{"theme":"alt","text":"View on GitHub","link":"https://github.com/hpcc-systems/HPCC-Platform"}]},"features":[{"title":"Clone and Build","details":"Building the HPCC Platform from source code, deploying to a test environment and submitting pull requests.","link":"/devdoc/Development"},{"title":"Issues","details":"Report and track issues relating to the HPCC Platform.","link":"https://track.hpccsystems.com"},{"title":"Discussions","details":"Discuss the HPCC Platform with other platform developers.","link":"https://github.com/hpcc-systems/HPCC-Platform/discussions"},{"title":"HPCC Systems Homepage","details":"Not a HPCC-Platform developer? Please visit the HPCC Systems homepage for end user information and support.","link":"https://hpccsystems.com"}]},"headers":[],"relativePath":"index.md","filePath":"index.md","lastUpdated":1731340314000}'),o={name:"index.md"};function a(i,n,r,c,l,m){return s(),e("div")}const h=t(o,[["render",a]]);export{p as __pageData,h as default};
diff --git a/assets/inter-italic-cyrillic-ext.r48I6akx.woff2 b/assets/inter-italic-cyrillic-ext.r48I6akx.woff2
new file mode 100644
index 00000000000..b6b603d5969
Binary files /dev/null and b/assets/inter-italic-cyrillic-ext.r48I6akx.woff2 differ
diff --git a/assets/inter-italic-cyrillic.By2_1cv3.woff2 b/assets/inter-italic-cyrillic.By2_1cv3.woff2
new file mode 100644
index 00000000000..def40a4f658
Binary files /dev/null and b/assets/inter-italic-cyrillic.By2_1cv3.woff2 differ
diff --git a/assets/inter-italic-greek-ext.1u6EdAuj.woff2 b/assets/inter-italic-greek-ext.1u6EdAuj.woff2
new file mode 100644
index 00000000000..e070c3d3097
Binary files /dev/null and b/assets/inter-italic-greek-ext.1u6EdAuj.woff2 differ
diff --git a/assets/inter-italic-greek.DJ8dCoTZ.woff2 b/assets/inter-italic-greek.DJ8dCoTZ.woff2
new file mode 100644
index 00000000000..a3c16ca40b2
Binary files /dev/null and b/assets/inter-italic-greek.DJ8dCoTZ.woff2 differ
diff --git a/assets/inter-italic-latin-ext.CN1xVJS-.woff2 b/assets/inter-italic-latin-ext.CN1xVJS-.woff2
new file mode 100644
index 00000000000..2210a899eda
Binary files /dev/null and b/assets/inter-italic-latin-ext.CN1xVJS-.woff2 differ
diff --git a/assets/inter-italic-latin.C2AdPX0b.woff2 b/assets/inter-italic-latin.C2AdPX0b.woff2
new file mode 100644
index 00000000000..790d62dc7ba
Binary files /dev/null and b/assets/inter-italic-latin.C2AdPX0b.woff2 differ
diff --git a/assets/inter-italic-vietnamese.BSbpV94h.woff2 b/assets/inter-italic-vietnamese.BSbpV94h.woff2
new file mode 100644
index 00000000000..1eec0775a64
Binary files /dev/null and b/assets/inter-italic-vietnamese.BSbpV94h.woff2 differ
diff --git a/assets/inter-roman-cyrillic-ext.BBPuwvHQ.woff2 b/assets/inter-roman-cyrillic-ext.BBPuwvHQ.woff2
new file mode 100644
index 00000000000..2cfe61536e3
Binary files /dev/null and b/assets/inter-roman-cyrillic-ext.BBPuwvHQ.woff2 differ
diff --git a/assets/inter-roman-cyrillic.C5lxZ8CY.woff2 b/assets/inter-roman-cyrillic.C5lxZ8CY.woff2
new file mode 100644
index 00000000000..e3886dd141e
Binary files /dev/null and b/assets/inter-roman-cyrillic.C5lxZ8CY.woff2 differ
diff --git a/assets/inter-roman-greek-ext.CqjqNYQ-.woff2 b/assets/inter-roman-greek-ext.CqjqNYQ-.woff2
new file mode 100644
index 00000000000..36d67487dcf
Binary files /dev/null and b/assets/inter-roman-greek-ext.CqjqNYQ-.woff2 differ
diff --git a/assets/inter-roman-greek.BBVDIX6e.woff2 b/assets/inter-roman-greek.BBVDIX6e.woff2
new file mode 100644
index 00000000000..2bed1e85e8b
Binary files /dev/null and b/assets/inter-roman-greek.BBVDIX6e.woff2 differ
diff --git a/assets/inter-roman-latin-ext.4ZJIpNVo.woff2 b/assets/inter-roman-latin-ext.4ZJIpNVo.woff2
new file mode 100644
index 00000000000..9a8d1e2b5ef
Binary files /dev/null and b/assets/inter-roman-latin-ext.4ZJIpNVo.woff2 differ
diff --git a/assets/inter-roman-latin.Di8DUHzh.woff2 b/assets/inter-roman-latin.Di8DUHzh.woff2
new file mode 100644
index 00000000000..07d3c53aef1
Binary files /dev/null and b/assets/inter-roman-latin.Di8DUHzh.woff2 differ
diff --git a/assets/inter-roman-vietnamese.BjW4sHH5.woff2 b/assets/inter-roman-vietnamese.BjW4sHH5.woff2
new file mode 100644
index 00000000000..57bdc22ae88
Binary files /dev/null and b/assets/inter-roman-vietnamese.BjW4sHH5.woff2 differ
diff --git a/assets/repository-tag-tab.DSfut5r1.png b/assets/repository-tag-tab.DSfut5r1.png
new file mode 100644
index 00000000000..16bdb37acbc
Binary files /dev/null and b/assets/repository-tag-tab.DSfut5r1.png differ
diff --git a/assets/style.D_DvoJT5.css b/assets/style.D_DvoJT5.css
new file mode 100644
index 00000000000..7ad0f490b93
--- /dev/null
+++ b/assets/style.D_DvoJT5.css
@@ -0,0 +1 @@
+@font-face{font-family:Inter;font-style:normal;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-roman-cyrillic-ext.BBPuwvHQ.woff2) format("woff2");unicode-range:U+0460-052F,U+1C80-1C88,U+20B4,U+2DE0-2DFF,U+A640-A69F,U+FE2E-FE2F}@font-face{font-family:Inter;font-style:normal;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-roman-cyrillic.C5lxZ8CY.woff2) format("woff2");unicode-range:U+0301,U+0400-045F,U+0490-0491,U+04B0-04B1,U+2116}@font-face{font-family:Inter;font-style:normal;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-roman-greek-ext.CqjqNYQ-.woff2) format("woff2");unicode-range:U+1F00-1FFF}@font-face{font-family:Inter;font-style:normal;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-roman-greek.BBVDIX6e.woff2) format("woff2");unicode-range:U+0370-0377,U+037A-037F,U+0384-038A,U+038C,U+038E-03A1,U+03A3-03FF}@font-face{font-family:Inter;font-style:normal;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-roman-vietnamese.BjW4sHH5.woff2) format("woff2");unicode-range:U+0102-0103,U+0110-0111,U+0128-0129,U+0168-0169,U+01A0-01A1,U+01AF-01B0,U+0300-0301,U+0303-0304,U+0308-0309,U+0323,U+0329,U+1EA0-1EF9,U+20AB}@font-face{font-family:Inter;font-style:normal;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-roman-latin-ext.4ZJIpNVo.woff2) format("woff2");unicode-range:U+0100-02AF,U+0304,U+0308,U+0329,U+1E00-1E9F,U+1EF2-1EFF,U+2020,U+20A0-20AB,U+20AD-20C0,U+2113,U+2C60-2C7F,U+A720-A7FF}@font-face{font-family:Inter;font-style:normal;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-roman-latin.Di8DUHzh.woff2) format("woff2");unicode-range:U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+0304,U+0308,U+0329,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD}@font-face{font-family:Inter;font-style:italic;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-italic-cyrillic-ext.r48I6akx.woff2) format("woff2");unicode-range:U+0460-052F,U+1C80-1C88,U+20B4,U+2DE0-2DFF,U+A640-A69F,U+FE2E-FE2F}@font-face{font-family:Inter;font-style:italic;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-italic-cyrillic.By2_1cv3.woff2) format("woff2");unicode-range:U+0301,U+0400-045F,U+0490-0491,U+04B0-04B1,U+2116}@font-face{font-family:Inter;font-style:italic;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-italic-greek-ext.1u6EdAuj.woff2) format("woff2");unicode-range:U+1F00-1FFF}@font-face{font-family:Inter;font-style:italic;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-italic-greek.DJ8dCoTZ.woff2) format("woff2");unicode-range:U+0370-0377,U+037A-037F,U+0384-038A,U+038C,U+038E-03A1,U+03A3-03FF}@font-face{font-family:Inter;font-style:italic;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-italic-vietnamese.BSbpV94h.woff2) format("woff2");unicode-range:U+0102-0103,U+0110-0111,U+0128-0129,U+0168-0169,U+01A0-01A1,U+01AF-01B0,U+0300-0301,U+0303-0304,U+0308-0309,U+0323,U+0329,U+1EA0-1EF9,U+20AB}@font-face{font-family:Inter;font-style:italic;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-italic-latin-ext.CN1xVJS-.woff2) format("woff2");unicode-range:U+0100-02AF,U+0304,U+0308,U+0329,U+1E00-1E9F,U+1EF2-1EFF,U+2020,U+20A0-20AB,U+20AD-20C0,U+2113,U+2C60-2C7F,U+A720-A7FF}@font-face{font-family:Inter;font-style:italic;font-weight:100 900;font-display:swap;src:url(/HPCC-Platform/assets/inter-italic-latin.C2AdPX0b.woff2) format("woff2");unicode-range:U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+0304,U+0308,U+0329,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD}@font-face{font-family:Punctuation SC;font-weight:400;src:local("PingFang SC Regular"),local("Noto Sans CJK SC"),local("Microsoft YaHei");unicode-range:U+201C,U+201D,U+2018,U+2019,U+2E3A,U+2014,U+2013,U+2026,U+00B7,U+007E,U+002F}@font-face{font-family:Punctuation SC;font-weight:500;src:local("PingFang SC Medium"),local("Noto Sans CJK SC"),local("Microsoft YaHei");unicode-range:U+201C,U+201D,U+2018,U+2019,U+2E3A,U+2014,U+2013,U+2026,U+00B7,U+007E,U+002F}@font-face{font-family:Punctuation SC;font-weight:600;src:local("PingFang SC Semibold"),local("Noto Sans CJK SC Bold"),local("Microsoft YaHei Bold");unicode-range:U+201C,U+201D,U+2018,U+2019,U+2E3A,U+2014,U+2013,U+2026,U+00B7,U+007E,U+002F}@font-face{font-family:Punctuation SC;font-weight:700;src:local("PingFang SC Semibold"),local("Noto Sans CJK SC Bold"),local("Microsoft YaHei Bold");unicode-range:U+201C,U+201D,U+2018,U+2019,U+2E3A,U+2014,U+2013,U+2026,U+00B7,U+007E,U+002F}:root{--vp-c-white: #ffffff;--vp-c-black: #000000;--vp-c-neutral: var(--vp-c-black);--vp-c-neutral-inverse: var(--vp-c-white)}.dark{--vp-c-neutral: var(--vp-c-white);--vp-c-neutral-inverse: var(--vp-c-black)}:root{--vp-c-gray-1: #dddde3;--vp-c-gray-2: #e4e4e9;--vp-c-gray-3: #ebebef;--vp-c-gray-soft: rgba(142, 150, 170, .14);--vp-c-indigo-1: #3451b2;--vp-c-indigo-2: #3a5ccc;--vp-c-indigo-3: #5672cd;--vp-c-indigo-soft: rgba(100, 108, 255, .14);--vp-c-purple-1: #6f42c1;--vp-c-purple-2: #7e4cc9;--vp-c-purple-3: #8e5cd9;--vp-c-purple-soft: rgba(159, 122, 234, .14);--vp-c-green-1: #18794e;--vp-c-green-2: #299764;--vp-c-green-3: #30a46c;--vp-c-green-soft: rgba(16, 185, 129, .14);--vp-c-yellow-1: #915930;--vp-c-yellow-2: #946300;--vp-c-yellow-3: #9f6a00;--vp-c-yellow-soft: rgba(234, 179, 8, .14);--vp-c-red-1: #b8272c;--vp-c-red-2: #d5393e;--vp-c-red-3: #e0575b;--vp-c-red-soft: rgba(244, 63, 94, .14);--vp-c-sponsor: #db2777}.dark{--vp-c-gray-1: #515c67;--vp-c-gray-2: #414853;--vp-c-gray-3: #32363f;--vp-c-gray-soft: rgba(101, 117, 133, .16);--vp-c-indigo-1: #a8b1ff;--vp-c-indigo-2: #5c73e7;--vp-c-indigo-3: #3e63dd;--vp-c-indigo-soft: rgba(100, 108, 255, .16);--vp-c-purple-1: #c8abfa;--vp-c-purple-2: #a879e6;--vp-c-purple-3: #8e5cd9;--vp-c-purple-soft: rgba(159, 122, 234, .16);--vp-c-green-1: #3dd68c;--vp-c-green-2: #30a46c;--vp-c-green-3: #298459;--vp-c-green-soft: rgba(16, 185, 129, .16);--vp-c-yellow-1: #f9b44e;--vp-c-yellow-2: #da8b17;--vp-c-yellow-3: #a46a0a;--vp-c-yellow-soft: rgba(234, 179, 8, .16);--vp-c-red-1: #f66f81;--vp-c-red-2: #f14158;--vp-c-red-3: #b62a3c;--vp-c-red-soft: rgba(244, 63, 94, .16)}:root{--vp-c-bg: #ffffff;--vp-c-bg-alt: #f6f6f7;--vp-c-bg-elv: #ffffff;--vp-c-bg-soft: #f6f6f7}.dark{--vp-c-bg: #1b1b1f;--vp-c-bg-alt: #161618;--vp-c-bg-elv: #202127;--vp-c-bg-soft: #202127}:root{--vp-c-border: #c2c2c4;--vp-c-divider: #e2e2e3;--vp-c-gutter: #e2e2e3}.dark{--vp-c-border: #3c3f44;--vp-c-divider: #2e2e32;--vp-c-gutter: #000000}:root{--vp-c-text-1: rgba(60, 60, 67);--vp-c-text-2: rgba(60, 60, 67, .78);--vp-c-text-3: rgba(60, 60, 67, .56)}.dark{--vp-c-text-1: rgba(255, 255, 245, .86);--vp-c-text-2: rgba(235, 235, 245, .6);--vp-c-text-3: rgba(235, 235, 245, .38)}:root{--vp-c-default-1: var(--vp-c-gray-1);--vp-c-default-2: var(--vp-c-gray-2);--vp-c-default-3: var(--vp-c-gray-3);--vp-c-default-soft: var(--vp-c-gray-soft);--vp-c-brand-1: var(--vp-c-indigo-1);--vp-c-brand-2: var(--vp-c-indigo-2);--vp-c-brand-3: var(--vp-c-indigo-3);--vp-c-brand-soft: var(--vp-c-indigo-soft);--vp-c-brand: var(--vp-c-brand-1);--vp-c-tip-1: var(--vp-c-brand-1);--vp-c-tip-2: var(--vp-c-brand-2);--vp-c-tip-3: var(--vp-c-brand-3);--vp-c-tip-soft: var(--vp-c-brand-soft);--vp-c-note-1: var(--vp-c-brand-1);--vp-c-note-2: var(--vp-c-brand-2);--vp-c-note-3: var(--vp-c-brand-3);--vp-c-note-soft: var(--vp-c-brand-soft);--vp-c-success-1: var(--vp-c-green-1);--vp-c-success-2: var(--vp-c-green-2);--vp-c-success-3: var(--vp-c-green-3);--vp-c-success-soft: var(--vp-c-green-soft);--vp-c-important-1: var(--vp-c-purple-1);--vp-c-important-2: var(--vp-c-purple-2);--vp-c-important-3: var(--vp-c-purple-3);--vp-c-important-soft: var(--vp-c-purple-soft);--vp-c-warning-1: var(--vp-c-yellow-1);--vp-c-warning-2: var(--vp-c-yellow-2);--vp-c-warning-3: var(--vp-c-yellow-3);--vp-c-warning-soft: var(--vp-c-yellow-soft);--vp-c-danger-1: var(--vp-c-red-1);--vp-c-danger-2: var(--vp-c-red-2);--vp-c-danger-3: var(--vp-c-red-3);--vp-c-danger-soft: var(--vp-c-red-soft);--vp-c-caution-1: var(--vp-c-red-1);--vp-c-caution-2: var(--vp-c-red-2);--vp-c-caution-3: var(--vp-c-red-3);--vp-c-caution-soft: var(--vp-c-red-soft)}:root{--vp-font-family-base: "Inter", ui-sans-serif, system-ui, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji";--vp-font-family-mono: ui-monospace, "Menlo", "Monaco", "Consolas", "Liberation Mono", "Courier New", monospace;font-optical-sizing:auto}:root:where(:lang(zh)){--vp-font-family-base: "Punctuation SC", "Inter", ui-sans-serif, system-ui, "PingFang SC", "Noto Sans CJK SC", "Noto Sans SC", "Heiti SC", "Microsoft YaHei", "DengXian", sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji"}:root{--vp-shadow-1: 0 1px 2px rgba(0, 0, 0, .04), 0 1px 2px rgba(0, 0, 0, .06);--vp-shadow-2: 0 3px 12px rgba(0, 0, 0, .07), 0 1px 4px rgba(0, 0, 0, .07);--vp-shadow-3: 0 12px 32px rgba(0, 0, 0, .1), 0 2px 6px rgba(0, 0, 0, .08);--vp-shadow-4: 0 14px 44px rgba(0, 0, 0, .12), 0 3px 9px rgba(0, 0, 0, .12);--vp-shadow-5: 0 18px 56px rgba(0, 0, 0, .16), 0 4px 12px rgba(0, 0, 0, .16)}:root{--vp-z-index-footer: 10;--vp-z-index-local-nav: 20;--vp-z-index-nav: 30;--vp-z-index-layout-top: 40;--vp-z-index-backdrop: 50;--vp-z-index-sidebar: 60}@media (min-width: 960px){:root{--vp-z-index-sidebar: 25}}:root{--vp-layout-max-width: 1440px}:root{--vp-header-anchor-symbol: "#"}:root{--vp-code-line-height: 1.7;--vp-code-font-size: .875em;--vp-code-color: var(--vp-c-brand-1);--vp-code-link-color: var(--vp-c-brand-1);--vp-code-link-hover-color: var(--vp-c-brand-2);--vp-code-bg: var(--vp-c-default-soft);--vp-code-block-color: var(--vp-c-text-2);--vp-code-block-bg: var(--vp-c-bg-alt);--vp-code-block-divider-color: var(--vp-c-gutter);--vp-code-lang-color: var(--vp-c-text-3);--vp-code-line-highlight-color: var(--vp-c-default-soft);--vp-code-line-number-color: var(--vp-c-text-3);--vp-code-line-diff-add-color: var(--vp-c-success-soft);--vp-code-line-diff-add-symbol-color: var(--vp-c-success-1);--vp-code-line-diff-remove-color: var(--vp-c-danger-soft);--vp-code-line-diff-remove-symbol-color: var(--vp-c-danger-1);--vp-code-line-warning-color: var(--vp-c-warning-soft);--vp-code-line-error-color: var(--vp-c-danger-soft);--vp-code-copy-code-border-color: var(--vp-c-divider);--vp-code-copy-code-bg: var(--vp-c-bg-soft);--vp-code-copy-code-hover-border-color: var(--vp-c-divider);--vp-code-copy-code-hover-bg: var(--vp-c-bg);--vp-code-copy-code-active-text: var(--vp-c-text-2);--vp-code-copy-copied-text-content: "Copied";--vp-code-tab-divider: var(--vp-code-block-divider-color);--vp-code-tab-text-color: var(--vp-c-text-2);--vp-code-tab-bg: var(--vp-code-block-bg);--vp-code-tab-hover-text-color: var(--vp-c-text-1);--vp-code-tab-active-text-color: var(--vp-c-text-1);--vp-code-tab-active-bar-color: var(--vp-c-brand-1)}:root{--vp-button-brand-border: transparent;--vp-button-brand-text: var(--vp-c-white);--vp-button-brand-bg: var(--vp-c-brand-3);--vp-button-brand-hover-border: transparent;--vp-button-brand-hover-text: var(--vp-c-white);--vp-button-brand-hover-bg: var(--vp-c-brand-2);--vp-button-brand-active-border: transparent;--vp-button-brand-active-text: var(--vp-c-white);--vp-button-brand-active-bg: var(--vp-c-brand-1);--vp-button-alt-border: transparent;--vp-button-alt-text: var(--vp-c-text-1);--vp-button-alt-bg: var(--vp-c-default-3);--vp-button-alt-hover-border: transparent;--vp-button-alt-hover-text: var(--vp-c-text-1);--vp-button-alt-hover-bg: var(--vp-c-default-2);--vp-button-alt-active-border: transparent;--vp-button-alt-active-text: var(--vp-c-text-1);--vp-button-alt-active-bg: var(--vp-c-default-1);--vp-button-sponsor-border: var(--vp-c-text-2);--vp-button-sponsor-text: var(--vp-c-text-2);--vp-button-sponsor-bg: transparent;--vp-button-sponsor-hover-border: var(--vp-c-sponsor);--vp-button-sponsor-hover-text: var(--vp-c-sponsor);--vp-button-sponsor-hover-bg: transparent;--vp-button-sponsor-active-border: var(--vp-c-sponsor);--vp-button-sponsor-active-text: var(--vp-c-sponsor);--vp-button-sponsor-active-bg: transparent}:root{--vp-custom-block-font-size: 14px;--vp-custom-block-code-font-size: 13px;--vp-custom-block-info-border: transparent;--vp-custom-block-info-text: var(--vp-c-text-1);--vp-custom-block-info-bg: var(--vp-c-default-soft);--vp-custom-block-info-code-bg: var(--vp-c-default-soft);--vp-custom-block-note-border: transparent;--vp-custom-block-note-text: var(--vp-c-text-1);--vp-custom-block-note-bg: var(--vp-c-default-soft);--vp-custom-block-note-code-bg: var(--vp-c-default-soft);--vp-custom-block-tip-border: transparent;--vp-custom-block-tip-text: var(--vp-c-text-1);--vp-custom-block-tip-bg: var(--vp-c-tip-soft);--vp-custom-block-tip-code-bg: var(--vp-c-tip-soft);--vp-custom-block-important-border: transparent;--vp-custom-block-important-text: var(--vp-c-text-1);--vp-custom-block-important-bg: var(--vp-c-important-soft);--vp-custom-block-important-code-bg: var(--vp-c-important-soft);--vp-custom-block-warning-border: transparent;--vp-custom-block-warning-text: var(--vp-c-text-1);--vp-custom-block-warning-bg: var(--vp-c-warning-soft);--vp-custom-block-warning-code-bg: var(--vp-c-warning-soft);--vp-custom-block-danger-border: transparent;--vp-custom-block-danger-text: var(--vp-c-text-1);--vp-custom-block-danger-bg: var(--vp-c-danger-soft);--vp-custom-block-danger-code-bg: var(--vp-c-danger-soft);--vp-custom-block-caution-border: transparent;--vp-custom-block-caution-text: var(--vp-c-text-1);--vp-custom-block-caution-bg: var(--vp-c-caution-soft);--vp-custom-block-caution-code-bg: var(--vp-c-caution-soft);--vp-custom-block-details-border: var(--vp-custom-block-info-border);--vp-custom-block-details-text: var(--vp-custom-block-info-text);--vp-custom-block-details-bg: var(--vp-custom-block-info-bg);--vp-custom-block-details-code-bg: var(--vp-custom-block-info-code-bg)}:root{--vp-input-border-color: var(--vp-c-border);--vp-input-bg-color: var(--vp-c-bg-alt);--vp-input-switch-bg-color: var(--vp-c-default-soft)}:root{--vp-nav-height: 64px;--vp-nav-bg-color: var(--vp-c-bg);--vp-nav-screen-bg-color: var(--vp-c-bg);--vp-nav-logo-height: 24px}.hide-nav{--vp-nav-height: 0px}.hide-nav .VPSidebar{--vp-nav-height: 22px}:root{--vp-local-nav-bg-color: var(--vp-c-bg)}:root{--vp-sidebar-width: 272px;--vp-sidebar-bg-color: var(--vp-c-bg-alt)}:root{--vp-backdrop-bg-color: rgba(0, 0, 0, .6)}:root{--vp-home-hero-name-color: var(--vp-c-brand-1);--vp-home-hero-name-background: transparent;--vp-home-hero-image-background-image: none;--vp-home-hero-image-filter: none}:root{--vp-badge-info-border: transparent;--vp-badge-info-text: var(--vp-c-text-2);--vp-badge-info-bg: var(--vp-c-default-soft);--vp-badge-tip-border: transparent;--vp-badge-tip-text: var(--vp-c-tip-1);--vp-badge-tip-bg: var(--vp-c-tip-soft);--vp-badge-warning-border: transparent;--vp-badge-warning-text: var(--vp-c-warning-1);--vp-badge-warning-bg: var(--vp-c-warning-soft);--vp-badge-danger-border: transparent;--vp-badge-danger-text: var(--vp-c-danger-1);--vp-badge-danger-bg: var(--vp-c-danger-soft)}:root{--vp-carbon-ads-text-color: var(--vp-c-text-1);--vp-carbon-ads-poweredby-color: var(--vp-c-text-2);--vp-carbon-ads-bg-color: var(--vp-c-bg-soft);--vp-carbon-ads-hover-text-color: var(--vp-c-brand-1);--vp-carbon-ads-hover-poweredby-color: var(--vp-c-text-1)}:root{--vp-local-search-bg: var(--vp-c-bg);--vp-local-search-result-bg: var(--vp-c-bg);--vp-local-search-result-border: var(--vp-c-divider);--vp-local-search-result-selected-bg: var(--vp-c-bg);--vp-local-search-result-selected-border: var(--vp-c-brand-1);--vp-local-search-highlight-bg: var(--vp-c-brand-1);--vp-local-search-highlight-text: var(--vp-c-neutral-inverse)}@media (prefers-reduced-motion: reduce){*,:before,:after{animation-delay:-1ms!important;animation-duration:1ms!important;animation-iteration-count:1!important;background-attachment:initial!important;scroll-behavior:auto!important;transition-duration:0s!important;transition-delay:0s!important}}*,:before,:after{box-sizing:border-box}html{line-height:1.4;font-size:16px;-webkit-text-size-adjust:100%}html.dark{color-scheme:dark}body{margin:0;width:100%;min-width:320px;min-height:100vh;line-height:24px;font-family:var(--vp-font-family-base);font-size:16px;font-weight:400;color:var(--vp-c-text-1);background-color:var(--vp-c-bg);font-synthesis:style;text-rendering:optimizeLegibility;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}main{display:block}h1,h2,h3,h4,h5,h6{margin:0;line-height:24px;font-size:16px;font-weight:400}p{margin:0}strong,b{font-weight:600}a,area,button,[role=button],input,label,select,summary,textarea{touch-action:manipulation}a{color:inherit;text-decoration:inherit}ol,ul{list-style:none;margin:0;padding:0}blockquote{margin:0}pre,code,kbd,samp{font-family:var(--vp-font-family-mono)}img,svg,video,canvas,audio,iframe,embed,object{display:block}figure{margin:0}img,video{max-width:100%;height:auto}button,input,optgroup,select,textarea{border:0;padding:0;line-height:inherit;color:inherit}button{padding:0;font-family:inherit;background-color:transparent;background-image:none}button:enabled,[role=button]:enabled{cursor:pointer}button:focus,button:focus-visible{outline:1px dotted;outline:4px auto -webkit-focus-ring-color}button:focus:not(:focus-visible){outline:none!important}input:focus,textarea:focus,select:focus{outline:none}table{border-collapse:collapse}input{background-color:transparent}input:-ms-input-placeholder,textarea:-ms-input-placeholder{color:var(--vp-c-text-3)}input::-ms-input-placeholder,textarea::-ms-input-placeholder{color:var(--vp-c-text-3)}input::placeholder,textarea::placeholder{color:var(--vp-c-text-3)}input::-webkit-outer-spin-button,input::-webkit-inner-spin-button{-webkit-appearance:none;margin:0}input[type=number]{-moz-appearance:textfield}textarea{resize:vertical}select{-webkit-appearance:none}fieldset{margin:0;padding:0}h1,h2,h3,h4,h5,h6,li,p{overflow-wrap:break-word}vite-error-overlay{z-index:9999}mjx-container{overflow-x:auto}mjx-container>svg{display:inline-block;margin:auto}[class^=vpi-],[class*=" vpi-"],.vp-icon{width:1em;height:1em}[class^=vpi-].bg,[class*=" vpi-"].bg,.vp-icon.bg{background-size:100% 100%;background-color:transparent}[class^=vpi-]:not(.bg),[class*=" vpi-"]:not(.bg),.vp-icon:not(.bg){-webkit-mask:var(--icon) no-repeat;mask:var(--icon) no-repeat;-webkit-mask-size:100% 100%;mask-size:100% 100%;background-color:currentColor;color:inherit}.vpi-align-left{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='M21 6H3M15 12H3M17 18H3'/%3E%3C/svg%3E")}.vpi-arrow-right,.vpi-arrow-down,.vpi-arrow-left,.vpi-arrow-up{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='M5 12h14M12 5l7 7-7 7'/%3E%3C/svg%3E")}.vpi-chevron-right,.vpi-chevron-down,.vpi-chevron-left,.vpi-chevron-up{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='m9 18 6-6-6-6'/%3E%3C/svg%3E")}.vpi-chevron-down,.vpi-arrow-down{transform:rotate(90deg)}.vpi-chevron-left,.vpi-arrow-left{transform:rotate(180deg)}.vpi-chevron-up,.vpi-arrow-up{transform:rotate(-90deg)}.vpi-square-pen{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='M12 3H5a2 2 0 0 0-2 2v14a2 2 0 0 0 2 2h14a2 2 0 0 0 2-2v-7'/%3E%3Cpath d='M18.375 2.625a2.121 2.121 0 1 1 3 3L12 15l-4 1 1-4Z'/%3E%3C/svg%3E")}.vpi-plus{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='M5 12h14M12 5v14'/%3E%3C/svg%3E")}.vpi-sun{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Ccircle cx='12' cy='12' r='4'/%3E%3Cpath d='M12 2v2M12 20v2M4.93 4.93l1.41 1.41M17.66 17.66l1.41 1.41M2 12h2M20 12h2M6.34 17.66l-1.41 1.41M19.07 4.93l-1.41 1.41'/%3E%3C/svg%3E")}.vpi-moon{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='M12 3a6 6 0 0 0 9 9 9 9 0 1 1-9-9Z'/%3E%3C/svg%3E")}.vpi-more-horizontal{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Ccircle cx='12' cy='12' r='1'/%3E%3Ccircle cx='19' cy='12' r='1'/%3E%3Ccircle cx='5' cy='12' r='1'/%3E%3C/svg%3E")}.vpi-languages{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='m5 8 6 6M4 14l6-6 2-3M2 5h12M7 2h1M22 22l-5-10-5 10M14 18h6'/%3E%3C/svg%3E")}.vpi-heart{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='M19 14c1.49-1.46 3-3.21 3-5.5A5.5 5.5 0 0 0 16.5 3c-1.76 0-3 .5-4.5 2-1.5-1.5-2.74-2-4.5-2A5.5 5.5 0 0 0 2 8.5c0 2.3 1.5 4.05 3 5.5l7 7Z'/%3E%3C/svg%3E")}.vpi-search{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Ccircle cx='11' cy='11' r='8'/%3E%3Cpath d='m21 21-4.3-4.3'/%3E%3C/svg%3E")}.vpi-layout-list{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Crect width='7' height='7' x='3' y='3' rx='1'/%3E%3Crect width='7' height='7' x='3' y='14' rx='1'/%3E%3Cpath d='M14 4h7M14 9h7M14 15h7M14 20h7'/%3E%3C/svg%3E")}.vpi-delete{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='M20 5H9l-7 7 7 7h11a2 2 0 0 0 2-2V7a2 2 0 0 0-2-2ZM18 9l-6 6M12 9l6 6'/%3E%3C/svg%3E")}.vpi-corner-down-left{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Cpath d='m9 10-5 5 5 5'/%3E%3Cpath d='M20 4v7a4 4 0 0 1-4 4H4'/%3E%3C/svg%3E")}:root{--vp-icon-copy: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='rgba(128,128,128,1)' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Crect width='8' height='4' x='8' y='2' rx='1' ry='1'/%3E%3Cpath d='M16 4h2a2 2 0 0 1 2 2v14a2 2 0 0 1-2 2H6a2 2 0 0 1-2-2V6a2 2 0 0 1 2-2h2'/%3E%3C/svg%3E");--vp-icon-copied: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='none' stroke='rgba(128,128,128,1)' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' viewBox='0 0 24 24'%3E%3Crect width='8' height='4' x='8' y='2' rx='1' ry='1'/%3E%3Cpath d='M16 4h2a2 2 0 0 1 2 2v14a2 2 0 0 1-2 2H6a2 2 0 0 1-2-2V6a2 2 0 0 1 2-2h2'/%3E%3Cpath d='m9 14 2 2 4-4'/%3E%3C/svg%3E")}.vpi-social-discord{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M20.317 4.37a19.791 19.791 0 0 0-4.885-1.515.074.074 0 0 0-.079.037c-.21.375-.444.864-.608 1.25a18.27 18.27 0 0 0-5.487 0 12.64 12.64 0 0 0-.617-1.25.077.077 0 0 0-.079-.037A19.736 19.736 0 0 0 3.677 4.37a.07.07 0 0 0-.032.027C.533 9.046-.32 13.58.099 18.057a.082.082 0 0 0 .031.057 19.9 19.9 0 0 0 5.993 3.03.078.078 0 0 0 .084-.028c.462-.63.874-1.295 1.226-1.994a.076.076 0 0 0-.041-.106 13.107 13.107 0 0 1-1.872-.892.077.077 0 0 1-.008-.128 10.2 10.2 0 0 0 .372-.292.074.074 0 0 1 .077-.01c3.928 1.793 8.18 1.793 12.062 0a.074.074 0 0 1 .078.01c.12.098.246.198.373.292a.077.077 0 0 1-.006.127 12.299 12.299 0 0 1-1.873.892.077.077 0 0 0-.041.107c.36.698.772 1.362 1.225 1.993a.076.076 0 0 0 .084.028 19.839 19.839 0 0 0 6.002-3.03.077.077 0 0 0 .032-.054c.5-5.177-.838-9.674-3.549-13.66a.061.061 0 0 0-.031-.03zM8.02 15.33c-1.183 0-2.157-1.085-2.157-2.419 0-1.333.956-2.419 2.157-2.419 1.21 0 2.176 1.096 2.157 2.42 0 1.333-.956 2.418-2.157 2.418zm7.975 0c-1.183 0-2.157-1.085-2.157-2.419 0-1.333.955-2.419 2.157-2.419 1.21 0 2.176 1.096 2.157 2.42 0 1.333-.946 2.418-2.157 2.418Z'/%3E%3C/svg%3E")}.vpi-social-facebook{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M9.101 23.691v-7.98H6.627v-3.667h2.474v-1.58c0-4.085 1.848-5.978 5.858-5.978.401 0 .955.042 1.468.103a8.68 8.68 0 0 1 1.141.195v3.325a8.623 8.623 0 0 0-.653-.036 26.805 26.805 0 0 0-.733-.009c-.707 0-1.259.096-1.675.309a1.686 1.686 0 0 0-.679.622c-.258.42-.374.995-.374 1.752v1.297h3.919l-.386 2.103-.287 1.564h-3.246v8.245C19.396 23.238 24 18.179 24 12.044c0-6.627-5.373-12-12-12s-12 5.373-12 12c0 5.628 3.874 10.35 9.101 11.647Z'/%3E%3C/svg%3E")}.vpi-social-github{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12'/%3E%3C/svg%3E")}.vpi-social-instagram{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M7.03.084c-1.277.06-2.149.264-2.91.563a5.874 5.874 0 0 0-2.124 1.388 5.878 5.878 0 0 0-1.38 2.127C.321 4.926.12 5.8.064 7.076.008 8.354-.005 8.764.001 12.023c.007 3.259.021 3.667.083 4.947.061 1.277.264 2.149.563 2.911.308.789.72 1.457 1.388 2.123a5.872 5.872 0 0 0 2.129 1.38c.763.295 1.636.496 2.913.552 1.278.056 1.689.069 4.947.063 3.257-.007 3.668-.021 4.947-.082 1.28-.06 2.147-.265 2.91-.563a5.881 5.881 0 0 0 2.123-1.388 5.881 5.881 0 0 0 1.38-2.129c.295-.763.496-1.636.551-2.912.056-1.28.07-1.69.063-4.948-.006-3.258-.02-3.667-.081-4.947-.06-1.28-.264-2.148-.564-2.911a5.892 5.892 0 0 0-1.387-2.123 5.857 5.857 0 0 0-2.128-1.38C19.074.322 18.202.12 16.924.066 15.647.009 15.236-.006 11.977 0 8.718.008 8.31.021 7.03.084m.14 21.693c-1.17-.05-1.805-.245-2.228-.408a3.736 3.736 0 0 1-1.382-.895 3.695 3.695 0 0 1-.9-1.378c-.165-.423-.363-1.058-.417-2.228-.06-1.264-.072-1.644-.08-4.848-.006-3.204.006-3.583.061-4.848.05-1.169.246-1.805.408-2.228.216-.561.477-.96.895-1.382a3.705 3.705 0 0 1 1.379-.9c.423-.165 1.057-.361 2.227-.417 1.265-.06 1.644-.072 4.848-.08 3.203-.006 3.583.006 4.85.062 1.168.05 1.804.244 2.227.408.56.216.96.475 1.382.895.421.42.681.817.9 1.378.165.422.362 1.056.417 2.227.06 1.265.074 1.645.08 4.848.005 3.203-.006 3.583-.061 4.848-.051 1.17-.245 1.805-.408 2.23-.216.56-.477.96-.896 1.38a3.705 3.705 0 0 1-1.378.9c-.422.165-1.058.362-2.226.418-1.266.06-1.645.072-4.85.079-3.204.007-3.582-.006-4.848-.06m9.783-16.192a1.44 1.44 0 1 0 1.437-1.442 1.44 1.44 0 0 0-1.437 1.442M5.839 12.012a6.161 6.161 0 1 0 12.323-.024 6.162 6.162 0 0 0-12.323.024M8 12.008A4 4 0 1 1 12.008 16 4 4 0 0 1 8 12.008'/%3E%3C/svg%3E")}.vpi-social-linkedin{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433a2.062 2.062 0 0 1-2.063-2.065 2.064 2.064 0 1 1 2.063 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z'/%3E%3C/svg%3E")}.vpi-social-mastodon{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M23.268 5.313c-.35-2.578-2.617-4.61-5.304-5.004C17.51.242 15.792 0 11.813 0h-.03c-3.98 0-4.835.242-5.288.309C3.882.692 1.496 2.518.917 5.127.64 6.412.61 7.837.661 9.143c.074 1.874.088 3.745.26 5.611.118 1.24.325 2.47.62 3.68.55 2.237 2.777 4.098 4.96 4.857 2.336.792 4.849.923 7.256.38.265-.061.527-.132.786-.213.585-.184 1.27-.39 1.774-.753a.057.057 0 0 0 .023-.043v-1.809a.052.052 0 0 0-.02-.041.053.053 0 0 0-.046-.01 20.282 20.282 0 0 1-4.709.545c-2.73 0-3.463-1.284-3.674-1.818a5.593 5.593 0 0 1-.319-1.433.053.053 0 0 1 .066-.054c1.517.363 3.072.546 4.632.546.376 0 .75 0 1.125-.01 1.57-.044 3.224-.124 4.768-.422.038-.008.077-.015.11-.024 2.435-.464 4.753-1.92 4.989-5.604.008-.145.03-1.52.03-1.67.002-.512.167-3.63-.024-5.545zm-3.748 9.195h-2.561V8.29c0-1.309-.55-1.976-1.67-1.976-1.23 0-1.846.79-1.846 2.35v3.403h-2.546V8.663c0-1.56-.617-2.35-1.848-2.35-1.112 0-1.668.668-1.67 1.977v6.218H4.822V8.102c0-1.31.337-2.35 1.011-3.12.696-.77 1.608-1.164 2.74-1.164 1.311 0 2.302.5 2.962 1.498l.638 1.06.638-1.06c.66-.999 1.65-1.498 2.96-1.498 1.13 0 2.043.395 2.74 1.164.675.77 1.012 1.81 1.012 3.12z'/%3E%3C/svg%3E")}.vpi-social-npm{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M1.763 0C.786 0 0 .786 0 1.763v20.474C0 23.214.786 24 1.763 24h20.474c.977 0 1.763-.786 1.763-1.763V1.763C24 .786 23.214 0 22.237 0zM5.13 5.323l13.837.019-.009 13.836h-3.464l.01-10.382h-3.456L12.04 19.17H5.113z'/%3E%3C/svg%3E")}.vpi-social-slack{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M5.042 15.165a2.528 2.528 0 0 1-2.52 2.523A2.528 2.528 0 0 1 0 15.165a2.527 2.527 0 0 1 2.522-2.52h2.52v2.52zm1.271 0a2.527 2.527 0 0 1 2.521-2.52 2.527 2.527 0 0 1 2.521 2.52v6.313A2.528 2.528 0 0 1 8.834 24a2.528 2.528 0 0 1-2.521-2.522v-6.313zM8.834 5.042a2.528 2.528 0 0 1-2.521-2.52A2.528 2.528 0 0 1 8.834 0a2.528 2.528 0 0 1 2.521 2.522v2.52H8.834zm0 1.271a2.528 2.528 0 0 1 2.521 2.521 2.528 2.528 0 0 1-2.521 2.521H2.522A2.528 2.528 0 0 1 0 8.834a2.528 2.528 0 0 1 2.522-2.521h6.312zm10.122 2.521a2.528 2.528 0 0 1 2.522-2.521A2.528 2.528 0 0 1 24 8.834a2.528 2.528 0 0 1-2.522 2.521h-2.522V8.834zm-1.268 0a2.528 2.528 0 0 1-2.523 2.521 2.527 2.527 0 0 1-2.52-2.521V2.522A2.527 2.527 0 0 1 15.165 0a2.528 2.528 0 0 1 2.523 2.522v6.312zm-2.523 10.122a2.528 2.528 0 0 1 2.523 2.522A2.528 2.528 0 0 1 15.165 24a2.527 2.527 0 0 1-2.52-2.522v-2.522h2.52zm0-1.268a2.527 2.527 0 0 1-2.52-2.523 2.526 2.526 0 0 1 2.52-2.52h6.313A2.527 2.527 0 0 1 24 15.165a2.528 2.528 0 0 1-2.522 2.523h-6.313z'/%3E%3C/svg%3E")}.vpi-social-twitter,.vpi-social-x{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M18.901 1.153h3.68l-8.04 9.19L24 22.846h-7.406l-5.8-7.584-6.638 7.584H.474l8.6-9.83L0 1.154h7.594l5.243 6.932ZM17.61 20.644h2.039L6.486 3.24H4.298Z'/%3E%3C/svg%3E")}.vpi-social-youtube{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath d='M23.498 6.186a3.016 3.016 0 0 0-2.122-2.136C19.505 3.545 12 3.545 12 3.545s-7.505 0-9.377.505A3.017 3.017 0 0 0 .502 6.186C0 8.07 0 12 0 12s0 3.93.502 5.814a3.016 3.016 0 0 0 2.122 2.136c1.871.505 9.376.505 9.376.505s7.505 0 9.377-.505a3.015 3.015 0 0 0 2.122-2.136C24 15.93 24 12 24 12s0-3.93-.502-5.814zM9.545 15.568V8.432L15.818 12l-6.273 3.568z'/%3E%3C/svg%3E")}.visually-hidden{position:absolute;width:1px;height:1px;white-space:nowrap;clip:rect(0 0 0 0);clip-path:inset(50%);overflow:hidden}.custom-block{border:1px solid transparent;border-radius:8px;padding:16px 16px 8px;line-height:24px;font-size:var(--vp-custom-block-font-size);color:var(--vp-c-text-2)}.custom-block.info{border-color:var(--vp-custom-block-info-border);color:var(--vp-custom-block-info-text);background-color:var(--vp-custom-block-info-bg)}.custom-block.info a,.custom-block.info code{color:var(--vp-c-brand-1)}.custom-block.info a:hover,.custom-block.info a:hover>code{color:var(--vp-c-brand-2)}.custom-block.info code{background-color:var(--vp-custom-block-info-code-bg)}.custom-block.note{border-color:var(--vp-custom-block-note-border);color:var(--vp-custom-block-note-text);background-color:var(--vp-custom-block-note-bg)}.custom-block.note a,.custom-block.note code{color:var(--vp-c-brand-1)}.custom-block.note a:hover,.custom-block.note a:hover>code{color:var(--vp-c-brand-2)}.custom-block.note code{background-color:var(--vp-custom-block-note-code-bg)}.custom-block.tip{border-color:var(--vp-custom-block-tip-border);color:var(--vp-custom-block-tip-text);background-color:var(--vp-custom-block-tip-bg)}.custom-block.tip a,.custom-block.tip code{color:var(--vp-c-tip-1)}.custom-block.tip a:hover,.custom-block.tip a:hover>code{color:var(--vp-c-tip-2)}.custom-block.tip code{background-color:var(--vp-custom-block-tip-code-bg)}.custom-block.important{border-color:var(--vp-custom-block-important-border);color:var(--vp-custom-block-important-text);background-color:var(--vp-custom-block-important-bg)}.custom-block.important a,.custom-block.important code{color:var(--vp-c-important-1)}.custom-block.important a:hover,.custom-block.important a:hover>code{color:var(--vp-c-important-2)}.custom-block.important code{background-color:var(--vp-custom-block-important-code-bg)}.custom-block.warning{border-color:var(--vp-custom-block-warning-border);color:var(--vp-custom-block-warning-text);background-color:var(--vp-custom-block-warning-bg)}.custom-block.warning a,.custom-block.warning code{color:var(--vp-c-warning-1)}.custom-block.warning a:hover,.custom-block.warning a:hover>code{color:var(--vp-c-warning-2)}.custom-block.warning code{background-color:var(--vp-custom-block-warning-code-bg)}.custom-block.danger{border-color:var(--vp-custom-block-danger-border);color:var(--vp-custom-block-danger-text);background-color:var(--vp-custom-block-danger-bg)}.custom-block.danger a,.custom-block.danger code{color:var(--vp-c-danger-1)}.custom-block.danger a:hover,.custom-block.danger a:hover>code{color:var(--vp-c-danger-2)}.custom-block.danger code{background-color:var(--vp-custom-block-danger-code-bg)}.custom-block.caution{border-color:var(--vp-custom-block-caution-border);color:var(--vp-custom-block-caution-text);background-color:var(--vp-custom-block-caution-bg)}.custom-block.caution a,.custom-block.caution code{color:var(--vp-c-caution-1)}.custom-block.caution a:hover,.custom-block.caution a:hover>code{color:var(--vp-c-caution-2)}.custom-block.caution code{background-color:var(--vp-custom-block-caution-code-bg)}.custom-block.details{border-color:var(--vp-custom-block-details-border);color:var(--vp-custom-block-details-text);background-color:var(--vp-custom-block-details-bg)}.custom-block.details a{color:var(--vp-c-brand-1)}.custom-block.details a:hover,.custom-block.details a:hover>code{color:var(--vp-c-brand-2)}.custom-block.details code{background-color:var(--vp-custom-block-details-code-bg)}.custom-block-title{font-weight:600}.custom-block p+p{margin:8px 0}.custom-block.details summary{margin:0 0 8px;font-weight:700;cursor:pointer;-webkit-user-select:none;user-select:none}.custom-block.details summary+p{margin:8px 0}.custom-block a{color:inherit;font-weight:600;text-decoration:underline;text-underline-offset:2px;transition:opacity .25s}.custom-block a:hover{opacity:.75}.custom-block code{font-size:var(--vp-custom-block-code-font-size)}.custom-block.custom-block th,.custom-block.custom-block blockquote>p{font-size:var(--vp-custom-block-font-size);color:inherit}.dark .vp-code span{color:var(--shiki-dark, inherit)}html:not(.dark) .vp-code span{color:var(--shiki-light, inherit)}.vp-code-group{margin-top:16px}.vp-code-group .tabs{position:relative;display:flex;margin-right:-24px;margin-left:-24px;padding:0 12px;background-color:var(--vp-code-tab-bg);overflow-x:auto;overflow-y:hidden;box-shadow:inset 0 -1px var(--vp-code-tab-divider)}@media (min-width: 640px){.vp-code-group .tabs{margin-right:0;margin-left:0;border-radius:8px 8px 0 0}}.vp-code-group .tabs input{position:fixed;opacity:0;pointer-events:none}.vp-code-group .tabs label{position:relative;display:inline-block;border-bottom:1px solid transparent;padding:0 12px;line-height:48px;font-size:14px;font-weight:500;color:var(--vp-code-tab-text-color);white-space:nowrap;cursor:pointer;transition:color .25s}.vp-code-group .tabs label:after{position:absolute;right:8px;bottom:-1px;left:8px;z-index:1;height:2px;border-radius:2px;content:"";background-color:transparent;transition:background-color .25s}.vp-code-group label:hover{color:var(--vp-code-tab-hover-text-color)}.vp-code-group input:checked+label{color:var(--vp-code-tab-active-text-color)}.vp-code-group input:checked+label:after{background-color:var(--vp-code-tab-active-bar-color)}.vp-code-group div[class*=language-],.vp-block{display:none;margin-top:0!important;border-top-left-radius:0!important;border-top-right-radius:0!important}.vp-code-group div[class*=language-].active,.vp-block.active{display:block}.vp-block{padding:20px 24px}.vp-doc h1,.vp-doc h2,.vp-doc h3,.vp-doc h4,.vp-doc h5,.vp-doc h6{position:relative;font-weight:600;outline:none}.vp-doc h1{letter-spacing:-.02em;line-height:40px;font-size:28px}.vp-doc h2{margin:48px 0 16px;border-top:1px solid var(--vp-c-divider);padding-top:24px;letter-spacing:-.02em;line-height:32px;font-size:24px}.vp-doc h3{margin:32px 0 0;letter-spacing:-.01em;line-height:28px;font-size:20px}.vp-doc h4{margin:24px 0 0;letter-spacing:-.01em;line-height:24px;font-size:18px}.vp-doc .header-anchor{position:absolute;top:0;left:0;margin-left:-.87em;font-weight:500;-webkit-user-select:none;user-select:none;opacity:0;text-decoration:none;transition:color .25s,opacity .25s}.vp-doc .header-anchor:before{content:var(--vp-header-anchor-symbol)}.vp-doc h1:hover .header-anchor,.vp-doc h1 .header-anchor:focus,.vp-doc h2:hover .header-anchor,.vp-doc h2 .header-anchor:focus,.vp-doc h3:hover .header-anchor,.vp-doc h3 .header-anchor:focus,.vp-doc h4:hover .header-anchor,.vp-doc h4 .header-anchor:focus,.vp-doc h5:hover .header-anchor,.vp-doc h5 .header-anchor:focus,.vp-doc h6:hover .header-anchor,.vp-doc h6 .header-anchor:focus{opacity:1}@media (min-width: 768px){.vp-doc h1{letter-spacing:-.02em;line-height:40px;font-size:32px}}.vp-doc h2 .header-anchor{top:24px}.vp-doc p,.vp-doc summary{margin:16px 0}.vp-doc p{line-height:28px}.vp-doc blockquote{margin:16px 0;border-left:2px solid var(--vp-c-divider);padding-left:16px;transition:border-color .5s;color:var(--vp-c-text-2)}.vp-doc blockquote>p{margin:0;font-size:16px;transition:color .5s}.vp-doc a{font-weight:500;color:var(--vp-c-brand-1);text-decoration:underline;text-underline-offset:2px;transition:color .25s,opacity .25s}.vp-doc a:hover{color:var(--vp-c-brand-2)}.vp-doc strong{font-weight:600}.vp-doc ul,.vp-doc ol{padding-left:1.25rem;margin:16px 0}.vp-doc ul{list-style:disc}.vp-doc ol{list-style:decimal}.vp-doc li+li{margin-top:8px}.vp-doc li>ol,.vp-doc li>ul{margin:8px 0 0}.vp-doc table{display:block;border-collapse:collapse;margin:20px 0;overflow-x:auto}.vp-doc tr{background-color:var(--vp-c-bg);border-top:1px solid var(--vp-c-divider);transition:background-color .5s}.vp-doc tr:nth-child(2n){background-color:var(--vp-c-bg-soft)}.vp-doc th,.vp-doc td{border:1px solid var(--vp-c-divider);padding:8px 16px}.vp-doc th{text-align:left;font-size:14px;font-weight:600;color:var(--vp-c-text-2);background-color:var(--vp-c-bg-soft)}.vp-doc td{font-size:14px}.vp-doc hr{margin:16px 0;border:none;border-top:1px solid var(--vp-c-divider)}.vp-doc .custom-block{margin:16px 0}.vp-doc .custom-block p{margin:8px 0;line-height:24px}.vp-doc .custom-block p:first-child{margin:0}.vp-doc .custom-block div[class*=language-]{margin:8px 0;border-radius:8px}.vp-doc .custom-block div[class*=language-] code{font-weight:400;background-color:transparent}.vp-doc .custom-block .vp-code-group .tabs{margin:0;border-radius:8px 8px 0 0}.vp-doc :not(pre,h1,h2,h3,h4,h5,h6)>code{font-size:var(--vp-code-font-size);color:var(--vp-code-color)}.vp-doc :not(pre)>code{border-radius:4px;padding:3px 6px;background-color:var(--vp-code-bg);transition:color .25s,background-color .5s}.vp-doc a>code{color:var(--vp-code-link-color)}.vp-doc a:hover>code{color:var(--vp-code-link-hover-color)}.vp-doc h1>code,.vp-doc h2>code,.vp-doc h3>code,.vp-doc h4>code{font-size:.9em}.vp-doc div[class*=language-],.vp-block{position:relative;margin:16px -24px;background-color:var(--vp-code-block-bg);overflow-x:auto;transition:background-color .5s}@media (min-width: 640px){.vp-doc div[class*=language-],.vp-block{border-radius:8px;margin:16px 0}}@media (max-width: 639px){.vp-doc li div[class*=language-]{border-radius:8px 0 0 8px}}.vp-doc div[class*=language-]+div[class*=language-],.vp-doc div[class$=-api]+div[class*=language-],.vp-doc div[class*=language-]+div[class$=-api]>div[class*=language-]{margin-top:-8px}.vp-doc [class*=language-] pre,.vp-doc [class*=language-] code{direction:ltr;text-align:left;white-space:pre;word-spacing:normal;word-break:normal;word-wrap:normal;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-hyphens:none;-moz-hyphens:none;-ms-hyphens:none;hyphens:none}.vp-doc [class*=language-] pre{position:relative;z-index:1;margin:0;padding:20px 0;background:transparent;overflow-x:auto}.vp-doc [class*=language-] code{display:block;padding:0 24px;width:fit-content;min-width:100%;line-height:var(--vp-code-line-height);font-size:var(--vp-code-font-size);color:var(--vp-code-block-color);transition:color .5s}.vp-doc [class*=language-] code .highlighted{background-color:var(--vp-code-line-highlight-color);transition:background-color .5s;margin:0 -24px;padding:0 24px;width:calc(100% + 48px);display:inline-block}.vp-doc [class*=language-] code .highlighted.error{background-color:var(--vp-code-line-error-color)}.vp-doc [class*=language-] code .highlighted.warning{background-color:var(--vp-code-line-warning-color)}.vp-doc [class*=language-] code .diff{transition:background-color .5s;margin:0 -24px;padding:0 24px;width:calc(100% + 48px);display:inline-block}.vp-doc [class*=language-] code .diff:before{position:absolute;left:10px}.vp-doc [class*=language-] .has-focused-lines .line:not(.has-focus){filter:blur(.095rem);opacity:.4;transition:filter .35s,opacity .35s}.vp-doc [class*=language-] .has-focused-lines .line:not(.has-focus){opacity:.7;transition:filter .35s,opacity .35s}.vp-doc [class*=language-]:hover .has-focused-lines .line:not(.has-focus){filter:blur(0);opacity:1}.vp-doc [class*=language-] code .diff.remove{background-color:var(--vp-code-line-diff-remove-color);opacity:.7}.vp-doc [class*=language-] code .diff.remove:before{content:"-";color:var(--vp-code-line-diff-remove-symbol-color)}.vp-doc [class*=language-] code .diff.add{background-color:var(--vp-code-line-diff-add-color)}.vp-doc [class*=language-] code .diff.add:before{content:"+";color:var(--vp-code-line-diff-add-symbol-color)}.vp-doc div[class*=language-].line-numbers-mode{padding-left:32px}.vp-doc .line-numbers-wrapper{position:absolute;top:0;bottom:0;left:0;z-index:3;border-right:1px solid var(--vp-code-block-divider-color);padding-top:20px;width:32px;text-align:center;font-family:var(--vp-font-family-mono);line-height:var(--vp-code-line-height);font-size:var(--vp-code-font-size);color:var(--vp-code-line-number-color);transition:border-color .5s,color .5s}.vp-doc [class*=language-]>button.copy{direction:ltr;position:absolute;top:12px;right:12px;z-index:3;border:1px solid var(--vp-code-copy-code-border-color);border-radius:4px;width:40px;height:40px;background-color:var(--vp-code-copy-code-bg);opacity:0;cursor:pointer;background-image:var(--vp-icon-copy);background-position:50%;background-size:20px;background-repeat:no-repeat;transition:border-color .25s,background-color .25s,opacity .25s}.vp-doc [class*=language-]:hover>button.copy,.vp-doc [class*=language-]>button.copy:focus{opacity:1}.vp-doc [class*=language-]>button.copy:hover,.vp-doc [class*=language-]>button.copy.copied{border-color:var(--vp-code-copy-code-hover-border-color);background-color:var(--vp-code-copy-code-hover-bg)}.vp-doc [class*=language-]>button.copy.copied,.vp-doc [class*=language-]>button.copy:hover.copied{border-radius:0 4px 4px 0;background-color:var(--vp-code-copy-code-hover-bg);background-image:var(--vp-icon-copied)}.vp-doc [class*=language-]>button.copy.copied:before,.vp-doc [class*=language-]>button.copy:hover.copied:before{position:relative;top:-1px;transform:translate(calc(-100% - 1px));display:flex;justify-content:center;align-items:center;border:1px solid var(--vp-code-copy-code-hover-border-color);border-right:0;border-radius:4px 0 0 4px;padding:0 10px;width:fit-content;height:40px;text-align:center;font-size:12px;font-weight:500;color:var(--vp-code-copy-code-active-text);background-color:var(--vp-code-copy-code-hover-bg);white-space:nowrap;content:var(--vp-code-copy-copied-text-content)}.vp-doc [class*=language-]>span.lang{position:absolute;top:2px;right:8px;z-index:2;font-size:12px;font-weight:500;color:var(--vp-code-lang-color);transition:color .4s,opacity .4s}.vp-doc [class*=language-]:hover>button.copy+span.lang,.vp-doc [class*=language-]>button.copy:focus+span.lang{opacity:0}.vp-doc .VPTeamMembers{margin-top:24px}.vp-doc .VPTeamMembers.small.count-1 .container{margin:0!important;max-width:calc((100% - 24px)/2)!important}.vp-doc .VPTeamMembers.small.count-2 .container,.vp-doc .VPTeamMembers.small.count-3 .container{max-width:100%!important}.vp-doc .VPTeamMembers.medium.count-1 .container{margin:0!important;max-width:calc((100% - 24px)/2)!important}:is(.vp-external-link-icon,.vp-doc a[href*="://"],.vp-doc a[target=_blank]):not(.no-icon):after{display:inline-block;margin-top:-1px;margin-left:4px;width:11px;height:11px;background:currentColor;color:var(--vp-c-text-3);flex-shrink:0;--icon: url("data:image/svg+xml, %3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24' %3E%3Cpath d='M0 0h24v24H0V0z' fill='none' /%3E%3Cpath d='M9 5v2h6.59L4 18.59 5.41 20 17 8.41V15h2V5H9z' /%3E%3C/svg%3E");-webkit-mask-image:var(--icon);mask-image:var(--icon)}.vp-external-link-icon:after{content:""}.external-link-icon-enabled :is(.vp-doc a[href*="://"],.vp-doc a[target=_blank]):after{content:"";color:currentColor}.vp-sponsor{border-radius:16px;overflow:hidden}.vp-sponsor.aside{border-radius:12px}.vp-sponsor-section+.vp-sponsor-section{margin-top:4px}.vp-sponsor-tier{margin:0 0 4px!important;text-align:center;letter-spacing:1px!important;line-height:24px;width:100%;font-weight:600;color:var(--vp-c-text-2);background-color:var(--vp-c-bg-soft)}.vp-sponsor.normal .vp-sponsor-tier{padding:13px 0 11px;font-size:14px}.vp-sponsor.aside .vp-sponsor-tier{padding:9px 0 7px;font-size:12px}.vp-sponsor-grid+.vp-sponsor-tier{margin-top:4px}.vp-sponsor-grid{display:flex;flex-wrap:wrap;gap:4px}.vp-sponsor-grid.xmini .vp-sponsor-grid-link{height:64px}.vp-sponsor-grid.xmini .vp-sponsor-grid-image{max-width:64px;max-height:22px}.vp-sponsor-grid.mini .vp-sponsor-grid-link{height:72px}.vp-sponsor-grid.mini .vp-sponsor-grid-image{max-width:96px;max-height:24px}.vp-sponsor-grid.small .vp-sponsor-grid-link{height:96px}.vp-sponsor-grid.small .vp-sponsor-grid-image{max-width:96px;max-height:24px}.vp-sponsor-grid.medium .vp-sponsor-grid-link{height:112px}.vp-sponsor-grid.medium .vp-sponsor-grid-image{max-width:120px;max-height:36px}.vp-sponsor-grid.big .vp-sponsor-grid-link{height:184px}.vp-sponsor-grid.big .vp-sponsor-grid-image{max-width:192px;max-height:56px}.vp-sponsor-grid[data-vp-grid="2"] .vp-sponsor-grid-item{width:calc((100% - 4px)/2)}.vp-sponsor-grid[data-vp-grid="3"] .vp-sponsor-grid-item{width:calc((100% - 4px * 2) / 3)}.vp-sponsor-grid[data-vp-grid="4"] .vp-sponsor-grid-item{width:calc((100% - 12px)/4)}.vp-sponsor-grid[data-vp-grid="5"] .vp-sponsor-grid-item{width:calc((100% - 16px)/5)}.vp-sponsor-grid[data-vp-grid="6"] .vp-sponsor-grid-item{width:calc((100% - 4px * 5) / 6)}.vp-sponsor-grid-item{flex-shrink:0;width:100%;background-color:var(--vp-c-bg-soft);transition:background-color .25s}.vp-sponsor-grid-item:hover{background-color:var(--vp-c-default-soft)}.vp-sponsor-grid-item:hover .vp-sponsor-grid-image{filter:grayscale(0) invert(0)}.vp-sponsor-grid-item.empty:hover{background-color:var(--vp-c-bg-soft)}.dark .vp-sponsor-grid-item:hover{background-color:var(--vp-c-white)}.dark .vp-sponsor-grid-item.empty:hover{background-color:var(--vp-c-bg-soft)}.vp-sponsor-grid-link{display:flex}.vp-sponsor-grid-box{display:flex;justify-content:center;align-items:center;width:100%}.vp-sponsor-grid-image{max-width:100%;filter:grayscale(1);transition:filter .25s}.dark .vp-sponsor-grid-image{filter:grayscale(1) invert(1)}.VPBadge{display:inline-block;margin-left:2px;border:1px solid transparent;border-radius:12px;padding:0 10px;line-height:22px;font-size:12px;font-weight:500;transform:translateY(-2px)}.VPBadge.small{padding:0 6px;line-height:18px;font-size:10px;transform:translateY(-8px)}.VPDocFooter .VPBadge{display:none}.vp-doc h1>.VPBadge{margin-top:4px;vertical-align:top}.vp-doc h2>.VPBadge{margin-top:3px;padding:0 8px;vertical-align:top}.vp-doc h3>.VPBadge{vertical-align:middle}.vp-doc h4>.VPBadge,.vp-doc h5>.VPBadge,.vp-doc h6>.VPBadge{vertical-align:middle;line-height:18px}.VPBadge.info{border-color:var(--vp-badge-info-border);color:var(--vp-badge-info-text);background-color:var(--vp-badge-info-bg)}.VPBadge.tip{border-color:var(--vp-badge-tip-border);color:var(--vp-badge-tip-text);background-color:var(--vp-badge-tip-bg)}.VPBadge.warning{border-color:var(--vp-badge-warning-border);color:var(--vp-badge-warning-text);background-color:var(--vp-badge-warning-bg)}.VPBadge.danger{border-color:var(--vp-badge-danger-border);color:var(--vp-badge-danger-text);background-color:var(--vp-badge-danger-bg)}.VPBackdrop[data-v-54a304ca]{position:fixed;top:0;right:0;bottom:0;left:0;z-index:var(--vp-z-index-backdrop);background:var(--vp-backdrop-bg-color);transition:opacity .5s}.VPBackdrop.fade-enter-from[data-v-54a304ca],.VPBackdrop.fade-leave-to[data-v-54a304ca]{opacity:0}.VPBackdrop.fade-leave-active[data-v-54a304ca]{transition-duration:.25s}@media (min-width: 1280px){.VPBackdrop[data-v-54a304ca]{display:none}}.NotFound[data-v-6ff51ddd]{padding:64px 24px 96px;text-align:center}@media (min-width: 768px){.NotFound[data-v-6ff51ddd]{padding:96px 32px 168px}}.code[data-v-6ff51ddd]{line-height:64px;font-size:64px;font-weight:600}.title[data-v-6ff51ddd]{padding-top:12px;letter-spacing:2px;line-height:20px;font-size:20px;font-weight:700}.divider[data-v-6ff51ddd]{margin:24px auto 18px;width:64px;height:1px;background-color:var(--vp-c-divider)}.quote[data-v-6ff51ddd]{margin:0 auto;max-width:256px;font-size:14px;font-weight:500;color:var(--vp-c-text-2)}.action[data-v-6ff51ddd]{padding-top:20px}.link[data-v-6ff51ddd]{display:inline-block;border:1px solid var(--vp-c-brand-1);border-radius:16px;padding:3px 16px;font-size:14px;font-weight:500;color:var(--vp-c-brand-1);transition:border-color .25s,color .25s}.link[data-v-6ff51ddd]:hover{border-color:var(--vp-c-brand-2);color:var(--vp-c-brand-2)}.root[data-v-53c99d69]{position:relative;z-index:1}.nested[data-v-53c99d69]{padding-right:16px;padding-left:16px}.outline-link[data-v-53c99d69]{display:block;line-height:32px;font-size:14px;font-weight:400;color:var(--vp-c-text-2);white-space:nowrap;overflow:hidden;text-overflow:ellipsis;transition:color .5s}.outline-link[data-v-53c99d69]:hover,.outline-link.active[data-v-53c99d69]{color:var(--vp-c-text-1);transition:color .25s}.outline-link.nested[data-v-53c99d69]{padding-left:13px}.VPDocAsideOutline[data-v-f610f197]{display:none}.VPDocAsideOutline.has-outline[data-v-f610f197]{display:block}.content[data-v-f610f197]{position:relative;border-left:1px solid var(--vp-c-divider);padding-left:16px;font-size:13px;font-weight:500}.outline-marker[data-v-f610f197]{position:absolute;top:32px;left:-1px;z-index:0;opacity:0;width:2px;border-radius:2px;height:18px;background-color:var(--vp-c-brand-1);transition:top .25s cubic-bezier(0,1,.5,1),background-color .5s,opacity .25s}.outline-title[data-v-f610f197]{line-height:32px;font-size:14px;font-weight:600}.VPDocAside[data-v-cb998dce]{display:flex;flex-direction:column;flex-grow:1}.spacer[data-v-cb998dce]{flex-grow:1}.VPDocAside[data-v-cb998dce] .spacer+.VPDocAsideSponsors,.VPDocAside[data-v-cb998dce] .spacer+.VPDocAsideCarbonAds{margin-top:24px}.VPDocAside[data-v-cb998dce] .VPDocAsideSponsors+.VPDocAsideCarbonAds{margin-top:16px}.VPLastUpdated[data-v-1bb0c8a8]{line-height:24px;font-size:14px;font-weight:500;color:var(--vp-c-text-2)}@media (min-width: 640px){.VPLastUpdated[data-v-1bb0c8a8]{line-height:32px;font-size:14px;font-weight:500}}.VPDocFooter[data-v-1bcd8184]{margin-top:64px}.edit-info[data-v-1bcd8184]{padding-bottom:18px}@media (min-width: 640px){.edit-info[data-v-1bcd8184]{display:flex;justify-content:space-between;align-items:center;padding-bottom:14px}}.edit-link-button[data-v-1bcd8184]{display:flex;align-items:center;border:0;line-height:32px;font-size:14px;font-weight:500;color:var(--vp-c-brand-1);transition:color .25s}.edit-link-button[data-v-1bcd8184]:hover{color:var(--vp-c-brand-2)}.edit-link-icon[data-v-1bcd8184]{margin-right:8px}.prev-next[data-v-1bcd8184]{border-top:1px solid var(--vp-c-divider);padding-top:24px;display:grid;grid-row-gap:8px}@media (min-width: 640px){.prev-next[data-v-1bcd8184]{grid-template-columns:repeat(2,1fr);grid-column-gap:16px}}.pager-link[data-v-1bcd8184]{display:block;border:1px solid var(--vp-c-divider);border-radius:8px;padding:11px 16px 13px;width:100%;height:100%;transition:border-color .25s}.pager-link[data-v-1bcd8184]:hover{border-color:var(--vp-c-brand-1)}.pager-link.next[data-v-1bcd8184]{margin-left:auto;text-align:right}.desc[data-v-1bcd8184]{display:block;line-height:20px;font-size:12px;font-weight:500;color:var(--vp-c-text-2)}.title[data-v-1bcd8184]{display:block;line-height:20px;font-size:14px;font-weight:500;color:var(--vp-c-brand-1);transition:color .25s}.VPDoc[data-v-e6f2a212]{padding:32px 24px 96px;width:100%}@media (min-width: 768px){.VPDoc[data-v-e6f2a212]{padding:48px 32px 128px}}@media (min-width: 960px){.VPDoc[data-v-e6f2a212]{padding:48px 32px 0}.VPDoc:not(.has-sidebar) .container[data-v-e6f2a212]{display:flex;justify-content:center;max-width:992px}.VPDoc:not(.has-sidebar) .content[data-v-e6f2a212]{max-width:752px}}@media (min-width: 1280px){.VPDoc .container[data-v-e6f2a212]{display:flex;justify-content:center}.VPDoc .aside[data-v-e6f2a212]{display:block}}@media (min-width: 1440px){.VPDoc:not(.has-sidebar) .content[data-v-e6f2a212]{max-width:784px}.VPDoc:not(.has-sidebar) .container[data-v-e6f2a212]{max-width:1104px}}.container[data-v-e6f2a212]{margin:0 auto;width:100%}.aside[data-v-e6f2a212]{position:relative;display:none;order:2;flex-grow:1;padding-left:32px;width:100%;max-width:256px}.left-aside[data-v-e6f2a212]{order:1;padding-left:unset;padding-right:32px}.aside-container[data-v-e6f2a212]{position:fixed;top:0;padding-top:calc(var(--vp-nav-height) + var(--vp-layout-top-height, 0px) + var(--vp-doc-top-height, 0px) + 48px);width:224px;height:100vh;overflow-x:hidden;overflow-y:auto;scrollbar-width:none}.aside-container[data-v-e6f2a212]::-webkit-scrollbar{display:none}.aside-curtain[data-v-e6f2a212]{position:fixed;bottom:0;z-index:10;width:224px;height:32px;background:linear-gradient(transparent,var(--vp-c-bg) 70%)}.aside-content[data-v-e6f2a212]{display:flex;flex-direction:column;min-height:calc(100vh - (var(--vp-nav-height) + var(--vp-layout-top-height, 0px) + 48px));padding-bottom:32px}.content[data-v-e6f2a212]{position:relative;margin:0 auto;width:100%}@media (min-width: 960px){.content[data-v-e6f2a212]{padding:0 32px 128px}}@media (min-width: 1280px){.content[data-v-e6f2a212]{order:1;margin:0;min-width:640px}}.content-container[data-v-e6f2a212]{margin:0 auto}.VPDoc.has-aside .content-container[data-v-e6f2a212]{max-width:688px}.VPButton[data-v-93dc4167]{display:inline-block;border:1px solid transparent;text-align:center;font-weight:600;white-space:nowrap;transition:color .25s,border-color .25s,background-color .25s}.VPButton[data-v-93dc4167]:active{transition:color .1s,border-color .1s,background-color .1s}.VPButton.medium[data-v-93dc4167]{border-radius:20px;padding:0 20px;line-height:38px;font-size:14px}.VPButton.big[data-v-93dc4167]{border-radius:24px;padding:0 24px;line-height:46px;font-size:16px}.VPButton.brand[data-v-93dc4167]{border-color:var(--vp-button-brand-border);color:var(--vp-button-brand-text);background-color:var(--vp-button-brand-bg)}.VPButton.brand[data-v-93dc4167]:hover{border-color:var(--vp-button-brand-hover-border);color:var(--vp-button-brand-hover-text);background-color:var(--vp-button-brand-hover-bg)}.VPButton.brand[data-v-93dc4167]:active{border-color:var(--vp-button-brand-active-border);color:var(--vp-button-brand-active-text);background-color:var(--vp-button-brand-active-bg)}.VPButton.alt[data-v-93dc4167]{border-color:var(--vp-button-alt-border);color:var(--vp-button-alt-text);background-color:var(--vp-button-alt-bg)}.VPButton.alt[data-v-93dc4167]:hover{border-color:var(--vp-button-alt-hover-border);color:var(--vp-button-alt-hover-text);background-color:var(--vp-button-alt-hover-bg)}.VPButton.alt[data-v-93dc4167]:active{border-color:var(--vp-button-alt-active-border);color:var(--vp-button-alt-active-text);background-color:var(--vp-button-alt-active-bg)}.VPButton.sponsor[data-v-93dc4167]{border-color:var(--vp-button-sponsor-border);color:var(--vp-button-sponsor-text);background-color:var(--vp-button-sponsor-bg)}.VPButton.sponsor[data-v-93dc4167]:hover{border-color:var(--vp-button-sponsor-hover-border);color:var(--vp-button-sponsor-hover-text);background-color:var(--vp-button-sponsor-hover-bg)}.VPButton.sponsor[data-v-93dc4167]:active{border-color:var(--vp-button-sponsor-active-border);color:var(--vp-button-sponsor-active-text);background-color:var(--vp-button-sponsor-active-bg)}html:not(.dark) .VPImage.dark[data-v-ab19afbb]{display:none}.dark .VPImage.light[data-v-ab19afbb]{display:none}.VPHero[data-v-b10c5094]{margin-top:calc((var(--vp-nav-height) + var(--vp-layout-top-height, 0px)) * -1);padding:calc(var(--vp-nav-height) + var(--vp-layout-top-height, 0px) + 48px) 24px 48px}@media (min-width: 640px){.VPHero[data-v-b10c5094]{padding:calc(var(--vp-nav-height) + var(--vp-layout-top-height, 0px) + 80px) 48px 64px}}@media (min-width: 960px){.VPHero[data-v-b10c5094]{padding:calc(var(--vp-nav-height) + var(--vp-layout-top-height, 0px) + 80px) 64px 64px}}.container[data-v-b10c5094]{display:flex;flex-direction:column;margin:0 auto;max-width:1152px}@media (min-width: 960px){.container[data-v-b10c5094]{flex-direction:row}}.main[data-v-b10c5094]{position:relative;z-index:10;order:2;flex-grow:1;flex-shrink:0}.VPHero.has-image .container[data-v-b10c5094]{text-align:center}@media (min-width: 960px){.VPHero.has-image .container[data-v-b10c5094]{text-align:left}}@media (min-width: 960px){.main[data-v-b10c5094]{order:1;width:calc((100% / 3) * 2)}.VPHero.has-image .main[data-v-b10c5094]{max-width:592px}}.name[data-v-b10c5094],.text[data-v-b10c5094]{max-width:392px;letter-spacing:-.4px;line-height:40px;font-size:32px;font-weight:700;white-space:pre-wrap}.VPHero.has-image .name[data-v-b10c5094],.VPHero.has-image .text[data-v-b10c5094]{margin:0 auto}.name[data-v-b10c5094]{color:var(--vp-home-hero-name-color)}.clip[data-v-b10c5094]{background:var(--vp-home-hero-name-background);-webkit-background-clip:text;background-clip:text;-webkit-text-fill-color:var(--vp-home-hero-name-color)}@media (min-width: 640px){.name[data-v-b10c5094],.text[data-v-b10c5094]{max-width:576px;line-height:56px;font-size:48px}}@media (min-width: 960px){.name[data-v-b10c5094],.text[data-v-b10c5094]{line-height:64px;font-size:56px}.VPHero.has-image .name[data-v-b10c5094],.VPHero.has-image .text[data-v-b10c5094]{margin:0}}.tagline[data-v-b10c5094]{padding-top:8px;max-width:392px;line-height:28px;font-size:18px;font-weight:500;white-space:pre-wrap;color:var(--vp-c-text-2)}.VPHero.has-image .tagline[data-v-b10c5094]{margin:0 auto}@media (min-width: 640px){.tagline[data-v-b10c5094]{padding-top:12px;max-width:576px;line-height:32px;font-size:20px}}@media (min-width: 960px){.tagline[data-v-b10c5094]{line-height:36px;font-size:24px}.VPHero.has-image .tagline[data-v-b10c5094]{margin:0}}.actions[data-v-b10c5094]{display:flex;flex-wrap:wrap;margin:-6px;padding-top:24px}.VPHero.has-image .actions[data-v-b10c5094]{justify-content:center}@media (min-width: 640px){.actions[data-v-b10c5094]{padding-top:32px}}@media (min-width: 960px){.VPHero.has-image .actions[data-v-b10c5094]{justify-content:flex-start}}.action[data-v-b10c5094]{flex-shrink:0;padding:6px}.image[data-v-b10c5094]{order:1;margin:-76px -24px -48px}@media (min-width: 640px){.image[data-v-b10c5094]{margin:-108px -24px -48px}}@media (min-width: 960px){.image[data-v-b10c5094]{flex-grow:1;order:2;margin:0;min-height:100%}}.image-container[data-v-b10c5094]{position:relative;margin:0 auto;width:320px;height:320px}@media (min-width: 640px){.image-container[data-v-b10c5094]{width:392px;height:392px}}@media (min-width: 960px){.image-container[data-v-b10c5094]{display:flex;justify-content:center;align-items:center;width:100%;height:100%;transform:translate(-32px,-32px)}}.image-bg[data-v-b10c5094]{position:absolute;top:50%;left:50%;border-radius:50%;width:192px;height:192px;background-image:var(--vp-home-hero-image-background-image);filter:var(--vp-home-hero-image-filter);transform:translate(-50%,-50%)}@media (min-width: 640px){.image-bg[data-v-b10c5094]{width:256px;height:256px}}@media (min-width: 960px){.image-bg[data-v-b10c5094]{width:320px;height:320px}}[data-v-b10c5094] .image-src{position:absolute;top:50%;left:50%;max-width:192px;max-height:192px;transform:translate(-50%,-50%)}@media (min-width: 640px){[data-v-b10c5094] .image-src{max-width:256px;max-height:256px}}@media (min-width: 960px){[data-v-b10c5094] .image-src{max-width:320px;max-height:320px}}.VPFeature[data-v-bd37d1a2]{display:block;border:1px solid var(--vp-c-bg-soft);border-radius:12px;height:100%;background-color:var(--vp-c-bg-soft);transition:border-color .25s,background-color .25s}.VPFeature.link[data-v-bd37d1a2]:hover{border-color:var(--vp-c-brand-1)}.box[data-v-bd37d1a2]{display:flex;flex-direction:column;padding:24px;height:100%}.box[data-v-bd37d1a2]>.VPImage{margin-bottom:20px}.icon[data-v-bd37d1a2]{display:flex;justify-content:center;align-items:center;margin-bottom:20px;border-radius:6px;background-color:var(--vp-c-default-soft);width:48px;height:48px;font-size:24px;transition:background-color .25s}.title[data-v-bd37d1a2]{line-height:24px;font-size:16px;font-weight:600}.details[data-v-bd37d1a2]{flex-grow:1;padding-top:8px;line-height:24px;font-size:14px;font-weight:500;color:var(--vp-c-text-2)}.link-text[data-v-bd37d1a2]{padding-top:8px}.link-text-value[data-v-bd37d1a2]{display:flex;align-items:center;font-size:14px;font-weight:500;color:var(--vp-c-brand-1)}.link-text-icon[data-v-bd37d1a2]{margin-left:6px}.VPFeatures[data-v-b1eea84a]{position:relative;padding:0 24px}@media (min-width: 640px){.VPFeatures[data-v-b1eea84a]{padding:0 48px}}@media (min-width: 960px){.VPFeatures[data-v-b1eea84a]{padding:0 64px}}.container[data-v-b1eea84a]{margin:0 auto;max-width:1152px}.items[data-v-b1eea84a]{display:flex;flex-wrap:wrap;margin:-8px}.item[data-v-b1eea84a]{padding:8px;width:100%}@media (min-width: 640px){.item.grid-2[data-v-b1eea84a],.item.grid-4[data-v-b1eea84a],.item.grid-6[data-v-b1eea84a]{width:50%}}@media (min-width: 768px){.item.grid-2[data-v-b1eea84a],.item.grid-4[data-v-b1eea84a]{width:50%}.item.grid-3[data-v-b1eea84a],.item.grid-6[data-v-b1eea84a]{width:calc(100% / 3)}}@media (min-width: 960px){.item.grid-4[data-v-b1eea84a]{width:25%}}.container[data-v-c141a4bd]{margin:auto;width:100%;max-width:1280px;padding:0 24px}@media (min-width: 640px){.container[data-v-c141a4bd]{padding:0 48px}}@media (min-width: 960px){.container[data-v-c141a4bd]{width:100%;padding:0 64px}}.vp-doc[data-v-c141a4bd] .VPHomeSponsors,.vp-doc[data-v-c141a4bd] .VPTeamPage{margin-left:var(--vp-offset, calc(50% - 50vw) );margin-right:var(--vp-offset, calc(50% - 50vw) )}.vp-doc[data-v-c141a4bd] .VPHomeSponsors h2{border-top:none;letter-spacing:normal}.vp-doc[data-v-c141a4bd] .VPHomeSponsors a,.vp-doc[data-v-c141a4bd] .VPTeamPage a{text-decoration:none}.VPHome[data-v-07b1ad08]{margin-bottom:96px}@media (min-width: 768px){.VPHome[data-v-07b1ad08]{margin-bottom:128px}}.VPContent[data-v-9a6c75ad]{flex-grow:1;flex-shrink:0;margin:var(--vp-layout-top-height, 0px) auto 0;width:100%}.VPContent.is-home[data-v-9a6c75ad]{width:100%;max-width:100%}.VPContent.has-sidebar[data-v-9a6c75ad]{margin:0}@media (min-width: 960px){.VPContent[data-v-9a6c75ad]{padding-top:var(--vp-nav-height)}.VPContent.has-sidebar[data-v-9a6c75ad]{margin:var(--vp-layout-top-height, 0px) 0 0;padding-left:var(--vp-sidebar-width)}}@media (min-width: 1440px){.VPContent.has-sidebar[data-v-9a6c75ad]{padding-right:calc((100vw - var(--vp-layout-max-width)) / 2);padding-left:calc((100vw - var(--vp-layout-max-width)) / 2 + var(--vp-sidebar-width))}}.VPFooter[data-v-566314d4]{position:relative;z-index:var(--vp-z-index-footer);border-top:1px solid var(--vp-c-gutter);padding:32px 24px;background-color:var(--vp-c-bg)}.VPFooter.has-sidebar[data-v-566314d4]{display:none}.VPFooter[data-v-566314d4] a{text-decoration-line:underline;text-underline-offset:2px;transition:color .25s}.VPFooter[data-v-566314d4] a:hover{color:var(--vp-c-text-1)}@media (min-width: 768px){.VPFooter[data-v-566314d4]{padding:32px}}.container[data-v-566314d4]{margin:0 auto;max-width:var(--vp-layout-max-width);text-align:center}.message[data-v-566314d4],.copyright[data-v-566314d4]{line-height:24px;font-size:14px;font-weight:500;color:var(--vp-c-text-2)}.VPLocalNavOutlineDropdown[data-v-883964e0]{padding:12px 20px 11px}@media (min-width: 960px){.VPLocalNavOutlineDropdown[data-v-883964e0]{padding:12px 36px 11px}}.VPLocalNavOutlineDropdown button[data-v-883964e0]{display:block;font-size:12px;font-weight:500;line-height:24px;color:var(--vp-c-text-2);transition:color .5s;position:relative}.VPLocalNavOutlineDropdown button[data-v-883964e0]:hover{color:var(--vp-c-text-1);transition:color .25s}.VPLocalNavOutlineDropdown button.open[data-v-883964e0]{color:var(--vp-c-text-1)}.icon[data-v-883964e0]{display:inline-block;vertical-align:middle;margin-left:2px;font-size:14px;transform:rotate(0);transition:transform .25s}@media (min-width: 960px){.VPLocalNavOutlineDropdown button[data-v-883964e0]{font-size:14px}.icon[data-v-883964e0]{font-size:16px}}.open>.icon[data-v-883964e0]{transform:rotate(90deg)}.items[data-v-883964e0]{position:absolute;top:40px;right:16px;left:16px;display:grid;gap:1px;border:1px solid var(--vp-c-border);border-radius:8px;background-color:var(--vp-c-gutter);max-height:calc(var(--vp-vh, 100vh) - 86px);overflow:hidden auto;box-shadow:var(--vp-shadow-3)}@media (min-width: 960px){.items[data-v-883964e0]{right:auto;left:calc(var(--vp-sidebar-width) + 32px);width:320px}}.header[data-v-883964e0]{background-color:var(--vp-c-bg-soft)}.top-link[data-v-883964e0]{display:block;padding:0 16px;line-height:48px;font-size:14px;font-weight:500;color:var(--vp-c-brand-1)}.outline[data-v-883964e0]{padding:8px 0;background-color:var(--vp-c-bg-soft)}.flyout-enter-active[data-v-883964e0]{transition:all .2s ease-out}.flyout-leave-active[data-v-883964e0]{transition:all .15s ease-in}.flyout-enter-from[data-v-883964e0],.flyout-leave-to[data-v-883964e0]{opacity:0;transform:translateY(-16px)}.VPLocalNav[data-v-2488c25a]{position:sticky;top:0;left:0;z-index:var(--vp-z-index-local-nav);border-bottom:1px solid var(--vp-c-gutter);padding-top:var(--vp-layout-top-height, 0px);width:100%;background-color:var(--vp-local-nav-bg-color)}.VPLocalNav.fixed[data-v-2488c25a]{position:fixed}@media (min-width: 960px){.VPLocalNav[data-v-2488c25a]{top:var(--vp-nav-height)}.VPLocalNav.has-sidebar[data-v-2488c25a]{padding-left:var(--vp-sidebar-width)}.VPLocalNav.empty[data-v-2488c25a]{display:none}}@media (min-width: 1280px){.VPLocalNav[data-v-2488c25a]{display:none}}@media (min-width: 1440px){.VPLocalNav.has-sidebar[data-v-2488c25a]{padding-left:calc((100vw - var(--vp-layout-max-width)) / 2 + var(--vp-sidebar-width))}}.container[data-v-2488c25a]{display:flex;justify-content:space-between;align-items:center}.menu[data-v-2488c25a]{display:flex;align-items:center;padding:12px 24px 11px;line-height:24px;font-size:12px;font-weight:500;color:var(--vp-c-text-2);transition:color .5s}.menu[data-v-2488c25a]:hover{color:var(--vp-c-text-1);transition:color .25s}@media (min-width: 768px){.menu[data-v-2488c25a]{padding:0 32px}}@media (min-width: 960px){.menu[data-v-2488c25a]{display:none}}.menu-icon[data-v-2488c25a]{margin-right:8px;font-size:14px}.VPOutlineDropdown[data-v-2488c25a]{padding:12px 24px 11px}@media (min-width: 768px){.VPOutlineDropdown[data-v-2488c25a]{padding:12px 32px 11px}}.VPSwitch[data-v-b4ccac88]{position:relative;border-radius:11px;display:block;width:40px;height:22px;flex-shrink:0;border:1px solid var(--vp-input-border-color);background-color:var(--vp-input-switch-bg-color);transition:border-color .25s!important}.VPSwitch[data-v-b4ccac88]:hover{border-color:var(--vp-c-brand-1)}.check[data-v-b4ccac88]{position:absolute;top:1px;left:1px;width:18px;height:18px;border-radius:50%;background-color:var(--vp-c-neutral-inverse);box-shadow:var(--vp-shadow-1);transition:transform .25s!important}.icon[data-v-b4ccac88]{position:relative;display:block;width:18px;height:18px;border-radius:50%;overflow:hidden}.icon[data-v-b4ccac88] [class^=vpi-]{position:absolute;top:3px;left:3px;width:12px;height:12px;color:var(--vp-c-text-2)}.dark .icon[data-v-b4ccac88] [class^=vpi-]{color:var(--vp-c-text-1);transition:opacity .25s!important}.sun[data-v-be9742d9]{opacity:1}.moon[data-v-be9742d9],.dark .sun[data-v-be9742d9]{opacity:0}.dark .moon[data-v-be9742d9]{opacity:1}.dark .VPSwitchAppearance[data-v-be9742d9] .check{transform:translate(18px)}.VPNavBarAppearance[data-v-3f90c1a5]{display:none}@media (min-width: 1280px){.VPNavBarAppearance[data-v-3f90c1a5]{display:flex;align-items:center}}.VPMenuGroup+.VPMenuLink[data-v-7eeeb2dc]{margin:12px -12px 0;border-top:1px solid var(--vp-c-divider);padding:12px 12px 0}.link[data-v-7eeeb2dc]{display:block;border-radius:6px;padding:0 12px;line-height:32px;font-size:14px;font-weight:500;color:var(--vp-c-text-1);white-space:nowrap;transition:background-color .25s,color .25s}.link[data-v-7eeeb2dc]:hover{color:var(--vp-c-brand-1);background-color:var(--vp-c-default-soft)}.link.active[data-v-7eeeb2dc]{color:var(--vp-c-brand-1)}.VPMenuGroup[data-v-a6b0397c]{margin:12px -12px 0;border-top:1px solid var(--vp-c-divider);padding:12px 12px 0}.VPMenuGroup[data-v-a6b0397c]:first-child{margin-top:0;border-top:0;padding-top:0}.VPMenuGroup+.VPMenuGroup[data-v-a6b0397c]{margin-top:12px;border-top:1px solid var(--vp-c-divider)}.title[data-v-a6b0397c]{padding:0 12px;line-height:32px;font-size:14px;font-weight:600;color:var(--vp-c-text-2);white-space:nowrap;transition:color .25s}.VPMenu[data-v-20ed86d6]{border-radius:12px;padding:12px;min-width:128px;border:1px solid var(--vp-c-divider);background-color:var(--vp-c-bg-elv);box-shadow:var(--vp-shadow-3);transition:background-color .5s;max-height:calc(100vh - var(--vp-nav-height));overflow-y:auto}.VPMenu[data-v-20ed86d6] .group{margin:0 -12px;padding:0 12px 12px}.VPMenu[data-v-20ed86d6] .group+.group{border-top:1px solid var(--vp-c-divider);padding:11px 12px 12px}.VPMenu[data-v-20ed86d6] .group:last-child{padding-bottom:0}.VPMenu[data-v-20ed86d6] .group+.item{border-top:1px solid var(--vp-c-divider);padding:11px 16px 0}.VPMenu[data-v-20ed86d6] .item{padding:0 16px;white-space:nowrap}.VPMenu[data-v-20ed86d6] .label{flex-grow:1;line-height:28px;font-size:12px;font-weight:500;color:var(--vp-c-text-2);transition:color .5s}.VPMenu[data-v-20ed86d6] .action{padding-left:24px}.VPFlyout[data-v-bfe7971f]{position:relative}.VPFlyout[data-v-bfe7971f]:hover{color:var(--vp-c-brand-1);transition:color .25s}.VPFlyout:hover .text[data-v-bfe7971f]{color:var(--vp-c-text-2)}.VPFlyout:hover .icon[data-v-bfe7971f]{fill:var(--vp-c-text-2)}.VPFlyout.active .text[data-v-bfe7971f]{color:var(--vp-c-brand-1)}.VPFlyout.active:hover .text[data-v-bfe7971f]{color:var(--vp-c-brand-2)}.button[aria-expanded=false]+.menu[data-v-bfe7971f]{opacity:0;visibility:hidden;transform:translateY(0)}.VPFlyout:hover .menu[data-v-bfe7971f],.button[aria-expanded=true]+.menu[data-v-bfe7971f]{opacity:1;visibility:visible;transform:translateY(0)}.button[data-v-bfe7971f]{display:flex;align-items:center;padding:0 12px;height:var(--vp-nav-height);color:var(--vp-c-text-1);transition:color .5s}.text[data-v-bfe7971f]{display:flex;align-items:center;line-height:var(--vp-nav-height);font-size:14px;font-weight:500;color:var(--vp-c-text-1);transition:color .25s}.option-icon[data-v-bfe7971f]{margin-right:0;font-size:16px}.text-icon[data-v-bfe7971f]{margin-left:4px;font-size:14px}.icon[data-v-bfe7971f]{font-size:20px;transition:fill .25s}.menu[data-v-bfe7971f]{position:absolute;top:calc(var(--vp-nav-height) / 2 + 20px);right:0;opacity:0;visibility:hidden;transition:opacity .25s,visibility .25s,transform .25s}.VPSocialLink[data-v-358b6670]{display:flex;justify-content:center;align-items:center;width:36px;height:36px;color:var(--vp-c-text-2);transition:color .5s}.VPSocialLink[data-v-358b6670]:hover{color:var(--vp-c-text-1);transition:color .25s}.VPSocialLink[data-v-358b6670]>svg,.VPSocialLink[data-v-358b6670]>[class^=vpi-social-]{width:20px;height:20px;fill:currentColor}.VPSocialLinks[data-v-e71e869c]{display:flex;justify-content:center}.VPNavBarExtra[data-v-f953d92f]{display:none;margin-right:-12px}@media (min-width: 768px){.VPNavBarExtra[data-v-f953d92f]{display:block}}@media (min-width: 1280px){.VPNavBarExtra[data-v-f953d92f]{display:none}}.trans-title[data-v-f953d92f]{padding:0 24px 0 12px;line-height:32px;font-size:14px;font-weight:700;color:var(--vp-c-text-1)}.item.appearance[data-v-f953d92f],.item.social-links[data-v-f953d92f]{display:flex;align-items:center;padding:0 12px}.item.appearance[data-v-f953d92f]{min-width:176px}.appearance-action[data-v-f953d92f]{margin-right:-2px}.social-links-list[data-v-f953d92f]{margin:-4px -8px}.VPNavBarHamburger[data-v-6bee1efd]{display:flex;justify-content:center;align-items:center;width:48px;height:var(--vp-nav-height)}@media (min-width: 768px){.VPNavBarHamburger[data-v-6bee1efd]{display:none}}.container[data-v-6bee1efd]{position:relative;width:16px;height:14px;overflow:hidden}.VPNavBarHamburger:hover .top[data-v-6bee1efd]{top:0;left:0;transform:translate(4px)}.VPNavBarHamburger:hover .middle[data-v-6bee1efd]{top:6px;left:0;transform:translate(0)}.VPNavBarHamburger:hover .bottom[data-v-6bee1efd]{top:12px;left:0;transform:translate(8px)}.VPNavBarHamburger.active .top[data-v-6bee1efd]{top:6px;transform:translate(0) rotate(225deg)}.VPNavBarHamburger.active .middle[data-v-6bee1efd]{top:6px;transform:translate(16px)}.VPNavBarHamburger.active .bottom[data-v-6bee1efd]{top:6px;transform:translate(0) rotate(135deg)}.VPNavBarHamburger.active:hover .top[data-v-6bee1efd],.VPNavBarHamburger.active:hover .middle[data-v-6bee1efd],.VPNavBarHamburger.active:hover .bottom[data-v-6bee1efd]{background-color:var(--vp-c-text-2);transition:top .25s,background-color .25s,transform .25s}.top[data-v-6bee1efd],.middle[data-v-6bee1efd],.bottom[data-v-6bee1efd]{position:absolute;width:16px;height:2px;background-color:var(--vp-c-text-1);transition:top .25s,background-color .5s,transform .25s}.top[data-v-6bee1efd]{top:0;left:0;transform:translate(0)}.middle[data-v-6bee1efd]{top:6px;left:0;transform:translate(8px)}.bottom[data-v-6bee1efd]{top:12px;left:0;transform:translate(4px)}.VPNavBarMenuLink[data-v-815115f5]{display:flex;align-items:center;padding:0 12px;line-height:var(--vp-nav-height);font-size:14px;font-weight:500;color:var(--vp-c-text-1);transition:color .25s}.VPNavBarMenuLink.active[data-v-815115f5],.VPNavBarMenuLink[data-v-815115f5]:hover{color:var(--vp-c-brand-1)}.VPNavBarMenu[data-v-afb2845e]{display:none}@media (min-width: 768px){.VPNavBarMenu[data-v-afb2845e]{display:flex}}/*! @docsearch/css 3.6.2 | MIT License | © Algolia, Inc. and contributors | https://docsearch.algolia.com */:root{--docsearch-primary-color:#5468ff;--docsearch-text-color:#1c1e21;--docsearch-spacing:12px;--docsearch-icon-stroke-width:1.4;--docsearch-highlight-color:var(--docsearch-primary-color);--docsearch-muted-color:#969faf;--docsearch-container-background:rgba(101,108,133,.8);--docsearch-logo-color:#5468ff;--docsearch-modal-width:560px;--docsearch-modal-height:600px;--docsearch-modal-background:#f5f6f7;--docsearch-modal-shadow:inset 1px 1px 0 0 hsla(0,0%,100%,.5),0 3px 8px 0 #555a64;--docsearch-searchbox-height:56px;--docsearch-searchbox-background:#ebedf0;--docsearch-searchbox-focus-background:#fff;--docsearch-searchbox-shadow:inset 0 0 0 2px var(--docsearch-primary-color);--docsearch-hit-height:56px;--docsearch-hit-color:#444950;--docsearch-hit-active-color:#fff;--docsearch-hit-background:#fff;--docsearch-hit-shadow:0 1px 3px 0 #d4d9e1;--docsearch-key-gradient:linear-gradient(-225deg,#d5dbe4,#f8f8f8);--docsearch-key-shadow:inset 0 -2px 0 0 #cdcde6,inset 0 0 1px 1px #fff,0 1px 2px 1px rgba(30,35,90,.4);--docsearch-key-pressed-shadow:inset 0 -2px 0 0 #cdcde6,inset 0 0 1px 1px #fff,0 1px 1px 0 rgba(30,35,90,.4);--docsearch-footer-height:44px;--docsearch-footer-background:#fff;--docsearch-footer-shadow:0 -1px 0 0 #e0e3e8,0 -3px 6px 0 rgba(69,98,155,.12)}html[data-theme=dark]{--docsearch-text-color:#f5f6f7;--docsearch-container-background:rgba(9,10,17,.8);--docsearch-modal-background:#15172a;--docsearch-modal-shadow:inset 1px 1px 0 0 #2c2e40,0 3px 8px 0 #000309;--docsearch-searchbox-background:#090a11;--docsearch-searchbox-focus-background:#000;--docsearch-hit-color:#bec3c9;--docsearch-hit-shadow:none;--docsearch-hit-background:#090a11;--docsearch-key-gradient:linear-gradient(-26.5deg,#565872,#31355b);--docsearch-key-shadow:inset 0 -2px 0 0 #282d55,inset 0 0 1px 1px #51577d,0 2px 2px 0 rgba(3,4,9,.3);--docsearch-key-pressed-shadow:inset 0 -2px 0 0 #282d55,inset 0 0 1px 1px #51577d,0 1px 1px 0 rgba(3,4,9,.30196078431372547);--docsearch-footer-background:#1e2136;--docsearch-footer-shadow:inset 0 1px 0 0 rgba(73,76,106,.5),0 -4px 8px 0 rgba(0,0,0,.2);--docsearch-logo-color:#fff;--docsearch-muted-color:#7f8497}.DocSearch-Button{align-items:center;background:var(--docsearch-searchbox-background);border:0;border-radius:40px;color:var(--docsearch-muted-color);cursor:pointer;display:flex;font-weight:500;height:36px;justify-content:space-between;margin:0 0 0 16px;padding:0 8px;-webkit-user-select:none;user-select:none}.DocSearch-Button:active,.DocSearch-Button:focus,.DocSearch-Button:hover{background:var(--docsearch-searchbox-focus-background);box-shadow:var(--docsearch-searchbox-shadow);color:var(--docsearch-text-color);outline:none}.DocSearch-Button-Container{align-items:center;display:flex}.DocSearch-Search-Icon{stroke-width:1.6}.DocSearch-Button .DocSearch-Search-Icon{color:var(--docsearch-text-color)}.DocSearch-Button-Placeholder{font-size:1rem;padding:0 12px 0 6px}.DocSearch-Button-Keys{display:flex;min-width:calc(40px + .8em)}.DocSearch-Button-Key{align-items:center;background:var(--docsearch-key-gradient);border-radius:3px;box-shadow:var(--docsearch-key-shadow);color:var(--docsearch-muted-color);display:flex;height:18px;justify-content:center;margin-right:.4em;position:relative;padding:0 0 2px;border:0;top:-1px;width:20px}.DocSearch-Button-Key--pressed{transform:translate3d(0,1px,0);box-shadow:var(--docsearch-key-pressed-shadow)}@media (max-width:768px){.DocSearch-Button-Keys,.DocSearch-Button-Placeholder{display:none}}.DocSearch--active{overflow:hidden!important}.DocSearch-Container,.DocSearch-Container *{box-sizing:border-box}.DocSearch-Container{background-color:var(--docsearch-container-background);height:100vh;left:0;position:fixed;top:0;width:100vw;z-index:200}.DocSearch-Container a{text-decoration:none}.DocSearch-Link{-webkit-appearance:none;-moz-appearance:none;appearance:none;background:none;border:0;color:var(--docsearch-highlight-color);cursor:pointer;font:inherit;margin:0;padding:0}.DocSearch-Modal{background:var(--docsearch-modal-background);border-radius:6px;box-shadow:var(--docsearch-modal-shadow);flex-direction:column;margin:60px auto auto;max-width:var(--docsearch-modal-width);position:relative}.DocSearch-SearchBar{display:flex;padding:var(--docsearch-spacing) var(--docsearch-spacing) 0}.DocSearch-Form{align-items:center;background:var(--docsearch-searchbox-focus-background);border-radius:4px;box-shadow:var(--docsearch-searchbox-shadow);display:flex;height:var(--docsearch-searchbox-height);margin:0;padding:0 var(--docsearch-spacing);position:relative;width:100%}.DocSearch-Input{-webkit-appearance:none;-moz-appearance:none;appearance:none;background:transparent;border:0;color:var(--docsearch-text-color);flex:1;font:inherit;font-size:1.2em;height:100%;outline:none;padding:0 0 0 8px;width:80%}.DocSearch-Input::placeholder{color:var(--docsearch-muted-color);opacity:1}.DocSearch-Input::-webkit-search-cancel-button,.DocSearch-Input::-webkit-search-decoration,.DocSearch-Input::-webkit-search-results-button,.DocSearch-Input::-webkit-search-results-decoration{display:none}.DocSearch-LoadingIndicator,.DocSearch-MagnifierLabel,.DocSearch-Reset{margin:0;padding:0}.DocSearch-MagnifierLabel,.DocSearch-Reset{align-items:center;color:var(--docsearch-highlight-color);display:flex;justify-content:center}.DocSearch-Container--Stalled .DocSearch-MagnifierLabel,.DocSearch-LoadingIndicator{display:none}.DocSearch-Container--Stalled .DocSearch-LoadingIndicator{align-items:center;color:var(--docsearch-highlight-color);display:flex;justify-content:center}@media screen and (prefers-reduced-motion:reduce){.DocSearch-Reset{animation:none;-webkit-appearance:none;-moz-appearance:none;appearance:none;background:none;border:0;border-radius:50%;color:var(--docsearch-icon-color);cursor:pointer;right:0;stroke-width:var(--docsearch-icon-stroke-width)}}.DocSearch-Reset{animation:fade-in .1s ease-in forwards;-webkit-appearance:none;-moz-appearance:none;appearance:none;background:none;border:0;border-radius:50%;color:var(--docsearch-icon-color);cursor:pointer;padding:2px;right:0;stroke-width:var(--docsearch-icon-stroke-width)}.DocSearch-Reset[hidden]{display:none}.DocSearch-Reset:hover{color:var(--docsearch-highlight-color)}.DocSearch-LoadingIndicator svg,.DocSearch-MagnifierLabel svg{height:24px;width:24px}.DocSearch-Cancel{display:none}.DocSearch-Dropdown{max-height:calc(var(--docsearch-modal-height) - var(--docsearch-searchbox-height) - var(--docsearch-spacing) - var(--docsearch-footer-height));min-height:var(--docsearch-spacing);overflow-y:auto;overflow-y:overlay;padding:0 var(--docsearch-spacing);scrollbar-color:var(--docsearch-muted-color) var(--docsearch-modal-background);scrollbar-width:thin}.DocSearch-Dropdown::-webkit-scrollbar{width:12px}.DocSearch-Dropdown::-webkit-scrollbar-track{background:transparent}.DocSearch-Dropdown::-webkit-scrollbar-thumb{background-color:var(--docsearch-muted-color);border:3px solid var(--docsearch-modal-background);border-radius:20px}.DocSearch-Dropdown ul{list-style:none;margin:0;padding:0}.DocSearch-Label{font-size:.75em;line-height:1.6em}.DocSearch-Help,.DocSearch-Label{color:var(--docsearch-muted-color)}.DocSearch-Help{font-size:.9em;margin:0;-webkit-user-select:none;user-select:none}.DocSearch-Title{font-size:1.2em}.DocSearch-Logo a{display:flex}.DocSearch-Logo svg{color:var(--docsearch-logo-color);margin-left:8px}.DocSearch-Hits:last-of-type{margin-bottom:24px}.DocSearch-Hits mark{background:none;color:var(--docsearch-highlight-color)}.DocSearch-HitsFooter{color:var(--docsearch-muted-color);display:flex;font-size:.85em;justify-content:center;margin-bottom:var(--docsearch-spacing);padding:var(--docsearch-spacing)}.DocSearch-HitsFooter a{border-bottom:1px solid;color:inherit}.DocSearch-Hit{border-radius:4px;display:flex;padding-bottom:4px;position:relative}@media screen and (prefers-reduced-motion:reduce){.DocSearch-Hit--deleting{transition:none}}.DocSearch-Hit--deleting{opacity:0;transition:all .25s linear}@media screen and (prefers-reduced-motion:reduce){.DocSearch-Hit--favoriting{transition:none}}.DocSearch-Hit--favoriting{transform:scale(0);transform-origin:top center;transition:all .25s linear;transition-delay:.25s}.DocSearch-Hit a{background:var(--docsearch-hit-background);border-radius:4px;box-shadow:var(--docsearch-hit-shadow);display:block;padding-left:var(--docsearch-spacing);width:100%}.DocSearch-Hit-source{background:var(--docsearch-modal-background);color:var(--docsearch-highlight-color);font-size:.85em;font-weight:600;line-height:32px;margin:0 -4px;padding:8px 4px 0;position:sticky;top:0;z-index:10}.DocSearch-Hit-Tree{color:var(--docsearch-muted-color);height:var(--docsearch-hit-height);opacity:.5;stroke-width:var(--docsearch-icon-stroke-width);width:24px}.DocSearch-Hit[aria-selected=true] a{background-color:var(--docsearch-highlight-color)}.DocSearch-Hit[aria-selected=true] mark{text-decoration:underline}.DocSearch-Hit-Container{align-items:center;color:var(--docsearch-hit-color);display:flex;flex-direction:row;height:var(--docsearch-hit-height);padding:0 var(--docsearch-spacing) 0 0}.DocSearch-Hit-icon{height:20px;width:20px}.DocSearch-Hit-action,.DocSearch-Hit-icon{color:var(--docsearch-muted-color);stroke-width:var(--docsearch-icon-stroke-width)}.DocSearch-Hit-action{align-items:center;display:flex;height:22px;width:22px}.DocSearch-Hit-action svg{display:block;height:18px;width:18px}.DocSearch-Hit-action+.DocSearch-Hit-action{margin-left:6px}.DocSearch-Hit-action-button{-webkit-appearance:none;-moz-appearance:none;appearance:none;background:none;border:0;border-radius:50%;color:inherit;cursor:pointer;padding:2px}svg.DocSearch-Hit-Select-Icon{display:none}.DocSearch-Hit[aria-selected=true] .DocSearch-Hit-Select-Icon{display:block}.DocSearch-Hit-action-button:focus,.DocSearch-Hit-action-button:hover{background:#0003;transition:background-color .1s ease-in}@media screen and (prefers-reduced-motion:reduce){.DocSearch-Hit-action-button:focus,.DocSearch-Hit-action-button:hover{transition:none}}.DocSearch-Hit-action-button:focus path,.DocSearch-Hit-action-button:hover path{fill:#fff}.DocSearch-Hit-content-wrapper{display:flex;flex:1 1 auto;flex-direction:column;font-weight:500;justify-content:center;line-height:1.2em;margin:0 8px;overflow-x:hidden;position:relative;text-overflow:ellipsis;white-space:nowrap;width:80%}.DocSearch-Hit-title{font-size:.9em}.DocSearch-Hit-path{color:var(--docsearch-muted-color);font-size:.75em}.DocSearch-Hit[aria-selected=true] .DocSearch-Hit-action,.DocSearch-Hit[aria-selected=true] .DocSearch-Hit-icon,.DocSearch-Hit[aria-selected=true] .DocSearch-Hit-path,.DocSearch-Hit[aria-selected=true] .DocSearch-Hit-text,.DocSearch-Hit[aria-selected=true] .DocSearch-Hit-title,.DocSearch-Hit[aria-selected=true] .DocSearch-Hit-Tree,.DocSearch-Hit[aria-selected=true] mark{color:var(--docsearch-hit-active-color)!important}@media screen and (prefers-reduced-motion:reduce){.DocSearch-Hit-action-button:focus,.DocSearch-Hit-action-button:hover{background:#0003;transition:none}}.DocSearch-ErrorScreen,.DocSearch-NoResults,.DocSearch-StartScreen{font-size:.9em;margin:0 auto;padding:36px 0;text-align:center;width:80%}.DocSearch-Screen-Icon{color:var(--docsearch-muted-color);padding-bottom:12px}.DocSearch-NoResults-Prefill-List{display:inline-block;padding-bottom:24px;text-align:left}.DocSearch-NoResults-Prefill-List ul{display:inline-block;padding:8px 0 0}.DocSearch-NoResults-Prefill-List li{list-style-position:inside;list-style-type:"» "}.DocSearch-Prefill{-webkit-appearance:none;-moz-appearance:none;appearance:none;background:none;border:0;border-radius:1em;color:var(--docsearch-highlight-color);cursor:pointer;display:inline-block;font-size:1em;font-weight:700;padding:0}.DocSearch-Prefill:focus,.DocSearch-Prefill:hover{outline:none;text-decoration:underline}.DocSearch-Footer{align-items:center;background:var(--docsearch-footer-background);border-radius:0 0 8px 8px;box-shadow:var(--docsearch-footer-shadow);display:flex;flex-direction:row-reverse;flex-shrink:0;height:var(--docsearch-footer-height);justify-content:space-between;padding:0 var(--docsearch-spacing);position:relative;-webkit-user-select:none;user-select:none;width:100%;z-index:300}.DocSearch-Commands{color:var(--docsearch-muted-color);display:flex;list-style:none;margin:0;padding:0}.DocSearch-Commands li{align-items:center;display:flex}.DocSearch-Commands li:not(:last-of-type){margin-right:.8em}.DocSearch-Commands-Key{align-items:center;background:var(--docsearch-key-gradient);border-radius:2px;box-shadow:var(--docsearch-key-shadow);display:flex;height:18px;justify-content:center;margin-right:.4em;padding:0 0 1px;color:var(--docsearch-muted-color);border:0;width:20px}.DocSearch-VisuallyHiddenForAccessibility{clip:rect(0 0 0 0);clip-path:inset(50%);height:1px;overflow:hidden;position:absolute;white-space:nowrap;width:1px}@media (max-width:768px){:root{--docsearch-spacing:10px;--docsearch-footer-height:40px}.DocSearch-Dropdown{height:100%}.DocSearch-Container{height:100vh;height:-webkit-fill-available;height:calc(var(--docsearch-vh, 1vh)*100);position:absolute}.DocSearch-Footer{border-radius:0;bottom:0;position:absolute}.DocSearch-Hit-content-wrapper{display:flex;position:relative;width:80%}.DocSearch-Modal{border-radius:0;box-shadow:none;height:100vh;height:-webkit-fill-available;height:calc(var(--docsearch-vh, 1vh)*100);margin:0;max-width:100%;width:100%}.DocSearch-Dropdown{max-height:calc(var(--docsearch-vh, 1vh)*100 - var(--docsearch-searchbox-height) - var(--docsearch-spacing) - var(--docsearch-footer-height))}.DocSearch-Cancel{-webkit-appearance:none;-moz-appearance:none;appearance:none;background:none;border:0;color:var(--docsearch-highlight-color);cursor:pointer;display:inline-block;flex:none;font:inherit;font-size:1em;font-weight:500;margin-left:var(--docsearch-spacing);outline:none;overflow:hidden;padding:0;-webkit-user-select:none;user-select:none;white-space:nowrap}.DocSearch-Commands,.DocSearch-Hit-Tree{display:none}}@keyframes fade-in{0%{opacity:0}to{opacity:1}}[class*=DocSearch]{--docsearch-primary-color: var(--vp-c-brand-1);--docsearch-highlight-color: var(--docsearch-primary-color);--docsearch-text-color: var(--vp-c-text-1);--docsearch-muted-color: var(--vp-c-text-2);--docsearch-searchbox-shadow: none;--docsearch-searchbox-background: transparent;--docsearch-searchbox-focus-background: transparent;--docsearch-key-gradient: transparent;--docsearch-key-shadow: none;--docsearch-modal-background: var(--vp-c-bg-soft);--docsearch-footer-background: var(--vp-c-bg)}.dark [class*=DocSearch]{--docsearch-modal-shadow: none;--docsearch-footer-shadow: none;--docsearch-logo-color: var(--vp-c-text-2);--docsearch-hit-background: var(--vp-c-default-soft);--docsearch-hit-color: var(--vp-c-text-2);--docsearch-hit-shadow: none}.DocSearch-Button{display:flex;justify-content:center;align-items:center;margin:0;padding:0;width:48px;height:55px;background:transparent;transition:border-color .25s}.DocSearch-Button:hover{background:transparent}.DocSearch-Button:focus{outline:1px dotted;outline:5px auto -webkit-focus-ring-color}.DocSearch-Button-Key--pressed{transform:none;box-shadow:none}.DocSearch-Button:focus:not(:focus-visible){outline:none!important}@media (min-width: 768px){.DocSearch-Button{justify-content:flex-start;border:1px solid transparent;border-radius:8px;padding:0 10px 0 12px;width:100%;height:40px;background-color:var(--vp-c-bg-alt)}.DocSearch-Button:hover{border-color:var(--vp-c-brand-1);background:var(--vp-c-bg-alt)}}.DocSearch-Button .DocSearch-Button-Container{display:flex;align-items:center}.DocSearch-Button .DocSearch-Search-Icon{position:relative;width:16px;height:16px;color:var(--vp-c-text-1);fill:currentColor;transition:color .5s}.DocSearch-Button:hover .DocSearch-Search-Icon{color:var(--vp-c-text-1)}@media (min-width: 768px){.DocSearch-Button .DocSearch-Search-Icon{top:1px;margin-right:8px;width:14px;height:14px;color:var(--vp-c-text-2)}}.DocSearch-Button .DocSearch-Button-Placeholder{display:none;margin-top:2px;padding:0 16px 0 0;font-size:13px;font-weight:500;color:var(--vp-c-text-2);transition:color .5s}.DocSearch-Button:hover .DocSearch-Button-Placeholder{color:var(--vp-c-text-1)}@media (min-width: 768px){.DocSearch-Button .DocSearch-Button-Placeholder{display:inline-block}}.DocSearch-Button .DocSearch-Button-Keys{direction:ltr;display:none;min-width:auto}@media (min-width: 768px){.DocSearch-Button .DocSearch-Button-Keys{display:flex;align-items:center}}.DocSearch-Button .DocSearch-Button-Key{display:block;margin:2px 0 0;border:1px solid var(--vp-c-divider);border-right:none;border-radius:4px 0 0 4px;padding-left:6px;min-width:0;width:auto;height:22px;line-height:22px;font-family:var(--vp-font-family-base);font-size:12px;font-weight:500;transition:color .5s,border-color .5s}.DocSearch-Button .DocSearch-Button-Key+.DocSearch-Button-Key{border-right:1px solid var(--vp-c-divider);border-left:none;border-radius:0 4px 4px 0;padding-left:2px;padding-right:6px}.DocSearch-Button .DocSearch-Button-Key:first-child{font-size:0!important}.DocSearch-Button .DocSearch-Button-Key:first-child:after{content:"Ctrl";font-size:12px;letter-spacing:normal;color:var(--docsearch-muted-color)}.mac .DocSearch-Button .DocSearch-Button-Key:first-child:after{content:"⌘"}.DocSearch-Button .DocSearch-Button-Key:first-child>*{display:none}.DocSearch-Search-Icon{--icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' stroke-width='1.6' viewBox='0 0 20 20'%3E%3Cpath fill='none' stroke='currentColor' stroke-linecap='round' stroke-linejoin='round' d='m14.386 14.386 4.088 4.088-4.088-4.088A7.533 7.533 0 1 1 3.733 3.733a7.533 7.533 0 0 1 10.653 10.653z'/%3E%3C/svg%3E")}.VPNavBarSearch{display:flex;align-items:center}@media (min-width: 768px){.VPNavBarSearch{flex-grow:1;padding-left:24px}}@media (min-width: 960px){.VPNavBarSearch{padding-left:32px}}.dark .DocSearch-Footer{border-top:1px solid var(--vp-c-divider)}.DocSearch-Form{border:1px solid var(--vp-c-brand-1);background-color:var(--vp-c-white)}.dark .DocSearch-Form{background-color:var(--vp-c-default-soft)}.DocSearch-Screen-Icon>svg{margin:auto}.VPNavBarSocialLinks[data-v-ef6192dc]{display:none}@media (min-width: 1280px){.VPNavBarSocialLinks[data-v-ef6192dc]{display:flex;align-items:center}}.title[data-v-0ad69264]{display:flex;align-items:center;border-bottom:1px solid transparent;width:100%;height:var(--vp-nav-height);font-size:16px;font-weight:600;color:var(--vp-c-text-1);transition:opacity .25s}@media (min-width: 960px){.title[data-v-0ad69264]{flex-shrink:0}.VPNavBarTitle.has-sidebar .title[data-v-0ad69264]{border-bottom-color:var(--vp-c-divider)}}[data-v-0ad69264] .logo{margin-right:8px;height:var(--vp-nav-logo-height)}.VPNavBarTranslations[data-v-acee064b]{display:none}@media (min-width: 1280px){.VPNavBarTranslations[data-v-acee064b]{display:flex;align-items:center}}.title[data-v-acee064b]{padding:0 24px 0 12px;line-height:32px;font-size:14px;font-weight:700;color:var(--vp-c-text-1)}.VPNavBar[data-v-9fd4d1dd]{position:relative;height:var(--vp-nav-height);pointer-events:none;white-space:nowrap;transition:background-color .25s}.VPNavBar.screen-open[data-v-9fd4d1dd]{transition:none;background-color:var(--vp-nav-bg-color);border-bottom:1px solid var(--vp-c-divider)}.VPNavBar[data-v-9fd4d1dd]:not(.home){background-color:var(--vp-nav-bg-color)}@media (min-width: 960px){.VPNavBar[data-v-9fd4d1dd]:not(.home){background-color:transparent}.VPNavBar[data-v-9fd4d1dd]:not(.has-sidebar):not(.home.top){background-color:var(--vp-nav-bg-color)}}.wrapper[data-v-9fd4d1dd]{padding:0 8px 0 24px}@media (min-width: 768px){.wrapper[data-v-9fd4d1dd]{padding:0 32px}}@media (min-width: 960px){.VPNavBar.has-sidebar .wrapper[data-v-9fd4d1dd]{padding:0}}.container[data-v-9fd4d1dd]{display:flex;justify-content:space-between;margin:0 auto;max-width:calc(var(--vp-layout-max-width) - 64px);height:var(--vp-nav-height);pointer-events:none}.container>.title[data-v-9fd4d1dd],.container>.content[data-v-9fd4d1dd]{pointer-events:none}.container[data-v-9fd4d1dd] *{pointer-events:auto}@media (min-width: 960px){.VPNavBar.has-sidebar .container[data-v-9fd4d1dd]{max-width:100%}}.title[data-v-9fd4d1dd]{flex-shrink:0;height:calc(var(--vp-nav-height) - 1px);transition:background-color .5s}@media (min-width: 960px){.VPNavBar.has-sidebar .title[data-v-9fd4d1dd]{position:absolute;top:0;left:0;z-index:2;padding:0 32px;width:var(--vp-sidebar-width);height:var(--vp-nav-height);background-color:transparent}}@media (min-width: 1440px){.VPNavBar.has-sidebar .title[data-v-9fd4d1dd]{padding-left:max(32px,calc((100% - (var(--vp-layout-max-width) - 64px)) / 2));width:calc((100% - (var(--vp-layout-max-width) - 64px)) / 2 + var(--vp-sidebar-width) - 32px)}}.content[data-v-9fd4d1dd]{flex-grow:1}@media (min-width: 960px){.VPNavBar.has-sidebar .content[data-v-9fd4d1dd]{position:relative;z-index:1;padding-right:32px;padding-left:var(--vp-sidebar-width)}}@media (min-width: 1440px){.VPNavBar.has-sidebar .content[data-v-9fd4d1dd]{padding-right:calc((100vw - var(--vp-layout-max-width)) / 2 + 32px);padding-left:calc((100vw - var(--vp-layout-max-width)) / 2 + var(--vp-sidebar-width))}}.content-body[data-v-9fd4d1dd]{display:flex;justify-content:flex-end;align-items:center;height:var(--vp-nav-height);transition:background-color .5s}@media (min-width: 960px){.VPNavBar:not(.home.top) .content-body[data-v-9fd4d1dd]{position:relative;background-color:var(--vp-nav-bg-color)}.VPNavBar:not(.has-sidebar):not(.home.top) .content-body[data-v-9fd4d1dd]{background-color:transparent}}@media (max-width: 767px){.content-body[data-v-9fd4d1dd]{column-gap:.5rem}}.menu+.translations[data-v-9fd4d1dd]:before,.menu+.appearance[data-v-9fd4d1dd]:before,.menu+.social-links[data-v-9fd4d1dd]:before,.translations+.appearance[data-v-9fd4d1dd]:before,.appearance+.social-links[data-v-9fd4d1dd]:before{margin-right:8px;margin-left:8px;width:1px;height:24px;background-color:var(--vp-c-divider);content:""}.menu+.appearance[data-v-9fd4d1dd]:before,.translations+.appearance[data-v-9fd4d1dd]:before{margin-right:16px}.appearance+.social-links[data-v-9fd4d1dd]:before{margin-left:16px}.social-links[data-v-9fd4d1dd]{margin-right:-8px}.divider[data-v-9fd4d1dd]{width:100%;height:1px}@media (min-width: 960px){.VPNavBar.has-sidebar .divider[data-v-9fd4d1dd]{padding-left:var(--vp-sidebar-width)}}@media (min-width: 1440px){.VPNavBar.has-sidebar .divider[data-v-9fd4d1dd]{padding-left:calc((100vw - var(--vp-layout-max-width)) / 2 + var(--vp-sidebar-width))}}.divider-line[data-v-9fd4d1dd]{width:100%;height:1px;transition:background-color .5s}.VPNavBar:not(.home) .divider-line[data-v-9fd4d1dd]{background-color:var(--vp-c-gutter)}@media (min-width: 960px){.VPNavBar:not(.home.top) .divider-line[data-v-9fd4d1dd]{background-color:var(--vp-c-gutter)}.VPNavBar:not(.has-sidebar):not(.home.top) .divider[data-v-9fd4d1dd]{background-color:var(--vp-c-gutter)}}.VPNavScreenAppearance[data-v-a3e2920d]{display:flex;justify-content:space-between;align-items:center;border-radius:8px;padding:12px 14px 12px 16px;background-color:var(--vp-c-bg-soft)}.text[data-v-a3e2920d]{line-height:24px;font-size:12px;font-weight:500;color:var(--vp-c-text-2)}.VPNavScreenMenuLink[data-v-fa963d97]{display:block;border-bottom:1px solid var(--vp-c-divider);padding:12px 0 11px;line-height:24px;font-size:14px;font-weight:500;color:var(--vp-c-text-1);transition:border-color .25s,color .25s}.VPNavScreenMenuLink[data-v-fa963d97]:hover{color:var(--vp-c-brand-1)}.VPNavScreenMenuGroupLink[data-v-e04f3e85]{display:block;margin-left:12px;line-height:32px;font-size:14px;font-weight:400;color:var(--vp-c-text-1);transition:color .25s}.VPNavScreenMenuGroupLink[data-v-e04f3e85]:hover{color:var(--vp-c-brand-1)}.VPNavScreenMenuGroupSection[data-v-f60dbfa7]{display:block}.title[data-v-f60dbfa7]{line-height:32px;font-size:13px;font-weight:700;color:var(--vp-c-text-2);transition:color .25s}.VPNavScreenMenuGroup[data-v-d99bfeec]{border-bottom:1px solid var(--vp-c-divider);height:48px;overflow:hidden;transition:border-color .5s}.VPNavScreenMenuGroup .items[data-v-d99bfeec]{visibility:hidden}.VPNavScreenMenuGroup.open .items[data-v-d99bfeec]{visibility:visible}.VPNavScreenMenuGroup.open[data-v-d99bfeec]{padding-bottom:10px;height:auto}.VPNavScreenMenuGroup.open .button[data-v-d99bfeec]{padding-bottom:6px;color:var(--vp-c-brand-1)}.VPNavScreenMenuGroup.open .button-icon[data-v-d99bfeec]{transform:rotate(45deg)}.button[data-v-d99bfeec]{display:flex;justify-content:space-between;align-items:center;padding:12px 4px 11px 0;width:100%;line-height:24px;font-size:14px;font-weight:500;color:var(--vp-c-text-1);transition:color .25s}.button[data-v-d99bfeec]:hover{color:var(--vp-c-brand-1)}.button-icon[data-v-d99bfeec]{transition:transform .25s}.group[data-v-d99bfeec]:first-child{padding-top:0}.group+.group[data-v-d99bfeec],.group+.item[data-v-d99bfeec]{padding-top:4px}.VPNavScreenTranslations[data-v-516e4bc3]{height:24px;overflow:hidden}.VPNavScreenTranslations.open[data-v-516e4bc3]{height:auto}.title[data-v-516e4bc3]{display:flex;align-items:center;font-size:14px;font-weight:500;color:var(--vp-c-text-1)}.icon[data-v-516e4bc3]{font-size:16px}.icon.lang[data-v-516e4bc3]{margin-right:8px}.icon.chevron[data-v-516e4bc3]{margin-left:4px}.list[data-v-516e4bc3]{padding:4px 0 0 24px}.link[data-v-516e4bc3]{line-height:32px;font-size:13px;color:var(--vp-c-text-1)}.VPNavScreen[data-v-2dd6d0c7]{position:fixed;top:calc(var(--vp-nav-height) + var(--vp-layout-top-height, 0px));right:0;bottom:0;left:0;padding:0 32px;width:100%;background-color:var(--vp-nav-screen-bg-color);overflow-y:auto;transition:background-color .25s;pointer-events:auto}.VPNavScreen.fade-enter-active[data-v-2dd6d0c7],.VPNavScreen.fade-leave-active[data-v-2dd6d0c7]{transition:opacity .25s}.VPNavScreen.fade-enter-active .container[data-v-2dd6d0c7],.VPNavScreen.fade-leave-active .container[data-v-2dd6d0c7]{transition:transform .25s ease}.VPNavScreen.fade-enter-from[data-v-2dd6d0c7],.VPNavScreen.fade-leave-to[data-v-2dd6d0c7]{opacity:0}.VPNavScreen.fade-enter-from .container[data-v-2dd6d0c7],.VPNavScreen.fade-leave-to .container[data-v-2dd6d0c7]{transform:translateY(-8px)}@media (min-width: 768px){.VPNavScreen[data-v-2dd6d0c7]{display:none}}.container[data-v-2dd6d0c7]{margin:0 auto;padding:24px 0 96px;max-width:288px}.menu+.translations[data-v-2dd6d0c7],.menu+.appearance[data-v-2dd6d0c7],.translations+.appearance[data-v-2dd6d0c7]{margin-top:24px}.menu+.social-links[data-v-2dd6d0c7]{margin-top:16px}.appearance+.social-links[data-v-2dd6d0c7]{margin-top:16px}.VPNav[data-v-7ad780c2]{position:relative;top:var(--vp-layout-top-height, 0px);left:0;z-index:var(--vp-z-index-nav);width:100%;pointer-events:none;transition:background-color .5s}@media (min-width: 960px){.VPNav[data-v-7ad780c2]{position:fixed}}.VPSidebarItem.level-0[data-v-edd2eed8]{padding-bottom:24px}.VPSidebarItem.collapsed.level-0[data-v-edd2eed8]{padding-bottom:10px}.item[data-v-edd2eed8]{position:relative;display:flex;width:100%}.VPSidebarItem.collapsible>.item[data-v-edd2eed8]{cursor:pointer}.indicator[data-v-edd2eed8]{position:absolute;top:6px;bottom:6px;left:-17px;width:2px;border-radius:2px;transition:background-color .25s}.VPSidebarItem.level-2.is-active>.item>.indicator[data-v-edd2eed8],.VPSidebarItem.level-3.is-active>.item>.indicator[data-v-edd2eed8],.VPSidebarItem.level-4.is-active>.item>.indicator[data-v-edd2eed8],.VPSidebarItem.level-5.is-active>.item>.indicator[data-v-edd2eed8]{background-color:var(--vp-c-brand-1)}.link[data-v-edd2eed8]{display:flex;align-items:center;flex-grow:1}.text[data-v-edd2eed8]{flex-grow:1;padding:4px 0;line-height:24px;font-size:14px;transition:color .25s}.VPSidebarItem.level-0 .text[data-v-edd2eed8]{font-weight:700;color:var(--vp-c-text-1)}.VPSidebarItem.level-1 .text[data-v-edd2eed8],.VPSidebarItem.level-2 .text[data-v-edd2eed8],.VPSidebarItem.level-3 .text[data-v-edd2eed8],.VPSidebarItem.level-4 .text[data-v-edd2eed8],.VPSidebarItem.level-5 .text[data-v-edd2eed8]{font-weight:500;color:var(--vp-c-text-2)}.VPSidebarItem.level-0.is-link>.item>.link:hover .text[data-v-edd2eed8],.VPSidebarItem.level-1.is-link>.item>.link:hover .text[data-v-edd2eed8],.VPSidebarItem.level-2.is-link>.item>.link:hover .text[data-v-edd2eed8],.VPSidebarItem.level-3.is-link>.item>.link:hover .text[data-v-edd2eed8],.VPSidebarItem.level-4.is-link>.item>.link:hover .text[data-v-edd2eed8],.VPSidebarItem.level-5.is-link>.item>.link:hover .text[data-v-edd2eed8]{color:var(--vp-c-brand-1)}.VPSidebarItem.level-0.has-active>.item>.text[data-v-edd2eed8],.VPSidebarItem.level-1.has-active>.item>.text[data-v-edd2eed8],.VPSidebarItem.level-2.has-active>.item>.text[data-v-edd2eed8],.VPSidebarItem.level-3.has-active>.item>.text[data-v-edd2eed8],.VPSidebarItem.level-4.has-active>.item>.text[data-v-edd2eed8],.VPSidebarItem.level-5.has-active>.item>.text[data-v-edd2eed8],.VPSidebarItem.level-0.has-active>.item>.link>.text[data-v-edd2eed8],.VPSidebarItem.level-1.has-active>.item>.link>.text[data-v-edd2eed8],.VPSidebarItem.level-2.has-active>.item>.link>.text[data-v-edd2eed8],.VPSidebarItem.level-3.has-active>.item>.link>.text[data-v-edd2eed8],.VPSidebarItem.level-4.has-active>.item>.link>.text[data-v-edd2eed8],.VPSidebarItem.level-5.has-active>.item>.link>.text[data-v-edd2eed8]{color:var(--vp-c-text-1)}.VPSidebarItem.level-0.is-active>.item .link>.text[data-v-edd2eed8],.VPSidebarItem.level-1.is-active>.item .link>.text[data-v-edd2eed8],.VPSidebarItem.level-2.is-active>.item .link>.text[data-v-edd2eed8],.VPSidebarItem.level-3.is-active>.item .link>.text[data-v-edd2eed8],.VPSidebarItem.level-4.is-active>.item .link>.text[data-v-edd2eed8],.VPSidebarItem.level-5.is-active>.item .link>.text[data-v-edd2eed8]{color:var(--vp-c-brand-1)}.caret[data-v-edd2eed8]{display:flex;justify-content:center;align-items:center;margin-right:-7px;width:32px;height:32px;color:var(--vp-c-text-3);cursor:pointer;transition:color .25s;flex-shrink:0}.item:hover .caret[data-v-edd2eed8]{color:var(--vp-c-text-2)}.item:hover .caret[data-v-edd2eed8]:hover{color:var(--vp-c-text-1)}.caret-icon[data-v-edd2eed8]{font-size:18px;transform:rotate(90deg);transition:transform .25s}.VPSidebarItem.collapsed .caret-icon[data-v-edd2eed8]{transform:rotate(0)}.VPSidebarItem.level-1 .items[data-v-edd2eed8],.VPSidebarItem.level-2 .items[data-v-edd2eed8],.VPSidebarItem.level-3 .items[data-v-edd2eed8],.VPSidebarItem.level-4 .items[data-v-edd2eed8],.VPSidebarItem.level-5 .items[data-v-edd2eed8]{border-left:1px solid var(--vp-c-divider);padding-left:16px}.VPSidebarItem.collapsed .items[data-v-edd2eed8]{display:none}.no-transition[data-v-51288d80] .caret-icon{transition:none}.group+.group[data-v-51288d80]{border-top:1px solid var(--vp-c-divider);padding-top:10px}@media (min-width: 960px){.group[data-v-51288d80]{padding-top:10px;width:calc(var(--vp-sidebar-width) - 64px)}}.VPSidebar[data-v-42c4c606]{position:fixed;top:var(--vp-layout-top-height, 0px);bottom:0;left:0;z-index:var(--vp-z-index-sidebar);padding:32px 32px 96px;width:calc(100vw - 64px);max-width:320px;background-color:var(--vp-sidebar-bg-color);opacity:0;box-shadow:var(--vp-c-shadow-3);overflow-x:hidden;overflow-y:auto;transform:translate(-100%);transition:opacity .5s,transform .25s ease;overscroll-behavior:contain}.VPSidebar.open[data-v-42c4c606]{opacity:1;visibility:visible;transform:translate(0);transition:opacity .25s,transform .5s cubic-bezier(.19,1,.22,1)}.dark .VPSidebar[data-v-42c4c606]{box-shadow:var(--vp-shadow-1)}@media (min-width: 960px){.VPSidebar[data-v-42c4c606]{padding-top:var(--vp-nav-height);width:var(--vp-sidebar-width);max-width:100%;background-color:var(--vp-sidebar-bg-color);opacity:1;visibility:visible;box-shadow:none;transform:translate(0)}}@media (min-width: 1440px){.VPSidebar[data-v-42c4c606]{padding-left:max(32px,calc((100% - (var(--vp-layout-max-width) - 64px)) / 2));width:calc((100% - (var(--vp-layout-max-width) - 64px)) / 2 + var(--vp-sidebar-width) - 32px)}}@media (min-width: 960px){.curtain[data-v-42c4c606]{position:sticky;top:-64px;left:0;z-index:1;margin-top:calc(var(--vp-nav-height) * -1);margin-right:-32px;margin-left:-32px;height:var(--vp-nav-height);background-color:var(--vp-sidebar-bg-color)}}.nav[data-v-42c4c606]{outline:0}.VPSkipLink[data-v-c8291ffa]{top:8px;left:8px;padding:8px 16px;z-index:999;border-radius:8px;font-size:12px;font-weight:700;text-decoration:none;color:var(--vp-c-brand-1);box-shadow:var(--vp-shadow-3);background-color:var(--vp-c-bg)}.VPSkipLink[data-v-c8291ffa]:focus{height:auto;width:auto;clip:auto;clip-path:none}@media (min-width: 1280px){.VPSkipLink[data-v-c8291ffa]{top:14px;left:16px}}.Layout[data-v-d8b57b2d]{display:flex;flex-direction:column;min-height:100vh}.VPHomeSponsors[data-v-3dc26e1d]{border-top:1px solid var(--vp-c-gutter);padding-top:88px!important}.VPHomeSponsors[data-v-3dc26e1d]{margin:96px 0}@media (min-width: 768px){.VPHomeSponsors[data-v-3dc26e1d]{margin:128px 0}}.VPHomeSponsors[data-v-3dc26e1d]{padding:0 24px}@media (min-width: 768px){.VPHomeSponsors[data-v-3dc26e1d]{padding:0 48px}}@media (min-width: 960px){.VPHomeSponsors[data-v-3dc26e1d]{padding:0 64px}}.container[data-v-3dc26e1d]{margin:0 auto;max-width:1152px}.love[data-v-3dc26e1d]{margin:0 auto;width:fit-content;font-size:28px;color:var(--vp-c-text-3)}.icon[data-v-3dc26e1d]{display:inline-block}.message[data-v-3dc26e1d]{margin:0 auto;padding-top:10px;max-width:320px;text-align:center;line-height:24px;font-size:16px;font-weight:500;color:var(--vp-c-text-2)}.sponsors[data-v-3dc26e1d]{padding-top:32px}.action[data-v-3dc26e1d]{padding-top:40px;text-align:center}.VPTeamPage[data-v-a5329171]{margin:96px 0}@media (min-width: 768px){.VPTeamPage[data-v-a5329171]{margin:128px 0}}.VPHome .VPTeamPageTitle[data-v-a5329171-s]{border-top:1px solid var(--vp-c-gutter);padding-top:88px!important}.VPTeamPageSection+.VPTeamPageSection[data-v-a5329171-s],.VPTeamMembers+.VPTeamPageSection[data-v-a5329171-s]{margin-top:64px}.VPTeamMembers+.VPTeamMembers[data-v-a5329171-s]{margin-top:24px}@media (min-width: 768px){.VPTeamPageTitle+.VPTeamPageSection[data-v-a5329171-s]{margin-top:16px}.VPTeamPageSection+.VPTeamPageSection[data-v-a5329171-s],.VPTeamMembers+.VPTeamPageSection[data-v-a5329171-s]{margin-top:96px}}.VPTeamMembers[data-v-a5329171-s]{padding:0 24px}@media (min-width: 768px){.VPTeamMembers[data-v-a5329171-s]{padding:0 48px}}@media (min-width: 960px){.VPTeamMembers[data-v-a5329171-s]{padding:0 64px}}.VPTeamPageTitle[data-v-46c5e327]{padding:48px 32px;text-align:center}@media (min-width: 768px){.VPTeamPageTitle[data-v-46c5e327]{padding:64px 48px 48px}}@media (min-width: 960px){.VPTeamPageTitle[data-v-46c5e327]{padding:80px 64px 48px}}.title[data-v-46c5e327]{letter-spacing:0;line-height:44px;font-size:36px;font-weight:500}@media (min-width: 768px){.title[data-v-46c5e327]{letter-spacing:-.5px;line-height:56px;font-size:48px}}.lead[data-v-46c5e327]{margin:0 auto;max-width:512px;padding-top:12px;line-height:24px;font-size:16px;font-weight:500;color:var(--vp-c-text-2)}@media (min-width: 768px){.lead[data-v-46c5e327]{max-width:592px;letter-spacing:.15px;line-height:28px;font-size:20px}}.VPTeamPageSection[data-v-3bf2e850]{padding:0 32px}@media (min-width: 768px){.VPTeamPageSection[data-v-3bf2e850]{padding:0 48px}}@media (min-width: 960px){.VPTeamPageSection[data-v-3bf2e850]{padding:0 64px}}.title[data-v-3bf2e850]{position:relative;margin:0 auto;max-width:1152px;text-align:center;color:var(--vp-c-text-2)}.title-line[data-v-3bf2e850]{position:absolute;top:16px;left:0;width:100%;height:1px;background-color:var(--vp-c-divider)}.title-text[data-v-3bf2e850]{position:relative;display:inline-block;padding:0 24px;letter-spacing:0;line-height:32px;font-size:20px;font-weight:500;background-color:var(--vp-c-bg)}.lead[data-v-3bf2e850]{margin:0 auto;max-width:480px;padding-top:12px;text-align:center;line-height:24px;font-size:16px;font-weight:500;color:var(--vp-c-text-2)}.members[data-v-3bf2e850]{padding-top:40px}.VPTeamMembersItem[data-v-acff304e]{display:flex;flex-direction:column;gap:2px;border-radius:12px;width:100%;height:100%;overflow:hidden}.VPTeamMembersItem.small .profile[data-v-acff304e]{padding:32px}.VPTeamMembersItem.small .data[data-v-acff304e]{padding-top:20px}.VPTeamMembersItem.small .avatar[data-v-acff304e]{width:64px;height:64px}.VPTeamMembersItem.small .name[data-v-acff304e]{line-height:24px;font-size:16px}.VPTeamMembersItem.small .affiliation[data-v-acff304e]{padding-top:4px;line-height:20px;font-size:14px}.VPTeamMembersItem.small .desc[data-v-acff304e]{padding-top:12px;line-height:20px;font-size:14px}.VPTeamMembersItem.small .links[data-v-acff304e]{margin:0 -16px -20px;padding:10px 0 0}.VPTeamMembersItem.medium .profile[data-v-acff304e]{padding:48px 32px}.VPTeamMembersItem.medium .data[data-v-acff304e]{padding-top:24px;text-align:center}.VPTeamMembersItem.medium .avatar[data-v-acff304e]{width:96px;height:96px}.VPTeamMembersItem.medium .name[data-v-acff304e]{letter-spacing:.15px;line-height:28px;font-size:20px}.VPTeamMembersItem.medium .affiliation[data-v-acff304e]{padding-top:4px;font-size:16px}.VPTeamMembersItem.medium .desc[data-v-acff304e]{padding-top:16px;max-width:288px;font-size:16px}.VPTeamMembersItem.medium .links[data-v-acff304e]{margin:0 -16px -12px;padding:16px 12px 0}.profile[data-v-acff304e]{flex-grow:1;background-color:var(--vp-c-bg-soft)}.data[data-v-acff304e]{text-align:center}.avatar[data-v-acff304e]{position:relative;flex-shrink:0;margin:0 auto;border-radius:50%;box-shadow:var(--vp-shadow-3)}.avatar-img[data-v-acff304e]{position:absolute;top:0;right:0;bottom:0;left:0;border-radius:50%;object-fit:cover}.name[data-v-acff304e]{margin:0;font-weight:600}.affiliation[data-v-acff304e]{margin:0;font-weight:500;color:var(--vp-c-text-2)}.org.link[data-v-acff304e]{color:var(--vp-c-text-2);transition:color .25s}.org.link[data-v-acff304e]:hover{color:var(--vp-c-brand-1)}.desc[data-v-acff304e]{margin:0 auto}.desc[data-v-acff304e] a{font-weight:500;color:var(--vp-c-brand-1);text-decoration-style:dotted;transition:color .25s}.links[data-v-acff304e]{display:flex;justify-content:center;height:56px}.sp-link[data-v-acff304e]{display:flex;justify-content:center;align-items:center;text-align:center;padding:16px;font-size:14px;font-weight:500;color:var(--vp-c-sponsor);background-color:var(--vp-c-bg-soft);transition:color .25s,background-color .25s}.sp .sp-link.link[data-v-acff304e]:hover,.sp .sp-link.link[data-v-acff304e]:focus{outline:none;color:var(--vp-c-white);background-color:var(--vp-c-sponsor)}.sp-icon[data-v-acff304e]{margin-right:8px;font-size:16px}.VPTeamMembers.small .container[data-v-bf782009]{grid-template-columns:repeat(auto-fit,minmax(224px,1fr))}.VPTeamMembers.small.count-1 .container[data-v-bf782009]{max-width:276px}.VPTeamMembers.small.count-2 .container[data-v-bf782009]{max-width:576px}.VPTeamMembers.small.count-3 .container[data-v-bf782009]{max-width:876px}.VPTeamMembers.medium .container[data-v-bf782009]{grid-template-columns:repeat(auto-fit,minmax(256px,1fr))}@media (min-width: 375px){.VPTeamMembers.medium .container[data-v-bf782009]{grid-template-columns:repeat(auto-fit,minmax(288px,1fr))}}.VPTeamMembers.medium.count-1 .container[data-v-bf782009]{max-width:368px}.VPTeamMembers.medium.count-2 .container[data-v-bf782009]{max-width:760px}.container[data-v-bf782009]{display:grid;gap:24px;margin:0 auto;max-width:1152px}:root{--vp-c-brand-1: #08557F;--vp-c-brand-2: #347FAA;--vp-c-brand-3: #61AAD4;--vp-c-brand-4: #8DD4FF}:root.dark{--vp-c-brand-0: #8DD4FF;--vp-c-brand-1: #61AAD4;--vp-c-brand-2: #347FAA;--vp-c-brand-3: #08557F}:root{--vp-home-hero-name-color: transparent;--vp-home-hero-name-background: -webkit-linear-gradient(120deg, #38af90 30%, #417fac);--vp-home-hero-image-background-image: linear-gradient(-45deg, #38af90 50%, #417fac 50%);--vp-home-hero-image-filter: blur(44px)}@media (min-width: 640px){:root{--vp-home-hero-image-filter: blur(56px)}}@media (min-width: 960px){:root{--vp-home-hero-image-filter: blur(68px)}}:root{--theme-foreground: #1b1e23;--theme-foreground-focus: var(--vp-c-brand-3);--theme-background-a: #ffffff;--theme-background-b: color-mix(in srgb, var(--theme-foreground) 4%, var(--theme-background-a));--theme-background: var(--theme-background-a);--theme-background-alt: var(--theme-background-b);--theme-foreground-alt: color-mix(in srgb, var(--theme-foreground) 90%, var(--theme-background-a));--theme-foreground-muted: color-mix(in srgb, var(--theme-foreground) 60%, var(--theme-background-a));--theme-foreground-faint: color-mix(in srgb, var(--theme-foreground) 50%, var(--theme-background-a));--theme-foreground-fainter: color-mix(in srgb, var(--theme-foreground) 30%, var(--theme-background-a));--theme-foreground-faintest: color-mix(in srgb, var(--theme-foreground) 14%, var(--theme-background-a));color-scheme:light}:root.dark{--theme-foreground: #ffffff;--theme-foreground-focus: var(--vp-c-brand-4);--theme-background-a: #000000;--theme-background-b: color-mix(in srgb, var(--theme-foreground) 15%, var(--theme-background-a));--theme-background: var(--theme-background-a);--theme-background-alt: var(--theme-background-b);--theme-foreground-alt: color-mix(in srgb, var(--theme-foreground) 90%, var(--theme-background-a));--theme-foreground-muted: color-mix(in srgb, var(--theme-foreground) 60%, var(--theme-background-a));--theme-foreground-faint: color-mix(in srgb, var(--theme-foreground) 50%, var(--theme-background-a));--theme-foreground-fainter: color-mix(in srgb, var(--theme-foreground) 50%, var(--theme-background-a));--theme-foreground-faintest: color-mix(in srgb, var(--theme-foreground) 50%, var(--theme-background-a));color-scheme:dark}:root{--monospace: Menlo, Consolas, monospace;--monospace-font: 14px/1.5 var(--monospace);--serif: "Source Serif 4", "Iowan Old Style", "Apple Garamond", "Palatino Linotype", "Times New Roman", "Droid Serif", Times, serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol";--sans-serif: -apple-system, BlinkMacSystemFont, "avenir next", avenir, helvetica, "helvetica neue", ubuntu, roboto, noto, "segoe ui", arial, sans-serif;--theme-blue: #4269d0;--theme-green: #3ca951;--theme-red: #ff725c;--theme-yellow: #efb118}.VPDoc div.grid{color:var(--theme-foreground-muted)}.VPDoc div.grid h1,.VPDoc div.grid h2,.VPDoc div.grid h3,.VPDoc div.grid h4,.VPDoc div.grid h5,.VPDoc div.grid h6{border:none;padding-top:0;color:var(--theme-foreground-muted);font-weight:700;line-height:1.15;margin-top:0;margin-bottom:.25rem;text-wrap:balance}.VPDoc div.grid h2+p,.VPDoc div.grid h3+p,.VPDoc div.grid h4+p,.VPDoc div.grid h2+table,.VPDoc div.grid h3+table,.VPDoc div.grid h4+table{margin-top:0}.VPDoc div.grid h1+h2{color:var(--theme-foreground);font-size:20px;font-style:italic;font-weight:400;margin-bottom:1rem}.VPDoc div.grid a[href]{text-decoration:none}.VPDoc div.grid a[href]:hover,.VPDoc div.grid a[href]:focus{text-decoration:underline}.VPDoc div.grid h1 code,.VPDoc div.grid h2 code,.VPDoc div.grid h3 code,.VPDoc div.grid h4 code,.VPDoc div.grid h5 code,.VPDoc div.grid h6 code{font-size:90%}.VPDoc div.grid pre{line-height:1.5}.VPDoc div.grid pre,.VPDoc div.grid code,.VPDoc div.grid tt{font-family:var(--monospace);font-size:14px}.VPDoc div.grid img{max-width:100%}.VPDoc div.grid p,.VPDoc div.grid table,.VPDoc div.grid figure,.VPDoc div.grid figcaption,.VPDoc div.grid h1,.VPDoc div.grid h2,.VPDoc div.grid h3,.VPDoc div.grid h4,.VPDoc div.grid h5,.VPDoc div.grid h6,.VPDoc div.grid .katex-display{max-width:640px}.VPDoc div.grid blockquote,.VPDoc div.grid ol,.VPDoc div.grid ul{max-width:600px}.VPDoc div.grid blockquote{margin:1rem 1.5rem}.VPDoc div.grid ul ol{padding-left:28px}.VPDoc div.grid hr{height:1px;margin:1rem 0;padding:1rem 0;border:none;background:no-repeat center/100% 1px linear-gradient(to right,var(--theme-foreground-faintest),var(--theme-foreground-faintest))}.VPDoc div.grid pre{background-color:var(--theme-background-alt);border-radius:4px;margin:1rem -1rem;max-width:960px;min-height:1.5em;padding:.5rem 1rem;overflow-x:auto;box-sizing:border-box}.VPDoc div.grid input:not([type]),.VPDoc div.grid input[type=email],.VPDoc div.grid input[type=number],.VPDoc div.grid input[type=password],.VPDoc div.grid input[type=range],.VPDoc div.grid input[type=search],.VPDoc div.grid input[type=tel],.VPDoc div.grid input[type=text],.VPDoc div.grid input[type=url]{width:240px}.VPDoc div.grid input,.VPDoc div.grid canvas,.VPDoc div.grid button{vertical-align:middle}.VPDoc div.grid button,.VPDoc div.grid input,.VPDoc div.grid textarea{accent-color:var(--theme-blue)}.VPDoc div.grid table{width:100%;border-collapse:collapse;font:13px/1.2 var(--sans-serif)}.VPDoc div.grid table pre,.VPDoc div.grid table code,.VPDoc div.grid table tt{font-size:inherit;line-height:inherit}.VPDoc div.grid th>pre:only-child,.VPDoc div.grid td>pre:only-child{margin:0;padding:0}.VPDoc div.grid th{color:var(--theme-foreground);text-align:left;vertical-align:bottom}.VPDoc div.grid td{color:var(--theme-foreground-alt);vertical-align:top}.VPDoc div.grid th,.VPDoc div.grid td{padding:3px 6.5px 3px 0}.VPDoc div.grid th:last-child,.VPDoc div.grid td:last-child{padding-right:0}.VPDoc div.grid tr:not(:last-child){border-bottom:solid 1px var(--theme-foreground-faintest)}.VPDoc div.grid thead tr{border-bottom:solid 1px var(--theme-foreground-fainter)}.VPDoc div.grid figure,.VPDoc div.grid table{margin:1rem 0}.VPDoc div.grid figure img{max-width:100%}.VPDoc div.grid figure>h2,.VPDoc div.grid figure>h3{font-family:var(--sans-serif)}.VPDoc div.grid figure>h2{font-size:20px}.VPDoc div.grid figure>h3{font-size:16px;font-weight:400}.VPDoc div.grid figcaption{font:small var(--sans-serif);color:var(--theme-foreground-muted)}.VPDoc div.grid a[href].observablehq-header-anchor{color:inherit}:root{--font-big: 700 32px/1 var(--sans-serif);--font-small: 14px var(--sans-serif)}.VPDoc div.grid .big{font:var(--font-big)}.VPDoc div.grid .small{font:var(--font-small)}.VPDoc div.grid .red{color:var(--theme-red)}.VPDoc div.grid .yellow{color:var(--theme-yellow)}.VPDoc div.grid .green{color:var(--theme-green)}.VPDoc div.grid .blue{color:var(--theme-blue)}.VPDoc div.grid .muted{color:var(--theme-foreground-faint)}.VPDoc div.grid .observablehq--draft>h1:first-of-type:after{content:" [DRAFT]";color:var(--theme-foreground-faint)}:root{--theme-caret: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='16' height='16' viewBox='0 0 16 16'%3E%3Cpath d='M5 7L8.125 9.5L11.25 7' stroke='black' stroke-width='1.5' stroke-linecap='round' fill='none'/%3E%3C/svg%3E");--theme-toggle: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='16' height='16' viewBox='0 0 16 16'%3E%3Cpath d='m10.5,11 2.5-3-2.5-3 M6,8h7' fill='none' stroke='black' stroke-width='2'/%3E%3Crect x='2' y='2' fill='currentColor' height='12' rx='0.5' width='2'/%3E%3C/svg%3E");--theme-magnifier: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='16' height='16' viewBox='0 0 16 16'%3E%3Cpath stroke='currentColor' stroke-width='2' fill='none' d='M15,15L10.5,10.5a3,3 0 1,0 -6 -6a3,3 0 1,0 6 6'%3E%3C/path%3E%3C/svg%3E");--observablehq-max-width: 1440px;--observablehq-header-height: 2.2rem;scroll-padding-top:2.5rem}:root:has(#observablehq-header){scroll-padding-top:calc(var(--observablehq-header-height) + 2.5rem)}body{max-width:var(--observablehq-max-width);margin:auto}#observablehq-main,#observablehq-header,#observablehq-footer{margin:1rem auto}#observablehq-header{position:fixed;top:0;left:calc(max(0rem,(100vw - var(--observablehq-max-width)) / 2) + var(--observablehq-inset-left) + 2rem);right:calc(max(0rem,(100vw - var(--observablehq-max-width)) / 2) + var(--observablehq-inset-right) + 2rem);z-index:1;display:flex;align-items:center;gap:.5rem;height:var(--observablehq-header-height);margin:0 -2rem 2rem;padding:1rem 2rem .5rem;background:var(--theme-background);border-bottom:solid 1px var(--theme-foreground-faintest);font:500 16px var(--sans-serif)}#observablehq-main{min-height:calc(100vh - 20rem);position:relative;z-index:0}#observablehq-header~#observablehq-main{margin-top:calc(var(--observablehq-header-height) + 1.5rem + 2rem)}#observablehq-footer{display:block;margin-top:10rem;font:12px var(--sans-serif);color:var(--theme-foreground-faint)}#observablehq-footer nav{display:grid;max-width:640px;grid-template-columns:1fr 1fr;column-gap:1rem;margin-bottom:1rem}#observablehq-footer nav a{display:flex;flex-direction:column;border:1px solid var(--theme-foreground-fainter);border-radius:8px;padding:1rem;line-height:1rem;text-decoration:none}#observablehq-footer nav a span{font-size:14px}#observablehq-footer nav a:hover span{text-decoration:underline}#observablehq-footer nav a:hover{border-color:var(--theme-foreground-focus)}#observablehq-footer nav a[rel=prev]{grid-column:1;align-items:start}#observablehq-footer nav a[rel=next]{grid-column:2;align-items:end}#observablehq-footer nav a:before{color:var(--theme-foreground-faint)}#observablehq-footer nav a[rel=prev]:before{content:"Previous page"}#observablehq-footer nav a[rel=next]:before{content:"Next page"}#observablehq-center{margin:2rem;--observablehq-inset-left: 0rem;--observablehq-inset-right: 0rem}#observablehq-sidebar{--observablehq-sidebar-padding-left: max(0rem, (100vw - var(--observablehq-max-width)) / 2) ;position:fixed;background:var(--theme-background-alt);color:var(--theme-foreground-muted);font:14px var(--sans-serif);visibility:hidden;font-weight:500;width:calc(272px + var(--observablehq-sidebar-padding-left));z-index:2;top:0;bottom:0;left:-272px;box-sizing:border-box;padding:0 .5rem 1rem calc(var(--observablehq-sidebar-padding-left) + .5rem);overflow-y:auto}#observablehq-sidebar ol,#observablehq-toc ol{list-style:none;margin:0;padding:0}#observablehq-sidebar>ol,#observablehq-sidebar>details,#observablehq-sidebar>section{position:relative;padding-bottom:.5rem;margin:.5rem 0;border-bottom:solid 1px var(--theme-foreground-faintest)}#observablehq-sidebar>ol:first-child{position:sticky;top:0;z-index:1;font-size:16px;font-weight:700;padding-top:1rem;margin:0;color:var(--theme-foreground)}#observablehq-sidebar>ol:first-child:before{content:"";position:absolute;top:0;right:-.5rem;bottom:0;left:-.5rem;background:var(--theme-background-alt)}#observablehq-sidebar>ol:first-child>li{position:relative}#observablehq-sidebar>ol:first-child>li>a{height:calc(var(--observablehq-header-height) - 1rem)}#observablehq-sidebar>ol:last-child,#observablehq-sidebar>details:last-child,#observablehq-sidebar>section:last-child{border-bottom:none}#observablehq-sidebar summary{font-weight:700;color:var(--theme-foreground);cursor:default}#observablehq-sidebar summary::-webkit-details-marker,#observablehq-sidebar summary::marker{display:none}#observablehq-sidebar details summary:after{position:absolute;right:0;width:1rem;height:1rem;background:var(--theme-foreground-muted);content:"";-webkit-mask:var(--theme-caret);-webkit-mask-repeat:no-repeat;-webkit-mask-position:center;mask:var(--theme-caret);mask-repeat:no-repeat;mask-position:center;padding:.5rem;transition:transform .25s ease;transform:rotate(-90deg);transform-origin:50% 50%}#observablehq-sidebar details summary:hover:after{color:var(--theme-foreground)}#observablehq-sidebar details[open] summary:after{transform:rotate(0)}#observablehq-sidebar-toggle{position:fixed;-webkit-appearance:none;-moz-appearance:none;appearance:none;background:none;top:0;left:0;height:100%;width:2rem;display:flex;align-items:center;justify-content:center;cursor:e-resize;margin:0;color:var(--theme-foreground-muted);z-index:1}#observablehq-sidebar-close{position:absolute;top:1rem;right:0;width:2rem;height:var(--observablehq-header-height);display:flex;align-items:center;justify-content:center;color:var(--theme-foreground-muted);cursor:w-resize;z-index:2}#observablehq-sidebar-toggle:before,#observablehq-sidebar-close:before{content:"";width:1rem;height:1rem;background:currentColor;-webkit-mask:var(--theme-toggle);mask:var(--theme-toggle)}#observablehq-sidebar-close:before{transform:scaleX(-1)}#observablehq-sidebar summary,.observablehq-link a{display:flex;padding:.5rem 1rem .5rem 1.5rem;margin-left:-.5rem;align-items:center}#observablehq-sidebar summary a{flex-grow:1;color:inherit}#observablehq-sidebar summary.observablehq-link{padding:0;margin-left:0}#observablehq-sidebar details summary:hover,.observablehq-link-active a,.observablehq-link a:hover{background:var(--theme-background)}.observablehq-link a:hover{color:var(--theme-foreground-focus)}#observablehq-toc{display:none;position:fixed;color:var(--theme-foreground-muted);font:400 14px var(--sans-serif);z-index:1;top:0;right:calc(max(0rem,(100% - var(--observablehq-max-width)) / 2) + 1rem);bottom:0;overflow-y:auto}#observablehq-header~#observablehq-toc{top:calc(var(--observablehq-header-height) + 1.5rem)}#observablehq-toc nav{width:192px;margin:2rem 0;padding:0 1rem;box-sizing:border-box;border-left:solid 1px var(--theme-foreground-faintest)}#observablehq-toc div{font-weight:700;color:var(--theme-foreground);margin-bottom:.5rem}.observablehq-secondary-link a{display:block;padding:.25rem 0}.observablehq-link:not(.observablehq-link-active) a[href]:not(:hover),.observablehq-secondary-link:not(.observablehq-secondary-link-active) a[href]:not(:hover){color:inherit}.observablehq-link-active,.observablehq-secondary-link-active{position:relative}.observablehq-link-active:before,.observablehq-secondary-link-highlight{content:"";position:absolute;width:3px;background:var(--theme-foreground-focus)}.observablehq-link-active:before{top:0;bottom:0;left:-.5rem}.observablehq-secondary-link-highlight{left:1px;top:2rem;height:0;transition:top .15s ease,height .15s ease}#observablehq-sidebar{transition:visibility .15s 0ms,left .15s 0ms ease}#observablehq-sidebar:focus-within,#observablehq-sidebar-toggle:checked~#observablehq-sidebar{left:0;visibility:initial;box-shadow:0 0 8px 4px #0000001a;transition:visibility 0ms 0ms,left .15s 0ms ease}#observablehq-sidebar-backdrop{display:none;position:fixed;top:0;right:0;bottom:0;left:0;z-index:2}#observablehq-sidebar-backdrop:has(~#observablehq-sidebar:focus-within),#observablehq-sidebar-toggle:checked~#observablehq-sidebar-backdrop{display:initial}@media (prefers-color-scheme: dark){#observablehq-sidebar:focus-within,#observablehq-sidebar-toggle:checked~#observablehq-sidebar{box-shadow:0 0 8px 4px #00000080}}@media (min-width: calc(912px + 6rem)){#observablehq-sidebar{transition:none!important}#observablehq-sidebar-toggle:checked~#observablehq-sidebar-backdrop{display:none}#observablehq-sidebar-toggle:checked~#observablehq-sidebar,#observablehq-sidebar-toggle:indeterminate~#observablehq-sidebar{left:0;visibility:initial;box-shadow:none}#observablehq-sidebar-toggle:checked~#observablehq-center,#observablehq-sidebar-toggle:indeterminate~#observablehq-center{--observablehq-inset-left: calc(272px + 1rem) ;--observablehq-inset-right: 1rem;padding-left:var(--observablehq-inset-left);padding-right:1rem}}@media (min-width: calc(832px + 5rem)){#observablehq-toc~#observablehq-main{padding-right:calc(192px + 1rem)}#observablehq-toc{display:block}}@media (min-width: calc(912px + 6rem)){#observablehq-sidebar-toggle:checked~#observablehq-center #observablehq-toc,#observablehq-sidebar-toggle:indeterminate~#observablehq-center #observablehq-toc{display:none}#observablehq-sidebar-toggle:checked~#observablehq-center #observablehq-toc~#observablehq-main,#observablehq-sidebar-toggle:indeterminate~#observablehq-center #observablehq-toc~#observablehq-main{padding-right:0}}@media (min-width: calc(1104px + 7rem)){#observablehq-sidebar-toggle:checked~#observablehq-center #observablehq-toc,#observablehq-sidebar-toggle:indeterminate~#observablehq-center #observablehq-toc,#observablehq-toc{display:block}#observablehq-sidebar-toggle:checked~#observablehq-center #observablehq-toc~#observablehq-main,#observablehq-sidebar-toggle:indeterminate~#observablehq-center #observablehq-toc~#observablehq-main{padding-right:calc(192px + 1rem)}}.observablehq-pre-container{position:relative;margin:1rem -1rem;max-width:960px}.observablehq-pre-container:after{position:absolute;top:0;right:0;height:21px;font:12px var(--sans-serif);color:var(--theme-foreground-muted);background:linear-gradient(to right,transparent,var(--theme-background-alt) 40%);padding:.5rem .5rem .5rem 1.5rem}.observablehq-pre-container[data-language]:after{content:attr(data-language)}.observablehq-pre-container pre{padding-right:4rem;margin:0;max-width:none}.observablehq-pre-copy{position:absolute;top:0;right:0;background:none;color:transparent;border:none;border-radius:4px;padding:0 8px;margin:4px;height:29px;cursor:pointer;z-index:1;display:flex;align-items:center}.observablehq-pre-copied:before{content:"Copied!";position:absolute;right:calc(100% + .25rem);background:linear-gradient(to right,transparent,var(--theme-background-alt) 10%);color:var(--theme-green);font:var(--font-small);padding:4px 8px 4px 16px;pointer-events:none;animation-name:observablehq-pre-copied;animation-duration:.25s;animation-direction:alternate;animation-iteration-count:2}@keyframes observablehq-pre-copied{0%{opacity:0;transform:translate(.5rem)}50%{opacity:1}to{transform:translate(0)}}.observablehq-pre-container[data-copy] .observablehq-pre-copy,.observablehq-pre-container:hover .observablehq-pre-copy,.observablehq-pre-container .observablehq-pre-copy:focus{background:var(--theme-background-alt);color:var(--theme-foreground-faint)}.observablehq-pre-container .observablehq-pre-copy:hover{color:var(--theme-foreground-muted)}.observablehq-pre-container .observablehq-pre-copy:active{color:var(--theme-foreground);background:var(--theme-foreground-faintest)}#observablehq-sidebar.observablehq-search-results>ol:not(:first-child),#observablehq-sidebar.observablehq-search-results>details,#observablehq-sidebar.observablehq-search-results>section{display:none}#observablehq-search{position:relative;padding:.5rem 0 0;display:flex;align-items:center}#observablehq-search input{padding:6px 4px 6px 2.2em;width:100%;border:none;border-radius:4px;background-color:var(--theme-background);font-size:13.3px;height:28px}#observablehq-search input::placeholder{color:var(--theme-foreground-faint)}#observablehq-search:before{position:absolute;left:.5rem;content:"";width:1rem;height:1rem;background:currentColor;-webkit-mask:var(--theme-magnifier);mask:var(--theme-magnifier);pointer-events:none}#observablehq-search:after{position:absolute;right:6px;content:attr(data-shortcut);pointer-events:none}#observablehq-search:focus-within:after{content:""}#observablehq-search-results{--relevance-width: 32px;position:absolute;overflow-y:auto;top:6.5rem;left:var(--observablehq-sidebar-padding-left);right:.5rem;bottom:0}#observablehq-search-results a span{max-width:184px;white-space:nowrap;overflow:hidden;text-overflow:ellipsis}#observablehq-search-results div{text-align:right;font-size:10px;margin:.5em}#observablehq-search-results li{position:relative;display:flex;align-items:center}#observablehq-search-results a{flex-grow:1}#observablehq-search-results li:after,#observablehq-search-results a span:after{content:"";width:var(--relevance-width);height:4px;position:absolute;top:14px;right:.5em;border-radius:2px;background:var(--theme-foreground-muted)}#observablehq-search-results li.observablehq-link-active:after{background:var(--theme-foreground-focus)}#observablehq-search-results a span:after{background:var(--theme-foreground-faintest)}#observablehq-search-results li[data-score="0"]:after{width:calc(var(--relevance-width) * .125)}#observablehq-search-results li[data-score="1"]:after{width:calc(var(--relevance-width) * .25)}#observablehq-search-results li[data-score="2"]:after{width:calc(var(--relevance-width) * .4375)}#observablehq-search-results li[data-score="3"]:after{width:calc(var(--relevance-width) * .625)}#observablehq-search-results li[data-score="4"]:after{width:calc(var(--relevance-width) * .8125)}@media print{#observablehq-center{padding-left:1em!important}#observablehq-sidebar,#observablehq-footer{display:none!important}}#VPContent{container-type:inline-size}.VPDoc .grid{margin:1rem 0;display:grid;gap:1rem;grid-auto-rows:1fr}.VPDoc .grid svg{overflow:visible}.VPDoc .grid figure{margin:0}.VPDoc .grid>*>p:first-child{margin-top:0}.VPDoc .grid>*>p:last-child{margin-bottom:0}@container (min-width: 640px){.grid-cols-2,.grid-cols-4{grid-template-columns:repeat(2,minmax(0,1fr))}.grid-cols-2 .grid-colspan-2,.grid-cols-2 .grid-colspan-3,.grid-cols-2 .grid-colspan-4,.grid-cols-4 .grid-colspan-2,.grid-cols-4 .grid-colspan-3,.grid-cols-4 .grid-colspan-4{grid-column:span 2}}@container (min-width: 720px){.grid-cols-3{grid-template-columns:repeat(3,minmax(0,1fr))}.grid-cols-3 .grid-colspan-2{grid-column:span 2}.grid-cols-3 .grid-colspan-3{grid-column:span 3}}@container (min-width: 900px){.grid-cols-4{grid-template-columns:repeat(4,minmax(0,1fr))}.grid-cols-4 .grid-colspan-3{grid-column:span 3}.grid-cols-4 .grid-colspan-4{grid-column:span 4}}.grid-rowspan-2{grid-row:span 2}.grid-rowspan-3{grid-row:span 3}.grid-rowspan-4{grid-row:span 4}div.card{background:var(--theme-background-alt);border:solid 1px var(--theme-foreground-faintest);border-radius:.75rem;padding:1rem;margin:1rem 0;font:14px var(--sans-serif)}.grid>div.card{margin:0}div.card>:first-child,div.card>:first-child>:first-child{margin-top:0}div.card>:last-child,div.card>:last-child>:last-child{margin-bottom:0}div.card h2,div.card h3{font-size:inherit}div.card h2{font-weight:500;font-size:15px}div.card h3{font-weight:400;color:var(--theme-foreground-muted)}div.card h2~svg,div.card h3~svg,div.card h2~p,div.card h3~p{margin-top:1rem}.plot-d6a7b5{--plot-background: var(--theme-background)}p .plot-d6a7b5{display:inline-block}.VPLocalSearchBox[data-v-ce12919d]{position:fixed;z-index:100;top:0;right:0;bottom:0;left:0;display:flex}.backdrop[data-v-ce12919d]{position:absolute;top:0;right:0;bottom:0;left:0;background:var(--vp-backdrop-bg-color);transition:opacity .5s}.shell[data-v-ce12919d]{position:relative;padding:12px;margin:64px auto;display:flex;flex-direction:column;gap:16px;background:var(--vp-local-search-bg);width:min(100vw - 60px,900px);height:min-content;max-height:min(100vh - 128px,900px);border-radius:6px}@media (max-width: 767px){.shell[data-v-ce12919d]{margin:0;width:100vw;height:100vh;max-height:none;border-radius:0}}.search-bar[data-v-ce12919d]{border:1px solid var(--vp-c-divider);border-radius:4px;display:flex;align-items:center;padding:0 12px;cursor:text}@media (max-width: 767px){.search-bar[data-v-ce12919d]{padding:0 8px}}.search-bar[data-v-ce12919d]:focus-within{border-color:var(--vp-c-brand-1)}.local-search-icon[data-v-ce12919d]{display:block;font-size:18px}.navigate-icon[data-v-ce12919d]{display:block;font-size:14px}.search-icon[data-v-ce12919d]{margin:8px}@media (max-width: 767px){.search-icon[data-v-ce12919d]{display:none}}.search-input[data-v-ce12919d]{padding:6px 12px;font-size:inherit;width:100%}@media (max-width: 767px){.search-input[data-v-ce12919d]{padding:6px 4px}}.search-actions[data-v-ce12919d]{display:flex;gap:4px}@media (any-pointer: coarse){.search-actions[data-v-ce12919d]{gap:8px}}@media (min-width: 769px){.search-actions.before[data-v-ce12919d]{display:none}}.search-actions button[data-v-ce12919d]{padding:8px}.search-actions button[data-v-ce12919d]:not([disabled]):hover,.toggle-layout-button.detailed-list[data-v-ce12919d]{color:var(--vp-c-brand-1)}.search-actions button.clear-button[data-v-ce12919d]:disabled{opacity:.37}.search-keyboard-shortcuts[data-v-ce12919d]{font-size:.8rem;opacity:75%;display:flex;flex-wrap:wrap;gap:16px;line-height:14px}.search-keyboard-shortcuts span[data-v-ce12919d]{display:flex;align-items:center;gap:4px}@media (max-width: 767px){.search-keyboard-shortcuts[data-v-ce12919d]{display:none}}.search-keyboard-shortcuts kbd[data-v-ce12919d]{background:#8080801a;border-radius:4px;padding:3px 6px;min-width:24px;display:inline-block;text-align:center;vertical-align:middle;border:1px solid rgba(128,128,128,.15);box-shadow:0 2px 2px #0000001a}.results[data-v-ce12919d]{display:flex;flex-direction:column;gap:6px;overflow-x:hidden;overflow-y:auto;overscroll-behavior:contain}.result[data-v-ce12919d]{display:flex;align-items:center;gap:8px;border-radius:4px;transition:none;line-height:1rem;border:solid 2px var(--vp-local-search-result-border);outline:none}.result>div[data-v-ce12919d]{margin:12px;width:100%;overflow:hidden}@media (max-width: 767px){.result>div[data-v-ce12919d]{margin:8px}}.titles[data-v-ce12919d]{display:flex;flex-wrap:wrap;gap:4px;position:relative;z-index:1001;padding:2px 0}.title[data-v-ce12919d]{display:flex;align-items:center;gap:4px}.title.main[data-v-ce12919d]{font-weight:500}.title-icon[data-v-ce12919d]{opacity:.5;font-weight:500;color:var(--vp-c-brand-1)}.title svg[data-v-ce12919d]{opacity:.5}.result.selected[data-v-ce12919d]{--vp-local-search-result-bg: var(--vp-local-search-result-selected-bg);border-color:var(--vp-local-search-result-selected-border)}.excerpt-wrapper[data-v-ce12919d]{position:relative}.excerpt[data-v-ce12919d]{opacity:50%;pointer-events:none;max-height:140px;overflow:hidden;position:relative;margin-top:4px}.result.selected .excerpt[data-v-ce12919d]{opacity:1}.excerpt[data-v-ce12919d] *{font-size:.8rem!important;line-height:130%!important}.titles[data-v-ce12919d] mark,.excerpt[data-v-ce12919d] mark{background-color:var(--vp-local-search-highlight-bg);color:var(--vp-local-search-highlight-text);border-radius:2px;padding:0 2px}.excerpt[data-v-ce12919d] .vp-code-group .tabs{display:none}.excerpt[data-v-ce12919d] .vp-code-group div[class*=language-]{border-radius:8px!important}.excerpt-gradient-bottom[data-v-ce12919d]{position:absolute;bottom:-1px;left:0;width:100%;height:8px;background:linear-gradient(transparent,var(--vp-local-search-result-bg));z-index:1000}.excerpt-gradient-top[data-v-ce12919d]{position:absolute;top:-1px;left:0;width:100%;height:8px;background:linear-gradient(var(--vp-local-search-result-bg),transparent);z-index:1000}.result.selected .titles[data-v-ce12919d],.result.selected .title-icon[data-v-ce12919d]{color:var(--vp-c-brand-1)!important}.no-results[data-v-ce12919d]{font-size:.9rem;text-align:center;padding:12px}svg[data-v-ce12919d]{flex:none}
diff --git a/assets/system_httplib_README.md.WIkDaVQZ.js b/assets/system_httplib_README.md.WIkDaVQZ.js
new file mode 100644
index 00000000000..1fb5081d0e9
--- /dev/null
+++ b/assets/system_httplib_README.md.WIkDaVQZ.js
@@ -0,0 +1,287 @@
+import{_ as i,c as a,a3 as t,o as n}from"./chunks/framework.DkhCEVKm.js";const g=JSON.parse('{"title":"cpp-httplib","description":"","frontmatter":{},"headers":[],"relativePath":"system/httplib/README.md","filePath":"system/httplib/README.md","lastUpdated":1731340314000}'),h={name:"system/httplib/README.md"};function l(k,s,p,e,E,r){return n(),a("div",null,s[0]||(s[0]=[t(`cpp-httplib
Server Example
#include <httplib.h>
+
+int main(void)
+{
+ using namespace httplib;
+
+ Server svr;
+
+ svr.Get("/hi", [](const Request& req, Response& res) {
+ res.set_content("Hello World!", "text/plain");
+ });
+
+ svr.Get(R"(/numbers/(\\d+))", [&](const Request& req, Response& res) {
+ auto numbers = req.matches[1];
+ res.set_content(numbers, "text/plain");
+ });
+
+ svr.Get("/body-header-param", [](const Request& req, Response& res) {
+ if (req.has_header("Content-Length")) {
+ auto val = req.get_header_value("Content-Length");
+ }
+ if (req.has_param("key")) {
+ auto val = req.get_param_value("key");
+ }
+ res.set_content(req.body, "text/plain");
+ });
+
+ svr.Get("/stop", [&](const Request& req, Response& res) {
+ svr.stop();
+ });
+
+ svr.listen("localhost", 1234);
+}
Post
, Put
, Delete
and Options
methods are also supported.Bind a socket to multiple interfaces and any available port
int port = svr.bind_to_any_port("0.0.0.0");
+svr.listen_after_bind();
Static File Server
// Mount / to ./www directory
+auto ret = svr.set_mount_point("/", "./www");
+if (!ret) {
+ // The specified base directory doesn't exist...
+}
+
+// Mount /public to ./www directory
+ret = svr.set_mount_point("/public", "./www");
+
+// Mount /public to ./www1 and ./www2 directories
+ret = svr.set_mount_point("/public", "./www1"); // 1st order to search
+ret = svr.set_mount_point("/public", "./www2"); // 2nd order to search
+
+// Remove mount /
+ret = svr.remove_mount_point("/");
+
+// Remove mount /public
+ret = svr.remove_mount_point("/public");
// User defined file extension and MIME type mappings
+svr.set_file_extension_and_mimetype_mapping("cc", "text/x-c");
+svr.set_file_extension_and_mimetype_mapping("cpp", "text/x-c");
+svr.set_file_extension_and_mimetype_mapping("hh", "text/x-h");
Extension MIME Type txt text/plain html, htm text/html css text/css jpeg, jpg image/jpg png image/png gif image/gif svg image/svg+xml ico image/x-icon json application/json pdf application/pdf js application/javascript wasm application/wasm xml application/xml xhtml application/xhtml+xml Logging
svr.set_logger([](const auto& req, const auto& res) {
+ your_logger(req, res);
+});
Error handler
svr.set_error_handler([](const auto& req, auto& res) {
+ auto fmt = "<p>Error Status: <span style='color:red;'>%d</span></p>";
+ char buf[BUFSIZ];
+ snprintf(buf, sizeof(buf), fmt, res.status);
+ res.set_content(buf, "text/html");
+});
'multipart/form-data' POST data
svr.Post("/multipart", [&](const auto& req, auto& res) {
+ auto size = req.files.size();
+ auto ret = req.has_file("name1");
+ const auto& file = req.get_file_value("name1");
+ // file.filename;
+ // file.content_type;
+ // file.content;
+});
Receive content with Content receiver
svr.Post("/content_receiver",
+ [&](const Request &req, Response &res, const ContentReader &content_reader) {
+ if (req.is_multipart_form_data()) {
+ MultipartFormDataItems files;
+ content_reader(
+ [&](const MultipartFormData &file) {
+ files.push_back(file);
+ return true;
+ },
+ [&](const char *data, size_t data_length) {
+ files.back().content.append(data, data_length);
+ return true;
+ });
+ } else {
+ std::string body;
+ content_reader([&](const char *data, size_t data_length) {
+ body.append(data, data_length);
+ return true;
+ });
+ res.set_content(body, "text/plain");
+ }
+ });
Send content with Content provider
const size_t DATA_CHUNK_SIZE = 4;
+
+svr.Get("/stream", [&](const Request &req, Response &res) {
+ auto data = new std::string("abcdefg");
+
+ res.set_content_provider(
+ data->size(), // Content length
+ "text/plain", // Content type
+ [data](size_t offset, size_t length, DataSink &sink) {
+ const auto &d = *data;
+ sink.write(&d[offset], std::min(length, DATA_CHUNK_SIZE));
+ return true; // return 'false' if you want to cancel the process.
+ },
+ [data] { delete data; });
+});
svr.Get("/stream", [&](const Request &req, Response &res) {
+ res.set_content_provider(
+ "text/plain", // Content type
+ [&](size_t offset, size_t length, DataSink &sink) {
+ if (/* there is still data */) {
+ std::vector<char> data;
+ // prepare data...
+ sink.write(data.data(), data.size());
+ } else {
+ done(); // No more data
+ }
+ return true; // return 'false' if you want to cancel the process.
+ });
+});
Chunked transfer encoding
svr.Get("/chunked", [&](const Request& req, Response& res) {
+ res.set_chunked_content_provider(
+ [](size_t offset, DataSink &sink) {
+ sink.write("123", 3);
+ sink.write("345", 3);
+ sink.write("789", 3);
+ sink.done(); // No more data
+ return true; // return 'false' if you want to cancel the process.
+ }
+ );
+});
'Expect: 100-continue' handler
100 Continue
response for Expect: 100-continue
header.// Send a '417 Expectation Failed' response.
+svr.set_expect_100_continue_handler([](const Request &req, Response &res) {
+ return 417;
+});
// Send a final status without reading the message body.
+svr.set_expect_100_continue_handler([](const Request &req, Response &res) {
+ return res.status = 401;
+});
Keep-Alive connection
svr.set_keep_alive_max_count(2); // Default is 5
Timeout
svr.set_read_timeout(5, 0); // 5 seconds
+svr.set_write_timeout(5, 0); // 5 seconds
+svr.set_idle_interval(0, 100000); // 100 milliseconds
Set maximum payload length for reading request body
svr.set_payload_max_length(1024 * 1024 * 512); // 512MB
Server-Sent Events
Default thread pool support
ThreadPool
is used as a default task queue, and the default thread count is set to value from std::thread::hardware_concurrency()
.CPPHTTPLIB_THREAD_POOL_COUNT
.Override the default thread pool with yours
class YourThreadPoolTaskQueue : public TaskQueue {
+public:
+ YourThreadPoolTaskQueue(size_t n) {
+ pool_.start_with_thread_count(n);
+ }
+
+ virtual void enqueue(std::function<void()> fn) override {
+ pool_.enqueue(fn);
+ }
+
+ virtual void shutdown() override {
+ pool_.shutdown_gracefully();
+ }
+
+private:
+ YourThreadPool pool_;
+};
+
+svr.new_task_queue = [] {
+ return new YourThreadPoolTaskQueue(12);
+};
Client Example
#include <httplib.h>
+#include <iostream>
+
+int main(void)
+{
+ httplib::Client cli("localhost", 1234);
+
+ if (auto res = cli.Get("/hi")) {
+ if (res->status == 200) {
+ std::cout << res->body << std::endl;
+ }
+ } else {
+ auto err = res.error();
+ ...
+ }
+}
httplib::Client cli("localhost");
+httplib::Client cli("localhost:8080");
+httplib::Client cli("http://localhost");
+httplib::Client cli("http://localhost:8080");
+httplib::Client cli("https://localhost");
GET with HTTP headers
httplib::Headers headers = {
+ { "Accept-Encoding", "gzip, deflate" }
+};
+auto res = cli.Get("/hi", headers);
cli.set_default_headers({
+ { "Accept-Encoding", "gzip, deflate" }
+});
+auto res = cli.Get("/hi");
POST
res = cli.Post("/post", "text", "text/plain");
+res = cli.Post("/person", "name=john1¬e=coder", "application/x-www-form-urlencoded");
POST with parameters
httplib::Params params;
+params.emplace("name", "john");
+params.emplace("note", "coder");
+
+auto res = cli.Post("/post", params);
httplib::Params params{
+ { "name", "john" },
+ { "note", "coder" }
+};
+
+auto res = cli.Post("/post", params);
POST with Multipart Form Data
httplib::MultipartFormDataItems items = {
+ { "text1", "text default", "", "" },
+ { "text2", "aωb", "", "" },
+ { "file1", "h\\ne\\n\\nl\\nl\\no\\n", "hello.txt", "text/plain" },
+ { "file2", "{\\n \\"world\\", true\\n}\\n", "world.json", "application/json" },
+ { "file3", "", "", "application/octet-stream" },
+};
+
+auto res = cli.Post("/multipart", items);
PUT
res = cli.Put("/resource/foo", "text", "text/plain");
DELETE
res = cli.Delete("/resource/foo");
OPTIONS
res = cli.Options("*");
+res = cli.Options("/resource/foo");
Timeout
cli.set_connection_timeout(0, 300000); // 300 milliseconds
+cli.set_read_timeout(5, 0); // 5 seconds
+cli.set_write_timeout(5, 0); // 5 seconds
Receive content with Content receiver
std::string body;
+
+auto res = cli.Get("/large-data",
+ [&](const char *data, size_t data_length) {
+ body.append(data, data_length);
+ return true;
+ });
std::string body;
+
+auto res = cli.Get(
+ "/stream", Headers(),
+ [&](const Response &response) {
+ EXPECT_EQ(200, response.status);
+ return true; // return 'false' if you want to cancel the request.
+ },
+ [&](const char *data, size_t data_length) {
+ body.append(data, data_length);
+ return true; // return 'false' if you want to cancel the request.
+ });
Send content with Content provider
std::string body = ...;
+
+auto res = cli_.Post(
+ "/stream", body.size(),
+ [](size_t offset, size_t length, DataSink &sink) {
+ sink.write(body.data() + offset, length);
+ return true; // return 'false' if you want to cancel the request.
+ },
+ "text/plain");
With Progress Callback
httplib::Client client(url, port);
+
+// prints: 0 / 000 bytes => 50% complete
+auto res = cli.Get("/", [](uint64_t len, uint64_t total) {
+ printf("%lld / %lld bytes => %d%% complete\\n",
+ len, total,
+ (int)(len*100/total));
+ return true; // return 'false' if you want to cancel the request.
+}
+);
Authentication
// Basic Authentication
+cli.set_basic_auth("user", "pass");
+
+// Digest Authentication
+cli.set_digest_auth("user", "pass");
+
+// Bearer Token Authentication
+cli.set_bearer_token_auth("token");
Proxy server support
cli.set_proxy("host", port);
+
+// Basic Authentication
+cli.set_proxy_basic_auth("user", "pass");
+
+// Digest Authentication
+cli.set_proxy_digest_auth("user", "pass");
+
+// Bearer Token Authentication
+cli.set_proxy_bearer_token_auth("pass");
Range
httplib::Client cli("httpbin.org");
+
+auto res = cli.Get("/range/32", {
+ httplib::make_range_header({{1, 10}}) // 'Range: bytes=1-10'
+});
+// res->status should be 206.
+// res->body should be "bcdefghijk".
httplib::make_range_header({{1, 10}, {20, -1}}) // 'Range: bytes=1-10, 20-'
+httplib::make_range_header({{100, 199}, {500, 599}}) // 'Range: bytes=100-199, 500-599'
+httplib::make_range_header({{0, 0}, {-1, 1}}) // 'Range: bytes=0-0, -1'
Keep-Alive connection
httplib::Client cli("localhost", 1234);
+
+cli.Get("/hello"); // with "Connection: close"
+
+cli.set_keep_alive(true);
+cli.Get("/world");
+
+cli.set_keep_alive(false);
+cli.Get("/last-request"); // with "Connection: close"
Redirect
httplib::Client cli("yahoo.com");
+
+auto res = cli.Get("/");
+res->status; // 301
+
+cli.set_follow_location(true);
+res = cli.Get("/");
+res->status; // 200
Use a specitic network interface
cli.set_interface("eth0"); // Interface name, IP address or host name
OpenSSL Support
CPPHTTPLIB_OPENSSL_SUPPORT
. libssl
and libcrypto
should be linked.#define CPPHTTPLIB_OPENSSL_SUPPORT
+
+SSLServer svr("./cert.pem", "./key.pem");
+
+SSLClient cli("localhost", 8080);
+cli.set_ca_cert_path("./ca-bundle.crt");
+cli.enable_server_certificate_verification(true);
Compression
Zlib Support
CPPHTTPLIB_ZLIB_SUPPORT
. libz
should be linked.Brotli Support
CPPHTTPLIB_BROTLI_SUPPORT
. Necessary libraries should be linked. Please see https://github.com/google/brotli for more detail.Compress request body on client
cli.set_compress(true);
+res = cli.Post("/resource/foo", "...", "text/plain");
Compress response body on client
cli.set_decompress(false);
+res = cli.Get("/resource/foo", {{"Accept-Encoding", "gzip, deflate, br"}});
+res->body; // Compressed data
Split httplib.h into .h and .cc
> python3 split.py
+> ls out
+httplib.h httplib.cc
NOTE
g++
<regex>
in the versions are broken.Windows
httplib.h
before Windows.h
or include Windows.h
by defining WIN32_LEAN_AND_MEAN
beforehand.#include <httplib.h>
+#include <Windows.h>
#define WIN32_LEAN_AND_MEAN
+#include <Windows.h>
+#include <httplib.h>
License
Special Thanks To
cpp-httplib
Server Example
#include <httplib.h>
+
+int main(void)
+{
+ using namespace httplib;
+
+ Server svr;
+
+ svr.Get("/hi", [](const Request& req, Response& res) {
+ res.set_content("Hello World!", "text/plain");
+ });
+
+ svr.Get(R"(/numbers/(\\d+))", [&](const Request& req, Response& res) {
+ auto numbers = req.matches[1];
+ res.set_content(numbers, "text/plain");
+ });
+
+ svr.Get("/body-header-param", [](const Request& req, Response& res) {
+ if (req.has_header("Content-Length")) {
+ auto val = req.get_header_value("Content-Length");
+ }
+ if (req.has_param("key")) {
+ auto val = req.get_param_value("key");
+ }
+ res.set_content(req.body, "text/plain");
+ });
+
+ svr.Get("/stop", [&](const Request& req, Response& res) {
+ svr.stop();
+ });
+
+ svr.listen("localhost", 1234);
+}
Post
, Put
, Delete
and Options
methods are also supported.Bind a socket to multiple interfaces and any available port
int port = svr.bind_to_any_port("0.0.0.0");
+svr.listen_after_bind();
Static File Server
// Mount / to ./www directory
+auto ret = svr.set_mount_point("/", "./www");
+if (!ret) {
+ // The specified base directory doesn't exist...
+}
+
+// Mount /public to ./www directory
+ret = svr.set_mount_point("/public", "./www");
+
+// Mount /public to ./www1 and ./www2 directories
+ret = svr.set_mount_point("/public", "./www1"); // 1st order to search
+ret = svr.set_mount_point("/public", "./www2"); // 2nd order to search
+
+// Remove mount /
+ret = svr.remove_mount_point("/");
+
+// Remove mount /public
+ret = svr.remove_mount_point("/public");
// User defined file extension and MIME type mappings
+svr.set_file_extension_and_mimetype_mapping("cc", "text/x-c");
+svr.set_file_extension_and_mimetype_mapping("cpp", "text/x-c");
+svr.set_file_extension_and_mimetype_mapping("hh", "text/x-h");
Extension MIME Type txt text/plain html, htm text/html css text/css jpeg, jpg image/jpg png image/png gif image/gif svg image/svg+xml ico image/x-icon json application/json pdf application/pdf js application/javascript wasm application/wasm xml application/xml xhtml application/xhtml+xml Logging
svr.set_logger([](const auto& req, const auto& res) {
+ your_logger(req, res);
+});
Error handler
svr.set_error_handler([](const auto& req, auto& res) {
+ auto fmt = "<p>Error Status: <span style='color:red;'>%d</span></p>";
+ char buf[BUFSIZ];
+ snprintf(buf, sizeof(buf), fmt, res.status);
+ res.set_content(buf, "text/html");
+});
'multipart/form-data' POST data
svr.Post("/multipart", [&](const auto& req, auto& res) {
+ auto size = req.files.size();
+ auto ret = req.has_file("name1");
+ const auto& file = req.get_file_value("name1");
+ // file.filename;
+ // file.content_type;
+ // file.content;
+});
Receive content with Content receiver
svr.Post("/content_receiver",
+ [&](const Request &req, Response &res, const ContentReader &content_reader) {
+ if (req.is_multipart_form_data()) {
+ MultipartFormDataItems files;
+ content_reader(
+ [&](const MultipartFormData &file) {
+ files.push_back(file);
+ return true;
+ },
+ [&](const char *data, size_t data_length) {
+ files.back().content.append(data, data_length);
+ return true;
+ });
+ } else {
+ std::string body;
+ content_reader([&](const char *data, size_t data_length) {
+ body.append(data, data_length);
+ return true;
+ });
+ res.set_content(body, "text/plain");
+ }
+ });
Send content with Content provider
const size_t DATA_CHUNK_SIZE = 4;
+
+svr.Get("/stream", [&](const Request &req, Response &res) {
+ auto data = new std::string("abcdefg");
+
+ res.set_content_provider(
+ data->size(), // Content length
+ "text/plain", // Content type
+ [data](size_t offset, size_t length, DataSink &sink) {
+ const auto &d = *data;
+ sink.write(&d[offset], std::min(length, DATA_CHUNK_SIZE));
+ return true; // return 'false' if you want to cancel the process.
+ },
+ [data] { delete data; });
+});
svr.Get("/stream", [&](const Request &req, Response &res) {
+ res.set_content_provider(
+ "text/plain", // Content type
+ [&](size_t offset, size_t length, DataSink &sink) {
+ if (/* there is still data */) {
+ std::vector<char> data;
+ // prepare data...
+ sink.write(data.data(), data.size());
+ } else {
+ done(); // No more data
+ }
+ return true; // return 'false' if you want to cancel the process.
+ });
+});
Chunked transfer encoding
svr.Get("/chunked", [&](const Request& req, Response& res) {
+ res.set_chunked_content_provider(
+ [](size_t offset, DataSink &sink) {
+ sink.write("123", 3);
+ sink.write("345", 3);
+ sink.write("789", 3);
+ sink.done(); // No more data
+ return true; // return 'false' if you want to cancel the process.
+ }
+ );
+});
'Expect: 100-continue' handler
100 Continue
response for Expect: 100-continue
header.// Send a '417 Expectation Failed' response.
+svr.set_expect_100_continue_handler([](const Request &req, Response &res) {
+ return 417;
+});
// Send a final status without reading the message body.
+svr.set_expect_100_continue_handler([](const Request &req, Response &res) {
+ return res.status = 401;
+});
Keep-Alive connection
svr.set_keep_alive_max_count(2); // Default is 5
Timeout
svr.set_read_timeout(5, 0); // 5 seconds
+svr.set_write_timeout(5, 0); // 5 seconds
+svr.set_idle_interval(0, 100000); // 100 milliseconds
Set maximum payload length for reading request body
svr.set_payload_max_length(1024 * 1024 * 512); // 512MB
Server-Sent Events
Default thread pool support
ThreadPool
is used as a default task queue, and the default thread count is set to value from std::thread::hardware_concurrency()
.CPPHTTPLIB_THREAD_POOL_COUNT
.Override the default thread pool with yours
class YourThreadPoolTaskQueue : public TaskQueue {
+public:
+ YourThreadPoolTaskQueue(size_t n) {
+ pool_.start_with_thread_count(n);
+ }
+
+ virtual void enqueue(std::function<void()> fn) override {
+ pool_.enqueue(fn);
+ }
+
+ virtual void shutdown() override {
+ pool_.shutdown_gracefully();
+ }
+
+private:
+ YourThreadPool pool_;
+};
+
+svr.new_task_queue = [] {
+ return new YourThreadPoolTaskQueue(12);
+};
Client Example
#include <httplib.h>
+#include <iostream>
+
+int main(void)
+{
+ httplib::Client cli("localhost", 1234);
+
+ if (auto res = cli.Get("/hi")) {
+ if (res->status == 200) {
+ std::cout << res->body << std::endl;
+ }
+ } else {
+ auto err = res.error();
+ ...
+ }
+}
httplib::Client cli("localhost");
+httplib::Client cli("localhost:8080");
+httplib::Client cli("http://localhost");
+httplib::Client cli("http://localhost:8080");
+httplib::Client cli("https://localhost");
GET with HTTP headers
httplib::Headers headers = {
+ { "Accept-Encoding", "gzip, deflate" }
+};
+auto res = cli.Get("/hi", headers);
cli.set_default_headers({
+ { "Accept-Encoding", "gzip, deflate" }
+});
+auto res = cli.Get("/hi");
POST
res = cli.Post("/post", "text", "text/plain");
+res = cli.Post("/person", "name=john1¬e=coder", "application/x-www-form-urlencoded");
POST with parameters
httplib::Params params;
+params.emplace("name", "john");
+params.emplace("note", "coder");
+
+auto res = cli.Post("/post", params);
httplib::Params params{
+ { "name", "john" },
+ { "note", "coder" }
+};
+
+auto res = cli.Post("/post", params);
POST with Multipart Form Data
httplib::MultipartFormDataItems items = {
+ { "text1", "text default", "", "" },
+ { "text2", "aωb", "", "" },
+ { "file1", "h\\ne\\n\\nl\\nl\\no\\n", "hello.txt", "text/plain" },
+ { "file2", "{\\n \\"world\\", true\\n}\\n", "world.json", "application/json" },
+ { "file3", "", "", "application/octet-stream" },
+};
+
+auto res = cli.Post("/multipart", items);
PUT
res = cli.Put("/resource/foo", "text", "text/plain");
DELETE
res = cli.Delete("/resource/foo");
OPTIONS
res = cli.Options("*");
+res = cli.Options("/resource/foo");
Timeout
cli.set_connection_timeout(0, 300000); // 300 milliseconds
+cli.set_read_timeout(5, 0); // 5 seconds
+cli.set_write_timeout(5, 0); // 5 seconds
Receive content with Content receiver
std::string body;
+
+auto res = cli.Get("/large-data",
+ [&](const char *data, size_t data_length) {
+ body.append(data, data_length);
+ return true;
+ });
std::string body;
+
+auto res = cli.Get(
+ "/stream", Headers(),
+ [&](const Response &response) {
+ EXPECT_EQ(200, response.status);
+ return true; // return 'false' if you want to cancel the request.
+ },
+ [&](const char *data, size_t data_length) {
+ body.append(data, data_length);
+ return true; // return 'false' if you want to cancel the request.
+ });
Send content with Content provider
std::string body = ...;
+
+auto res = cli_.Post(
+ "/stream", body.size(),
+ [](size_t offset, size_t length, DataSink &sink) {
+ sink.write(body.data() + offset, length);
+ return true; // return 'false' if you want to cancel the request.
+ },
+ "text/plain");
With Progress Callback
httplib::Client client(url, port);
+
+// prints: 0 / 000 bytes => 50% complete
+auto res = cli.Get("/", [](uint64_t len, uint64_t total) {
+ printf("%lld / %lld bytes => %d%% complete\\n",
+ len, total,
+ (int)(len*100/total));
+ return true; // return 'false' if you want to cancel the request.
+}
+);
Authentication
// Basic Authentication
+cli.set_basic_auth("user", "pass");
+
+// Digest Authentication
+cli.set_digest_auth("user", "pass");
+
+// Bearer Token Authentication
+cli.set_bearer_token_auth("token");
Proxy server support
cli.set_proxy("host", port);
+
+// Basic Authentication
+cli.set_proxy_basic_auth("user", "pass");
+
+// Digest Authentication
+cli.set_proxy_digest_auth("user", "pass");
+
+// Bearer Token Authentication
+cli.set_proxy_bearer_token_auth("pass");
Range
httplib::Client cli("httpbin.org");
+
+auto res = cli.Get("/range/32", {
+ httplib::make_range_header({{1, 10}}) // 'Range: bytes=1-10'
+});
+// res->status should be 206.
+// res->body should be "bcdefghijk".
httplib::make_range_header({{1, 10}, {20, -1}}) // 'Range: bytes=1-10, 20-'
+httplib::make_range_header({{100, 199}, {500, 599}}) // 'Range: bytes=100-199, 500-599'
+httplib::make_range_header({{0, 0}, {-1, 1}}) // 'Range: bytes=0-0, -1'
Keep-Alive connection
httplib::Client cli("localhost", 1234);
+
+cli.Get("/hello"); // with "Connection: close"
+
+cli.set_keep_alive(true);
+cli.Get("/world");
+
+cli.set_keep_alive(false);
+cli.Get("/last-request"); // with "Connection: close"
Redirect
httplib::Client cli("yahoo.com");
+
+auto res = cli.Get("/");
+res->status; // 301
+
+cli.set_follow_location(true);
+res = cli.Get("/");
+res->status; // 200
Use a specitic network interface
cli.set_interface("eth0"); // Interface name, IP address or host name
OpenSSL Support
CPPHTTPLIB_OPENSSL_SUPPORT
. libssl
and libcrypto
should be linked.#define CPPHTTPLIB_OPENSSL_SUPPORT
+
+SSLServer svr("./cert.pem", "./key.pem");
+
+SSLClient cli("localhost", 8080);
+cli.set_ca_cert_path("./ca-bundle.crt");
+cli.enable_server_certificate_verification(true);
Compression
Zlib Support
CPPHTTPLIB_ZLIB_SUPPORT
. libz
should be linked.Brotli Support
CPPHTTPLIB_BROTLI_SUPPORT
. Necessary libraries should be linked. Please see https://github.com/google/brotli for more detail.Compress request body on client
cli.set_compress(true);
+res = cli.Post("/resource/foo", "...", "text/plain");
Compress response body on client
cli.set_decompress(false);
+res = cli.Get("/resource/foo", {{"Accept-Encoding", "gzip, deflate, br"}});
+res->body; // Compressed data
Split httplib.h into .h and .cc
> python3 split.py
+> ls out
+httplib.h httplib.cc
NOTE
g++
<regex>
in the versions are broken.Windows
httplib.h
before Windows.h
or include Windows.h
by defining WIN32_LEAN_AND_MEAN
beforehand.#include <httplib.h>
+#include <Windows.h>
#define WIN32_LEAN_AND_MEAN
+#include <Windows.h>
+#include <httplib.h>
License
Special Thanks To
Data Masking Framework
system/masking/include
. The framework includes a platform component, the engine, that exposes obfuscation logic supplied by dynamically loaded plugin libraries. The framework can be used to obfuscate sensitive data, such as passwords and PII. Possible use cases including preventing trace output from revealing sensitive values and partially masking values in situations where a user needs to see "enough" of a value to confirm its correctness without seeing all of the value (e.g., when a user is asked to confirm the last four digits of an account number).File Contents datamasking.h Declares the core framework interfaces. Most are used by the engine and plugins. datamaskingengine.hpp Defines the engine component from the core engine interface. datamaskingplugin.hpp Defines a set of classes derived from the core plugin interfaces that can be used to provide rudimentary obfuscation. It is expected that the some or all classes will be subclassed, if not replaced outright, in new plugins. datamaskingshared.hpp Defines utilities shared by engine and plugin implementations. Glossary
Domain
Masker
maskValue
obfuscates individual values based on snapshot-defined meanings. For example, a value identified as a password might require complete obfuscation anywhere it appears, or an account number may require complete obfuscation in some cases and partial obfuscation in others.maskContent
obfuscates a variable number of values based on context provided by surrounding text. For example, the value of an HTTP authentication header or the text between <Password>
and </Password>
might require obfuscation. This operation can apply to both structured and unstructured text content.maskMarkupValue
obfuscates individual values based on their locationa within an in-memory representation of a structured document, such as XML or JSON. For example, the value of element Password
might require obfuscation unconditionally, while the value of element Value
might require obfuscation only if a sibling element named Name
exists with value password. This operation relies on the caller's ability to supply context parsed from structured content.IDataMasker
, with IDataMaskerInspector
providing access to less frequently used information about the domain.Profile
IDataMaskingProfile
(extending IDataMasker
) and IDataMaskingProfileInspector
(extending IDataMaskerInspector
) for additional information.Context
valuetype-set
is a pre-defined property used to select the group of value types that may be masked by any operation.rule-set
is a pre-defined property used to select the group of rules that will be applied for maskContent
requests.mask-pattern
to override the default obfuscation pattern to avoid replicating (and complicating) configurations just to change the appearance of obfuscated data.IDataMaskingProfileContext
(extending IDataMasker
) and IDataMaskingProfileContextInspector
(extending IDataMaskerInspector
) for additional information.Value Type
IDataMaskingProfileValueType
. They are accessed through the IDataMaskerInspector
interface.Mask Style
IDataMaskingProfileMaskStyle
. They are accessed through the IDataMaskingProfileValueType
interface.Rule
Plugin
IDataMaskingProfileIterator
.Engine
IDataMaskingEngine
(extending IDataMasker
) and IDataMaskingEngineInspector
for additional information.Environment
default-pattern
. A caller could interrogate a masker to know if default-pattern
is accepted. If accepted, setProperty("default-patterns", "#")
can be used to register an override.bool hasProperties() cosnt
+bool hasProperty(const char* name) const
+const char* queryProperty(const char* name) const
+bool setProperty(const char* name, const char* value)
+bool removeProperty(const char* name)
Value Type Sets
bool setProperty(const char* name, const char* value);
Overview
Runtime Compatibility
acceptsProperty
to determine whether a snapshot recognizes a custom property name. Use usesProperty
to learn if the snapshot includes references to the custom property name. Acceptance implies awareness of the set (and what it represents) even if selecting it will not change the outcome of any masking request. Usage implies that selecting the set will change the outcome of at least one masking operation.valuetype-set
is used to select a named set. A check of this property addresses whether the concept of a set is accepted or used by the snapshot.valuetype-set:foo
as used.Implementation
profile:
+ valueType:
+ - name: type1
type1
belongs to the default, unnamed value type set. Property valuetype-set
is accepted, but not used; an attempt to use it will change no results.profile:
+ valueType:
+ - name: type1
+ - name: type2
+ memberOf:
+ - name: set1
+ - name: set2
+ - name: type3
+ memberOf:
+ - name: set2
+ - name: set3
+ - name: type4
+ memberOf:
+ - name: set1
+ - name: type5
+ memberOf:
+ - name: set3
valuetype-set
is both accepted and used, as it can now impact results. Properties valuetype-set:set1
, valuetype-set:set2
and valuetype-set:set3
are also both accepted and used, but are intended only for use with compatibility checks.unnamed set1 set2 set3 * type1 Y Y Y Y Y type2 N Y Y N Y type3 N N Y Y Y type4 N Y N N Y type5 N N N Y Y profile:
+ valueType:
+ - name: type1
+ - name: type2
+ memberOf:
+ - name: set1
+ - name: set2
+ - name: type3
+ memberOf:
+ - name: set2
+ - name: set3
+ - name: type4
+ memberOf:
+ - name: set1
+ - name: type5
+ memberOf:
+ - name: set3
+ property:
+ - name: 'valuetype-set:set4'
+ - name: 'valuetype-set:set5'
valuetype-set:set4
and valuetype-set:set5
. Selection of these sets will not change results, but a caller might accept acceptance of a set as sufficient to establish compatibility.Rule Sets
bool setProperty(const char* name, const char* value);
Overview
Runtime Compatibility
acceptsProperty
to determine whether a snapshot recognizes a custom property name. Use usesProperty
to learn if the snapshot includes references to the custom property name. Acceptance implies awareness of the set (and what it represents) even if selecting it will select no rules. Usage implies that selecting the set will select at least one rule.rule-set
is used to select a named set. A check of this property addresses whether the concept of a set is accepted or used by the snapshot.rule-set:foo
as used.Implementation
profile:
+ valueType:
+ name: type1
+ rule:
+ - name: foo
+ - memberOf:
+ - name: ''
+ - memberOf:
+ - name: ''
+ - name: set1
+ - memberOf:
+ - name: set1
+ property:
+ name: 'rule-set:set2'
set1
.set1
.rule-set
is both accepted and used. Properties rule-set:
and rule-set:set1
are both accepted and used, intended for use with compatibility checks. Property rule-set:set2
is accepted.Operations
maskValue
bool maskValue(const char* valueType, const char* maskStyle, char* buffer, size_t offset, size_t length, bool conditionally) const;
+bool maskValueConditionally(const char* valueType, const char* maskStyle, char* buffer, size_t offset, size_t length) const;
+bool maskValueUnconditionally(const char* valueType, const char* maskStyle, char* buffer, size_t offset, size_t length) const;
Overview
buffer
, offset
, and length
parameters define what is assumed to be a single domain datum. That is, every character in the given character range is subject to obfuscation.valueType
parameter determines if obfuscation is required, with conditionally
controlling whether an unrecognized valueType
is ignored (true) or forced to obfuscate (false).maskStyle
parameter may be used to affect the nature of obfuscation to be applied. If the parameter names a defined mask style, that obfuscation format is applied. If the parameter does not name a define mask style, the value type's default obfuscation format is applied.Runtime Compatibility
canMaskValue
method of IDataMasker
can be used to establish whether a snapshot supports this operation. A result of true indicates that the plugin is capable of performing obfuscation in response to a request. Whether a combination of input parameters exists that results in obfuscation depends on the definition of the snapshot.hasValueType
method of IDataMaskerInspector
can be used to check if a given name is selected in the snapshot. The reserved name "*" may be used to detect unconditional obfuscation support when available in the snapshot. Names defined yet unselected in the snapshot are not directly detectable, but can be detected by comparing results of using multiple context configurations.queryValueType
to obtain the defined instance and, from the result, call queryMaskStyle
to confirm the existence of the mask stye. For each of these, corresponding get...
methods are defined to obtain new references to the objects. The methods are declared by IDataMaskerInspector
.Implementations
valueType
with a selected value type instance in the snapshot, where an instance is selected if it belongs to the currently selected value type set.valueType:
+ - name: type1
+ - name: type2
+ memberOf:
+ - name: set1
valueType
and the currently selected value type set.valueType
Selected Set Obfuscated type1 N/A Yes type1 set1 Yes type2 N/A No type2 set1 Yes typeN N/A No typeN set1 No valueType:
+ - name: type1
+ - name: type2
+ memberOf:
+ - name: set1
+ - name: *
valueType
Selected Set Obfuscated type1 N/A Yes type1 set1 Yes type2 N/A No type2 set1 Yes typeN N/A Yes typeN set1 Yes maskContent
bool maskContent(const char* contentType, char* buffer, size_t offset, size_t length) const;
Overview
buffer
, offset
, and length
parameters define what is assumed to be a blob of content that may contain zero or more occurrences of domain data. The blob is expected to contain sufficient context allowing a snapshot's rules to locate any included occurrences.context
parameter optionally influences which rules are selected in a snapshot, while the contentType
parameter offers a hint as to the blob's data format. Each snapshot rule may be assigned a format to which it applies. Operation requests may be optimized by limiting the number of selected rules applied by describing the format. Selected rules not assigned a format will be applied in all requests.maskValue
. Content obfuscation is always conditional on matching rules.Runtime Compatibility
canMaskContent
method of IDataMasker
can be used to establish whether a snapshot supports this operation. A result of true indicates that the plugin is capable of performing obfuscation in response to a request. Whether a combination of input parameters exists that results in obfuscation depends on the definition of the snapshot.hasRule
method of IDataMaskerInspector
indicates if at least one rule is selected in the snapshot for a given content type. Rules not associated with any content type may be interrogated using an empty string.Implementations
maskMarkupValue
bool maskMarkupValue(const char* element, const char* attribute, char* buffer, size_t offset, size_t length, IDataMaskingDependencyCallback& callback) const;
Overview
maskValue
obfuscates individual values based on what the caller believes the value to be, and maskContent
obfuscates any number of values it identifies based on cues found in surrounding text, maskMarkupValue
obfuscastes individual values based on cues not provided to it.Value
might depend on the content of a sibling element named Name
. If value of Value
probably does not require obfuscation when the value of Name
is city, but almost certainly does require obfuscation when the value of Name
is password. When processing the request for Value
, an implementation must ask for the value of a sibline named Name
to decide whether or not to obfuscate the data.maskValue
. Markup value obfuscation presumes that the caller is traversing a structured document without awareness of which values require obfuscation. If the caller knows which values require obfuscation, it should use maskValue
on those specific values.Runtime Compatibility
virtual bool getMarkupValueInfo(const char* element, const char* attribute, const char* buffer, size_t offset, size_t length, IDataMaskingDependencyCallback& callback, DataMaskingMarkupValueInfo& info) const = 0;
canMaskMarkupValue
method of IDataMasker
can be used to establish whether a snapshot supports this operation. A result of true indicates that the plugin is capable of performing obfuscation in response to a request. Whether a combination of input parameters exists that results in obfuscation depends on the definition of the snapshot.getMarkupValueInfo
method of IDataMaskerInspector
determines if the value of an element or attribute requires obfuscation. If required, the info
parameter will describe on completion how to perform the obfuscation using maskValue
instead of maskMarkupValue
. This is done to optimize performance by eliminating the need to reevaluate rules to re-identify a match.maskValue
with these three values and no mask style will apply the default obfuscation only to the embedded password. Alternatively, if a mask style is supplied, use of maskValue
with this style and the original value offset and length should obfuscate only the embedded password.callback
interface is required for all calls to getMarkupValueInfo
. The interface is only used when additional document context is required to make a determination about the requested value. In these cases, the checks are proximity-dependent and cannot be performed as part of load-time compatibility checks, when no document is available.Implementations
Load-time Compatibility
bool checkCompatibility(const IPTree& requirements) const;
Overview
requirements
parameter represents a standardized set of runtime compatibility checks to be applied. By default, each check must pass to establish compatibility. Explicitly optional checks may be included to detect conditions the caller is prepared to work around.maskValue
, and which obfuscation styles are available. It can test which rule sets will impact maskContent
. It specifically excludes maskMarkupValue
checks that depend on the proximity of one value to another.Implementation
compatibility:
+ - context:
+ - domain: optional text
+ version: optional number
+ property:
+ - name: required text
+ value: required text
+ accepts:
+ - name: required text
+ presence: one of "r", "o", or "p"
+ uses:
+ - name: required text
+ presence: one of "r", "o", or "p"
+ operation:
+ - name: one of "maskValue", "maskContent", or "maskMarkupValue"
+ presence: one of "r", "o", or "p"
+ valueType
+ - name: required text
+ presence: one of "r", "o", or "p"
+ maskStyle:
+ - name: required text
+ presence: one of "r", "o", or "p"
+ Set:
+ - name: required text
+ presence: one of "r", "o", or "p"
+ rule:
+ - contentType: required text
+ presence: one of "r", "o", or "p"
requirements
parameter of checkCompatibility
must be either a compatibility
element or the parent of a collection of compatibility
elements. Each instance may contain at most one context
element, three operation
elements (one per operation), and a variable number of accepts
, uses
, valueType
, or rule
elements.context
element describes the target of an evaluation. It may include a domain identifier, or assume the default domain. It may include a version number, or assume the default snapshot of the domain. It may specify any number of custom context properties to select various profile elements in the snapshot.accepts
and uses
elements identify custom context property names either accepted or used within a snapshot. Acceptance indicates some part of a snapshot has declared an understanding of the named property. Usage indicates the some part of a snapshot will react to the named value being set. Usage implies acceptance.Operations
element identifies which operations must be supported. Its purpose is to enforce availability of an operation when no more explicit requirements are given (for example, one may require maskContent
without also requiring the presence of rules for any given content type). When more explicit requirements are given, the attributes of this element are redundant. If all attributes are redundant, the element may be omitted.valueType
element describes all optional and mandatory requirements for using maskValue
with a single value type name and optional mask style name. Set membership requirements can also be evaluated, which is most useful when compatibility/context/property/@name
is "*".rule
element describes all optional and mandatory requirements for using maskContent
with a single content type value. Unlike evaluable value type set membership, rule set membership requirements cannot be evaluated.@presence
, the acceptable values are
`,170)]))}const h=t(o,[["render",i]]);export{u as __pageData,h as default};
diff --git a/assets/system_masking_include_readme.md.CNC1sarR.lean.js b/assets/system_masking_include_readme.md.CNC1sarR.lean.js
new file mode 100644
index 00000000000..11bf837832d
--- /dev/null
+++ b/assets/system_masking_include_readme.md.CNC1sarR.lean.js
@@ -0,0 +1,93 @@
+import{_ as t,c as a,a3 as n,o as s}from"./chunks/framework.DkhCEVKm.js";const u=JSON.parse('{"title":"Data Masking Framework","description":"","frontmatter":{},"headers":[],"relativePath":"system/masking/include/readme.md","filePath":"system/masking/include/readme.md","lastUpdated":1731340314000}'),o={name:"system/masking/include/readme.md"};function i(l,e,r,p,c,d){return s(),a("div",null,e[0]||(e[0]=[n(`Data Masking Framework
system/masking/include
. The framework includes a platform component, the engine, that exposes obfuscation logic supplied by dynamically loaded plugin libraries. The framework can be used to obfuscate sensitive data, such as passwords and PII. Possible use cases including preventing trace output from revealing sensitive values and partially masking values in situations where a user needs to see "enough" of a value to confirm its correctness without seeing all of the value (e.g., when a user is asked to confirm the last four digits of an account number).File Contents datamasking.h Declares the core framework interfaces. Most are used by the engine and plugins. datamaskingengine.hpp Defines the engine component from the core engine interface. datamaskingplugin.hpp Defines a set of classes derived from the core plugin interfaces that can be used to provide rudimentary obfuscation. It is expected that the some or all classes will be subclassed, if not replaced outright, in new plugins. datamaskingshared.hpp Defines utilities shared by engine and plugin implementations. Glossary
Domain
Masker
maskValue
obfuscates individual values based on snapshot-defined meanings. For example, a value identified as a password might require complete obfuscation anywhere it appears, or an account number may require complete obfuscation in some cases and partial obfuscation in others.maskContent
obfuscates a variable number of values based on context provided by surrounding text. For example, the value of an HTTP authentication header or the text between <Password>
and </Password>
might require obfuscation. This operation can apply to both structured and unstructured text content.maskMarkupValue
obfuscates individual values based on their locationa within an in-memory representation of a structured document, such as XML or JSON. For example, the value of element Password
might require obfuscation unconditionally, while the value of element Value
might require obfuscation only if a sibling element named Name
exists with value password. This operation relies on the caller's ability to supply context parsed from structured content.IDataMasker
, with IDataMaskerInspector
providing access to less frequently used information about the domain.Profile
IDataMaskingProfile
(extending IDataMasker
) and IDataMaskingProfileInspector
(extending IDataMaskerInspector
) for additional information.Context
valuetype-set
is a pre-defined property used to select the group of value types that may be masked by any operation.rule-set
is a pre-defined property used to select the group of rules that will be applied for maskContent
requests.mask-pattern
to override the default obfuscation pattern to avoid replicating (and complicating) configurations just to change the appearance of obfuscated data.IDataMaskingProfileContext
(extending IDataMasker
) and IDataMaskingProfileContextInspector
(extending IDataMaskerInspector
) for additional information.Value Type
IDataMaskingProfileValueType
. They are accessed through the IDataMaskerInspector
interface.Mask Style
IDataMaskingProfileMaskStyle
. They are accessed through the IDataMaskingProfileValueType
interface.Rule
Plugin
IDataMaskingProfileIterator
.Engine
IDataMaskingEngine
(extending IDataMasker
) and IDataMaskingEngineInspector
for additional information.Environment
default-pattern
. A caller could interrogate a masker to know if default-pattern
is accepted. If accepted, setProperty("default-patterns", "#")
can be used to register an override.bool hasProperties() cosnt
+bool hasProperty(const char* name) const
+const char* queryProperty(const char* name) const
+bool setProperty(const char* name, const char* value)
+bool removeProperty(const char* name)
Value Type Sets
bool setProperty(const char* name, const char* value);
Overview
Runtime Compatibility
acceptsProperty
to determine whether a snapshot recognizes a custom property name. Use usesProperty
to learn if the snapshot includes references to the custom property name. Acceptance implies awareness of the set (and what it represents) even if selecting it will not change the outcome of any masking request. Usage implies that selecting the set will change the outcome of at least one masking operation.valuetype-set
is used to select a named set. A check of this property addresses whether the concept of a set is accepted or used by the snapshot.valuetype-set:foo
as used.Implementation
profile:
+ valueType:
+ - name: type1
type1
belongs to the default, unnamed value type set. Property valuetype-set
is accepted, but not used; an attempt to use it will change no results.profile:
+ valueType:
+ - name: type1
+ - name: type2
+ memberOf:
+ - name: set1
+ - name: set2
+ - name: type3
+ memberOf:
+ - name: set2
+ - name: set3
+ - name: type4
+ memberOf:
+ - name: set1
+ - name: type5
+ memberOf:
+ - name: set3
valuetype-set
is both accepted and used, as it can now impact results. Properties valuetype-set:set1
, valuetype-set:set2
and valuetype-set:set3
are also both accepted and used, but are intended only for use with compatibility checks.unnamed set1 set2 set3 * type1 Y Y Y Y Y type2 N Y Y N Y type3 N N Y Y Y type4 N Y N N Y type5 N N N Y Y profile:
+ valueType:
+ - name: type1
+ - name: type2
+ memberOf:
+ - name: set1
+ - name: set2
+ - name: type3
+ memberOf:
+ - name: set2
+ - name: set3
+ - name: type4
+ memberOf:
+ - name: set1
+ - name: type5
+ memberOf:
+ - name: set3
+ property:
+ - name: 'valuetype-set:set4'
+ - name: 'valuetype-set:set5'
valuetype-set:set4
and valuetype-set:set5
. Selection of these sets will not change results, but a caller might accept acceptance of a set as sufficient to establish compatibility.Rule Sets
bool setProperty(const char* name, const char* value);
Overview
Runtime Compatibility
acceptsProperty
to determine whether a snapshot recognizes a custom property name. Use usesProperty
to learn if the snapshot includes references to the custom property name. Acceptance implies awareness of the set (and what it represents) even if selecting it will select no rules. Usage implies that selecting the set will select at least one rule.rule-set
is used to select a named set. A check of this property addresses whether the concept of a set is accepted or used by the snapshot.rule-set:foo
as used.Implementation
profile:
+ valueType:
+ name: type1
+ rule:
+ - name: foo
+ - memberOf:
+ - name: ''
+ - memberOf:
+ - name: ''
+ - name: set1
+ - memberOf:
+ - name: set1
+ property:
+ name: 'rule-set:set2'
set1
.set1
.rule-set
is both accepted and used. Properties rule-set:
and rule-set:set1
are both accepted and used, intended for use with compatibility checks. Property rule-set:set2
is accepted.Operations
maskValue
bool maskValue(const char* valueType, const char* maskStyle, char* buffer, size_t offset, size_t length, bool conditionally) const;
+bool maskValueConditionally(const char* valueType, const char* maskStyle, char* buffer, size_t offset, size_t length) const;
+bool maskValueUnconditionally(const char* valueType, const char* maskStyle, char* buffer, size_t offset, size_t length) const;
Overview
buffer
, offset
, and length
parameters define what is assumed to be a single domain datum. That is, every character in the given character range is subject to obfuscation.valueType
parameter determines if obfuscation is required, with conditionally
controlling whether an unrecognized valueType
is ignored (true) or forced to obfuscate (false).maskStyle
parameter may be used to affect the nature of obfuscation to be applied. If the parameter names a defined mask style, that obfuscation format is applied. If the parameter does not name a define mask style, the value type's default obfuscation format is applied.Runtime Compatibility
canMaskValue
method of IDataMasker
can be used to establish whether a snapshot supports this operation. A result of true indicates that the plugin is capable of performing obfuscation in response to a request. Whether a combination of input parameters exists that results in obfuscation depends on the definition of the snapshot.hasValueType
method of IDataMaskerInspector
can be used to check if a given name is selected in the snapshot. The reserved name "*" may be used to detect unconditional obfuscation support when available in the snapshot. Names defined yet unselected in the snapshot are not directly detectable, but can be detected by comparing results of using multiple context configurations.queryValueType
to obtain the defined instance and, from the result, call queryMaskStyle
to confirm the existence of the mask stye. For each of these, corresponding get...
methods are defined to obtain new references to the objects. The methods are declared by IDataMaskerInspector
.Implementations
valueType
with a selected value type instance in the snapshot, where an instance is selected if it belongs to the currently selected value type set.valueType:
+ - name: type1
+ - name: type2
+ memberOf:
+ - name: set1
valueType
and the currently selected value type set.valueType
Selected Set Obfuscated type1 N/A Yes type1 set1 Yes type2 N/A No type2 set1 Yes typeN N/A No typeN set1 No valueType:
+ - name: type1
+ - name: type2
+ memberOf:
+ - name: set1
+ - name: *
valueType
Selected Set Obfuscated type1 N/A Yes type1 set1 Yes type2 N/A No type2 set1 Yes typeN N/A Yes typeN set1 Yes maskContent
bool maskContent(const char* contentType, char* buffer, size_t offset, size_t length) const;
Overview
buffer
, offset
, and length
parameters define what is assumed to be a blob of content that may contain zero or more occurrences of domain data. The blob is expected to contain sufficient context allowing a snapshot's rules to locate any included occurrences.context
parameter optionally influences which rules are selected in a snapshot, while the contentType
parameter offers a hint as to the blob's data format. Each snapshot rule may be assigned a format to which it applies. Operation requests may be optimized by limiting the number of selected rules applied by describing the format. Selected rules not assigned a format will be applied in all requests.maskValue
. Content obfuscation is always conditional on matching rules.Runtime Compatibility
canMaskContent
method of IDataMasker
can be used to establish whether a snapshot supports this operation. A result of true indicates that the plugin is capable of performing obfuscation in response to a request. Whether a combination of input parameters exists that results in obfuscation depends on the definition of the snapshot.hasRule
method of IDataMaskerInspector
indicates if at least one rule is selected in the snapshot for a given content type. Rules not associated with any content type may be interrogated using an empty string.Implementations
maskMarkupValue
bool maskMarkupValue(const char* element, const char* attribute, char* buffer, size_t offset, size_t length, IDataMaskingDependencyCallback& callback) const;
Overview
maskValue
obfuscates individual values based on what the caller believes the value to be, and maskContent
obfuscates any number of values it identifies based on cues found in surrounding text, maskMarkupValue
obfuscastes individual values based on cues not provided to it.Value
might depend on the content of a sibling element named Name
. If value of Value
probably does not require obfuscation when the value of Name
is city, but almost certainly does require obfuscation when the value of Name
is password. When processing the request for Value
, an implementation must ask for the value of a sibline named Name
to decide whether or not to obfuscate the data.maskValue
. Markup value obfuscation presumes that the caller is traversing a structured document without awareness of which values require obfuscation. If the caller knows which values require obfuscation, it should use maskValue
on those specific values.Runtime Compatibility
virtual bool getMarkupValueInfo(const char* element, const char* attribute, const char* buffer, size_t offset, size_t length, IDataMaskingDependencyCallback& callback, DataMaskingMarkupValueInfo& info) const = 0;
canMaskMarkupValue
method of IDataMasker
can be used to establish whether a snapshot supports this operation. A result of true indicates that the plugin is capable of performing obfuscation in response to a request. Whether a combination of input parameters exists that results in obfuscation depends on the definition of the snapshot.getMarkupValueInfo
method of IDataMaskerInspector
determines if the value of an element or attribute requires obfuscation. If required, the info
parameter will describe on completion how to perform the obfuscation using maskValue
instead of maskMarkupValue
. This is done to optimize performance by eliminating the need to reevaluate rules to re-identify a match.maskValue
with these three values and no mask style will apply the default obfuscation only to the embedded password. Alternatively, if a mask style is supplied, use of maskValue
with this style and the original value offset and length should obfuscate only the embedded password.callback
interface is required for all calls to getMarkupValueInfo
. The interface is only used when additional document context is required to make a determination about the requested value. In these cases, the checks are proximity-dependent and cannot be performed as part of load-time compatibility checks, when no document is available.Implementations
Load-time Compatibility
bool checkCompatibility(const IPTree& requirements) const;
Overview
requirements
parameter represents a standardized set of runtime compatibility checks to be applied. By default, each check must pass to establish compatibility. Explicitly optional checks may be included to detect conditions the caller is prepared to work around.maskValue
, and which obfuscation styles are available. It can test which rule sets will impact maskContent
. It specifically excludes maskMarkupValue
checks that depend on the proximity of one value to another.Implementation
compatibility:
+ - context:
+ - domain: optional text
+ version: optional number
+ property:
+ - name: required text
+ value: required text
+ accepts:
+ - name: required text
+ presence: one of "r", "o", or "p"
+ uses:
+ - name: required text
+ presence: one of "r", "o", or "p"
+ operation:
+ - name: one of "maskValue", "maskContent", or "maskMarkupValue"
+ presence: one of "r", "o", or "p"
+ valueType
+ - name: required text
+ presence: one of "r", "o", or "p"
+ maskStyle:
+ - name: required text
+ presence: one of "r", "o", or "p"
+ Set:
+ - name: required text
+ presence: one of "r", "o", or "p"
+ rule:
+ - contentType: required text
+ presence: one of "r", "o", or "p"
requirements
parameter of checkCompatibility
must be either a compatibility
element or the parent of a collection of compatibility
elements. Each instance may contain at most one context
element, three operation
elements (one per operation), and a variable number of accepts
, uses
, valueType
, or rule
elements.context
element describes the target of an evaluation. It may include a domain identifier, or assume the default domain. It may include a version number, or assume the default snapshot of the domain. It may specify any number of custom context properties to select various profile elements in the snapshot.accepts
and uses
elements identify custom context property names either accepted or used within a snapshot. Acceptance indicates some part of a snapshot has declared an understanding of the named property. Usage indicates the some part of a snapshot will react to the named value being set. Usage implies acceptance.Operations
element identifies which operations must be supported. Its purpose is to enforce availability of an operation when no more explicit requirements are given (for example, one may require maskContent
without also requiring the presence of rules for any given content type). When more explicit requirements are given, the attributes of this element are redundant. If all attributes are redundant, the element may be omitted.valueType
element describes all optional and mandatory requirements for using maskValue
with a single value type name and optional mask style name. Set membership requirements can also be evaluated, which is most useful when compatibility/context/property/@name
is "*".rule
element describes all optional and mandatory requirements for using maskContent
with a single content type value. Unlike evaluable value type set membership, rule set membership requirements cannot be evaluated.@presence
, the acceptable values are
`,170)]))}const h=t(o,[["render",i]]);export{u as __pageData,h as default};
diff --git a/assets/system_masking_plugins_datamasker_readme.md.DKeIe1BA.js b/assets/system_masking_plugins_datamasker_readme.md.DKeIe1BA.js
new file mode 100644
index 00000000000..5a7408709c3
--- /dev/null
+++ b/assets/system_masking_plugins_datamasker_readme.md.DKeIe1BA.js
@@ -0,0 +1,24 @@
+import{_ as t,c as i,a3 as n,o as a}from"./chunks/framework.DkhCEVKm.js";const u=JSON.parse('{"title":"","description":"","frontmatter":{},"headers":[],"relativePath":"system/masking/plugins/datamasker/readme.md","filePath":"system/masking/plugins/datamasker/readme.md","lastUpdated":1731340314000}'),o={name:"system/masking/plugins/datamasker/readme.md"};function s(r,e,l,m,d,p){return a(),i("div",null,e[0]||(e[0]=[n(`libdatamasker.so
IDataMaskingProfileIterator
to be used by an IDataMaskingEngine
instance. The library is written in C++, and this section is organized according to the namespace and class names used.namespace DataMasking
CContext
IDataMaskingProfileContext
and IDataMaskingProfileContextInspector
tightly coupled to a specific version of a specific profile instance. It manages custom properties that are accepted by the associated profile, and rejects attempts to set those not accepted. The rejection is one way of letting the caller know that something it might deem essential is not handled by the profile.CProfile
, an abstract implementation of IDataMaskingProfile
and IDataMaskingProfileInspector
.CMaskStyle
IDataMaskingProfileMaskStyle
providing customized masked output of values. Configuration options include:CPartialMaskStyle
[ [ @action ] [ @location ] @count [ @characters ] ]
+
CRule
maskContent
.maskContent
requests that indicate a content type should apply all rules that match the type in addition to all rules that omit a type.CSerialTokenRule
<
and >
to match an element name and quotes to match attribute values.TPlugin
IDataMaskingProfileIterator
used for transforming a configuration property tree into a collection of profiles to be returned by a library entry point function. Created profiles are of the same class, identified by the template parameter profile_t.TProfile
CProfile
, this manages value types and rules, providing a default implementation of maskValue
but leaving other operations to subclasses.IDataMaskingProfileValueType
to be instantiated during configuration.IDataMaskingProfileContext
to be instantiated on demand.profile:
+ minimumVersion: 1
+ maximumVersion: 2
+ valueType:
+ - name: foo
+ maximumVersion: 1
+ - name: foo
+ minimumVersion: 2
+ - name: bar
+ - name: bar
+ minimumVersion: 2
+
maskValue
requests specifying unknown value type names can fall back to this special type and force masking for these values. Without this type, maskValue
masks what it thinks should be masked; with this type, it trusts the caller to request masking for only those values known to require it. Value types included in unselected value type sets are known to the profile and, as such, can be used to prevent specific typed values from ever being masked.TSerialProfile
maskContent
. It is assumed by maskContent
that each applicable rule must be applied serially, i.e., one after the other, using an bool applyRule(buffer, length, context)
interface.TProfile
.TProfile
.TValueType
IDataMaskingProfileValueType
that manages mask styles and creates rules for the profile.IDataMaskingProfileMaskStyle
to be instantiated during configuration.valueType:
+ minimumVersion: 1
+ maximumVersion: 2
+ maskStyle:
+ - name: foo
+ maximumVersion: 1
+ - name: foo
+ minimumVersion: 2
+ - name: bar
+ - name: bar
+ minimumVersion: 2
+
Entry Point Functions
newPartialMaskSerialToken
maskValue
and maskContent
operations.
`,63)]))}const h=t(o,[["render",s]]);export{u as __pageData,h as default};
diff --git a/assets/system_masking_plugins_datamasker_readme.md.DKeIe1BA.lean.js b/assets/system_masking_plugins_datamasker_readme.md.DKeIe1BA.lean.js
new file mode 100644
index 00000000000..5a7408709c3
--- /dev/null
+++ b/assets/system_masking_plugins_datamasker_readme.md.DKeIe1BA.lean.js
@@ -0,0 +1,24 @@
+import{_ as t,c as i,a3 as n,o as a}from"./chunks/framework.DkhCEVKm.js";const u=JSON.parse('{"title":"","description":"","frontmatter":{},"headers":[],"relativePath":"system/masking/plugins/datamasker/readme.md","filePath":"system/masking/plugins/datamasker/readme.md","lastUpdated":1731340314000}'),o={name:"system/masking/plugins/datamasker/readme.md"};function s(r,e,l,m,d,p){return a(),i("div",null,e[0]||(e[0]=[n(`libdatamasker.so
IDataMaskingProfileIterator
to be used by an IDataMaskingEngine
instance. The library is written in C++, and this section is organized according to the namespace and class names used.namespace DataMasking
CContext
IDataMaskingProfileContext
and IDataMaskingProfileContextInspector
tightly coupled to a specific version of a specific profile instance. It manages custom properties that are accepted by the associated profile, and rejects attempts to set those not accepted. The rejection is one way of letting the caller know that something it might deem essential is not handled by the profile.CProfile
, an abstract implementation of IDataMaskingProfile
and IDataMaskingProfileInspector
.CMaskStyle
IDataMaskingProfileMaskStyle
providing customized masked output of values. Configuration options include:CPartialMaskStyle
[ [ @action ] [ @location ] @count [ @characters ] ]
+
CRule
maskContent
.maskContent
requests that indicate a content type should apply all rules that match the type in addition to all rules that omit a type.CSerialTokenRule
<
and >
to match an element name and quotes to match attribute values.TPlugin
IDataMaskingProfileIterator
used for transforming a configuration property tree into a collection of profiles to be returned by a library entry point function. Created profiles are of the same class, identified by the template parameter profile_t.TProfile
CProfile
, this manages value types and rules, providing a default implementation of maskValue
but leaving other operations to subclasses.IDataMaskingProfileValueType
to be instantiated during configuration.IDataMaskingProfileContext
to be instantiated on demand.profile:
+ minimumVersion: 1
+ maximumVersion: 2
+ valueType:
+ - name: foo
+ maximumVersion: 1
+ - name: foo
+ minimumVersion: 2
+ - name: bar
+ - name: bar
+ minimumVersion: 2
+
maskValue
requests specifying unknown value type names can fall back to this special type and force masking for these values. Without this type, maskValue
masks what it thinks should be masked; with this type, it trusts the caller to request masking for only those values known to require it. Value types included in unselected value type sets are known to the profile and, as such, can be used to prevent specific typed values from ever being masked.TSerialProfile
maskContent
. It is assumed by maskContent
that each applicable rule must be applied serially, i.e., one after the other, using an bool applyRule(buffer, length, context)
interface.TProfile
.TProfile
.TValueType
IDataMaskingProfileValueType
that manages mask styles and creates rules for the profile.IDataMaskingProfileMaskStyle
to be instantiated during configuration.valueType:
+ minimumVersion: 1
+ maximumVersion: 2
+ maskStyle:
+ - name: foo
+ maximumVersion: 1
+ - name: foo
+ minimumVersion: 2
+ - name: bar
+ - name: bar
+ minimumVersion: 2
+
Entry Point Functions
newPartialMaskSerialToken
maskValue
and maskContent
operations.
`,63)]))}const h=t(o,[["render",s]]);export{u as __pageData,h as default};
diff --git a/assets/system_security_plugins_jwtSecurity_README.md.B1LGJUHj.js b/assets/system_security_plugins_jwtSecurity_README.md.B1LGJUHj.js
new file mode 100644
index 00000000000..8ad890ce01f
--- /dev/null
+++ b/assets/system_security_plugins_jwtSecurity_README.md.B1LGJUHj.js
@@ -0,0 +1,8 @@
+import{_ as t,c as o,a3 as i,o as s}from"./chunks/framework.DkhCEVKm.js";const u=JSON.parse('{"title":"","description":"","frontmatter":{},"headers":[],"relativePath":"system/security/plugins/jwtSecurity/README.md","filePath":"system/security/plugins/jwtSecurity/README.md","lastUpdated":1731340314000}'),a={name:"system/security/plugins/jwtSecurity/README.md"};function n(r,e,l,d,c,h){return s(),o("div",null,e[0]||(e[0]=[i(`JWT Authorization Security Manager Plugin
Code Documentation
doxygen
is on your path, you can build the documentation via:cd system/security/plugins/jwtSecurity
+doxygen Doxyfile
+
docs/html/index.html
.Theory of Operations
esp
process when a user needs to be authenticated. That call will contain the user's username and either a reference to a session token or a password. The session token is present only for already-authenticated users.JWT login service
(also known as a JWT login endpoint) with the username and password, plus a nonce value for additional security.JWT refresh service
(also known as a JWT refresh endpoint) when needed. This largely follows the OIC specification.Deviations From OIC Specification
esp
process will gather the username/password credentials instead of a third party, then send those credentials off to another service. In a true OIC configuration, the client process (the esp
process) never sees user credentials and relies on an external service to gather them from the user.JWT login service
is a POST HTTP or HTTPS call (depending on your configuration) containing four items in JSON format; example: {
+ "username": "my_username",
+ "password": "my_password",
+ "client_id": "https://myhpcccluster.com",
+ "nonce": "hf674DTRMd4Z1s"
+ }
JWT login service
should reply with an OIC-compatible JSON-formatted reply. See https://openid.net/specs/openid-connect-core-1_0.html#TokenResponse for an example of a successful authentication and https://openid.net/specs/openid-connect-core-1_0.html#TokenErrorResponse for an example of an error response. The token itself is comprised of several attributes, followed by HPCC Systems-specific claims; see https://openid.net/specs/openid-connect-core-1_0.html#IDToken (all required fields are indeed required, plus the nonce field). access_token
and expires_in
values are ignored by this plugin.JWT refresh service
to request a new token. This follows https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens except that the client_secret
and scope
fields in the request are omitted.Implications of Deviations
HPCC Systems Configuration Notes
jwtsecmgr
Security Manager plugin must be added as a component and then modified according to your environment:client_id
in token requestsJWT Login Endpoint
(should be HTTPS, but not required)JWT Refresh Endpoint
(should be HTTPS, but not required)jwtseccmgr
component is added, you have to tell other parts of the system to use the plugin. For user authentication and permissions affecting features and workunit scopes, you need to add the plugin to the esp
component. Instructions for doing so can be found in the HPCC Systems Administrator's Guide manual (though the manual uses the htpasswd plugin as an example, the process is the same).Dali Server
component, select the LDAP tab. Change the authMethod
entry to secmgrPlugin
and enter "jwtsecmgr" as the authPluginType
. Make sure checkScopeScans
is set to true.HPCC Systems Authorization and JWT Claims
{ "SmcAccess": "Read" }
Workunit Scope Permissions
Meaning Claim Value User has view rights to workunit scope AllowWorkunitScopeView pattern User has modify rights to workunit scope AllowWorkunitScopeModify pattern User has delete rights to workunit scope AllowWorkunitScopeDelete pattern User does not have view rights to workunit scope DenyWorkunitScopeView pattern User does not have modify rights to workunit scope DenyWorkunitScopeModify pattern User does not have delete rights to workunit scope DenyWorkunitScopeDelete pattern File Scope Permissions
`,41)]))}const m=t(a,[["render",n]]);export{u as __pageData,m as default};
diff --git a/assets/system_security_plugins_jwtSecurity_README.md.B1LGJUHj.lean.js b/assets/system_security_plugins_jwtSecurity_README.md.B1LGJUHj.lean.js
new file mode 100644
index 00000000000..8ad890ce01f
--- /dev/null
+++ b/assets/system_security_plugins_jwtSecurity_README.md.B1LGJUHj.lean.js
@@ -0,0 +1,8 @@
+import{_ as t,c as o,a3 as i,o as s}from"./chunks/framework.DkhCEVKm.js";const u=JSON.parse('{"title":"","description":"","frontmatter":{},"headers":[],"relativePath":"system/security/plugins/jwtSecurity/README.md","filePath":"system/security/plugins/jwtSecurity/README.md","lastUpdated":1731340314000}'),a={name:"system/security/plugins/jwtSecurity/README.md"};function n(r,e,l,d,c,h){return s(),o("div",null,e[0]||(e[0]=[i(`Meaning Claim Value User has view rights to file scope AllowFileScopeView pattern User has modify rights to file scope AllowFileScopeModify pattern User has delete rights to file scope AllowFileScopeDelete pattern User does not have view rights to file scope DenyFileScopeView pattern User does not have modify rights to file scope DenyFileScopeModify pattern User does not have delete rights to file scope DenyFileScopeDelete pattern JWT Authorization Security Manager Plugin
Code Documentation
doxygen
is on your path, you can build the documentation via:cd system/security/plugins/jwtSecurity
+doxygen Doxyfile
+
docs/html/index.html
.Theory of Operations
esp
process when a user needs to be authenticated. That call will contain the user's username and either a reference to a session token or a password. The session token is present only for already-authenticated users.JWT login service
(also known as a JWT login endpoint) with the username and password, plus a nonce value for additional security.JWT refresh service
(also known as a JWT refresh endpoint) when needed. This largely follows the OIC specification.Deviations From OIC Specification
esp
process will gather the username/password credentials instead of a third party, then send those credentials off to another service. In a true OIC configuration, the client process (the esp
process) never sees user credentials and relies on an external service to gather them from the user.JWT login service
is a POST HTTP or HTTPS call (depending on your configuration) containing four items in JSON format; example: {
+ "username": "my_username",
+ "password": "my_password",
+ "client_id": "https://myhpcccluster.com",
+ "nonce": "hf674DTRMd4Z1s"
+ }
JWT login service
should reply with an OIC-compatible JSON-formatted reply. See https://openid.net/specs/openid-connect-core-1_0.html#TokenResponse for an example of a successful authentication and https://openid.net/specs/openid-connect-core-1_0.html#TokenErrorResponse for an example of an error response. The token itself is comprised of several attributes, followed by HPCC Systems-specific claims; see https://openid.net/specs/openid-connect-core-1_0.html#IDToken (all required fields are indeed required, plus the nonce field). access_token
and expires_in
values are ignored by this plugin.JWT refresh service
to request a new token. This follows https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens except that the client_secret
and scope
fields in the request are omitted.Implications of Deviations
HPCC Systems Configuration Notes
jwtsecmgr
Security Manager plugin must be added as a component and then modified according to your environment:client_id
in token requestsJWT Login Endpoint
(should be HTTPS, but not required)JWT Refresh Endpoint
(should be HTTPS, but not required)jwtseccmgr
component is added, you have to tell other parts of the system to use the plugin. For user authentication and permissions affecting features and workunit scopes, you need to add the plugin to the esp
component. Instructions for doing so can be found in the HPCC Systems Administrator's Guide manual (though the manual uses the htpasswd plugin as an example, the process is the same).Dali Server
component, select the LDAP tab. Change the authMethod
entry to secmgrPlugin
and enter "jwtsecmgr" as the authPluginType
. Make sure checkScopeScans
is set to true.HPCC Systems Authorization and JWT Claims
{ "SmcAccess": "Read" }
Workunit Scope Permissions
Meaning Claim Value User has view rights to workunit scope AllowWorkunitScopeView pattern User has modify rights to workunit scope AllowWorkunitScopeModify pattern User has delete rights to workunit scope AllowWorkunitScopeDelete pattern User does not have view rights to workunit scope DenyWorkunitScopeView pattern User does not have modify rights to workunit scope DenyWorkunitScopeModify pattern User does not have delete rights to workunit scope DenyWorkunitScopeDelete pattern File Scope Permissions
`,41)]))}const m=t(a,[["render",n]]);export{u as __pageData,m as default};
diff --git a/assets/testing_regress_cleanupReadme.md.BWXfHXn3.js b/assets/testing_regress_cleanupReadme.md.BWXfHXn3.js
new file mode 100644
index 00000000000..bceac01f3a4
--- /dev/null
+++ b/assets/testing_regress_cleanupReadme.md.BWXfHXn3.js
@@ -0,0 +1 @@
+import{_ as t,c as a,a3 as r,o as s}from"./chunks/framework.DkhCEVKm.js";const m=JSON.parse('{"title":"Cleanup Parameter of Regression Suite run and query sub-command","description":"","frontmatter":{},"headers":[],"relativePath":"testing/regress/cleanupReadme.md","filePath":"testing/regress/cleanupReadme.md","lastUpdated":1731340314000}'),o={name:"testing/regress/cleanupReadme.md"};function i(n,e,l,u,d,c){return s(),a("div",null,e[0]||(e[0]=[r('Meaning Claim Value User has view rights to file scope AllowFileScopeView pattern User has modify rights to file scope AllowFileScopeModify pattern User has delete rights to file scope AllowFileScopeDelete pattern User does not have view rights to file scope DenyFileScopeView pattern User does not have modify rights to file scope DenyFileScopeModify pattern User does not have delete rights to file scope DenyFileScopeDelete pattern Cleanup Parameter of Regression Suite run and query sub-command
Command:
Result:
The sample terminal output for hthor target:
[Action] Queries: 4
[Action] 1. Test: action1.ecl
[Pass] 1. Pass action1.ecl - W20240526-094322 (2 sec)
[Pass] 1. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094322
[Action] 2. Test: action2.ecl
[Pass] 2. Pass action2.ecl - W20240526-094324 (2 sec)
[Pass] 2. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094324
[Action] 3. Test: action4.ecl
[Pass] 3. Pass action4.ecl - W20240526-094325 (2 sec)
[Pass] 3. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094325
[Action] 4. Test: action5.ecl
[Pass] 4. Pass action5.ecl - W20240526-094327 (2 sec)
[Pass] 4. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094327
[Action]
-------------------------------------------------
Result:
Passing: 4
Failure: 0
-------------------------------------------------
Log: /root/HPCCSystems-regression/log/hthor.24-05-26-09-43-22.log
-------------------------------------------------
Elapsed time: 11 sec (00:00:11)
-------------------------------------------------
[Pass] 2. Workunit Wuid=W20240526-094324 deleted successfully.
[Failure] 3. Failed to delete Wuid=W20240526-094325. URL: http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094325
Failed cannot open workunit Wuid=W20240526-094325.. Response status code: 200
[Pass] 4. Workunit Wuid=W20240526-094327 deleted successfully.
Suite destructor.Cleanup Parameter of Regression Suite run and query sub-command
Command:
Result:
The sample terminal output for hthor target:
[Action] Queries: 4
[Action] 1. Test: action1.ecl
[Pass] 1. Pass action1.ecl - W20240526-094322 (2 sec)
[Pass] 1. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094322
[Action] 2. Test: action2.ecl
[Pass] 2. Pass action2.ecl - W20240526-094324 (2 sec)
[Pass] 2. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094324
[Action] 3. Test: action4.ecl
[Pass] 3. Pass action4.ecl - W20240526-094325 (2 sec)
[Pass] 3. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094325
[Action] 4. Test: action5.ecl
[Pass] 4. Pass action5.ecl - W20240526-094327 (2 sec)
[Pass] 4. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094327
[Action]
-------------------------------------------------
Result:
Passing: 4
Failure: 0
-------------------------------------------------
Log: /root/HPCCSystems-regression/log/hthor.24-05-26-09-43-22.log
-------------------------------------------------
Elapsed time: 11 sec (00:00:11)
-------------------------------------------------
[Pass] 2. Workunit Wuid=W20240526-094324 deleted successfully.
[Failure] 3. Failed to delete Wuid=W20240526-094325. URL: http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240526-094325
Failed cannot open workunit Wuid=W20240526-094325.. Response status code: 200
[Pass] 4. Workunit Wuid=W20240526-094327 deleted successfully.
Suite destructor.Test Suite for the Parquet Plugin
Running the Test Suite
./ecl-test setup
command to initialize these files before running the test suite../ecl-test query --runclass parquet parquet*.ecl
./ecl-test query --runclass parquet <test_file_name>.ecl
+
+example below:
+
+/ecl-test query --target hthor --runclass parquet parquet_schema.ecl
./ecl-test query --target roxie --runclass parquet parquet*.ecl
[Action] Suite: roxie
+[Action] Queries: 15
+[Action] 1. Test: parquet_compress.ecl ( version: compressionType='UNCOMPRESSED' )
+[Pass] 1. Pass parquet_compress.ecl ( version: compressionType='UNCOMPRESSED' ) - W20240815-111429 (4 sec)
+[Pass] 1. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111429
+[Action] 2. Test: parquet_compress.ecl ( version: compressionType='Snappy' )
+[Pass] 2. Pass parquet_compress.ecl ( version: compressionType='Snappy' ) - W20240815-111434 (3 sec)
+[Pass] 2. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111434
+[Action] 3. Test: parquet_compress.ecl ( version: compressionType='GZip' )
+[Pass] 3. Pass parquet_compress.ecl ( version: compressionType='GZip' ) - W20240815-111438 (4 sec)
+[Pass] 3. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111438
+[Action] 4. Test: parquet_compress.ecl ( version: compressionType='Brotli' )
+[Pass] 4. Pass parquet_compress.ecl ( version: compressionType='Brotli' ) - W20240815-111442 (4 sec)
+[Pass] 4. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111442
+[Action] 5. Test: parquet_compress.ecl ( version: compressionType='LZ4' )
+[Pass] 5. Pass parquet_compress.ecl ( version: compressionType='LZ4' ) - W20240815-111447 (3 sec)
+[Pass] 5. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111447
+[Action] 6. Test: parquet_compress.ecl ( version: compressionType='ZSTD' )
+[Pass] 6. Pass parquet_compress.ecl ( version: compressionType='ZSTD' ) - W20240815-111450 (2 sec)
+[Pass] 6. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111450
+[Action] 7. Test: parquet_corrupt.ecl
+[Pass] 7. Pass parquet_corrupt.ecl - W20240815-111453 (2 sec)
+[Pass] 7. Intentionally fails
+[Pass] 7. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111453
+[Action] 8. Test: parquet_empty.ecl
+[Pass] 8. Pass parquet_empty.ecl - W20240815-111455 (2 sec)
+[Pass] 8. Intentionally fails
+[Pass] 8. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111455
+[Action] 9. Test: parquet_overwrite.ecl
+[Pass] 9. Pass parquet_overwrite.ecl - W20240815-111457 (2 sec)
+[Pass] 9. Intentionally fails
+[Pass] 9. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111457
+[Action] 10. Test: parquet_partition.ecl
+[Pass] 10. Pass parquet_partition.ecl - W20240815-111459 (2 sec)
+[Pass] 10. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111459
+[Action] 11. Test: parquet_schema.ecl
+[Pass] 11. Pass parquet_schema.ecl - W20240815-111502 (1 sec)
+[Pass] 11. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111502
+[Action] 12. Test: parquet_size.ecl
+[Pass] 12. Pass parquet_size.ecl - W20240815-111504 (3 sec)
+[Pass] 12. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111504
+[Action] 13. Test: parquet_string.ecl
+[Pass] 13. Pass parquet_string.ecl - W20240815-111507 (1 sec)
+[Pass] 13. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111507
+[Action] 14. Test: parquet_types.ecl
+[Pass] 14. Pass parquet_types.ecl - W20240815-111509 (7 sec)
+[Pass] 14. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111509
+[Action] 15. Test: parquet_write.ecl
+[Pass] 15. Pass parquet_write.ecl - W20240815-111517 (2 sec)
+[Pass] 15. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111517
+[Action]
+ -------------------------------------------------
+ Result:
+ Passing: 15
+ Failure: 0
+ -------------------------------------------------
+ Log: /home/user/HPCCSystems-regression/log/roxie.24-08-15-11-14-29.log
+ -------------------------------------------------
+ Elapsed time: 52 sec (00:00:52)
+ -------------------------------------------------
Project Description
Test Files
Test Suite Overview
Type Testing
Data Type Tests
Arrow Types Supported by the Parquet Plugin
Compression Testing
Parquet Read and Write Operations
Additional Tests
Test Evaluation
Test Suite for the Parquet Plugin
Running the Test Suite
./ecl-test setup
command to initialize these files before running the test suite../ecl-test query --runclass parquet parquet*.ecl
./ecl-test query --runclass parquet <test_file_name>.ecl
+
+example below:
+
+/ecl-test query --target hthor --runclass parquet parquet_schema.ecl
./ecl-test query --target roxie --runclass parquet parquet*.ecl
[Action] Suite: roxie
+[Action] Queries: 15
+[Action] 1. Test: parquet_compress.ecl ( version: compressionType='UNCOMPRESSED' )
+[Pass] 1. Pass parquet_compress.ecl ( version: compressionType='UNCOMPRESSED' ) - W20240815-111429 (4 sec)
+[Pass] 1. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111429
+[Action] 2. Test: parquet_compress.ecl ( version: compressionType='Snappy' )
+[Pass] 2. Pass parquet_compress.ecl ( version: compressionType='Snappy' ) - W20240815-111434 (3 sec)
+[Pass] 2. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111434
+[Action] 3. Test: parquet_compress.ecl ( version: compressionType='GZip' )
+[Pass] 3. Pass parquet_compress.ecl ( version: compressionType='GZip' ) - W20240815-111438 (4 sec)
+[Pass] 3. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111438
+[Action] 4. Test: parquet_compress.ecl ( version: compressionType='Brotli' )
+[Pass] 4. Pass parquet_compress.ecl ( version: compressionType='Brotli' ) - W20240815-111442 (4 sec)
+[Pass] 4. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111442
+[Action] 5. Test: parquet_compress.ecl ( version: compressionType='LZ4' )
+[Pass] 5. Pass parquet_compress.ecl ( version: compressionType='LZ4' ) - W20240815-111447 (3 sec)
+[Pass] 5. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111447
+[Action] 6. Test: parquet_compress.ecl ( version: compressionType='ZSTD' )
+[Pass] 6. Pass parquet_compress.ecl ( version: compressionType='ZSTD' ) - W20240815-111450 (2 sec)
+[Pass] 6. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111450
+[Action] 7. Test: parquet_corrupt.ecl
+[Pass] 7. Pass parquet_corrupt.ecl - W20240815-111453 (2 sec)
+[Pass] 7. Intentionally fails
+[Pass] 7. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111453
+[Action] 8. Test: parquet_empty.ecl
+[Pass] 8. Pass parquet_empty.ecl - W20240815-111455 (2 sec)
+[Pass] 8. Intentionally fails
+[Pass] 8. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111455
+[Action] 9. Test: parquet_overwrite.ecl
+[Pass] 9. Pass parquet_overwrite.ecl - W20240815-111457 (2 sec)
+[Pass] 9. Intentionally fails
+[Pass] 9. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111457
+[Action] 10. Test: parquet_partition.ecl
+[Pass] 10. Pass parquet_partition.ecl - W20240815-111459 (2 sec)
+[Pass] 10. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111459
+[Action] 11. Test: parquet_schema.ecl
+[Pass] 11. Pass parquet_schema.ecl - W20240815-111502 (1 sec)
+[Pass] 11. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111502
+[Action] 12. Test: parquet_size.ecl
+[Pass] 12. Pass parquet_size.ecl - W20240815-111504 (3 sec)
+[Pass] 12. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111504
+[Action] 13. Test: parquet_string.ecl
+[Pass] 13. Pass parquet_string.ecl - W20240815-111507 (1 sec)
+[Pass] 13. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111507
+[Action] 14. Test: parquet_types.ecl
+[Pass] 14. Pass parquet_types.ecl - W20240815-111509 (7 sec)
+[Pass] 14. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111509
+[Action] 15. Test: parquet_write.ecl
+[Pass] 15. Pass parquet_write.ecl - W20240815-111517 (2 sec)
+[Pass] 15. URL http://127.0.0.1:8010/?Widget=WUDetailsWidget&Wuid=W20240815-111517
+[Action]
+ -------------------------------------------------
+ Result:
+ Passing: 15
+ Failure: 0
+ -------------------------------------------------
+ Log: /home/user/HPCCSystems-regression/log/roxie.24-08-15-11-14-29.log
+ -------------------------------------------------
+ Elapsed time: 52 sec (00:00:52)
+ -------------------------------------------------
Project Description
Test Files
Test Suite Overview
Type Testing
Data Type Tests
Arrow Types Supported by the Parquet Plugin
Compression Testing
Parquet Read and Write Operations
Additional Tests
Test Evaluation
esdl Utility
xml Generate XML from ESDL definition.
+ecl Generate ECL from ESDL definition.
+xsd Generate XSD from ESDL definition.
+wsdl Generate WSDL from ESDL definition.
+java Generate Java code from ESDL definition.
+cpp Generate C++ code from ESDL definition.
+monitor Generate ECL code for result monitoring / differencing
+monitor-template Generate a template for use with 'monitor' command
+
publish Publish ESDL Definition for ESP use.
+list-definitions List all ESDL definitions.
+get-definition Get ESDL definition.
+delete Delete ESDL Definition.
+bind-service Configure ESDL based service on target ESP (with existing ESP Binding).
+list-bindings List all ESDL bindings.
+unbind-service Remove ESDL based service binding on target ESP.
+bind-method Configure method associated with existing ESDL binding.
+unbind-method Remove method associated with existing ESDL binding.
+get-binding Get ESDL binding.
+manifest Build a service binding or bundle from a manifest file.
+bind-log-transform Configure log transform associated with existing ESDL binding.
+unbind-log-transform Remove log transform associated with existing ESDL binding.
+
manifest
manifest
command creates an XML configuration file for an ESDL ESP from an input XML manifest file. The type of configuration output depends on the manifest file input and on command-line options.Manifest File
urn:hpcc:esdl:manifest
namespace. Recognized elements of this namespace control the tool while all other markup is copied to the output. The goal of using a manifest file with the tool is to make configuring and deploying services easier:binding
: The output is an ESDL binding that may be published to dali.bundle
: The output is an ESDL bundle file that may be used to launch an ESP in application mode.Example
bundle
manifest file:<em:Manifest xmlns:em="urn:hpcc:esdl:manifest">
+ <em:ServiceBinding esdlservice="WsFoobar" id="WsFoobar_desdl_binding" auth_feature="DEFERRED">
+ <Methods>
+ <em:Scripts>
+ <em:Include file="WsFoobar-request-prep.xml"/>
+ <em:Include file="WsFoobar-logging-prep.xml"/>
+ </em:Scripts>
+ <Method name="FoobarSearch" url="127.0.0.1:8888">
+ <em:Scripts>
+ <em:Include file="FoobarSearch-scripts.xml"/>
+ </em:Scripts>
+ </Method>
+ </Methods>
+ <LoggingManager>
+ <LogAgent transformSource="local" name="main-logging">
+ <LogDataXPath>
+ <LogInfo name="PreparedData" xsl="log-prep">
+ </LogDataXPath>
+ <XSL>
+ <em:Transform name="log-prep">
+ <em:Include file="log-prep.xslt">
+ </em:Transform>
+ </XSL>
+ </LogAgent>
+ </LoggingManager>
+ </em:ServiceBinding>
+ <em:EsdlDefinition>
+ <em:Include file="WsFoobar.ecm"/>
+ </em:EsdlDefinition>
+</em:Manifest>
bundle
or binding
output elements we won't cover that usage here.<em:Manifest>
is the required root element. By default the tool outputs a bundle
, though you may explicitly override that on the command line or by providing an @outputType='binding'
attribute.<em:ServiceBinding>
is valid for both bundle
and binding
output. It is necessary to enable recognition of <em:Scripts>
, and <em:Transform>
elements.<em:EsdlDefinition>
is relevant only for bundle
output. It is necessary to enable element order preservation and recognition of <em:Include>
as a descendant element.<em:Include>
causes external file contents to be inserted in place of the element. The processing of included files is context dependent; the parent of the <em:Include>
element dictates how the file is handled. File inclusion facilitates code reuse in a configuration as code development environment.<em:Scripts>
and <em:Transform>
trigger preservation of element order for all descendent elements and enable <em:Include>
recognition.IPropertyTree
implementation, used when loading an artifact, does not preserve order. Output configuration files must embed order-sensitive content as text as opposed to XML markup. The tool allows configuration authors to create and maintain files as XML markup, which is easy to read. It then automates the conversion of the XML markup into the embedded text required by an ESP.Syntax
Manifest
bundle
output is created by default. It can be made explicit by setting either/or: --output-type
to bundle
@outputType
to bundle
.binding
output is created when either/or: --output-type
is binding
@outputType
is binding
.Attribute Required? Value Description @outputType
N string A hint informing the tool which type of output to generate. The command line option --output-type
may supersede this value to produce a different output.
A bundle
manifest is a superset of a binding
manifest and may logically be used to create either output type. A binding
manifest, as a subset of a bundle
, cannot be used to create a valid bundle
.@xmlns[:prefix]
Y string The manifest namespace urn:hpcc:esdl:manifest
must be declared. The default namespace prefix should not be used unless all other markup is fully qualified.ServiceBinding
Manifest
that creates ESDL binding content and applies ESDL binding-specific logic to descendent content:Binding
output element is always created containing attributes of Manifest
as required. See the sections below for details of the attributes referenced by the tool and how they're output.Binding
named Definition
is created with attributes @id
and @esdlservice
.<em:Scripts>
elements is enabled.<em:Transform>
elements is enabled.<em:ServiceBinding>
element and instead embed a complete Binding
tree in the manifest, it is discouraged because you lose the benefit of the processing described above.<em:ServiceBinding>
element:Standard Attributes
Attribute Required? Value Usage @esdlservice
Yes string - Name of the ESDL service to which the binding is bound. Output on the Binding/Definition
element.
- Also used to generate a value for Definition/@id
for bundle
type output.Definition/@id
is not output by the tool for binding
output as it would need to match the ESDLDefinitionId
passed on command line esdl bind-service
call, which may differ between environments.Service-Specific Attributes
Binding/Definition
element:Attribute Required? Value Usage @auth_feature
No string Used declare authorization settings if they aren't present in the ESDL Definition, or override them if they are. Additional documentation on this attribute is forthcoming. @returnSchemaLocationOnOK
No Boolean When true, a successful SOAP response (non SOAP-Fault) will include the schema location property. False by default. @namespace
No string String specifying the namespace for all methods in the binding. May contain variables that are replaced with their values by the ESP during runtime:
- \${service}
: lowercase service name
- \${esdl-service}
: service name, possibly mixed case
- \${method}
: lowercase method name
- \${esdl-method}
: method name, possibly mixed case
- \${optionals}
: comma-delimited list of all optional URL parameters included in the method request, enclosed in parentheses
- \${version}
: client version numberAuxillary Attributes
esdl bind-service
, and they are present on binding configurations retrieved from the dali. If you set these attributes in the manifest, the values will be overwritten when running esdl bind-service
. Alternately, if you run as an esdl application, these values aren't set by default and don't have a material effect on the binding, but they may appear in the trace log.Attribute Required? Value Usage @created
No string Timestamp of binding creation @espbinding
No string Set to match the id
. Otherwise the value is unset and unused.@espprocess
No string Name of the ESP process this binding is running on @id
No string Runtime name of the binding. When publishing to dali the value is [ESP Process].[port].[ESDL Service]. When not present in the manifest a default value is generated of the form [@esdlservice]_desdl_binding @port
No string Port on the ESP Process listening for connections to the binding. @publishedBy
No string Userid of the person publishing the binding EsdlDefinition
<em:Manifest>
that enables ESDL definition-specific logic in the tool:Definitions
element is output. All content is enclosed in a CDATA section.<em:Include>
element is recognized to import ESDL definitions: Attribute Required? Value Usage N/A Definitions
element hierarchy in the manifest.Include
Attribute Required? Value Usage @file
Yes file path Full or partial path to an external file to be imported. If a partial path is outside of the tool's working directory, the tool's command line must specify the appropriate root directory using either -I
or --include-path
.<em:Include>
operation. Included files are inserted into the output as-is, with the exception of encoding nested CDATA markup. If the included file will be inside a CDATA section on output, then any CDATA end markup in the file will be encoded as ]]]]><![CDATA[>
to prevent nested CDATA sections or a prematurely ending a CDATA section.Scripts
<em:ServiceBinding>
element that processes child elements and creates output expected for an ESDL binding:<em:Scripts>
element is replaced on output with <Scripts>
. Then all content is enclosed in a CDATA section after wrapping it with a new Scripts
element. That new <Scripts>
element contains namespaces declared by the input <em:Scripts>
element. The input <em:Scripts foo="..." xmlns:bar="..."><!-- content --></em:Scripts>
becomes <Scripts><![CDATA[<Scripts xmlns:bar="..."><!-- content --></Scripts>]]></Scripts>
.<em:Include>
element is recognized to import scripts from external files. The entire file, minus leading and trailing whitespace, is imported. Refrain from including files that contain an XML declaration.Transform
<em:ServiceBinding>
element that processes child elements and creates output expected for an ESDL binding:<em:Transform> <!-- content --> </em:Transform>
becomes <Transform><![CDATA[<!-- content -->]]></Transform>
.<em:Include>
element is recognized to import transforms from external files. The entire file, minus leading and trailing whitespace, is imported. Refrain from including files that contain an XML declaration.Usage
Usage:
+
+esdl manifest <manifest-file> [options]
+
+Options:
+ -I | --include-path <path>
+ Search path for external files included in the manifest.
+ Use once for each path.
+ --outfile <filename>
+ Path and name of the output file
+ --output-type <type>
+ When specified this option overrides the value supplied
+ in the manifest attribute Manifest/@outputType.
+ Allowed values are 'binding' or 'bundle'.
+ When not specified in either location the default is
+ 'bundle'
+ --help Display usage information for the given command
+ -v,--verbose Output additional tracing information
+ -tcat,--trace-category <flags>
+ Control which debug messages are output; a case-insensitive
+ comma-delimited combination of:
+ dev: all output for the developer audience
+ admin: all output for the operator audience
+ user: all output for the user audience
+ err: all error output
+ warn: all warning output
+ prog: all progress output
+ info: all info output
+ Errors and warnings are enabled by default if not verbose,
+ and all are enabled when verbose. Use an empty <flags> value
+ to disable all.
+
Output
manifest
command reads the manifest, processes statements in the urn:hpcc:esdl:manifest
namespace and generates an output XML file formatted to the requirements of the ESDL ESP. This includes wrapping included content in CDATA sections to ensure element order is maintained and replacing urn:hpcc:esdl:manifest
elements as required.bundle
and binding
- is shown below. The examples use the sample manifest above as input plus these included files:<es:BackendRequest name="request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="RequestValue" value="'foobar'"/>
+</es:BackendRequest>
<es:PreLogging name="logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="LogValue" value="23"/>
+</es:PreLogging>
<Scripts>
+ <es:BackendRequest name="search-request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption>1">
+ <es:set-value target="HiddenOption" value="true()"/>
+ </es:if>
+ </es:BackendRequest>
+
+ <es:PreLogging name="search-logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption=1">
+ <es:set-value target="ProductPrice" value="10"/>
+ </es:if>
+ </es:PreLogging>
+</Scripts>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
+ <xsl:output method="xml" omit-xml-declaration="yes"/>
+ <xsl:variable name="logContent" select="/UpdateLogRequest/LogContent"/>
+ <xsl:variable name="transactionId" select="$logContent/UserContext/Context/Row/Common/TransactionId"/>
+ <xsl:template match="/">
+ <Result>
+ <Dataset name='special-data'>
+ <Row>
+ <Records>
+ <Rec>
+ <transaction_id><xsl:value-of select="$transactionId"/></transaction_id>
+ <request_data>
+ <xsl:text disable-output-escaping="yes">&lt;![CDATA[COMPRESS('</xsl:text>
+ <xsl:copy-of select="$logContent/UserContent/Context"/>
+ <xsl:text disable-output-escaping="yes">')]]&gt;</xsl:text>
+ </request_data>
+ <request_format>SPECIAL</request_format>
+ <type>23</type>
+ </Rec>
+ </Records>
+ </Row>
+ </Dataset>
+ </Result>
+ </xsl:template>
+</xsl:stylesheet>
ESPrequest FoobarSearchRequest
+{
+ int RequestOption;
+ string RequestName;
+ [optional("hidden")] bool HiddenOption;
+};
+
+ESPresponse FoobarSearchResponse
+{
+ int FoundCount;
+ string FoundAddress;
+};
+
+ESPservice [
+ auth_feature("DEFERRED"),
+ version("1"),
+ default_client_version("1"),
+] WsFoobar
+{
+ ESPmethod FoobarSearch(FoobarSearchRequest, FoobarSearchResponse);
+};
+
Bundle
<EsdlBundle>
+ <Binding id="WsFoobar_desdl_binding">
+ <Definition esdlservice="WsFoobar" id="WsFoobar.1">
+ <Methods>
+ <Scripts>
+ <![CDATA[
+ <Scripts>
+ <es:BackendRequest name="request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="RequestValue" value="'foobar'"/>
+ </es:BackendRequest>
+ <es:PreLogging name="logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="LogValue" value="23"/>
+ </es:PreLogging>
+ </Scripts>
+ ]]>
+ </Scripts>
+ <Method name="FoobarSearch" url="127.0.0.1:8888">
+ <Scripts>
+ <![CDATA[
+ <Scripts>
+ <Scripts>
+ <es:BackendRequest name="search-request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption>1">
+ <es:set-value target="HiddenOption" value="true()"/>
+ </es:if>
+ </es:BackendRequest>
+
+ <es:PreLogging name="search-logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption=1">
+ <es:set-value target="ProductPrice" value="10"/>
+ </es:if>
+ </es:PreLogging>
+ </Scripts>
+ </Scripts>
+ ]]>
+ </Scripts>
+ </Method>
+ </Methods>
+ <LoggingManager>
+ <LogAgent transformSource="local" name="main-logging">
+ <LogDataXPath>
+ <LogInfo name="PreparedData" xsl="log-prep"/>
+ </LogDataXPath>
+ <XSL>
+ <Transform name="log-prep">
+ <![CDATA[
+ <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
+ <xsl:output method="xml" omit-xml-declaration="yes"/>
+ <xsl:variable name="logContent" select="/UpdateLogRequest/LogContent"/>
+ <xsl:variable name="transactionId" select="$logContent/UserContext/Context/Row/Common/TransactionId"/>
+ <xsl:template match="/">
+ <Result>
+ <Dataset name='special-data'>
+ <Row>
+ <Records>
+ <Rec>
+ <transaction_id><xsl:value-of select="$transactionId"/></transaction_id>
+ <request_data>
+ <xsl:text disable-output-escaping="yes">&lt;![CDATA[COMPRESS('</xsl:text>
+ <xsl:copy-of select="$logContent/UserContent/Context"/>
+ <xsl:text disable-output-escaping="yes">')]]&gt;</xsl:text>
+ </request_data>
+ <request_format>SPECIAL</request_format>
+ <type>23</type>
+ </Rec>
+ </Records>
+ </Row>
+ </Dataset>
+ </Result>
+ </xsl:template>
+ </xsl:stylesheet>
+ ]]>
+ </Transform>
+ </XSL>
+ </LogAgent>
+ </LoggingManager>
+ </Definition>
+ </Binding>
+ <Definitions>
+ <![CDATA[
+ <esxdl name="WsFoobar"><EsdlRequest name="FoobarSearchRequest"><EsdlElement type="int" name="RequestOption"/><EsdlElement type="string" name="RequestName"/><EsdlElement optional="hidden" type="bool" name="HiddenOption"/></EsdlRequest>
+ <EsdlResponse name="FoobarSearchResponse"><EsdlElement type="int" name="FoundCount"/><EsdlElement type="string" name="FoundAddress"/></EsdlResponse>
+ <EsdlRequest name="WsFoobarPingRequest"></EsdlRequest>
+ <EsdlResponse name="WsFoobarPingResponse"></EsdlResponse>
+ <EsdlService version="1" auth_feature="DEFERRED" name="WsFoobar" default_client_version="1"><EsdlMethod response_type="FoobarSearchResponse" request_type="FoobarSearchRequest" name="FoobarSearch"/><EsdlMethod response_type="WsFoobarPingResponse" auth_feature="none" request_type="WsFoobarPingRequest" name="Ping"/></EsdlService>
+ </esxdl>
+ ]]>
+ </Definitions>
+ </EsdlBundle>
Binding
<Binding id="WsFoobar_desdl_binding">
+ <Definition esdlservice="WsFoobar" id="WsFoobar.1">
+ <Methods>
+ <Scripts>
+ <![CDATA[
+ <Scripts>
+ <es:BackendRequest name="request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="RequestValue" value="'foobar'"/>
+ </es:BackendRequest>
+ <es:PreLogging name="logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="LogValue" value="23"/>
+ </es:PreLogging>
+ </Scripts>
+ ]]>
+ </Scripts>
+ <Method name="FoobarSearch" url="127.0.0.1:8888">
+ <Scripts>
+ <![CDATA[
+ <Scripts>
+ <Scripts>
+ <es:BackendRequest name="search-request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption>1">
+ <es:set-value target="HiddenOption" value="true()"/>
+ </es:if>
+ </es:BackendRequest>
+
+ <es:PreLogging name="search-logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption=1">
+ <es:set-value target="ProductPrice" value="10"/>
+ </es:if>
+ </es:PreLogging>
+ </Scripts>
+ </Scripts>
+ ]]>
+ </Scripts>
+ </Method>
+ </Methods>
+ <LoggingManager>
+ <LogAgent transformSource="local" name="main-logging">
+ <LogDataXPath>
+ <LogInfo name="PreparedData" xsl="log-prep"/>
+ </LogDataXPath>
+ <XSL>
+ <Transform name="log-prep">
+ <![CDATA[
+ <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
+ <xsl:output method="xml" omit-xml-declaration="yes"/>
+ <xsl:variable name="logContent" select="/UpdateLogRequest/LogContent"/>
+ <xsl:variable name="transactionId" select="$logContent/UserContext/Context/Row/Common/TransactionId"/>
+ <xsl:template match="/">
+ <Result>
+ <Dataset name='special-data'>
+ <Row>
+ <Records>
+ <Rec>
+ <transaction_id><xsl:value-of select="$transactionId"/></transaction_id>
+ <request_data>
+ <xsl:text disable-output-escaping="yes">&lt;![CDATA[COMPRESS('</xsl:text>
+ <xsl:copy-of select="$logContent/UserContent/Context"/>
+ <xsl:text disable-output-escaping="yes">')]]&gt;</xsl:text>
+ </request_data>
+ <request_format>SPECIAL</request_format>
+ <type>23</type>
+ </Rec>
+ </Records>
+ </Row>
+ </Dataset>
+ </Result>
+ </xsl:template>
+ </xsl:stylesheet>
+ ]]>
+ </Transform>
+ </XSL>
+ </LogAgent>
+ </LoggingManager>
+ </Definition>
+</Binding>
esdl Utility
xml Generate XML from ESDL definition.
+ecl Generate ECL from ESDL definition.
+xsd Generate XSD from ESDL definition.
+wsdl Generate WSDL from ESDL definition.
+java Generate Java code from ESDL definition.
+cpp Generate C++ code from ESDL definition.
+monitor Generate ECL code for result monitoring / differencing
+monitor-template Generate a template for use with 'monitor' command
+
publish Publish ESDL Definition for ESP use.
+list-definitions List all ESDL definitions.
+get-definition Get ESDL definition.
+delete Delete ESDL Definition.
+bind-service Configure ESDL based service on target ESP (with existing ESP Binding).
+list-bindings List all ESDL bindings.
+unbind-service Remove ESDL based service binding on target ESP.
+bind-method Configure method associated with existing ESDL binding.
+unbind-method Remove method associated with existing ESDL binding.
+get-binding Get ESDL binding.
+manifest Build a service binding or bundle from a manifest file.
+bind-log-transform Configure log transform associated with existing ESDL binding.
+unbind-log-transform Remove log transform associated with existing ESDL binding.
+
manifest
manifest
command creates an XML configuration file for an ESDL ESP from an input XML manifest file. The type of configuration output depends on the manifest file input and on command-line options.Manifest File
urn:hpcc:esdl:manifest
namespace. Recognized elements of this namespace control the tool while all other markup is copied to the output. The goal of using a manifest file with the tool is to make configuring and deploying services easier:binding
: The output is an ESDL binding that may be published to dali.bundle
: The output is an ESDL bundle file that may be used to launch an ESP in application mode.Example
bundle
manifest file:<em:Manifest xmlns:em="urn:hpcc:esdl:manifest">
+ <em:ServiceBinding esdlservice="WsFoobar" id="WsFoobar_desdl_binding" auth_feature="DEFERRED">
+ <Methods>
+ <em:Scripts>
+ <em:Include file="WsFoobar-request-prep.xml"/>
+ <em:Include file="WsFoobar-logging-prep.xml"/>
+ </em:Scripts>
+ <Method name="FoobarSearch" url="127.0.0.1:8888">
+ <em:Scripts>
+ <em:Include file="FoobarSearch-scripts.xml"/>
+ </em:Scripts>
+ </Method>
+ </Methods>
+ <LoggingManager>
+ <LogAgent transformSource="local" name="main-logging">
+ <LogDataXPath>
+ <LogInfo name="PreparedData" xsl="log-prep">
+ </LogDataXPath>
+ <XSL>
+ <em:Transform name="log-prep">
+ <em:Include file="log-prep.xslt">
+ </em:Transform>
+ </XSL>
+ </LogAgent>
+ </LoggingManager>
+ </em:ServiceBinding>
+ <em:EsdlDefinition>
+ <em:Include file="WsFoobar.ecm"/>
+ </em:EsdlDefinition>
+</em:Manifest>
bundle
or binding
output elements we won't cover that usage here.<em:Manifest>
is the required root element. By default the tool outputs a bundle
, though you may explicitly override that on the command line or by providing an @outputType='binding'
attribute.<em:ServiceBinding>
is valid for both bundle
and binding
output. It is necessary to enable recognition of <em:Scripts>
, and <em:Transform>
elements.<em:EsdlDefinition>
is relevant only for bundle
output. It is necessary to enable element order preservation and recognition of <em:Include>
as a descendant element.<em:Include>
causes external file contents to be inserted in place of the element. The processing of included files is context dependent; the parent of the <em:Include>
element dictates how the file is handled. File inclusion facilitates code reuse in a configuration as code development environment.<em:Scripts>
and <em:Transform>
trigger preservation of element order for all descendent elements and enable <em:Include>
recognition.IPropertyTree
implementation, used when loading an artifact, does not preserve order. Output configuration files must embed order-sensitive content as text as opposed to XML markup. The tool allows configuration authors to create and maintain files as XML markup, which is easy to read. It then automates the conversion of the XML markup into the embedded text required by an ESP.Syntax
Manifest
bundle
output is created by default. It can be made explicit by setting either/or: --output-type
to bundle
@outputType
to bundle
.binding
output is created when either/or: --output-type
is binding
@outputType
is binding
.Attribute Required? Value Description @outputType
N string A hint informing the tool which type of output to generate. The command line option --output-type
may supersede this value to produce a different output.
A bundle
manifest is a superset of a binding
manifest and may logically be used to create either output type. A binding
manifest, as a subset of a bundle
, cannot be used to create a valid bundle
.@xmlns[:prefix]
Y string The manifest namespace urn:hpcc:esdl:manifest
must be declared. The default namespace prefix should not be used unless all other markup is fully qualified.ServiceBinding
Manifest
that creates ESDL binding content and applies ESDL binding-specific logic to descendent content:Binding
output element is always created containing attributes of Manifest
as required. See the sections below for details of the attributes referenced by the tool and how they're output.Binding
named Definition
is created with attributes @id
and @esdlservice
.<em:Scripts>
elements is enabled.<em:Transform>
elements is enabled.<em:ServiceBinding>
element and instead embed a complete Binding
tree in the manifest, it is discouraged because you lose the benefit of the processing described above.<em:ServiceBinding>
element:Standard Attributes
Attribute Required? Value Usage @esdlservice
Yes string - Name of the ESDL service to which the binding is bound. Output on the Binding/Definition
element.
- Also used to generate a value for Definition/@id
for bundle
type output.Definition/@id
is not output by the tool for binding
output as it would need to match the ESDLDefinitionId
passed on command line esdl bind-service
call, which may differ between environments.Service-Specific Attributes
Binding/Definition
element:Attribute Required? Value Usage @auth_feature
No string Used declare authorization settings if they aren't present in the ESDL Definition, or override them if they are. Additional documentation on this attribute is forthcoming. @returnSchemaLocationOnOK
No Boolean When true, a successful SOAP response (non SOAP-Fault) will include the schema location property. False by default. @namespace
No string String specifying the namespace for all methods in the binding. May contain variables that are replaced with their values by the ESP during runtime:
- \${service}
: lowercase service name
- \${esdl-service}
: service name, possibly mixed case
- \${method}
: lowercase method name
- \${esdl-method}
: method name, possibly mixed case
- \${optionals}
: comma-delimited list of all optional URL parameters included in the method request, enclosed in parentheses
- \${version}
: client version numberAuxillary Attributes
esdl bind-service
, and they are present on binding configurations retrieved from the dali. If you set these attributes in the manifest, the values will be overwritten when running esdl bind-service
. Alternately, if you run as an esdl application, these values aren't set by default and don't have a material effect on the binding, but they may appear in the trace log.Attribute Required? Value Usage @created
No string Timestamp of binding creation @espbinding
No string Set to match the id
. Otherwise the value is unset and unused.@espprocess
No string Name of the ESP process this binding is running on @id
No string Runtime name of the binding. When publishing to dali the value is [ESP Process].[port].[ESDL Service]. When not present in the manifest a default value is generated of the form [@esdlservice]_desdl_binding @port
No string Port on the ESP Process listening for connections to the binding. @publishedBy
No string Userid of the person publishing the binding EsdlDefinition
<em:Manifest>
that enables ESDL definition-specific logic in the tool:Definitions
element is output. All content is enclosed in a CDATA section.<em:Include>
element is recognized to import ESDL definitions: Attribute Required? Value Usage N/A Definitions
element hierarchy in the manifest.Include
Attribute Required? Value Usage @file
Yes file path Full or partial path to an external file to be imported. If a partial path is outside of the tool's working directory, the tool's command line must specify the appropriate root directory using either -I
or --include-path
.<em:Include>
operation. Included files are inserted into the output as-is, with the exception of encoding nested CDATA markup. If the included file will be inside a CDATA section on output, then any CDATA end markup in the file will be encoded as ]]]]><![CDATA[>
to prevent nested CDATA sections or a prematurely ending a CDATA section.Scripts
<em:ServiceBinding>
element that processes child elements and creates output expected for an ESDL binding:<em:Scripts>
element is replaced on output with <Scripts>
. Then all content is enclosed in a CDATA section after wrapping it with a new Scripts
element. That new <Scripts>
element contains namespaces declared by the input <em:Scripts>
element. The input <em:Scripts foo="..." xmlns:bar="..."><!-- content --></em:Scripts>
becomes <Scripts><![CDATA[<Scripts xmlns:bar="..."><!-- content --></Scripts>]]></Scripts>
.<em:Include>
element is recognized to import scripts from external files. The entire file, minus leading and trailing whitespace, is imported. Refrain from including files that contain an XML declaration.Transform
<em:ServiceBinding>
element that processes child elements and creates output expected for an ESDL binding:<em:Transform> <!-- content --> </em:Transform>
becomes <Transform><![CDATA[<!-- content -->]]></Transform>
.<em:Include>
element is recognized to import transforms from external files. The entire file, minus leading and trailing whitespace, is imported. Refrain from including files that contain an XML declaration.Usage
Usage:
+
+esdl manifest <manifest-file> [options]
+
+Options:
+ -I | --include-path <path>
+ Search path for external files included in the manifest.
+ Use once for each path.
+ --outfile <filename>
+ Path and name of the output file
+ --output-type <type>
+ When specified this option overrides the value supplied
+ in the manifest attribute Manifest/@outputType.
+ Allowed values are 'binding' or 'bundle'.
+ When not specified in either location the default is
+ 'bundle'
+ --help Display usage information for the given command
+ -v,--verbose Output additional tracing information
+ -tcat,--trace-category <flags>
+ Control which debug messages are output; a case-insensitive
+ comma-delimited combination of:
+ dev: all output for the developer audience
+ admin: all output for the operator audience
+ user: all output for the user audience
+ err: all error output
+ warn: all warning output
+ prog: all progress output
+ info: all info output
+ Errors and warnings are enabled by default if not verbose,
+ and all are enabled when verbose. Use an empty <flags> value
+ to disable all.
+
Output
manifest
command reads the manifest, processes statements in the urn:hpcc:esdl:manifest
namespace and generates an output XML file formatted to the requirements of the ESDL ESP. This includes wrapping included content in CDATA sections to ensure element order is maintained and replacing urn:hpcc:esdl:manifest
elements as required.bundle
and binding
- is shown below. The examples use the sample manifest above as input plus these included files:<es:BackendRequest name="request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="RequestValue" value="'foobar'"/>
+</es:BackendRequest>
<es:PreLogging name="logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="LogValue" value="23"/>
+</es:PreLogging>
<Scripts>
+ <es:BackendRequest name="search-request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption>1">
+ <es:set-value target="HiddenOption" value="true()"/>
+ </es:if>
+ </es:BackendRequest>
+
+ <es:PreLogging name="search-logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption=1">
+ <es:set-value target="ProductPrice" value="10"/>
+ </es:if>
+ </es:PreLogging>
+</Scripts>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
+ <xsl:output method="xml" omit-xml-declaration="yes"/>
+ <xsl:variable name="logContent" select="/UpdateLogRequest/LogContent"/>
+ <xsl:variable name="transactionId" select="$logContent/UserContext/Context/Row/Common/TransactionId"/>
+ <xsl:template match="/">
+ <Result>
+ <Dataset name='special-data'>
+ <Row>
+ <Records>
+ <Rec>
+ <transaction_id><xsl:value-of select="$transactionId"/></transaction_id>
+ <request_data>
+ <xsl:text disable-output-escaping="yes">&lt;![CDATA[COMPRESS('</xsl:text>
+ <xsl:copy-of select="$logContent/UserContent/Context"/>
+ <xsl:text disable-output-escaping="yes">')]]&gt;</xsl:text>
+ </request_data>
+ <request_format>SPECIAL</request_format>
+ <type>23</type>
+ </Rec>
+ </Records>
+ </Row>
+ </Dataset>
+ </Result>
+ </xsl:template>
+</xsl:stylesheet>
ESPrequest FoobarSearchRequest
+{
+ int RequestOption;
+ string RequestName;
+ [optional("hidden")] bool HiddenOption;
+};
+
+ESPresponse FoobarSearchResponse
+{
+ int FoundCount;
+ string FoundAddress;
+};
+
+ESPservice [
+ auth_feature("DEFERRED"),
+ version("1"),
+ default_client_version("1"),
+] WsFoobar
+{
+ ESPmethod FoobarSearch(FoobarSearchRequest, FoobarSearchResponse);
+};
+
Bundle
<EsdlBundle>
+ <Binding id="WsFoobar_desdl_binding">
+ <Definition esdlservice="WsFoobar" id="WsFoobar.1">
+ <Methods>
+ <Scripts>
+ <![CDATA[
+ <Scripts>
+ <es:BackendRequest name="request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="RequestValue" value="'foobar'"/>
+ </es:BackendRequest>
+ <es:PreLogging name="logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="LogValue" value="23"/>
+ </es:PreLogging>
+ </Scripts>
+ ]]>
+ </Scripts>
+ <Method name="FoobarSearch" url="127.0.0.1:8888">
+ <Scripts>
+ <![CDATA[
+ <Scripts>
+ <Scripts>
+ <es:BackendRequest name="search-request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption>1">
+ <es:set-value target="HiddenOption" value="true()"/>
+ </es:if>
+ </es:BackendRequest>
+
+ <es:PreLogging name="search-logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption=1">
+ <es:set-value target="ProductPrice" value="10"/>
+ </es:if>
+ </es:PreLogging>
+ </Scripts>
+ </Scripts>
+ ]]>
+ </Scripts>
+ </Method>
+ </Methods>
+ <LoggingManager>
+ <LogAgent transformSource="local" name="main-logging">
+ <LogDataXPath>
+ <LogInfo name="PreparedData" xsl="log-prep"/>
+ </LogDataXPath>
+ <XSL>
+ <Transform name="log-prep">
+ <![CDATA[
+ <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
+ <xsl:output method="xml" omit-xml-declaration="yes"/>
+ <xsl:variable name="logContent" select="/UpdateLogRequest/LogContent"/>
+ <xsl:variable name="transactionId" select="$logContent/UserContext/Context/Row/Common/TransactionId"/>
+ <xsl:template match="/">
+ <Result>
+ <Dataset name='special-data'>
+ <Row>
+ <Records>
+ <Rec>
+ <transaction_id><xsl:value-of select="$transactionId"/></transaction_id>
+ <request_data>
+ <xsl:text disable-output-escaping="yes">&lt;![CDATA[COMPRESS('</xsl:text>
+ <xsl:copy-of select="$logContent/UserContent/Context"/>
+ <xsl:text disable-output-escaping="yes">')]]&gt;</xsl:text>
+ </request_data>
+ <request_format>SPECIAL</request_format>
+ <type>23</type>
+ </Rec>
+ </Records>
+ </Row>
+ </Dataset>
+ </Result>
+ </xsl:template>
+ </xsl:stylesheet>
+ ]]>
+ </Transform>
+ </XSL>
+ </LogAgent>
+ </LoggingManager>
+ </Definition>
+ </Binding>
+ <Definitions>
+ <![CDATA[
+ <esxdl name="WsFoobar"><EsdlRequest name="FoobarSearchRequest"><EsdlElement type="int" name="RequestOption"/><EsdlElement type="string" name="RequestName"/><EsdlElement optional="hidden" type="bool" name="HiddenOption"/></EsdlRequest>
+ <EsdlResponse name="FoobarSearchResponse"><EsdlElement type="int" name="FoundCount"/><EsdlElement type="string" name="FoundAddress"/></EsdlResponse>
+ <EsdlRequest name="WsFoobarPingRequest"></EsdlRequest>
+ <EsdlResponse name="WsFoobarPingResponse"></EsdlResponse>
+ <EsdlService version="1" auth_feature="DEFERRED" name="WsFoobar" default_client_version="1"><EsdlMethod response_type="FoobarSearchResponse" request_type="FoobarSearchRequest" name="FoobarSearch"/><EsdlMethod response_type="WsFoobarPingResponse" auth_feature="none" request_type="WsFoobarPingRequest" name="Ping"/></EsdlService>
+ </esxdl>
+ ]]>
+ </Definitions>
+ </EsdlBundle>
Binding
<Binding id="WsFoobar_desdl_binding">
+ <Definition esdlservice="WsFoobar" id="WsFoobar.1">
+ <Methods>
+ <Scripts>
+ <![CDATA[
+ <Scripts>
+ <es:BackendRequest name="request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="RequestValue" value="'foobar'"/>
+ </es:BackendRequest>
+ <es:PreLogging name="logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:set-value target="LogValue" value="23"/>
+ </es:PreLogging>
+ </Scripts>
+ ]]>
+ </Scripts>
+ <Method name="FoobarSearch" url="127.0.0.1:8888">
+ <Scripts>
+ <![CDATA[
+ <Scripts>
+ <Scripts>
+ <es:BackendRequest name="search-request-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption>1">
+ <es:set-value target="HiddenOption" value="true()"/>
+ </es:if>
+ </es:BackendRequest>
+
+ <es:PreLogging name="search-logging-prep" target="soap:Body/{$query}" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:es="urn:hpcc:esdl:script">
+ <es:if test="RequestOption=1">
+ <es:set-value target="ProductPrice" value="10"/>
+ </es:if>
+ </es:PreLogging>
+ </Scripts>
+ </Scripts>
+ ]]>
+ </Scripts>
+ </Method>
+ </Methods>
+ <LoggingManager>
+ <LogAgent transformSource="local" name="main-logging">
+ <LogDataXPath>
+ <LogInfo name="PreparedData" xsl="log-prep"/>
+ </LogDataXPath>
+ <XSL>
+ <Transform name="log-prep">
+ <![CDATA[
+ <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
+ <xsl:output method="xml" omit-xml-declaration="yes"/>
+ <xsl:variable name="logContent" select="/UpdateLogRequest/LogContent"/>
+ <xsl:variable name="transactionId" select="$logContent/UserContext/Context/Row/Common/TransactionId"/>
+ <xsl:template match="/">
+ <Result>
+ <Dataset name='special-data'>
+ <Row>
+ <Records>
+ <Rec>
+ <transaction_id><xsl:value-of select="$transactionId"/></transaction_id>
+ <request_data>
+ <xsl:text disable-output-escaping="yes">&lt;![CDATA[COMPRESS('</xsl:text>
+ <xsl:copy-of select="$logContent/UserContent/Context"/>
+ <xsl:text disable-output-escaping="yes">')]]&gt;</xsl:text>
+ </request_data>
+ <request_format>SPECIAL</request_format>
+ <type>23</type>
+ </Rec>
+ </Records>
+ </Row>
+ </Dataset>
+ </Result>
+ </xsl:template>
+ </xsl:stylesheet>
+ ]]>
+ </Transform>
+ </XSL>
+ </LogAgent>
+ </LoggingManager>
+ </Definition>
+</Binding>
Developer README for ESP API Command Line Tool
Overview
describe
, test
, list-services
, and list-methods
.describe
: Allows users to explore available services, methods, and their request-response structures.test
: Enables sending requests in various formats (XML, JSON, or form query strings) to the ESP services.list-services
: Invoked by the auto-complete script, this command provides a list of names of all ESP services.list-methods
: Invoked by the auto-complete script, this command provides a list of names of all methods within an ESP service.Usage Notes
test
command, if the server and port are not specified, the tool defaults to interacting with an ESP instance located at http://127.0.0.1:8010
.Ideas for Expansion
',8)]))}const u=o(a,[["render",i]]);export{h as __pageData,u as default};
diff --git a/assets/tools_esp-api_README.md.WcTRZBDR.lean.js b/assets/tools_esp-api_README.md.WcTRZBDR.lean.js
new file mode 100644
index 00000000000..5cc0960cf54
--- /dev/null
+++ b/assets/tools_esp-api_README.md.WcTRZBDR.lean.js
@@ -0,0 +1 @@
+import{_ as o,c as t,a3 as s,o as r}from"./chunks/framework.DkhCEVKm.js";const h=JSON.parse('{"title":"Developer README for ESP API Command Line Tool","description":"","frontmatter":{},"headers":[],"relativePath":"tools/esp-api/README.md","filePath":"tools/esp-api/README.md","lastUpdated":1731340314000}'),a={name:"tools/esp-api/README.md"};function i(n,e,l,d,p,c){return r(),t("div",null,e[0]||(e[0]=[s('Developer README for ESP API Command Line Tool
Overview
describe
, test
, list-services
, and list-methods
.describe
: Allows users to explore available services, methods, and their request-response structures.test
: Enables sending requests in various formats (XML, JSON, or form query strings) to the ESP services.list-services
: Invoked by the auto-complete script, this command provides a list of names of all ESP services.list-methods
: Invoked by the auto-complete script, this command provides a list of names of all methods within an ESP service.Usage Notes
test
command, if the server and port are not specified, the tool defaults to interacting with an ESP instance located at http://127.0.0.1:8010
.Ideas for Expansion
',8)]))}const u=o(a,[["render",i]]);export{h as __pageData,u as default};
diff --git a/assets/tools_tagging_README.md.CdzS0CPr.js b/assets/tools_tagging_README.md.CdzS0CPr.js
new file mode 100644
index 00000000000..75904aa1764
--- /dev/null
+++ b/assets/tools_tagging_README.md.CdzS0CPr.js
@@ -0,0 +1,13 @@
+import{_ as a,c as s,a3 as n,o as i}from"./chunks/framework.DkhCEVKm.js";const d=JSON.parse('{"title":"Tagging new versions","description":"","frontmatter":{},"headers":[],"relativePath":"tools/tagging/README.md","filePath":"tools/tagging/README.md","lastUpdated":1731340314000}'),t={name:"tools/tagging/README.md"};function o(l,e,p,r,c,g){return i(),s("div",null,e[0]||(e[0]=[n(`Tagging new versions
General
. env.sh
Pre-requisites
git clone git@github.com:hpcc-systems/eclide.git
+git clone git@github.com:hpcc-systems/hpcc4j.git HPCC-JAPIs
+git clone git@github.com:hpcc-systems/Spark-HPCC.git
+git clone git@github.com:hpcc-systems/LN.git ln
+git clone git@github.com:hpcc-systems/HPCC-Platform.git hpcc
+git clone git@github.com:hpcc-systems/helm-chart.git
git clone git@github.com:hpcc-systems/nagios-monitoring.git
+git clone git@github.com:hpcc-systems/ganglia-monitoring.git
Tagging new versions
all
environment variable to a subset of the projects (e.g. export all=hpcc
) if there are no changes in the other repositories. The only effect for projects that are upmerged with no changes will be that they gain an empty merge transaction. If multiple people are merging PRs to different repositories it may be safer to upmerge all projects../upmerge A.a.x candidate-A.b.x
+./upmerge A.b.x candidate-A.c.x
+./upmerge A.b.x candidate-B.0.x
+./upmerge B.0.x master
./gorc.sh A.a.x
+./gorc.sh A.b.x
+./gorc.sh A.c.x
Taking a build gold:
./gogold.sh 7.8.76
+./gogold.sh 7.10.50
Creating a new rc for an existing point release:
./gorc.sh A.a.<n>
Create a new minor/major branch
./gominor.sh
Tagging new versions
General
. env.sh
Pre-requisites
git clone git@github.com:hpcc-systems/eclide.git
+git clone git@github.com:hpcc-systems/hpcc4j.git HPCC-JAPIs
+git clone git@github.com:hpcc-systems/Spark-HPCC.git
+git clone git@github.com:hpcc-systems/LN.git ln
+git clone git@github.com:hpcc-systems/HPCC-Platform.git hpcc
+git clone git@github.com:hpcc-systems/helm-chart.git
git clone git@github.com:hpcc-systems/nagios-monitoring.git
+git clone git@github.com:hpcc-systems/ganglia-monitoring.git
Tagging new versions
all
environment variable to a subset of the projects (e.g. export all=hpcc
) if there are no changes in the other repositories. The only effect for projects that are upmerged with no changes will be that they gain an empty merge transaction. If multiple people are merging PRs to different repositories it may be safer to upmerge all projects../upmerge A.a.x candidate-A.b.x
+./upmerge A.b.x candidate-A.c.x
+./upmerge A.b.x candidate-B.0.x
+./upmerge B.0.x master
./gorc.sh A.a.x
+./gorc.sh A.b.x
+./gorc.sh A.c.x
Taking a build gold:
./gogold.sh 7.8.76
+./gogold.sh 7.10.50
Creating a new rc for an existing point release:
./gorc.sh A.a.<n>
Create a new minor/major branch
./gominor.sh
HPCC-Platform