PadoGrid | Catalogs | Manual | FAQ | Releases | Templates | Pods | Kubernetes | Docker | Apps | Quick Start
The dbsched
bundle is preconfigured with the Pado scheduler to periodically execute jobs that dump database tables to CSV files from which it automatically extracts column information to generate the corresponding VersionedPortable
classes. It then transforms the CSV records to objects using the generated classes before ingesting them into Hazelcast.
install_bundle -download bundle-hazelcast-3n4n5-app-pado_dbsched-perf_test_dbsched-docker-mysql
In this use case, to control the system load, the IT department has put restrictions on applications from accessing the database in real time, especially during the business hours. Only select applications are allowed to read the database in a batch mode. Writes are not permitted. Our task is to download the data from the database and feed it to Hazelcast as follows:
- Dump table contents to CSV files
- Create
VersionedPortable
classes based on CSV file contents - Transform CSV records to objects
- Ingest the transformed data into Hazelcast
- Automate and schedule jobs to periodically repeat the above steps
apps
├── pado_dbsched
└── perf_test_dbsched
docker
└── mysql
Let's create a new Hazelcast cluster. Make sure to name the cluster dbsched
. The scheduler has been preconfigured with the dbsched
cluster. See Step 3 in Configuring and Running Pado Scheduler.
# Create dbsched
create_cluster -product hazelcast -cluster dbsched
# Switch to the dbsched cluster so that we are in the Hazelcast cluster context
switch_cluster dbsched
This bundle includes the Pado scheduler to simplify the ETL process. The Pado scheduler offers the following:
- Schedule jobs that execute SQL queries and dump the results to CSV files.
- Automatically generate Hazelcast
VersionedPortable
classes based on CSV file contents. - Automatically generate schema (metadata) files for ingesting
VersionedPortable
objects into Hazelcast maps. - Transform CSV file contents to
VersionedPortable
objects. - Ingest
VersionedPortable
objects into Hazelcast.
We could also use the popular schedulers like Spring Cloud Data Flow and Apache Nifi but we would need to manually perform data transformation.
Open MySQL Workbench and create the nw
schema. When you run the test_group script (see below), the customers
and orders
tables will automatically be created by Hibernate.
If you prefer to use Docker to start MySQL, start the MySQL container as follows.
cd_docker mysql
docker compose up -d
MySQL root account is setup as follows:
Parameter | Value |
---|---|
Adminer URL | http://localhost:8080 |
MySQL User | root |
MySQL Password | rootpw |
MySQL Port | 3306 |
Create the nw
database using Adminer.
- Login to MySQL from Adminer URL
- Select SQL command from Adminer
- Execute the following:
create database nw;
We need to download the MySQL binary files by building perf_test_dbsched
as follows.
cd_app perf_test_dbsched/bin_sh
./build_app
The perf_test_dbsched
app has been preconfigured to connect to MySQL on localhost with the user name root
and the password rootpw
. You can change the user name and password in etc/hibernate.cfg-mysql.xml
.
We will be using the perf_test
app to load data directly into the database tables. After the database has been loaded with data, we will execute the use case by first dumping database tables to CSV files.
# Change database user name and password in the Hibernate config file.
cd_app perf_test_dbsched
vi etc/hibernate.cfg-mysql.xml
Let's load mock data into the nw.customers
and nw.oders
tables by executing the test_group -db
command. The -db
option directly loads data into the database instead of Hazelcast.
cd_app perf_test_dbsched/bin_sh
./test_group -db -run -prop ../etc/group-factory.properties
cd_app pado_dbsched/bin_sh
./build_app
- Encrypt the database password. Copy the encrypted password.
cd_app pado_dbsched
cd pado_<version>/bin_sh/tools
./encryptor
- Copy the included scheduler job to the scheduler's data directory.
cd_app pado_dbsched
cp -r scheduler pado_<version>/data/
- Edit the job file and enter the user name and encrypted password.
cd pado_<version>/data/scehduler/etc
vi mysql.json
The mysql.json
file contents are shown below. It configures the Pado scheduler to import the nw.customers
table once a day at midnight and the nw.orders
table every hour.
Note that the GridId
attribute must be set to the Hazelcast cluster name. In our case, it is set to dbsched
.
{
"Driver": "com.mysql.cj.jdbc.Driver",
"Url": "jdbc:mysql://localhost:3306/nw?allowPublicKeyRetrieval=true&serverTimezone=America/New_York",
"User": "root",
"Password": "",
"Delimiter": ",",
"Null": "'\\N'",
"GridId": "dbsched",
"Paths": [
{
"Path": "nw/customers",
"Columns": "customerId, address, city, companyName, contactName, contactTitle, country, fax, phone, postalCode, region",
"Query": "select * from nw.customers",
"Day": "Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday", "Time": "00:00:00"
},
{
"Path": "nw/orders",
"Columns": "orderId, customerId, employeeId, freight, orderDate, requiredDate, shipAddress, shipCity, shipCountry, shipName, shipPostalCode, shipRegion, shipVia, shippedDate",
"Query": "select * from nw.orders",
"Day": "Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday", "Time": "00:00:00, 01:00:00, 02:00:00, 03:00:00, 04:00:00, 05:00:00, 06:00:00, 07:00:00, 08:00:00, 09:00:00, 10:00:00, 11:00:00, 12:00:00, 13:00:00, 14:00:00, 15:00:00, 16:00:00, 17:00:00, 18:00:00, 19:00:00, 20:00:00, 21:00:00, 22:00:00, 23:00:00"
}
]
}
❗️ Note that serverTimezone
is set to America/New_York
for the JDBC URL. Without it, you may see the following exception if your MySQL uses the system timezone and unable to calculate the dates due to the leap year.
com.mysql.cj.exceptions.WrongArgumentException: HOUR_OF_DAY: 2 -> 3
- We need to first execute the
mysql.json
job without scheduling it to generate the schema files. Run the following command to download data into CSV files in the defaultdata/scheduler/import
directory.
cd_app pado_dbsched
cd pado_<version>/bin_sh/hazelcast
./import_scheduler -now
- Generate schema files using the downloaded data files.
./generate_schema -schemaDir data/scheduler/schema -dataDir data/scheduler/import -package org.hazelcast.data.demo.nw
- Generate the corresponding
VersionedPortable
source code in the defaultsrc/generated
directory.
./generate_versioned_portable -schemaDir data/scheduler/schema -fid 20000 -cid 20000
- Add the generated factory class information in the cluster's configuration file.
switch_cluster dbsched
vi etc/hazelcast.xml
Add the following towards the bottom of the etc/hazelcast.xml
file:
<serialization>
...
<portable-factory factory-id="20000">
org.hazelcast.data.demo.nw.PortableFactoryImpl
</portable-factory>
</serialization>
- Compile the generated code and deploy the generated jar file to the workspace
plugins
directory so that it will be included in the cluster class path.
cd_app pado_dbsched
cd pado_<version>/bin_sh/hazelcast
./compile_generated_code
cp ../../dropins/generated.jar $PADOGRID_WORKSPACE/plugins/
- Start Hazelcast cluster.
# Stat cluster
start_cluster
- Import the downloaded data into the cluster.
cd_app pado_dbsched
cd pado_<version>/bin_sh/hazelcast
./import_scheduler -import
- Execute the
read_cache
command to verify the data in the Hazelcast cluster.
cd_app perf_test_dbsched
./read_cache nw/customers
./read_cache nw/orders
- Execute the following SQL statements in your database to verify the data.
select * from nw.customers;
select * from nw.orders;
- Once you are satisfied with the results, you can schedule the job by executing the following.
cd_app pado_dbsched
cd pado_<version>/bin_sh/hazelcast
./import_scheduler -sched
Output:
./import_scheduler started. Please see the following log files for status:
/Users/dpark/Padogrid/workspaces/rwe-bundles/bundle-hazelcast-3n4n5-app-pado_dbsched-perf_test_dbsched-docker-mysql/apps/pado_dbsched/pado_0.5.0-B1-SNAPSHOT/log/./import_scheduler.log
- Check the scheduler's log file to verify the scheduled jobs.
cat /Users/dpark/Padogrid/workspaces/rwe-bundles/bundle-hazelcast-3n4n5-app-pado_dbsched-perf_test_dbsched-docker-mysql/apps/pado_dbsched/pado_0.5.0-B1-SNAPSHOT/log/./import_scheduler.log
Output:
[info 2024/03/19 11:41:12.809 EDT <main main> tid=1] ImportScheduler started. All import results are logged in this file.
isSched=true
isNow=false
isImport=false
[config 2024/03/19 11:41:13.079 EDT <main main> tid=1] com.netcrest.pado.tools.db.DbManager.logScheduledList(DbManager.java:394): DB import tasks scheduled:
1. SUNDAY 00:00:00 dbsched//nw/customers [select * from nw.customers]
2. SUNDAY 00:00:00 dbsched//nw/orders [select * from nw.orders]
3. SUNDAY 01:00:00 dbsched//nw/orders [select * from nw.orders]
4. SUNDAY 02:00:00 dbsched//nw/orders [select * from nw.orders]
...
174. SATURDAY 22:00:00 dbsched//nw/orders [select * from nw.orders]
175. SATURDAY 23:00:00 dbsched//nw/orders [select * from nw.orders]
As you can see from the above log output, the nw.customers
table is scheduled to be imported to the nw/customers
map every Sunday at 00:00:00. The nw.orders
table, on the other hand, is secheduled to be imported to the nw/orders
map every hour.
- Try updating the database to see if the updates get imported into Hazelcast. Keep in mind that you will need to wait up to an hour before the updates make to Hazelcast.
# Stop the Hazelcast cluster
stop_cluster
# Run jps to find and kill the 'HazelcastImportScheduler' process
jps |grep HazelcastImportScheduler | kill -9 $(awk '{print $1}')
PadoGrid | Catalogs | Manual | FAQ | Releases | Templates | Pods | Kubernetes | Docker | Apps | Quick Start