Skip to content

Commit

Permalink
Modify configuration management
Browse files Browse the repository at this point in the history
  • Loading branch information
sophie-cluml authored Feb 19, 2025
1 parent e76ce4b commit 1c3ecb6
Show file tree
Hide file tree
Showing 17 changed files with 503 additions and 618 deletions.
49 changes: 27 additions & 22 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,22 @@ Versioning](https://semver.org/spec/v2.0.0.html).
- Changed `COMPATIBLE_VERSION_REQ` to ">=0.24.0-alpha.1,<0.25.0".
- Added migration function `migrate_0_23_0_to_0_24_0_op_log`. This function
performs a migration to change the key and value of `Oplog`.
- Added minimal mode to improve behavior in remote configuration mode.
- If local configuration is not used when running Giganto, it will run in
minimal mode, with only the GraphQL server running. Retain task, peer server
task, ingest task, publish task, and DB migration are not executed.
- In minimal mode, the APIs provided by the GraphQL server are limited. In
this mode, only the APIs provided by `graphql::status` are available.
- If the configuration is received successfully via the `setConfig` GraphQL
API, it will switch to normal mode, where all task and DB migrations run
as normal, just like when running Giganto with local settings.
- Several changes are made to configuration management via the GraphQL API:
- The `setConfig` GraphQL API has been renamed to `updateConfig` to better
reflect its functionality. This API not only accepts a new configuration but
also applies it by reloading the system. Upon success, the API returns the
new config. The fields that can be updated via `updateConfig` are the same
as those retrievable via the `config` GraphQL API.
- The `updateConfig` GraphQL API returns an error if the provided `new` config
is an empty string. It also returns an error if the `new` is the same as the
current configuration, which can be retrieved via the `config` GraphQL API.
Additionally, an error is returned if the `new` config content is invalid.
If an error occurs, the update request is not applied.
- The `config` GraphQL API no longer returns the `logDir`, `addrToPeers`, and
`peers` fields.
- The `retention` field in the `config` GraphQL API response now follows the
"{days}d" format to align with the request format used in `setConfig`
GraphQL API.
- The term `timestamp` and `timestamps` are replaced with `time` and `times` in
event structs where the type is `DateTime<Utc>`. This change impacts GraphQL
APIs that return event data or accept filter parameters that used timestamp.
Expand All @@ -57,19 +64,17 @@ Versioning](https://semver.org/spec/v2.0.0.html).
- `bootpRawEvents`
- `dceRpcRawEvents`
- `rdpRawEvents`
- Changed `config` GraphQL API to respond `retention` field in "{days}d" format
to align with the format of the configuration field in the API request.
- `log_dir` is not a required configuration item.
- Adjust logging behavior by revised logging policy.
- If `log_dir` is not present in the local/remote config, logs are written
to stdout/stderr.
- If `log_dir` in the local/remote config is writable, logs are written to
the specified `log_dir`.
- If `log_dir` in the local config is not writable, the program terminates.
- If `log_dir` in the remote config is not writable, the program enters idle
mode.
- Logging in idle mode: logs are written to stdout/stderr until the remote
config is retrieved.
- `log_dir` is no longer a configuration item. To specify the log directory, it
is required to use an optional command-line argument `log-dir`.
- Logging behavior related to command line arguemtn `log-dir` is as follows:
- If `log-dir` is not provided, logs are written to stdout using the tracing
library.
- If `log-dir` is provided and writable, logs are written to the specified
directory using the tracing library.
- If `log-dir` is provided but not writable, Giganto will terminate.
- Any logs generated before the tracing functionality is initialized will be
written directly to stdout or stderr using `println`, `eprintln`, or
similar.

### Removed

Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ graphql_client = "0.14"
humantime = "2"
humantime-serde = "1"
libc = "0.2"
nix = { version = "0.29", features = ["user"] }
num_enum = "0.7"
num-traits = "0.2"
pcap = "2"
Expand Down
55 changes: 28 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,40 +23,46 @@ processing and real-time analytics.

You can run Giganto by invoking the following command:

```sh
giganto --cert <CERT_PATH> --key <KEY_PATH> --ca-certs <CA_CERT_PATH> \
--ca-certs <CA_CERT_PATH>
```

If you want to run Giganto with local configuration file,

```sh
giganto -c <CONFIG_PATH> --cert <CERT_PATH> --key <KEY_PATH> --ca-certs \
<CA_CERT_PATH> --ca-certs <CA_CERT_PATH>
<CA_CERT_PATH>[,<CA_CERT_PATH>...] [--log-dir <LOG_DIR>]
```

### Arguments

- `<CONFIG_PATH>`: Path to the TOML configuration file (optional when running in
remote mode).
- `<CERT_PATH>`: Path to the certificate file (required).
- `<KEY_PATH>`: Path to the private key file (required).
- `<CA_CERTS_PATH>`: Path to the CA certificates file (required).
<!-- markdownlint-disable -->

### Example
| Name | Description | Required |
| ---------------- | --------------------------------------------------------- | -------- |
| `<CONFIG_PATH>` | Path to the TOML configuration file. | Yes |
| `<CERT_PATH>` | Path to the certificate file. | Yes |
| `<KEY_PATH>` | Path to the private key file. | Yes |
| `<CA_CERT_PATH>` | Path to the CA certificates file. | Yes |
| `<LOG_DIR>` | Path to the directory where the log files will be stored. | No |

- Run Giganto with remote server configuration.
<!-- markdownlint-enable -->

```sh
giganto --cert /path/to/cert.pem --key /path/to/key.pem \
--ca-certs /path/to/ca_cert.pem
```
#### Notes on Arguments

- The `--ca-certs` argument accepts multiple values, separated by commas. You
can also repeat the argument to specify multiple CA certificates.
- Logging behavior based on the `--log-dir` argument is as follows:
- If `<LOG_DIR>` is not provided, logs are written to stdout using the tracing
library.
- If `<LOG_DIR>` is provided and writable, logs are written to the specified
directory using the tracing library.
- If `<LOG_DIR>` is provided but not writable, Giganto will terminate.
- Any logs generated before the tracing functionality is initialized will be
written directly to stdout or stderr using `println`, `eprintln`, or
similar.

### Example

- Run Giganto with local configuration file and multiple CA certificates.

```sh
giganto -c path/to/config.toml --cert /path/to/cert.pem --key /path/to/key.pem \
--ca-certs /path/to/ca_cert1.pem --ca-certs /path/to/ca_cert2.pem
--ca-certs /path/to/ca_cert1.pem,/path/to/ca_cert2.pem
```

## Configuration
Expand All @@ -72,7 +78,6 @@ In the config file, you can specify the following options:
| `graphql_srv_addr` | Giganto's GraphQL address | No | [::]:8442 |
| `data_dir` | Path to directory to store data | Yes | - |
| `retention` | Retention period for data | No | 100d |
| `log_dir` | Path to Giganto's syslog file | No | - |
| `export_dir` | Path to Giganto's export file | Yes | - |
| `max_open_files` | Max open files for database | No | 8000 |
| `max_mb_of_level_base` | Max MB for RocksDB Level 1 | No | 512 |
Expand All @@ -92,7 +97,6 @@ publish_srv_addr = "0.0.0.0:38371"
graphql_srv_addr = "127.0.0.1:8442"
data_dir = "tests/data"
retention = "100d"
log_dir = "/opt/clumit/log"
export_dir = "/opt/clumit/var/giganto/export"
max_open_files = 8000
max_mb_of_level_base = 512
Expand All @@ -106,19 +110,16 @@ peers = [ { addr = "10.10.12.1:38383", hostname = "ai" } ]
For the `max_mb_of_level_base`, the last level has 100,000 times capacity,
and it is about 90% of total capacity. Therefore, about `db_total_mb / 111111` is
appropriate.
For example, `90`MB or less for 10TB Database, `900`MB or less for 100TB would
For example, 90 MB or less for 10 TB Database, 900 MB or less for 100 TB would
be appropriate.

These values assume you've used all the way up to level 6, so the actual values may
change if you want to grow your data further at the level base.
So if it's less than `512`MB, it's recommended to set default value of `512`MB.
So if it's less than 512 MB, it's recommended to set default value of 512 MB.

If there is no `addr_to_peers` option in the configuration file, it runs in
standalone mode, and if there is, it runs in cluster mode for P2P.

If there is no `log_dir` option in the configuration file, logs will be written
to stdout instead of to a specific path's log file.

## Test

Run Giganto with the prepared configuration file. (Settings to use the
Expand Down
72 changes: 11 additions & 61 deletions src/graphql.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ use tracing::error;
use crate::{
ingest::implement::EventFilter,
peer::Peers,
settings::Settings,
settings::{ConfigVisible, Settings},
storage::{
Database, Direction, FilteredIter, KeyExtractor, KeyValue, RawEventStore, StorageKey,
},
AckTransmissionCount, IngestSensors, PcapSensors,
IngestSensors, PcapSensors,
};

pub const TIMESTAMP_SIZE: usize = 8;
Expand All @@ -66,9 +66,6 @@ pub struct Query(
netflow::NetflowQuery,
);

#[derive(Default, MergedObject)]
pub struct MinimalQuery(status::StatusQuery);

#[derive(Default, MergedObject)]
pub struct Mutation(status::ConfigMutation);

Expand Down Expand Up @@ -145,14 +142,12 @@ pub trait ClusterSortKey {
}

type Schema = async_graphql::Schema<Query, Mutation, EmptySubscription>;
type MinimalSchema = async_graphql::Schema<MinimalQuery, Mutation, EmptySubscription>;
type ConnArgs<T> = (Vec<(Box<[u8]>, T)>, bool, bool);

pub struct NodeName(pub String);
pub struct RebootNotify(Arc<Notify>); // reboot
pub struct PowerOffNotify(Arc<Notify>); // shutdown
pub struct TerminateNotify(Arc<Notify>); // stop
pub struct TracingEnabled(bool);

#[allow(clippy::too_many_arguments)]
pub fn schema(
Expand All @@ -163,14 +158,11 @@ pub fn schema(
peers: Peers,
request_client_pool: reqwest::Client,
export_path: PathBuf,
reload_tx: Sender<String>,
reload_tx: Sender<ConfigVisible>,
notify_reboot: Arc<Notify>,
notify_power_off: Arc<Notify>,
notify_terminate: Arc<Notify>,
ack_transmission_cnt: AckTransmissionCount,
is_local_config: bool,
settings: Option<Settings>,
tracing_enabled: bool,
settings: Settings,
) -> Schema {
Schema::build(Query::default(), Mutation::default(), EmptySubscription)
.data(node_name)
Expand All @@ -181,40 +173,13 @@ pub fn schema(
.data(request_client_pool)
.data(export_path)
.data(reload_tx)
.data(ack_transmission_cnt)
.data(TerminateNotify(notify_terminate))
.data(RebootNotify(notify_reboot))
.data(PowerOffNotify(notify_power_off))
.data(is_local_config)
.data(settings)
.data(TracingEnabled(tracing_enabled))
.finish()
}

pub fn minimal_schema(
reload_tx: Sender<String>,
notify_reboot: Arc<Notify>,
notify_power_off: Arc<Notify>,
notify_terminate: Arc<Notify>,
is_local_config: bool,
settings: Option<Settings>,
tracing_enabled: bool,
) -> MinimalSchema {
MinimalSchema::build(
MinimalQuery::default(),
Mutation::default(),
EmptySubscription,
)
.data(reload_tx)
.data(TerminateNotify(notify_terminate))
.data(RebootNotify(notify_reboot))
.data(PowerOffNotify(notify_power_off))
.data(is_local_config)
.data(settings)
.data(TracingEnabled(tracing_enabled))
.finish()
}

/// The default page size for connections when neither `first` nor `last` is
/// provided. Maximum size: 100.
const MAXIMUM_PAGE_SIZE: usize = 100;
Expand Down Expand Up @@ -1776,12 +1741,12 @@ mod tests {
EmptySubscription, SimpleObject,
};
use chrono::{DateTime, Utc};
use tokio::sync::{Notify, RwLock};
use tokio::sync::Notify;

use super::{schema, sort_and_trunk_edges, NodeName};
use crate::graphql::{ClusterSortKey, Mutation, Query};
use crate::peer::{PeerInfo, Peers};
use crate::settings::Settings;
use crate::settings::{ConfigVisible, Settings};
use crate::storage::{Database, DbOptions};
use crate::{new_pcap_sensors, IngestSensors};

Expand All @@ -1797,13 +1762,13 @@ mod tests {
}

impl TestSchema {
fn setup(ingest_sensors: IngestSensors, peers: Peers, is_local_config: bool) -> Self {
fn setup(ingest_sensors: IngestSensors, peers: Peers) -> Self {
let db_dir = tempfile::tempdir().unwrap();
let db = Database::open(db_dir.path(), &DbOptions::default()).unwrap();
let pcap_sensors = new_pcap_sensors();
let request_client_pool = reqwest::Client::new();
let export_dir = tempfile::tempdir().unwrap();
let (reload_tx, _) = tokio::sync::mpsc::channel::<String>(1);
let (reload_tx, _) = tokio::sync::mpsc::channel::<ConfigVisible>(1);
let notify_reboot = Arc::new(Notify::new());
let notify_power_off = Arc::new(Notify::new());
let notify_terminate = Arc::new(Notify::new());
Expand All @@ -1820,10 +1785,7 @@ mod tests {
notify_reboot,
notify_power_off,
notify_terminate,
Arc::new(RwLock::new(1024)),
is_local_config,
Some(settings),
true,
settings,
);

Self {
Expand All @@ -1842,7 +1804,7 @@ mod tests {
));

let peers = Arc::new(tokio::sync::RwLock::new(HashMap::new()));
Self::setup(ingest_sensors, peers, true)
Self::setup(ingest_sensors, peers)
}

pub fn new_with_graphql_peer(port: u16) -> Self {
Expand All @@ -1865,19 +1827,7 @@ mod tests {
},
)])));

Self::setup(ingest_sensors, peers, true)
}

pub fn new_with_remote_config() -> Self {
let ingest_sensors = Arc::new(tokio::sync::RwLock::new(
CURRENT_GIGANTO_INGEST_SENSORS
.into_iter()
.map(str::to_string)
.collect::<HashSet<String>>(),
));

let peers = Arc::new(tokio::sync::RwLock::new(HashMap::new()));
Self::setup(ingest_sensors, peers, false)
Self::setup(ingest_sensors, peers)
}

pub async fn execute(&self, query: &str) -> async_graphql::Response {
Expand Down
20 changes: 11 additions & 9 deletions src/graphql/client/schema/schema.graphql
Original file line number Diff line number Diff line change
Expand Up @@ -89,14 +89,11 @@ type Config {
graphqlSrvAddr: String!
retention: String!
dataDir: String!
logDir: String
exportDir: String!
maxOpenFiles: Int!
maxMbOfLevelBase: StringNumberU64!
numOfThread: Int!
maxSubCompactions: StringNumberU32!
addrToPeers: String
peers: [PeerIdentity!]
ackTransmission: Int!
}

Expand Down Expand Up @@ -1098,7 +1095,17 @@ type MqttRawEventEdge {
}

type Mutation {
setConfig(draft: String!): Boolean!
# Updates the config with the given `new` config. It involves realoding the module with the new
# config.
#
# # Errors
#
# Returns an error if the `new` is empty. In addition, it returns an error if the `new` is
# invalid. The `new` config is invalid if it contains a negative value for `max_open_files` or
# `num_of_thread`, or if the `data_dir` or `export_dir` does not exist or is not a directory.
# It also returns an error if the `export_dir` is not writable. If the `new` is the same as
# the current config, it returns an error.
updateConfig(old: String!, new: String!): Config!
stop: Boolean!
reboot: Boolean!
shutdown: Boolean!
Expand Down Expand Up @@ -1464,11 +1471,6 @@ type Pcap {
parsedPcap: String!
}

type PeerIdentity {
addr: String!
hostname: String!
}

type PipeEventEvent {
time: DateTime!
agentName: String!
Expand Down
Loading

0 comments on commit 1c3ecb6

Please sign in to comment.