Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic resource graph #106

Merged
merged 18 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .vscode/launch.json

Large diffs are not rendered by default.

65 changes: 54 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,47 +6,83 @@ We provide a builtin repo which contains various applications to deploy.
We refer to applications as "jarivs pkgs" which can be connected to form
"deployment pipelines".

# 0.1 Dependencies
# 1. Installation

## 0.1.1. Jarvis-Util
Get the GRC spack repo:
```bash
git clone https://github.com/grc-iit/grc-repo
spack repo add grc-repo
```

Install jarvis-cd:
```bash
spack external find python
spack install py-jarvis-cd
```

Spack packages must be loaded to use them.
You'll have to do this for each new terminal.
```bash
spack load py-jarvis-cd
```

# 2. Manual Installation

## 2.1. Jarvis-Util
Jarvis-CD depends on jarvis-util. jarvis-util contains functions to execute
binaries in python and collect their output.

```bash
git clone https://github.com/scs-lab/jarvis-util.git
git clone https://github.com/grc-iit/jarvis-util.git
cd jarvis-util
python3 -m pip install -r requirements.txt
python3 -m pip install -e .
```

## 0.1.2. Scspkg
## 2.2. Scspkg

Scspkg is a tool for building modulefiles using a CLI. It's not strictly
necessary for Jarvis to function, but many of the readmes use it to provide
structure to manual installations.

```bash
git clone https://github.com/scs-lab/scspkg.git
git clone https://github.com/grc-iit/scspkg.git
python3 -m pip install -r requirements.txt
python3 -m pip install -e .
echo "module use \`scspkg module dir\`" >> ~/.bashrc
```

The wiki for scspkg is [here](https://github.com/scs-lab/scspkg.git).
The wiki for scspkg is [here](https://github.com/grc-iit/scspkg.git).

# 0.2. Installation
# 2.3. Jarvis-CD

```bash
cd /path/to/jarvis-cd
python3 -m pip install -r requirements.txt
python3 -m pip install -e .
```

# 0.3. Configuring Jarvis
# 2.4. Net Test

Network test tool for identifying valid networks.
```bash
spack install chi-nettest
```

# 3. Configuring Jarvis

## 0.3.1. Bootstrapping from a specific machine
## 3.1. Bootstrapping for a single-node machine

Jarivs has been pre-configured on some machines. To bootstrap from
You may be trying to test things on just a single node.

In this case, run:
```bash
jarvis bootstrap from local
```

## 3.2. Bootstrapping from a specific machine

Jarvis has been pre-configured on some machines. To bootstrap from
one of them, run the following:

```bash
Expand All @@ -60,7 +96,7 @@ To check the set of available machines to bootstrap from, run:
jarvis bootstrap list
```

## 0.3.2. Creating a new configuration
## 3.3. Creating a new configuration

A configuration can be generated as follows:
```bash
Expand All @@ -78,3 +114,10 @@ require this, but on machines without a global filesystem (e.g., Chameleon Cloud
this parameter can be set later.

For a personal machine, these directories can be the same directory.

# 4. Building the Resource Graph

Python jarvis:
```bash
jarvis rg build
```
61 changes: 28 additions & 33 deletions bin/jarvis
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ class JarvisArgs(ArgParse):
'pos': True
}
])

# jarvis celan
# jarvis reset
self.add_cmd('reset',
msg='Clean all pipelines and configurations')

Expand Down Expand Up @@ -110,10 +110,6 @@ class JarvisArgs(ArgParse):
msg='Resources to build a resource graph for a machine',
aliases=['rg'])

# jarvis resource-graph init
self.add_cmd('resource-graph init',
msg='Create an empty resource graph')

# jarvis resource-graph show
self.add_cmd('resource-graph show',
msg='Show the resource graph')
Expand All @@ -127,23 +123,29 @@ class JarvisArgs(ArgParse):
msg='Introspect resource graph for this machine')
self.add_args([
{
'name': 'walkthrough',
'msg': 'A guide for building a resource graph',
'type': bool,
'default': False
'name': 'net_sleep',
'msg': 'How long to sleep in network tests',
'type': float,
'default': 5,
'pos': False,
'required': False
},
])

# jarvis resource-graph modify
self.add_cmd('resource-graph modify',
msg='Modify the resource graph to introspect new resources')
self.add_args([
{
'name': 'introspect',
'msg': 'Whether or not to do an introspect before building',
'type': bool,
'default': True
'name': 'net_sleep',
'msg': 'How long to sleep in network tests',
'type': float,
'default': 5,
'pos': False,
'required': False
},
])

# jarvis resource-graph build
self.add_cmd('resource-graph prune',
msg='An interactive CLI for modifying the resource graph')


# jarvis resource-graph add storage
self.add_cmd('resource-graph add storage',
msg='Add a storage device or PFS to track')
Expand Down Expand Up @@ -1122,30 +1124,23 @@ class JarvisArgs(ArgParse):

def hostfile_set(self):
self.jarvis.set_hostfile(self.kwargs['path'])
pipelines = self.jarvis.list_pipelines()
self.jarvis.save()

def resource_graph_init(self):
self.jarvis.resource_graph_init()
self.jarvis.save()

def resource_graph_show(self):
self.jarvis.resource_graph_show()
self.jarvis.load().resource_graph_show()
self.jarvis.save()

def resource_graph_path(self):
print(self.jarvis.resource_graph_path)
self.jarvis.save()

def resource_graph_build(self):
walkthrough = self.kwargs['walkthrough']
introspect = self.kwargs['introspect']
if walkthrough:
self.jarvis.resource_graph.walkthrough_build(
PsshExecInfo(hostfile=self.jarvis.hostfile),
introspect)
else:
self.jarvis.resource_graph_build()
net_sleep = self.kwargs['net_sleep']
self.jarvis.resource_graph_build(net_sleep)
self.jarvis.save()

def resource_graph_modify(self):
self.jarvis.resource_graph_modify()
self.jarvis.save()

def resource_graph_prune(self):
Expand Down
5 changes: 3 additions & 2 deletions builtin/builtin/hermes/pkg.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,12 @@ def _configure_server(self):

if len(self.config['devices']) == 0:
# Get all the fastest storage device mount points on machine
dev_df = rg.find_storage()
dev_df = rg.find_storage(needs_root=False)
else:
# Get the storage devices for the user
dev_list = [rg.find_storage(dev_types=dev_type,
count_per_pkg=count)
count_per_pkg=count,
needs_root=False)
for dev_type, count in self.config['devices']]
dev_df = sdf.concat(dev_list)
if len(dev_df) == 0:
Expand Down
5 changes: 3 additions & 2 deletions builtin/builtin/hermes_run/pkg.py
Original file line number Diff line number Diff line change
Expand Up @@ -336,11 +336,12 @@ def _configure(self, **kwargs):
# Get storage info
if len(self.config['devices']) == 0:
# Get all the fastest storage device mount points on machine
dev_df = rg.find_storage()
dev_df = rg.find_storage(needs_root=False)
else:
# Get the storage devices for the user
dev_list = [rg.find_storage(dev_types=dev_type,
count_per_pkg=count)
count_per_pkg=count,
needs_root=False)
for dev_type, count in self.config['devices']]
dev_df = sdf.concat(dev_list)
if len(dev_df) == 0:
Expand Down
6 changes: 4 additions & 2 deletions builtin/builtin/orangefs/fuse.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,14 @@ def _configure(self, **kwargs):
dev_types = ['hdd', 'ssd', 'nvme', 'dimm']
for dev_type in dev_types:
dev_df = rg.find_storage(dev_types=[dev_type],
shared=False)
shared=False,
needs_root=False)
if len(dev_df) != 0:
break
else:
dev_df = rg.find_storage(dev_types=[self.config['dev_type']],
shared=False)
shared=False,
needs_root=False)
if len(dev_df) == 0:
raise Exception('Could not find any storage devices :(')

Expand Down
8 changes: 5 additions & 3 deletions builtin/builtin/orangefs/pkg.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def _configure_menu(self):
{
'name': 'name',
'msg': 'The name of the orangefs installation',
'type': int,
'type': str,
'default': 'orangefs',
},
{
Expand Down Expand Up @@ -107,12 +107,14 @@ def _configure(self, **kwargs):
dev_types = ['hdd', 'ssd', 'nvme', 'dimm']
for dev_type in dev_types:
dev_df = rg.find_storage(dev_types=[dev_type],
shared=False)
shared=False,
needs_root=False)
if len(dev_df) != 0:
break
else:
dev_df = rg.find_storage(dev_types=[self.config['dev_type']],
shared=False)
shared=False,
needs_root=False)
if len(dev_df) == 0:
raise Exception('Could not find any storage devices :(')
storage_dir = os.path.expandvars(dev_df.rows[0]['mount'])
Expand Down
Loading
Loading