Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage failed/timeout jobs states with colored label #408

Merged
merged 11 commits into from
Nov 22, 2024
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
details page (#292).
- Display hash near all jobs fields in job details page to generate link to
highlight specific field (#251).
- Represent terminated jobs with colored bullet in job status badge, using
respectively green for completed (ie. successful) jobs, red for failed jobs
and dark orange for timeout jobs (#354).
- conf:
- Add `racksdb` > `infrastructure` parameter for the agent.
- Add `metrics` > `enabled` parameter for the agent.
Expand All @@ -59,6 +62,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Mention metrics export and charts feature in overview page.
- Mention possible Prometheus integration in architecture page.
- Mention login service message feature in overview page.
- Mention jobs badges to visualize job status in overview page.
- Add page to document _Service Messages_ configuration.
- pkgs:
- Introduce `gateway` Python extra package.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions assets/screenshots/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ shadowed:
- screenshot_clusters.png
- screenshot_dashboard_tablet.png
- screenshot_jobs_filters.png
- screenshot_job_badges.png
- screenshot_job_status.png
- screenshot_login_service_message.png
- screenshot_nodes_hovering.png
Expand Down
Binary file modified assets/screenshots/raw/screenshot_charts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/screenshots/raw/screenshot_job_badges.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/screenshots/shadowed/screenshot_clusters.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions dev/crawl-tests-assets
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ def crawl_slurmrestd(socket: Path) -> None:
if _job["job_id"] > max_job_id:
max_job_id = _job["job_id"]

for state in ["RUNNING", "PENDING", "COMPLETED"]:
for state in ["RUNNING", "PENDING", "COMPLETED", "FAILED", "TIMEOUT"]:
dump_job_state(state)

dump_slurmrestd_query(
Expand Down Expand Up @@ -608,7 +608,7 @@ def crawl_gateway(cluster: str, infrastructure: str, dev_tmp_dir: Path) -> str:
for _job in jobs:
if _job["job_id"] < min_job_id:
min_job_id = _job["job_id"]
for state in ["PENDING", "RUNNING", "COMPLETED"]:
for state in ["PENDING", "RUNNING", "COMPLETED", "FAILED", "TIMEOUT"]:
dump_job_state()

dump_component_query(
Expand Down
1 change: 1 addition & 0 deletions docs/modules/overview/images/screenshot_job_badges.png
16 changes: 16 additions & 0 deletions docs/modules/overview/pages/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,22 @@ compare there statuses:

image::screenshot_cluster_change.png[width=600]

== Jobs Status

[.float-group]
--
image::screenshot_job_badges.png[float=left]

{empty} +

Easily visualize jobs status with colored badges and quickly spot possible
failures.

Slurm-web represents Slurm jobs status with a visual colored badge. This really
helps to figure out status of the jobs queue at a glance. Never miss errors when
they occur!
--

== Jobs filters and sorting

Jobs queue can be filtered by many criteria (job state, user, account, QOS,
Expand Down
4 changes: 4 additions & 0 deletions frontend/src/components/dashboard/ChartJobsHistogram.vue
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ const statesColors: Record<MetricJobState, string> = {
pending: 'rgba(255, 204, 0, 0.7)', // yellow
completing: 'rgba(204, 153, 0, 0.7)', // dark yellow
completed: 'rgb(192, 191, 188, 0.7)', // grey
failed: 'rgb(199, 40, 43, 0.7)', // red
timeout: 'rgb(214, 93, 11, 0.7)', // dark orange
cancelled: 'rgb(204, 0, 153, 0.7)', // purple
unknown: 'rgb(30, 30, 30, 0.7)' // dark grey
}
Expand All @@ -27,6 +29,8 @@ const liveChart = useDashboardLiveChart<MetricJobState>('metrics_jobs', chartCan
'unknown',
'cancelled',
'completed',
'failed',
'timeout',
'completing',
'running',
'pending'
Expand Down
15 changes: 15 additions & 0 deletions frontend/src/components/job/JobStatusBadge.vue
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,21 @@ const statusColor = computed<JobLabelColors>(() => {
span: 'bg-purple-100 text-purple-700',
circle: 'fill-purple-500'
}
else if (props.status.includes('COMPLETED'))
return {
span: 'bg-gray-100 text-gray-600',
circle: 'fill-green-500'
}
else if (props.status.includes('FAILED'))
return {
span: 'bg-gray-100 text-gray-600',
circle: 'fill-red-500'
}
else if (props.status.includes('TIMEOUT'))
return {
span: 'bg-gray-100 text-gray-600',
circle: 'fill-orange-600'
}
else
return {
span: 'bg-gray-100 text-gray-600',
Expand Down
2 changes: 2 additions & 0 deletions frontend/src/composables/GatewayAPI.ts
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,8 @@ export type MetricJobState =
| 'unknown'
| 'cancelled'
| 'completed'
| 'failed'
| 'timeout'
| 'completing'
| 'running'
| 'pending'
Expand Down
56 changes: 56 additions & 0 deletions frontend/tests/components/job/JobProgress.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ import JobProgress from '@/components/job/JobProgress.vue'
import jobPending from '../../assets/job-pending.json'
import jobRunning from '../../assets/job-running.json'
import jobCompleted from '../../assets/job-completed.json'
import jobFailed from '../../assets/job-failed.json'
import jobTimeout from '../../assets/job-timeout.json'

describe('JobProgress.vue', () => {
test('display job progress of pending job', () => {
Expand Down Expand Up @@ -100,6 +102,60 @@ describe('JobProgress.vue', () => {
const completingSpans = wrapper.get('li#step-completing').findAll('span')
expect(completingSpans[1].classes('bg-slurmweb')).toBe(true)

const terminatedSpans = wrapper.get('li#step-terminated').findAll('span')
expect(terminatedSpans[1].classes('bg-slurmweb')).toBe(true)
})
test('display job progress of failed job', () => {
const wrapper = mount(JobProgress, {
props: {
job: jobFailed
}
})
// Must have all 6 bullets in blue
expect(wrapper.findAll('span.bg-slurmweb').length).toBe(6)

const submittedSpans = wrapper.get('li#step-submitted').findAll('span')
expect(submittedSpans[1].classes('bg-slurmweb')).toBe(true)

const eligibleSpans = wrapper.get('li#step-eligible').findAll('span')
expect(eligibleSpans[1].classes('bg-slurmweb')).toBe(true)

const schedulingSpans = wrapper.get('li#step-scheduling').findAll('span')
expect(schedulingSpans[1].classes('bg-slurmweb')).toBe(true)

const runningSpans = wrapper.get('li#step-running').findAll('span')
expect(runningSpans[1].classes('bg-slurmweb')).toBe(true)

const completingSpans = wrapper.get('li#step-completing').findAll('span')
expect(completingSpans[1].classes('bg-slurmweb')).toBe(true)

const terminatedSpans = wrapper.get('li#step-terminated').findAll('span')
expect(terminatedSpans[1].classes('bg-slurmweb')).toBe(true)
})
test('display job progress of timeout job', () => {
const wrapper = mount(JobProgress, {
props: {
job: jobTimeout
}
})
// Must have all 6 bullets in blue
expect(wrapper.findAll('span.bg-slurmweb').length).toBe(6)

const submittedSpans = wrapper.get('li#step-submitted').findAll('span')
expect(submittedSpans[1].classes('bg-slurmweb')).toBe(true)

const eligibleSpans = wrapper.get('li#step-eligible').findAll('span')
expect(eligibleSpans[1].classes('bg-slurmweb')).toBe(true)

const schedulingSpans = wrapper.get('li#step-scheduling').findAll('span')
expect(schedulingSpans[1].classes('bg-slurmweb')).toBe(true)

const runningSpans = wrapper.get('li#step-running').findAll('span')
expect(runningSpans[1].classes('bg-slurmweb')).toBe(true)

const completingSpans = wrapper.get('li#step-completing').findAll('span')
expect(completingSpans[1].classes('bg-slurmweb')).toBe(true)

const terminatedSpans = wrapper.get('li#step-terminated').findAll('span')
expect(terminatedSpans[1].classes('bg-slurmweb')).toBe(true)
})
Expand Down
28 changes: 26 additions & 2 deletions frontend/tests/components/job/JobStatusBadge.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ import JobStatusBadge from '@/components/job/JobStatusBadge.vue'
import jobRunning from '../../assets/job-running.json'
import jobPending from '../../assets/job-pending.json'
import jobCompleted from '../../assets/job-completed.json'
import jobFailed from '../../assets/job-failed.json'
import jobTimeout from '../../assets/job-timeout.json'
import jobArchived from '../../assets/job-archived.json'

describe('JobStatusBadge.vue', () => {
Expand Down Expand Up @@ -37,9 +39,31 @@ describe('JobStatusBadge.vue', () => {
})
expect(wrapper.get('span').classes('bg-gray-100')).toBe(true)
expect(wrapper.get('span').classes('text-gray-600')).toBe(true)
expect(wrapper.get('svg').classes('fill-gray-400')).toBe(true)
expect(wrapper.get('svg').classes('fill-green-500')).toBe(true)
expect(wrapper.get('span').text()).toBe('COMPLETED')
})
test('badge failed job', () => {
const wrapper = mount(JobStatusBadge, {
props: {
status: jobFailed.state.current
}
})
expect(wrapper.get('span').classes('bg-gray-100')).toBe(true)
expect(wrapper.get('span').classes('text-gray-600')).toBe(true)
expect(wrapper.get('svg').classes('fill-red-500')).toBe(true)
expect(wrapper.get('span').text()).toBe('FAILED')
})
test('badge timeout job', () => {
const wrapper = mount(JobStatusBadge, {
props: {
status: jobTimeout.state.current
}
})
expect(wrapper.get('span').classes('bg-gray-100')).toBe(true)
expect(wrapper.get('span').classes('text-gray-600')).toBe(true)
expect(wrapper.get('svg').classes('fill-orange-600')).toBe(true)
expect(wrapper.get('span').text()).toBe('TIMEOUT')
})
test('badge archived job', () => {
const wrapper = mount(JobStatusBadge, {
props: {
Expand All @@ -48,7 +72,7 @@ describe('JobStatusBadge.vue', () => {
})
expect(wrapper.get('span').classes('bg-gray-100')).toBe(true)
expect(wrapper.get('span').classes('text-gray-600')).toBe(true)
expect(wrapper.get('svg').classes('fill-gray-400')).toBe(true)
expect(wrapper.get('svg').classes('fill-green-500')).toBe(true)
expect(wrapper.get('span').text()).toBe('COMPLETED')
})
test('job badge large', () => {
Expand Down
20 changes: 15 additions & 5 deletions slurmweb/slurmrestd/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,20 +100,26 @@ def jobs_by_node(self, node: str):
"""Select jobs not completed which are allocated the given node."""

def on_node(job):
"""Return True if job is allocated this node."""
if job["nodes"] == "":
return False
return node in ClusterShell.NodeSet.NodeSet(job["nodes"])

return [
job
for job in self.jobs()
if on_node(job) and "COMPLETED" not in job["job_state"]
]
def terminated(job):
"""Return True if job is terminated."""
for terminated_state in ["COMPLETED", "FAILED", "TIMEOUT"]:
if terminated_state in job["job_state"]:
return True
return False

return [job for job in self.jobs() if on_node(job) and not terminated(job)]

def jobs_states(self):
jobs = {
"running": 0,
"completed": 0,
"failed": 0,
"timeout": 0,
"completing": 0,
"cancelled": 0,
"pending": 0,
Expand All @@ -125,6 +131,10 @@ def jobs_states(self):
jobs["running"] += 1
elif "COMPLETED" in job["job_state"]:
jobs["completed"] += 1
elif "FAILED" in job["job_state"]:
jobs["failed"] += 1
elif "TIMEOUT" in job["job_state"]:
jobs["timeout"] += 1
elif "COMPLETING" in job["job_state"]:
jobs["completing"] += 1
elif "CANCELLED" in job["job_state"]:
Expand Down
Loading
Loading