Skip to content

Releases: volcano-sh/volcano

v1.5.0-Beta

31 Dec 09:53
Compare
Choose a tag to compare

What's New

Support Task Dependency

In most mainstream computing platforms such as MPI and Tensorflow, different pods undertake different roles, for example, master/worker. It is necessary to start master or worker first due to the working principle for different platforms. This feature aims to provide the ability to make the start order correct. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/task-launch-order-within-job.md. (#1920, #1833, @hwdef @shinytang6 @Thor-wl )

Support Reserve Resource for Queue

This feature provides the ability to reserve resources for specified queues in order to make sure there is always guaranteed resources for urgent jobs instead of waiting for resource release or being preempted. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/queue-guarantee-resource-reservation-design.md (#1905, #1904, @qiankunli )

Support Specified Nodes for Volcano in Cluster

In some scenarios such as multiple schedulers, it is necessary for Volcano to be only responsible for part of nodes in the cluster. This feature enable users to configure the nodes that are responsible for the Volcano. More details can be referred to #1834 (#1821, @qiankunli )

Add Tendorflow Job Plugin

Volcano provides a unified object for job management which allows user to run AI training such as Tensorflow, Pytorch, Mxnet, MPI with Volcano Job and enjoy the enhanced lifecycle management. However it is a bit complex for some users. This features is to add Tensorflow plugin based on Volcano job plugin framework which reduces the complexity of running Tensorflow with Volcano and make it easy to use. More details can be referred to https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md (#1874, @LuBingtan )

Other Notable Changes

Bug Fixes

v1.4.0

18 Sep 10:14
Compare
Choose a tag to compare

Changes since v1.4.0-Beta

  • fix bug about not record queue label in metric(#1722, @lowang-bh )
  • fix: do not set taskInfo.NodeName to empty when nodeInfo.RemoveTask is called(#1716, @eggiter )
  • fix(underused): Do not check overused when there is no UnderUsedResourceFn added(#1726, @eggiter )
  • pass kubeClient to admission service(#1730, @hack-qian)
  • upgrade k8s to v1.19.11 because of security notification(#1733, @Thor-wl )
  • optimize some logs in admission process(#1738, @huone1 )
  • change the Mutex to RWMutex in predicateCache(#1741, @william-wang )
  • fix vcjob not work when mount volume(#1742, @Thor-wl )
  • e2e cases about pod affinity skip cancel(#1743, @Thor-wl )
  • fix bug that vcjob is not compeleted when maxRetry is 1(#1746, @Thor-wl )
  • fix gen-admission-secret.sh(#1752, @yahaa )

v1.4.0-Beta

04 Sep 11:40
9aed970
Compare
Choose a tag to compare

What's New

1. Support multi-scheduler

In Kubernetes cluster with multiple schedulers, different kinds of workloads should be mapped to certain scheduler sometimes. For example, K8s native workloads such as deployment in namespace kube-system are mapped to default-sheduler while AI and Big data jobs are mapped to Volcano. This feature aims to implements that automaticallty. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/multi-scheduler.md. (#1576, #1521, @huone1 @william-wang )

2. Support proportion of resources for GPU node

In order to make full use of scarce resources such as GPU, one solution is to bind them with other resources as shares. For example, it is common to see a lot of CPU-intensive workloads are scheduled to GPU nodes. When GPU-intensive workloads come, they cannot be scheduled because of lack of CPU or Memory in GPU nodes. If workloads requiring both GPU, CPU, Memory at certatin range can be scheduled to GPU nodes first, it is possible to make full use of GPUs. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/proportional.md. (#1527, @king-jingxiang )

3. Support CPU NUMA-Aware scheduling

As to CPU-intensive workloads especially in AI, Big Data and HPC fileds, It will result in a significant performance improvement if CPU NUMA is enabled. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/numa-aware.md. (#1493, @huone1 )

4. Provide framework of stress test

In this release, A kind of framework for Volcano stress test is provided. (#1516, @rudeigerc )

Other Notable Changes

Bug Fixes

v1.3.0

27 May 08:18
44ec8eb
Compare
Choose a tag to compare

What's New

1. Support minAvailable at task level

Just as the minAvailable at job level, minAvailable at task level will regard replicases at the same task as group and decide whether to schedule pods at the task. Only when the minAvailable is meet will the pods will be scheduled together. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/task-minavailable.md. (#1459, @shinytang6 )

2. Support minSuccess for Job

Support to configure the least number of pods belonging to the job. It's useful to mark the status of job when minsuccess reaches or not and accelerates the job status judgement. (#1384, @zen-xu )

3. Support task-topology

In big data processing jobs like Tensorflow & Spark, tasks transmitted a large amount of data between each other, causing transmission delay took a large proportion in job execution time. So task topology plugin was proposed to modify scheduling strategy according to transmission topology inside a job, so as to cut the data amount to be transmitted between nodes, decrease transmission delay proportion in job execution time, and improve resource utilization. More details please refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/task-topology-plugin.md. (#1353, @jiangkaihua )

4. Create new repository volcano.sh/apis

Separate apis from volcano.sh/volcanosh. Any downstream projects can introduce the CRD clientset/lister/informer with the K8s version it needs. (https://github.com/volcano-sh/apis, @Thor-wl )

Other Notable Changes

Bug Fixes

  • fix: lose preemptor when considering Preemption between Tasks within same Job (#1453, @lowang-bh )
  • scheduler need configmap role to enable elect funtion(#1443, @wpeng102 )
  • fix(scheduler): use nodeMap to fix anti-affinity problem(#1430. @shinytang6 )
  • fix: use task.Name to make podName in admission(#1412, @merryzhou )
  • add bindingTasks to judge whether adding node to the snapshot.(#1388, @zen-xu )
  • fix reserving for deleted targetJob raise nil pointer(#1371, @zen-xu )
  • fix sla jobOderFn when sla not set(#1365, @merryzhou )
  • fix: it is possible to Occur OutOfCpu, when exist some pods including init container(#1364, @huone1 )
  • fix wrong Pipeline in action allocate(#1360, @yzs981130 )
  • fix: prevent SelectBestNode func arise panic(#1344, @yahaa )
  • fix(scheduler): move JobInfo helper functions to method(#1343, @Thrimbda )

v1.2.0

27 Feb 07:11
Compare
Choose a tag to compare

What's New

1. Add TDM plugin

TDM(Time Division Multiplexing) plugin aims to provide a mechanism for nodes, which can be used for K8S and other cluster(such as Yarn) in separate time.(#1269, @yahaa )

2. Add SLA plugin

SLA(Service Level Agreement) plugin works for job resource reservation feature. Users can set SLA for jobs to ensure specified jobs to be scheduled in time. It provides an better design and implementation for job resource reservation. (#1303, @jiangkaihua )

Other Notable Changes

Bug Fixes

v1.1.2

23 Feb 06:30
Compare
Choose a tag to compare

Changes since v1.1.1

  • bug fix: Use musl-gcc build image, because vc-scheduler default image is alpine, which only has musl-libc(#1225, @zen-xu)

v1.1.1

31 Dec 07:40
acc6a56
Compare
Choose a tag to compare

What's New

1. support vc-scheduler loading custom plugins

Separate plugin implementation with scheduler. Support implement custom plugins and load to vc-scheduler dynamically.(#1218, @zen-xu)

2. add MaxRequeueNum as a controller-manager param

Support configure MaxRequeueNum in config file of vc-scheduler, default to 15 times.(#1087, @shinytang6)

3. add design documentation of CPU careful regulation

Give the design of CPU careful regulation in socket level.(#1051, @ProgramerGu)

Other Notable Changes

Bug Fixes

v1.1.0

30 Oct 15:35
Compare
Choose a tag to compare

What's New

1. Add monitor compontent

Monitor compontent added support display some base metrics about Volcano.(#1066, @alcorj-mizar)

2. Support resource reservation for big job automatically

Reserve resource for pending job which is at highest priority among pending jobs and waits for a long time. The big job is recognized by scheduler automatically.(#1044, @Thor-wl)

3. Support HDRF

Hierarchical dominant resource fairness is configured with a weighted tree, such that each node in the tree has a positive weight value.(#928, @ggaaooppeenngg)

Other Notable Changes

Bug Fixes

v0.4.2

04 Aug 12:58
89c2fa6
Compare
Choose a tag to compare
  • Fix queue capability validation failed when some running jobs finished or deleted (#959, @Thor-wl)

v1.0.1

30 Jul 13:50
68f40e2
Compare
Choose a tag to compare

Changelog since v1.0.0