[QST] RAPIDS can't work if I use more than one executors for a server in yarn mode,What can I do? #5393
-
What is your question? I have two servers and one server's configuration is as follows: Software configuration: spark3.0+rapids-0.2+hadoop-3.2.1 The command which I run my job with and some logs are following: (2)the part of the driver's log 20/09/01 17:46:50 [dispatcher-BlockManagerMaster] INFO BlockManagerMasterEndpoint: Registering block manager yq01-sys-hic-v100-box-a223-0117.yq01.baidu.com:39805 with 36.9 GiB RAM, BlockManagerId(4, yq01-sys-hic-v100-box-a223-0117.yq01.baidu.com, 39805, None) [2020-09-01 17:46:50.957]Container exited with a non-zero exit code 1. Error file: prelaunch.err. (3)the part of executor's log Another error: Another error: Another error: Another error: I found driver's error log "Spark GPU Plugin only supports 1 gpu per executor" in rapids source code I run getGpusResources.sh,output is {"name": "gpu", "addresses":["0","1","2","3","4","5","6","7"]} My question: if I have a sever with many gpu cards, for the above source code ,I always get errors log "Spark GPU Plugin only supports 1 gpu per executor" ? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
it depends on how you are running yarn. It looks like you are not running isolated so you need to have a different method of making sure the GPU is only assigned to a single executor. See our instructions here: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started.html#yarn-without-isolation Or if you can run yarn with docker and cgroups enabled to get isolation that would also work. |
Beta Was this translation helpful? Give feedback.
-
hi,@tgravescs It works but I found a new question: RAPIDS can work only in the master nodes ,can't work in worker nodes. My command is as follows: $SPARK_HOME/bin/spark-shell |
Beta Was this translation helpful? Give feedback.
-
hi,@tgravescs |
Beta Was this translation helpful? Give feedback.
-
by your last comment I think you have figured it out so I'm going to close this. If not please reopen and let me know what issue you are seeing. Spark should run executors on all the yarn node manager nodes. |
Beta Was this translation helpful? Give feedback.
it depends on how you are running yarn. It looks like you are not running isolated so you need to have a different method of making sure the GPU is only assigned to a single executor. See our instructions here: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started.html#yarn-without-isolation
Or if you can run yarn with docker and cgroups enabled to get isolation that would also work.