-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the issue that pyspark3 not supported by Livy 0.4 #421
Conversation
@@ -32,7 +32,7 @@ def override_all(obj): | |||
|
|||
# Helpers | |||
|
|||
def get_livy_kind(language): | |||
def get_session_kind(language): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please revert the change in name? Was it done for any particular reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there is a reason for that change.
Basically Livy 0.4 onwards has removed "pyspark3" as session kind, which is a breaking change. For sparkmagic, we switch to use the pyspark setting "spark.pyspark.python" to specify python2/python3 instead of the old approach by using the Livy session kind.
As you can see from my change, we are sticking with pyspark/pyspark3 as the session kind definition for sparkmagic, and then translate that to the correct config settings which are sent to Livy when creating a session. I am renaming the method name because this method is now using the new session definition (for sparkmagic), as opposed to livy session definition (which does not have pyspark3 now).
Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused. If Livy 0.4 has removed "pyspark3" as session kind why is it still there in the master?
https://github.com/cloudera/livy/blob/master/core/src/main/scala/com/cloudera/livy/sessions/Kind.scala#L47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has anyone tested this on livy 0.5?
properties[constants.LIVY_CONF_PARAM] = python_properties | ||
elif kind == constants.SESSION_KIND_PYSPARK3: | ||
properties[constants.LIVY_KIND_PARAM] = constants.SESSION_KIND_PYSPARK | ||
python_properties = {constants.PYSPARK_PYTHON_PARAM: constants.LANG_PYTHON3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The python3 path should be configurable, and not a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this kind of replacement of properties breaks configuration JSON, e.g. below is before and after. Basically it lost 'spark.jars.packages' I had passed over in the conf.
before
properties: {'driverMemory': '1G', 'executorCores': 4, 'kind': 'pyspark3', 'conf': {'spark.pyspark.python': 'python3', 'spark.jars.packages': 'com.databricks:spark-avro_2.11:4.0.0'}, 'heartbeatTimeoutInSecond': 120}
after
properties: {'driverMemory': '1G', 'executorCores': 4, 'kind': 'pyspark', 'conf': {'spark.pyspark.python': 'python3'}, 'heartbeatTimeoutInSecond': 120}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of _translate_to_livy_kind
overrides additional parameters passed in "conf", e.g. spark.jars.packages
.
@@ -32,7 +32,7 @@ def override_all(obj): | |||
|
|||
# Helpers | |||
|
|||
def get_livy_kind(language): | |||
def get_session_kind(language): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused. If Livy 0.4 has removed "pyspark3" as session kind why is it still there in the master?
https://github.com/cloudera/livy/blob/master/core/src/main/scala/com/cloudera/livy/sessions/Kind.scala#L47
properties[constants.LIVY_CONF_PARAM] = python_properties | ||
elif kind == constants.SESSION_KIND_PYSPARK3: | ||
properties[constants.LIVY_KIND_PARAM] = constants.SESSION_KIND_PYSPARK | ||
python_properties = {constants.PYSPARK_PYTHON_PARAM: constants.LANG_PYTHON3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this kind of replacement of properties breaks configuration JSON, e.g. below is before and after. Basically it lost 'spark.jars.packages' I had passed over in the conf.
before
properties: {'driverMemory': '1G', 'executorCores': 4, 'kind': 'pyspark3', 'conf': {'spark.pyspark.python': 'python3', 'spark.jars.packages': 'com.databricks:spark-avro_2.11:4.0.0'}, 'heartbeatTimeoutInSecond': 120}
after
properties: {'driverMemory': '1G', 'executorCores': 4, 'kind': 'pyspark', 'conf': {'spark.pyspark.python': 'python3'}, 'heartbeatTimeoutInSecond': 120}
livy 0.4.0 pyspark3 compatibility, fixing jupyter-incubator#421
This reverts commit 0c1bb74.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the bug that was fixed here? Cursory review suggests code has been refactored to make liysession.py concise and python names configurable.
I don't see a logical error from the previous version that got addressed here, but sorry if I missed it somehow. Thanks for your clarifications.
* livy 0.4.0 pyspark3 compatibility, fixing jupyter-incubator#421 (#1) * Update version to 0.12.5.1 - Keeping compat with public version of sparkmagic by staying at 0.12.5.n * Modified the livy session initial call for SparkR sessions (#2) * Add minor version change to reflect update vs. main repo's SparkMagic
Hi I have tried to to fix this compatibility issue in a different way in this PR: #507 |
@gm-spacagna I have a PR that's failing build #462. Any tips on how to resolve it? |
@gm-spacagna @aggFTW Can you please answer the question? Would like someone to look at that PRs and it's been almost a year, so want to get Travis CI working. |
Also tagging @apetresc in case he knows the answer to @ranjitiyer's question. Thanks! |
This is superseded by #540, which I hope to get merged soon. Please try it! |
Livy 0.4 onwards removed the "pyspark3" as the session kind, so sparkmagic receives a 400 response when we create a pyspark3 notebook on Jupyter. So pyspark3 kernel is not working due to this Livy breaking change.
In this PR, we are specifying "spark.yarn.appMasterEnv.PYSPARK_PYTHON" to "python" or "python3" to kick off different python versions, which is the new approach to differentiate pyspark/pyspark3 sessions. This approach works with the old livy version (e.g. 0.3) as well as 0.4 onwards. Also "spark.yarn.appMasterEnv.PYSPARK_PYTHON" is supported both spark 1.6 and spark 2.