Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue that pyspark3 not supported by Livy 0.4 #421

Closed
wants to merge 4 commits into from

Conversation

taoliseki
Copy link

Livy 0.4 onwards removed the "pyspark3" as the session kind, so sparkmagic receives a 400 response when we create a pyspark3 notebook on Jupyter. So pyspark3 kernel is not working due to this Livy breaking change.

In this PR, we are specifying "spark.yarn.appMasterEnv.PYSPARK_PYTHON" to "python" or "python3" to kick off different python versions, which is the new approach to differentiate pyspark/pyspark3 sessions. This approach works with the old livy version (e.g. 0.3) as well as 0.4 onwards. Also "spark.yarn.appMasterEnv.PYSPARK_PYTHON" is supported both spark 1.6 and spark 2.

@@ -32,7 +32,7 @@ def override_all(obj):

# Helpers

def get_livy_kind(language):
def get_session_kind(language):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please revert the change in name? Was it done for any particular reason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there is a reason for that change.

Basically Livy 0.4 onwards has removed "pyspark3" as session kind, which is a breaking change. For sparkmagic, we switch to use the pyspark setting "spark.pyspark.python" to specify python2/python3 instead of the old approach by using the Livy session kind.

As you can see from my change, we are sticking with pyspark/pyspark3 as the session kind definition for sparkmagic, and then translate that to the correct config settings which are sent to Livy when creating a session. I am renaming the method name because this method is now using the new session definition (for sparkmagic), as opposed to livy session definition (which does not have pyspark3 now).

Does it make sense?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused. If Livy 0.4 has removed "pyspark3" as session kind why is it still there in the master?
https://github.com/cloudera/livy/blob/master/core/src/main/scala/com/cloudera/livy/sessions/Kind.scala#L47

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has anyone tested this on livy 0.5?

properties[constants.LIVY_CONF_PARAM] = python_properties
elif kind == constants.SESSION_KIND_PYSPARK3:
properties[constants.LIVY_KIND_PARAM] = constants.SESSION_KIND_PYSPARK
python_properties = {constants.PYSPARK_PYTHON_PARAM: constants.LANG_PYTHON3}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python3 path should be configurable, and not a constant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this kind of replacement of properties breaks configuration JSON, e.g. below is before and after. Basically it lost 'spark.jars.packages' I had passed over in the conf.

before

 properties: {'driverMemory': '1G', 'executorCores': 4, 'kind': 'pyspark3', 'conf': {'spark.pyspark.python': 'python3', 'spark.jars.packages': 'com.databricks:spark-avro_2.11:4.0.0'}, 'heartbeatTimeoutInSecond': 120}

after

 properties: {'driverMemory': '1G', 'executorCores': 4, 'kind': 'pyspark', 'conf': {'spark.pyspark.python': 'python3'}, 'heartbeatTimeoutInSecond': 120}

Copy link

@mpekalski mpekalski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of _translate_to_livy_kind overrides additional parameters passed in "conf", e.g. spark.jars.packages.

@@ -32,7 +32,7 @@ def override_all(obj):

# Helpers

def get_livy_kind(language):
def get_session_kind(language):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused. If Livy 0.4 has removed "pyspark3" as session kind why is it still there in the master?
https://github.com/cloudera/livy/blob/master/core/src/main/scala/com/cloudera/livy/sessions/Kind.scala#L47

properties[constants.LIVY_CONF_PARAM] = python_properties
elif kind == constants.SESSION_KIND_PYSPARK3:
properties[constants.LIVY_KIND_PARAM] = constants.SESSION_KIND_PYSPARK
python_properties = {constants.PYSPARK_PYTHON_PARAM: constants.LANG_PYTHON3}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this kind of replacement of properties breaks configuration JSON, e.g. below is before and after. Basically it lost 'spark.jars.packages' I had passed over in the conf.

before

 properties: {'driverMemory': '1G', 'executorCores': 4, 'kind': 'pyspark3', 'conf': {'spark.pyspark.python': 'python3', 'spark.jars.packages': 'com.databricks:spark-avro_2.11:4.0.0'}, 'heartbeatTimeoutInSecond': 120}

after

 properties: {'driverMemory': '1G', 'executorCores': 4, 'kind': 'pyspark', 'conf': {'spark.pyspark.python': 'python3'}, 'heartbeatTimeoutInSecond': 120}

mpekalski pushed a commit to mpekalski/sparkmagic that referenced this pull request Feb 6, 2018
mpekalski pushed a commit to mpekalski/sparkmagic that referenced this pull request Feb 6, 2018
ckbhatt pushed a commit to ckbhatt/sparkmagic that referenced this pull request Apr 18, 2018
jimdowling pushed a commit to logicalclocks/sparkmagic that referenced this pull request May 19, 2018
livy 0.4.0 pyspark3 compatibility, fixing jupyter-incubator#421
jimdowling pushed a commit to logicalclocks/sparkmagic that referenced this pull request May 21, 2018
Copy link

@ranjitiyer ranjitiyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the bug that was fixed here? Cursory review suggests code has been refactored to make liysession.py concise and python names configurable.

I don't see a logical error from the previous version that got addressed here, but sorry if I missed it somehow. Thanks for your clarifications.

kevcunnane pushed a commit to kevcunnane/sparkmagic that referenced this pull request Sep 17, 2018
kevcunnane added a commit to kevcunnane/sparkmagic that referenced this pull request Sep 17, 2018
subusc added a commit to subusc/sparkmagic that referenced this pull request Dec 20, 2018
* livy 0.4.0 pyspark3 compatibility, fixing jupyter-incubator#421 (#1)

* Update version to 0.12.5.1
- Keeping compat with public version of sparkmagic by staying at 0.12.5.n

* Modified the livy session initial call for SparkR sessions (#2)

* Add minor version change to reflect update vs. main repo's SparkMagic
@gm-spacagna
Copy link

Hi I have tried to to fix this compatibility issue in a different way in this PR: #507
Would you mind having a look?

@ranjitiyer
Copy link

@gm-spacagna I have a PR that's failing build #462. Any tips on how to resolve it?

@foscraig
Copy link

@gm-spacagna @aggFTW Can you please answer the question? Would like someone to look at that PRs and it's been almost a year, so want to get Travis CI working.

@foscraig
Copy link

Also tagging @apetresc in case he knows the answer to @ranjitiyer's question. Thanks!

@itamarst
Copy link
Contributor

This is superseded by #540, which I hope to get merged soon. Please try it!

@itamarst itamarst closed this Jun 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants