Skip to content

Commit

Permalink
workaround for #7628
Browse files Browse the repository at this point in the history
  • Loading branch information
scottdraves committed Jul 3, 2018
1 parent c82384f commit 4bc5a2b
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 9 deletions.
2 changes: 1 addition & 1 deletion StartHere.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.6"
},
"toc": {
"base_numbering": 1,
Expand Down
17 changes: 11 additions & 6 deletions doc/scala/Flint.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@
"Unlike `DataFrame` and `Dataset`, Flint's `TimeSeriesRDD`s can leverage the existing ordering properties of datasets at rest and the fact that almost all data manipulations and analysis over these datasets respect their temporal ordering properties.\n",
"It differs from other time series efforts in Spark in its ability to efficiently compute across panel data or on large scale high frequency data.\n",
"\n",
"This example uses `prices.csv` file from [Kaggle](https://www.kaggle.com/dgawlik/nyse). For it to work you need to get it and put it in `/tmp/prices.csv`."
"This example uses `prices.csv` file from [Kaggle](https://www.kaggle.com/dgawlik/nyse). For it to work you need to get it and put it in `/tmp/prices.csv`.\n",
"\n",
"The `io.netty` lines are a workaround for a temporary upstream problem, see [#7628](https://github.com/twosigma/beakerx/issues/7628)."
]
},
{
Expand All @@ -23,7 +25,10 @@
"outputs": [],
"source": [
"%%classpath add mvn\n",
"com.github.twosigma flint master-SNAPSHOT\n",
"io.netty netty-all 4.1.25.Final\n",
"io.netty netty-buffer 4.1.25.Final\n",
"io.netty netty-common 4.1.25.Final\n",
"com.github.twosigma flint master-b560b000bc-1\n",
"org.apache.spark spark-sql_2.11 2.2.1\n",
"org.apache.spark spark-mllib_2.11 2.2.1"
]
Expand Down Expand Up @@ -178,7 +183,7 @@
"outputs": [],
"source": [
"// Calculate logarithm of a column\n",
"val logVolumeRdd = pricesRdd.addColumns(\"logVolume\" -> DoubleType -> { row => Math.log(row.getAs[Double](\"volume\")) })\n",
"val logVolumeRdd = pricesRdd.addColumns(\"logVolume\" -> DoubleType -> { row => scala.math.log(row.getAs[Double](\"volume\")) })\n",
"preview(pricesRdd)"
]
},
Expand All @@ -189,7 +194,7 @@
"outputs": [],
"source": [
"// Raise a column to an exponent\n",
"val squaredVolumeRdd = pricesRdd.addColumns(\"squaredVolume\" -> DoubleType -> { row => Math.pow(row.getAs[Double](\"volume\"), 2) })\n",
"val squaredVolumeRdd = pricesRdd.addColumns(\"squaredVolume\" -> DoubleType -> { row => scala.math.pow(row.getAs[Double](\"volume\"), 2) })\n",
"preview(squaredVolumeRdd)"
]
},
Expand Down Expand Up @@ -339,8 +344,8 @@
"// Compute the Z score across an interval\n",
"val zScoreRdd = pricesRdd.addColumnsForCycle(\"volumeZScore\" -> DoubleType -> { rows: Seq[Row] =>\n",
" val mean = rows.map(_.getAs[Double](\"volume\")).sum / rows.size\n",
" val stddev = Math.sqrt(rows.map { row =>\n",
" Math.pow(row.getAs[Double](\"close\") - mean, 2)\n",
" val stddev = scala.math.sqrt(rows.map { row =>\n",
" scala.math.pow(row.getAs[Double](\"close\") - mean, 2)\n",
" }.sum ) / (rows.size - 1)\n",
" rows.map { row =>\n",
" row -> (row.getAs[Double](\"close\") - mean) / stddev\n",
Expand Down
10 changes: 8 additions & 2 deletions doc/scala/SparkUI.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
"\n",
"BeakerX has a Spark magic that provides deeper integration with Spark. It provides a GUI dialog for connecting to a cluster, a progress meter that shows how your job is working and links to the regular Spark UI, and it forwards kernel interrupt messages onto the cluster so you can stop a job without leaving the notebook, and it automatically displays Datasets using an interactive widget. Finally, it automatically closes the Spark session when the notebook is closed.\n",
"\n",
"It is compatible with Spark version 2.x."
"It is compatible with Spark version 2.x.\n",
"\n",
"The `io.netty` and flint lines are a workaround for a temporary upstream problem, see [#7628](https://github.com/twosigma/beakerx/issues/7628)."
]
},
{
Expand All @@ -20,7 +22,11 @@
"outputs": [],
"source": [
"%%classpath add mvn\n",
"org.apache.spark spark-sql_2.11 2.3.1"
"io.netty netty-all 4.1.25.Final\n",
"io.netty netty-buffer 4.1.25.Final\n",
"io.netty netty-common 4.1.25.Final\n",
"com.github.twosigma flint master-b560b000bc-1\n",
"org.apache.spark spark-sql_2.11 2.2.1"
]
},
{
Expand Down

0 comments on commit 4bc5a2b

Please sign in to comment.