Skip to content

Commit

Permalink
Generate Pelican site
Browse files Browse the repository at this point in the history
  • Loading branch information
nevillelyh committed Aug 1, 2017
1 parent 785705c commit d987513
Show file tree
Hide file tree
Showing 50 changed files with 1,693 additions and 380 deletions.
6 changes: 5 additions & 1 deletion archives.html
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@
<section id="content">
<h1>Archives for Das Keyboard Shredder</h1>
<div id="archives">
<p>
<span class="categories-timestamp"><time datetime="2017-08-01T13:51:00-04:00">Tue 01 August 2017</time></span>
<a href="http://www.lyh.me/lambda-serialization.html">Lambda&nbsp;serialization</a>
</p>
<p>
<span class="categories-timestamp"><time datetime="2017-07-19T11:23:00-04:00">Wed 19 July 2017</time></span>
<a href="http://www.lyh.me/lawfulness-of-aggregatebykey.html">Lawfulness of&nbsp;aggregateByKey</a>
Expand Down Expand Up @@ -214,11 +218,11 @@ <h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Social</span></h4>
<li class="list-group-item">
<h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Recent Posts</span></h4>
<ul class="list-group" id="recentposts">
<li class="list-group-item"><a href="http://www.lyh.me/lambda-serialization.html">Lambda&nbsp;serialization</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/lawfulness-of-aggregatebykey.html">Lawfulness of&nbsp;aggregateByKey</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/canbuildfrom.html">CanBuildFrom</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/decompiling-scala-code.html">Decompiling Scala&nbsp;code</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/implicits.html">Implicits</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/scio-at-philly-ete.html">Scio at Philly <span class="caps">ETE</span></a></li>
</ul>
</li>
<!-- End Sidebar/Recent Posts -->
Expand Down
81 changes: 48 additions & 33 deletions author/neville-li.html
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,53 @@
<div class="container">
<div class="row">
<div class="col-sm-9">
<article>
<h2><a href="http://www.lyh.me/lambda-serialization.html">Lambda&nbsp;serialization</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2017-08-01T13:51:00-04:00"> Tue 01 August 2017</time>
</span>



<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>


<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/scala.html">scala</a>
/
<a href="http://www.lyh.me/tag/scio.html">scio</a>
/
<a href="http://www.lyh.me/tag/data.html">data</a>

</footer><!-- /.post-info --> </div>
<div class="summary"><p>Lambda serialization is one of the more confusion issues in distributed data processing in Scala. No matter which framework you choose, whether it&#8217;s Scalding, Spark, Flink or Scio, sooner or later you&#8217;ll be hit by the dreaded <code>NotSerializableException</code>. In this post we&#8217;ll take a closer look at the common causes and solutions to this&nbsp;problem.</p>
<h2>Setup</h2>
<p>To demonstrate the problem, first we need a minimal setup that minics the behavior of a distributed data processing system. We start with a utility method that roundtrips an object throguh Java serialization. Anonymous functions, or lambdas, in such systems are serialized so that they can be distributed to workers for parallel&nbsp;processing.</p>
<div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">java.io.</span><span class="o">{</span><span class="nc">ByteArrayInputStream</span><span class="o">,</span> <span class="nc">ByteArrayOutputStream</span><span class="o">,</span> <span class="nc">ObjectInputStream</span><span class="o">,</span> <span class="nc">ObjectOutputStream</span><span class="o">}</span>

<span class="k">object</span> <span class="nc">SerDeUtil</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">serDe</span><span class="o">[</span><span class="kt">T</span><span class="o">](</span><span class="n">obj</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span><span class="k">:</span> <span class="kt">T</span> <span class="o">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">buffer</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">ByteArrayOutputStream</span><span class="o">()</span>
<span class="k">val</span> <span class="n">out</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">ObjectOutputStream</span><span class="o">(</span><span class="n">buffer</span><span class="o">)</span>
<span class="n">out</span><span class="o">.</span><span class="n">writeObject</span><span class="o">(</span><span class="n">obj</span><span class="o">)</span>
<span class="n">out</span><span class="o">.</span><span class="n">close</span><span class="o">()</span>

<span class="k">val</span> <span class="n">in</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">ObjectInputStream</span><span class="o">(</span><span class="k">new</span> <span class="nc">ByteArrayInputStream</span><span class="o">(</span><span class="n">buffer</span><span class="o">.</span><span class="n">toByteArray</span><span class="o">))</span>
<span class="n">in</span><span class="o">.</span><span class="n">readObject</span><span class="o">().</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>


<p>Next we create a bare minimal <code>Collection[T]</code> type that mimics an abstract distributed data set, akin to <code>TypedPipe</code>, <code>RDD</code>, or <code>SCollection</code> in Scalding, Spark or Scio respectively. Our implementation is backed by a local in-memory <code>Seq[T]</code> but does pass the function <code>f</code> through serialization like …</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/lambda-serialization.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/lawfulness-of-aggregatebykey.html">Lawfulness of&nbsp;aggregateByKey</a></h2>
<div class="well well-sm">
Expand Down Expand Up @@ -476,38 +523,6 @@ <h2><a href="http://www.lyh.me/semigroups.html">Semigroups</a></h2>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/scio-a-scala-api-for-google-cloud-dataflow.html">Scio, a Scala <span class="caps">API</span> for Google Cloud&nbsp;Dataflow</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2016-04-21T12:47:00-04:00"> Thu 21 April 2016</time>
</span>



<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>


<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/data.html">data</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
/
<a href="http://www.lyh.me/tag/scio.html">scio</a>

</footer><!-- /.post-info --> </div>
<div class="summary"><p>We recently open sourced <a href="https://github.com/spotify/scio">Scio</a>, a Scala <span class="caps">API</span> for <a href="https://github.com/GoogleCloudPlatform/DataflowJavaSDK">Google Cloud Dataflow</a>. Here are the slides of our talk at <a href="https://cloudplatformonline.com/NEXT2016.html"><span class="caps">GCPNEXT16</span></a> a few weeks&nbsp;ago.</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/rRPUB4cSAvgF8M" width="800" height="490" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>

<p><div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/sinisalyh/from-stream-to-recommendation-using-apache-beam-with-cloud-pubsub-and-cloud-dataflow" title="From stream to recommendation using apache beam with cloud pubsub and cloud dataflow" target="_blank">From stream to recommendation using apache beam with cloud pubsub and cloud dataflow</a> </strong> from <strong><a href="//www.slideshare.net/sinisalyh" target="_blank">Neville Li</a></strong> </div></p>
<p>The first half of the talk covers our experiments with Dataflow and Pub/Sub for streaming application while the second half covers Scio and BigQuery for batch analysis and machine&nbsp;learning.</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/scio-a-scala-api-for-google-cloud-dataflow.html">more ...</a>
</div>
</article>
<hr/>

<ul class="pagination">
<li class="prev disabled"><a href="#">&laquo;</a></li>
Expand Down Expand Up @@ -559,11 +574,11 @@ <h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Social</span></h4>
<li class="list-group-item">
<h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Recent Posts</span></h4>
<ul class="list-group" id="recentposts">
<li class="list-group-item"><a href="http://www.lyh.me/lambda-serialization.html">Lambda&nbsp;serialization</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/lawfulness-of-aggregatebykey.html">Lawfulness of&nbsp;aggregateByKey</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/canbuildfrom.html">CanBuildFrom</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/decompiling-scala-code.html">Decompiling Scala&nbsp;code</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/implicits.html">Implicits</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/scio-at-philly-ete.html">Scio at Philly <span class="caps">ETE</span></a></li>
</ul>
</li>
<!-- End Sidebar/Recent Posts -->
Expand Down
74 changes: 33 additions & 41 deletions author/neville-li2.html
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,38 @@
<div class="container">
<div class="row">
<div class="col-sm-9">
<article>
<h2><a href="http://www.lyh.me/scio-a-scala-api-for-google-cloud-dataflow.html">Scio, a Scala <span class="caps">API</span> for Google Cloud&nbsp;Dataflow</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2016-04-21T12:47:00-04:00"> Thu 21 April 2016</time>
</span>



<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>


<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/data.html">data</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
/
<a href="http://www.lyh.me/tag/scio.html">scio</a>

</footer><!-- /.post-info --> </div>
<div class="summary"><p>We recently open sourced <a href="https://github.com/spotify/scio">Scio</a>, a Scala <span class="caps">API</span> for <a href="https://github.com/GoogleCloudPlatform/DataflowJavaSDK">Google Cloud Dataflow</a>. Here are the slides of our talk at <a href="https://cloudplatformonline.com/NEXT2016.html"><span class="caps">GCPNEXT16</span></a> a few weeks&nbsp;ago.</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/rRPUB4cSAvgF8M" width="800" height="490" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>

<p><div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/sinisalyh/from-stream-to-recommendation-using-apache-beam-with-cloud-pubsub-and-cloud-dataflow" title="From stream to recommendation using apache beam with cloud pubsub and cloud dataflow" target="_blank">From stream to recommendation using apache beam with cloud pubsub and cloud dataflow</a> </strong> from <strong><a href="//www.slideshare.net/sinisalyh" target="_blank">Neville Li</a></strong> </div></p>
<p>The first half of the talk covers our experiments with Dataflow and Pub/Sub for streaming application while the second half covers Scio and BigQuery for batch analysis and machine&nbsp;learning.</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/scio-a-scala-api-for-google-cloud-dataflow.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/scala-data-pipelines-spotify.html">Scala Data Pipelines @&nbsp;Spotify</a></h2>
<div class="well well-sm">
Expand Down Expand Up @@ -369,46 +401,6 @@ <h2><a href="http://www.lyh.me/dotfiles-update.html">dotfiles&nbsp;update</a></h
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/on-being-a-polyglot.html">On being a&nbsp;polyglot</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2014-08-21T21:26:00-04:00"> Thu 21 August 2014</time>
</span>



<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>


<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/c.html">c</a>
/
<a href="http://www.lyh.me/tag/cpp.html">cpp</a>
/
<a href="http://www.lyh.me/tag/python.html">python</a>
/
<a href="http://www.lyh.me/tag/javascript.html">javascript</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
/
<a href="http://www.lyh.me/tag/java.html">java</a>
/
<a href="http://www.lyh.me/tag/clojure.html">clojure</a>
/
<a href="http://www.lyh.me/tag/haskell.html">haskell</a>

</footer><!-- /.post-info --> </div>
<div class="summary"><p>I&#8217;m kind of known as a polyglot among coworkers. We would often argue that instead of hiring great Java/Python/C++ developers, we should rather strive to hire great engineers with strong <span class="caps">CS</span> fundamentals who can pick up any language easily. I came from scientific computing background, doing mostly C/C++/Python many years ago. Over the course of the last three years at my current job I coded seven languages professionally, some out of interest and some necessity. I enjoyed the experience learning all these different things and want to share my experience here, what I learned from each one of them and how it helps me becoming a better&nbsp;engineer.</p>
<h2>C</h2>
<p>The first language I used seriously, apart from <span class="caps">LOGO</span> <span class="amp">&amp;</span> <span class="caps">BASIC</span> when I was a kid of course. It&#8217;s probably the closest thing one can get to the operating system and bare metal without dropping down to assembly (while you still can in C). It&#8217;s a simple language whose syntax served as the basis of many successors like C++ <span class="amp">&amp;</span> Java. It doesn&#8217;t offer any fancy features like <span class="caps">OOP</span> or namespaces, but rather depends on the developer&#8217;s skill for organizing large code base (think …</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/on-being-a-polyglot.html">more ...</a>
</div>
</article>
<hr/>

<ul class="pagination">
<li class="prev"><a href="http://www.lyh.me/author/neville-li.html">&laquo;</a>
Expand Down Expand Up @@ -461,11 +453,11 @@ <h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Social</span></h4>
<li class="list-group-item">
<h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Recent Posts</span></h4>
<ul class="list-group" id="recentposts">
<li class="list-group-item"><a href="http://www.lyh.me/lambda-serialization.html">Lambda&nbsp;serialization</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/lawfulness-of-aggregatebykey.html">Lawfulness of&nbsp;aggregateByKey</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/canbuildfrom.html">CanBuildFrom</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/decompiling-scala-code.html">Decompiling Scala&nbsp;code</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/implicits.html">Implicits</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/scio-at-philly-ete.html">Scio at Philly <span class="caps">ETE</span></a></li>
</ul>
</li>
<!-- End Sidebar/Recent Posts -->
Expand Down
Loading

0 comments on commit d987513

Please sign in to comment.