-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex2.html
583 lines (481 loc) · 38 KB
/
index2.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
<!DOCTYPE html>
<html lang="en" prefix="og: http://ogp.me/ns# fb: https://www.facebook.com/2008/fbml">
<head>
<title>Das Keyboard Shredder</title>
<!-- Using the latest rendering mode for IE -->
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="canonical" href="http://www.lyh.me">
<meta name="author" content="Neville Li" />
<!-- Open Graph tags -->
<meta property="og:site_name" content="Das Keyboard Shredder" />
<meta property="og:type" content="website"/>
<meta property="og:title" content="Das Keyboard Shredder"/>
<meta property="og:url" content="http://www.lyh.me"/>
<meta property="og:description" content="Das Keyboard Shredder"/>
<!-- Bootstrap -->
<link rel="stylesheet" href="http://www.lyh.me/theme/css/bootstrap.min.css" type="text/css"/>
<link href="http://www.lyh.me/theme/css/font-awesome.min.css" rel="stylesheet">
<link href="http://www.lyh.me/theme/css/pygments/monokai.css" rel="stylesheet">
<link href="http://www.lyh.me/theme/css/typogrify.css" rel="stylesheet">
<link rel="stylesheet" href="http://www.lyh.me/theme/css/style.css" type="text/css"/>
<link href="http://www.lyh.me/feeds/all.atom.xml" type="application/atom+xml" rel="alternate"
title="Das Keyboard Shredder ATOM Feed"/>
</head>
<body>
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="http://www.lyh.me/" class="navbar-brand">
Das Keyboard Shredder </a>
</div>
<div class="collapse navbar-collapse navbar-ex1-collapse">
<ul class="nav navbar-nav">
<li><a href="http://www.lyh.me/pages/about-me.html">
About Me
</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
</ul>
</div>
<!-- /.navbar-collapse -->
</div>
</div> <!-- /.navbar -->
<!-- Banner -->
<!-- End Banner -->
<div class="container">
<div class="row">
<div class="col-sm-9">
<article>
<h2><a href="http://www.lyh.me/nescala-2015-talk.html">NEScala 2015 talk</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2015-02-04T11:02:00-05:00"> Wed 04 February 2015</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/avro.html">avro</a>
/
<a href="http://www.lyh.me/tag/macro.html">macro</a>
/
<a href="http://www.lyh.me/tag/parquet.html">parquet</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p>I gave a lightning talk at <a href="http://www.nescala.org/">Northeast Scala Symposium</a> last week on Macros in Datapipelines and here are the <a href="/slides/macros.html">slides</a>.</p>
<iframe src="/slides/macros.html" width="800" height="450"></iframe>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/nescala-2015-talk.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/fun-with-macros-and-parquet-avro.html">Fun with macros and parquet-avro</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2015-01-08T14:01:00-05:00"> Thu 08 January 2015</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/avro.html">avro</a>
/
<a href="http://www.lyh.me/tag/macro.html">macro</a>
/
<a href="http://www.lyh.me/tag/parquet.html">parquet</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p>I recently had some fun building <a href="https://github.com/nevillelyh/parquet-avro-extra">parquet-avro-extra</a>, an add-on module for <a href="https://github.com/Parquet/parquet-mr/tree/master/parquet-avro">parquet-avro</a> using <a href="http://scalamacros.org/">Scala macros</a>. I did it mainly to learn Scala macros but also to make it easier to use <a href="http://parquet.incubator.apache.org/">Parquet</a> with <a href="http://avro.apache.org/">Avro</a> in a data pipeline.</p>
<h2>Parquet and Avro</h2>
<p>Parquet is a columnar storage system designed for <span class="caps">HDFS</span>. It offers some nice improvements over row-major systems including better compression and less I/O with column projection and predicate pushdown. Avro is a data serialization system that enables type-safe access to structured data with complex schema. The <code>parquet-avro</code> module makes it possible to store data in Parquet format on disk and process them as Avro objects inside a <span class="caps">JVM</span> data pipeline like <a href="https://github.com/twitter/scalding">Scalding</a> or <a href="http://spark.apache.org/">Spark</a>.</p>
<h2>Projection</h2>
<p>Parquet allows reading only a subset of columns via projection. Here’s an Scalding <a href="https://github.com/epishkin/scalding/tree/parquet_avro/scalding-parquet">example</a> from <a href="http://www.tapad.com/">Tapad</a>.</p>
<div class="highlight"><pre><span></span><span class="nc">Projection</span><span class="o">[</span><span class="kt">Signal</span><span class="o">](</span><span class="s">"field1"</span><span class="o">,</span> <span class="s">"field2.field2a"</span><span class="o">)</span>
</pre></div>
<p>Note that fields specifications are strings even though the <span class="caps">API</span> has access to Avro type <code>Signal</code> which has strongly typed getter methods.</p>
<p>This is slightly counter-intuitive since most Scala developers are used to transformations like <code>pipe.map(_.getField)</code>. It’s however can be easily solved with macro since the syntax tree of is accessible. A modified version has signature of …</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/fun-with-macros-and-parquet-avro.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/three-reasons-a-data-engineer-should-learn-scala.html">Three Reasons a Data Engineer Should Learn Scala</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2014-11-17T23:15:00-05:00"> Mon 17 November 2014</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/data.html">data</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p><em>This article was written in collaboration with <a href="https://www.hakkalabs.co">Hakka Labs</a> (<a href="https://www.hakkalabs.co/articles/three-reasons-data-eng-learn-scala">original link</a>)</em></p>
<p>There has been a lot of debate over Scala lately, including criticisms like <a href="http://java.dzone.com/articles/i-dont-scala">this</a>, <a href="http://overwatering.org/blog/2013/12/scala-1-star-would-not-program-again/">this</a>, <a href="http://www.infoq.com/news/2011/11/yammer-scala">this</a>, and defenses like <a href="http://blog.gridgainsystems.com/in-defense-of-scala-response-to-i-dont-like-scala/">this</a> and <a href="http://blog.gridgainsystems.com/in-defense-of-scala-part-2/">this</a>. Most of the criticisms seem to focus on the language’s complexity, performance, and integration with existing tools and libraries, while some praise its elegant syntax, powerful type system, and good fit for domain-specific languages.</p>
<p>However most of the discussions seem based on experiences building production backend or web systems where there are a lot of other options already. There are mature, battle tested options like Java, Erlang or even <span class="caps">PHP</span>, and there are Go, node.js, or Python for those who are more adventurous or prefer agility over performance.</p>
<p>Here I want to argue that there’s a best tool for every job, and Scala shines for data processing and machine learning, for the following reasons:</p>
<ul>
<li>good balance between productivity and performance</li>
<li>integration with big data ecosystem</li>
<li>functional paradigm</li>
</ul>
<h2>Productivity without sacrificing performance</h2>
<p>In the big data <span class="amp">&</span> machine learning world where most developers are from Python/R/Matlab background, Scala’s syntax, or the subset needed for the domain, is a lot less intimidating than that …</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/three-reasons-a-data-engineer-should-learn-scala.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/scala-workshop.html">Scala Workshop</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2014-10-04T21:52:00-04:00"> Sat 04 October 2014</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/fp.html">fp</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
/
<a href="http://www.lyh.me/tag/data.html">data</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p>While there are many Scala tutorials and books available, very few of them focus on big data. I did a couple of workshops at Spotify focusing on these areas and here are the <a href="/slides/workshop.html">slides</a>.</p>
<iframe src="/slides/workshop.html" width="800" height="450"></iframe>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/scala-workshop.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/using-cql-with-legacy-column-families.html">Using <span class="caps">CQL</span> with legacy column families</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2014-09-13T20:47:00-04:00"> Sat 13 September 2014</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/cassandra.html">cassandra</a>
/
<a href="http://www.lyh.me/tag/cql.html">cql</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p>We use <a href="http://cassandra.apache.org/">Cassandra</a> extensively <a href="http://www.slideshare.net/JimmyMrdell/playlists-at-spotify-cassandra-summit-london-2013?related=1">at work</a>, and up till recently we’ve been using mostly Cassandra 1.2 with <a href="https://github.com/Netflix/astyanax">Astyanax</a> and <a href="https://thrift.apache.org/">Thrift</a> protocol in Java applications. Very recently we started adopting Cassandra 2.0 with <span class="caps">CQL</span>, <a href="https://github.com/datastax/java-driver">DataStax Java Driver</a> and binary protocol.</p>
<p>While one should move to <span class="caps">CQL</span> schema to take full advantage of the new protocol and storage engine, it’s still possible to use <span class="caps">CQL</span> and the new driver on existing clusters. Say we have a legacy column family with <code>UTF8Type</code> for row/column keys and <code>BytesType</code> for values, it would look like this in <code>cassandra-cli</code>:</p>
<div class="highlight"><pre><span></span><span class="k">create</span> <span class="k">column</span> <span class="n">family</span> <span class="k">data</span>
<span class="k">with</span> <span class="n">column_type</span> <span class="o">=</span> <span class="s1">'Standard'</span>
<span class="k">and</span> <span class="n">comparator</span> <span class="o">=</span> <span class="s1">'UTF8Type'</span>
<span class="k">and</span> <span class="n">default_validation_class</span> <span class="o">=</span> <span class="s1">'BytesType'</span>
<span class="k">and</span> <span class="n">key_validation_class</span> <span class="o">=</span> <span class="s1">'UTF8Type'</span><span class="p">;</span>
</pre></div>
<p>And this in <code>cqlsh</code> after setting <code>start_native_transport: true</code> in <code>cassandra.yaml</code>:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="k">data</span> <span class="p">(</span>
<span class="k">key</span> <span class="nb">text</span><span class="p">,</span>
<span class="n">column1</span> <span class="nb">text</span><span class="p">,</span>
<span class="n">value</span> <span class="nb">blob</span><span class="p">,</span>
<span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="k">key</span><span class="p">,</span> <span class="n">column1</span><span class="p">)</span>
<span class="p">)</span> <span class="k">WITH</span> <span class="n">COMPACT</span> <span class="k">STORAGE</span><span class="p">;</span>
</pre></div>
<p>In this table, <code>key</code> and <code>column1</code> corresponds to row and column keys in the legacy column family and <code>value</code> corresponds to column value.</p>
<p>Queries to look up a column value, an entire row, and selected columns in a row would look like this:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">value</span> <span class="k">FROM</span> <span class="n">mykeyspace</span><span class="p">.</span><span class="k">data</span> <span class="k">WHERE</span> <span class="k">key</span> <span class="o">=</span> <span class="s1">'rowkey'</span> <span class="k">AND</span> <span class="n">column1</span> <span class="o">=</span> <span class="s1">'colkey'</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="n">column1</span><span class="p">,</span> <span class="n">value</span> <span class="k">FROM</span> <span class="n">mykeyspace …</span></pre></div>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/using-cql-with-legacy-column-families.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/dotfiles-update.html">dotfiles update</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2014-08-21T22:14:00-04:00"> Thu 21 August 2014</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/dotfiles.html">dotfiles</a>
/
<a href="http://www.lyh.me/tag/tmux.html">tmux</a>
/
<a href="http://www.lyh.me/tag/vim.html">vim</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p>I’ve been using my current <a href="http://www.lyh.me/dotfiles.html">dotfiles</a> setup for a while and felt it’s time to freshen up. I focused on updating the look and feel of Vim and tmux this round.</p>
<p>First I switched to <a href="https://github.com/tomasr/molokai">molokai</a> color theme for Vim, <a href="http://macromates.com/">TextMate</a> (monokai) and <a href="http://www.jetbrains.com/idea/">IntelliJ <span class="caps">IDEA</span></a> (using <a href="https://github.com/y3sh/Intellij-Colors-Sublime-Monokai">this</a>). I guess I grew tired of the old trusted <a href="http://ethanschoonover.com/solarized">solarized</a>, plus with my new MacBook Pro 13” at highest resolution, it just doesn’t feel sharp enough.</p>
<p>The <a href="https://github.com/Lokaltog/vim-powerline">vim-powerline</a> plugin I was using is being deprecated and replaced by <a href="https://github.com/Lokaltog/powerline">powerline</a>, which supports vim, tmux, zsh, and many others. However it requires Python and I had trouble using it with some really old Vim versions at work. So instead I switched to a pure VimL plugin, <a href="https://github.com/bling/vim-airline">vim-airline</a>. Not surprisingly there’s a companion plugin, <a href="https://github.com/edkolev/tmuxline.vim">tmuxline</a> for tmux as well. Both have no extra dependencies which is a big plus for me since I use the same dotfiles on Mac, my Ubuntu Trusty destop at work, and many Debian Squeeze servers.</p>
<p>I also updated a couple of other Vim plugins along the process, replacing <a href="https://github.com/garbas/vim-snipmate">vim-snipmate</a> with <a href="https://github.com/SirVer/ultisnips">ultisnips</a>, <a href="https://github.com/bitc/vim-bad-whitespace">vim-bad-whitespace</a> with <a href="https://github.com/ntpeters/vim-better-whitespace">vim-better-whitespace</a> (no pun intended), and adding <a href="https://github.com/airblade/vim-gitgutter">vim-gutter</a>. The biggest discovery is <a href="https://github.com/Lokaltog/vim-easymotion">vim-easymotion</a> though, perfect …</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/dotfiles-update.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/on-being-a-polyglot.html">On being a polyglot</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2014-08-21T21:26:00-04:00"> Thu 21 August 2014</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/c.html">c</a>
/
<a href="http://www.lyh.me/tag/cpp.html">cpp</a>
/
<a href="http://www.lyh.me/tag/python.html">python</a>
/
<a href="http://www.lyh.me/tag/javascript.html">javascript</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
/
<a href="http://www.lyh.me/tag/java.html">java</a>
/
<a href="http://www.lyh.me/tag/clojure.html">clojure</a>
/
<a href="http://www.lyh.me/tag/haskell.html">haskell</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p>I’m kind of known as a polyglot among coworkers. We would often argue that instead of hiring great Java/Python/C++ developers, we should rather strive to hire great engineers with strong <span class="caps">CS</span> fundamentals who can pick up any language easily. I came from scientific computing background, doing mostly C/C++/Python many years ago. Over the course of the last three years at my current job I coded seven languages professionally, some out of interest and some necessity. I enjoyed the experience learning all these different things and want to share my experience here, what I learned from each one of them and how it helps me becoming a better engineer.</p>
<h2>C</h2>
<p>The first language I used seriously, apart from <span class="caps">LOGO</span> <span class="amp">&</span> <span class="caps">BASIC</span> when I was a kid of course. It’s probably the closest thing one can get to the operating system and bare metal without dropping down to assembly (while you still can in C). It’s a simple language whose syntax served as the basis of many successors like C++ <span class="amp">&</span> Java. It doesn’t offer any fancy features like <span class="caps">OOP</span> or namespaces, but rather depends on the developer’s skill for organizing large code base (think …</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/on-being-a-polyglot.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/how-many-copies.html">How many copies</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2014-08-02T20:48:00-04:00"> Sat 02 August 2014</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/data.html">data</a>
/
<a href="http://www.lyh.me/tag/performance.html">performance</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p>One topic that came up a lot when optimizing Scala data applications is the performance of standard collections, or the hidden cost of temporary copies. The collections <span class="caps">API</span> is easy to learn and maps well to many Python concepts where a lot of data engineers are familiar with. But the performance penalty can be pretty big when it’s repeated over millions of records in a <span class="caps">JVM</span> with limited heap.</p>
<h2>Mapping values</h2>
<p>Let’s take a look at one most naive example first, mapping the values of a <code>Map</code>.</p>
<div class="highlight"><pre><span></span><span class="k">val</span> <span class="n">m</span> <span class="k">=</span> <span class="nc">Map</span><span class="o">(</span><span class="s">"A"</span> <span class="o">-></span> <span class="mi">1</span><span class="o">,</span> <span class="s">"B"</span> <span class="o">-></span> <span class="mi">2</span><span class="o">,</span> <span class="s">"C"</span> <span class="o">-></span> <span class="mi">3</span><span class="o">)</span>
<span class="n">m</span><span class="o">.</span><span class="n">toList</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">t</span> <span class="k">=></span> <span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="n">_1</span><span class="o">,</span> <span class="n">t</span><span class="o">.</span><span class="n">_2</span> <span class="o">+</span> <span class="mi">1</span><span class="o">)).</span><span class="n">toMap</span>
</pre></div>
<p>Looks simple enough but obviously not optimal. Two temporary <code>List[(String, Int)]</code> were created, one from <code>toList</code> and one from <code>map</code>. <code>map</code> also creates 3 copies of <code>(String, Int)</code>.</p>
<p>There are a few commonly seen variations. These don’t create temporary collections but still key-value tuples.</p>
<div class="highlight"><pre><span></span><span class="k">for</span> <span class="o">((</span><span class="n">k</span><span class="o">,</span> <span class="n">v</span><span class="o">)</span> <span class="k"><-</span> <span class="n">m</span><span class="o">)</span> <span class="k">yield</span> <span class="n">k</span> <span class="o">-></span> <span class="o">(</span><span class="n">v</span> <span class="o">+</span> <span class="mi">1</span><span class="o">)</span>
<span class="n">m</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">k</span><span class="o">,</span> <span class="n">v</span><span class="o">)</span> <span class="k">=></span> <span class="n">k</span> <span class="o">-></span> <span class="o">(</span><span class="n">v</span> <span class="o">+</span> <span class="mi">1</span><span class="o">)</span> <span class="o">}</span>
</pre></div>
<p>If one reads the <a href="http://www.scala-lang.org/api/2.10.4/index.html#scala.collection.immutable.Map">ScalaDoc</a> closely, there’s a <code>mapValues</code> method already and it probably is the shortest and most performant.</p>
<div class="highlight"><pre><span></span><span class="n">m</span><span class="o">.</span><span class="n">mapValues</span><span class="o">(</span><span class="k">_</span> <span class="o">+</span> <span class="mi">1</span><span class="o">)</span>
</pre></div>
<h2>Java conversion</h2>
<p>Similar problem exists …</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/how-many-copies.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/why-functional-why-scala.html">Why Functional? Why Scala?</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2014-07-28T23:03:00-04:00"> Mon 28 July 2014</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/fp.html">fp</a>
/
<a href="http://www.lyh.me/tag/scala.html">scala</a>
/
<a href="http://www.lyh.me/tag/data.html">data</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p>I recently did an internal talk at Spotify on why every data engineer should know something about functional programming languages and Scala. And here are the <a href="/slides/pitch.html">slides</a>.</p>
<iframe src="/slides/pitch.html" width="800" height="450"></iframe>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/why-functional-why-scala.html">more ...</a>
</div>
</article>
<hr/>
<article>
<h2><a href="http://www.lyh.me/light-table.html">Light Table</a></h2>
<div class="well well-sm">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2014-07-28T00:01:00-04:00"> Mon 28 July 2014</time>
</span>
<span class="label label-default">Category</span>
<a href="http://www.lyh.me/category/code.html">code</a>
<span class="label label-default">Tags</span>
<a href="http://www.lyh.me/tag/clojure.html">clojure</a>
/
<a href="http://www.lyh.me/tag/intellij-idea.html">intellij-idea</a>
/
<a href="http://www.lyh.me/tag/light-table.html">light-table</a>
</footer><!-- /.post-info --> </div>
<div class="summary"><p>I recently picked up <a href="http://www.lighttable.com/">Light Table</a> for <a href="http://clojure.org/">Clojure</a> development and liked it. Form evaluation works out of the box and indentation is better than that in <a href="http://plugins.jetbrains.com/plugin/?id=4050">La Clojure</a> plugin for <a href="http://www.jetbrains.com/idea/">IntelliJ <span class="caps">IDEA</span></a>.</p>
<p>I particularly like the idea of command bar, which allows you to search for Light Table commands by name and execute them quickly. I was already used to <span class="caps">IDEA</span>’s key map though (<code>Mac OS X 10.5+</code> which is more natural to Mac users than the default <code>Mac OS X</code>), and wanted something similar. The setting files are in Clojure so it’s easy to customize. This is what I got so far for <code>user.keymap</code>:</p>
<div class="highlight"><pre><span></span><span class="p">{</span><span class="ss">:+</span> <span class="p">{</span><span class="ss">:app</span> <span class="p">{</span><span class="s">"alt-space"</span> <span class="p">[</span><span class="ss">:show-commandbar-transient</span><span class="p">]}</span>
<span class="ss">:editor</span> <span class="p">{</span><span class="s">"alt-w"</span> <span class="p">[</span><span class="ss">:editor.watch.watch-selection</span><span class="p">]</span>
<span class="s">"alt-shift-w"</span> <span class="p">[</span><span class="ss">:editor.watch.unwatch</span><span class="p">]</span>
<span class="s">"ctrl-alt-i"</span> <span class="p">[</span><span class="ss">:smart-indent-selection</span><span class="p">]</span>
<span class="s">"ctrl-alt-c"</span> <span class="p">[</span><span class="ss">:toggle-console</span><span class="p">]</span>
<span class="s">"ctrl-shift-j"</span> <span class="p">[</span><span class="ss">:editor.sublime.joinLines</span><span class="p">]</span>
<span class="s">"pmeta-d"</span> <span class="p">[</span><span class="ss">:editor.sublime.duplicateLine</span><span class="p">]</span>
<span class="s">"pmeta-shift-up"</span> <span class="p">[</span><span class="ss">:editor.sublime.swapLineUp</span><span class="p">]</span>
<span class="s">"pmeta-shift-down"</span> <span class="p">[</span><span class="ss">:editor.sublime.swapLineDown</span><span class="p">]</span>
<span class="s">"pmeta-/"</span> <span class="p">[</span><span class="ss">:toggle-comment-selection</span> <span class="ss">:editor.line-down</span><span class="p">]}}}</span>
</pre></div>
<p>Apart from these, I found myself using <code>"pmeta-enter" [:eval-editor-form]</code> and <code>"ctrl-d" [:editor.doc.toggle]</code> most when writing Clojure code. After all they are probably the most essential ones no matter what editor you use :)</p>
<a class="btn btn-default btn-xs" href="http://www.lyh.me/light-table.html">more ...</a>
</div>
</article>
<hr/>
<ul class="pagination">
<li class="prev"><a href="http://www.lyh.me/index.html">«</a>
</li>
<li class=""><a
href="http://www.lyh.me/index.html">1</a></li>
<li class="active"><a
href="http://www.lyh.me/index2.html">2</a></li>
<li class=""><a
href="http://www.lyh.me/index3.html">3</a></li>
<li class="next"><a
href="http://www.lyh.me/index3.html">»</a></li>
</ul>
</div>
<div class="col-sm-3" id="sidebar">
<aside>
<div id="aboutme">
<p>
<strong>About Neville Li</strong><br/>
Data infrastructure @<a href="https://twitter.com/Spotify">Spotify</a>, ex-@<a href="https://twitter.com/Yahoo">Yahoo</a> search, das keyboard shredder, author of <a href="https://github.com/spotify/scio">Scio</a>
</p>
</div><!-- Sidebar -->
<section class="well well-sm">
<ul class="list-group list-group-flush">
<!-- Sidebar/Social -->
<li class="list-group-item">
<h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Social</span></h4>
<ul class="list-group" id="social">
<li class="list-group-item"><a href="http://open.spotify.com/user/sinisa_lyh"><i class="fa fa-spotify fa-lg"></i> Spotify</a></li>
<li class="list-group-item"><a href="https://github.com/nevillelyh"><i class="fa fa-github-square fa-lg"></i> GitHub</a></li>
<li class="list-group-item"><a href="https://twitter.com/sinisa_lyh"><i class="fa fa-twitter-square fa-lg"></i> Twitter</a></li>
<li class="list-group-item"><a href="https://www.linkedin.com/in/nevillelyh"><i class="fa fa-linkedin-square fa-lg"></i> LinkedIn</a></li>
<li class="list-group-item"><a href="http://www.slideshare.net/sinisalyh"><i class="fa fa-slideshare fa-lg"></i> SlideShare</a></li>
<li class="list-group-item"><a href="http://stackoverflow.com/users/3880836/neville-li"><i class="fa fa-stack-overflow fa-lg"></i> Stack-Overflow</a></li>
<li class="list-group-item"><a href="https://www.facebook.com/neville.lyh"><i class="fa fa-facebook-square fa-lg"></i> Facebook</a></li>
<li class="list-group-item"><a href="https://plus.google.com/+NevilleLiYH"><i class="fa fa-google-plus-square fa-lg"></i> Google+</a></li>
<li class="list-group-item"><a href="https://www.youtube.com/user/sinisalyh/videos"><i class="fa fa-youtube-square fa-lg"></i> YouTube</a></li>
<li class="list-group-item"><a href="https://www.flickr.com/photos/sinisa_lyh"><i class="fa fa-flickr fa-lg"></i> Flickr</a></li>
</ul>
</li>
<!-- End Sidebar/Social -->
<!-- Sidebar/Recent Posts -->
<li class="list-group-item">
<h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Recent Posts</span></h4>
<ul class="list-group" id="recentposts">
<li class="list-group-item"><a href="http://www.lyh.me/implicits.html">Implicits</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/scio-at-philly-ete.html">Scio at Philly <span class="caps">ETE</span></a></li>
<li class="list-group-item"><a href="http://www.lyh.me/joins.html">Joins</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/for-comprehensions.html">For comprehensions</a></li>
<li class="list-group-item"><a href="http://www.lyh.me/scio-at-scala-by-the-bay.html">Scio at Scala by the Bay</a></li>
</ul>
</li>
<!-- End Sidebar/Recent Posts -->
<!-- Sidebar/Categories -->
<li class="list-group-item">
<h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Categories</span></h4>
<ul class="list-group" id="categories">
<li class="list-group-item">
<a href="http://www.lyh.me/category/code.html"><i class="fa fa-folder-open fa-lg"></i>code</a>
</li>
<li class="list-group-item">
<a href="http://www.lyh.me/category/misc.html"><i class="fa fa-folder-open fa-lg"></i>misc</a>
</li>
</ul>
</li>
<!-- End Sidebar/Categories -->
</ul>
</section>
<!-- End Sidebar --> </aside>
</div>
</div>
</div>
<footer>
<div class="container">
<hr>
<div class="row">
<div class="col-xs-10">© 2017 Neville Li
· Powered by <a href="https://github.com/getpelican/pelican-themes/tree/master/pelican-bootstrap3" target="_blank">pelican-bootstrap3</a>,
<a href="http://docs.getpelican.com/" target="_blank">Pelican</a>,
<a href="http://getbootstrap.com" target="_blank">Bootstrap</a> <p><small> <a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/deed.en"><img alt="Creative Commons License" style="border-width:0" src="//i.creativecommons.org/l/by-nc/4.0/80x15.png" /></a>
Content
licensed under a <a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/deed.en">Creative Commons Attribution-NonCommercial 4.0 International License</a>, except where indicated otherwise.
</small></p>
</div>
<div class="col-xs-2"><p class="pull-right"><i class="fa fa-arrow-up"></i> <a href="#">Back to top</a></p></div>
</div>
</div>
</footer>
<script src="http://www.lyh.me/theme/js/jquery.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="http://www.lyh.me/theme/js/bootstrap.min.js"></script>
<!-- Enable responsive features in IE8 with Respond.js (https://github.com/scottjehl/Respond) -->
<script src="http://www.lyh.me/theme/js/respond.min.js"></script>
<!-- Disqus -->
<script type="text/javascript">
/* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
var disqus_shortname = 'lyh'; // required: replace example with your forum shortname
/* * * DON'T EDIT BELOW THIS LINE * * */
(function () {
var s = document.createElement('script');
s.async = true;
s.type = 'text/javascript';
s.src = '//' + disqus_shortname + '.disqus.com/count.js';
(document.getElementsByTagName('HEAD')[0] || document.getElementsByTagName('BODY')[0]).appendChild(s);
}());
</script>
<!-- End Disqus Code -->
<!-- Google Analytics -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-6988688-5']);
_gaq.push(['_trackPageview']);
(function () {
var ga = document.createElement('script');
ga.type = 'text/javascript';
ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(ga, s);
})();
</script>
<!-- End Google Analytics Code -->
</body>
</html>