-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
2169 lines (829 loc) · 147 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html class="theme-next muse use-motion" lang="zh-Hans">
<head>
<meta charset="UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=2"/>
<meta name="theme-color" content="#222">
<meta http-equiv="Cache-Control" content="no-transform" />
<meta http-equiv="Cache-Control" content="no-siteapp" />
<link href="/lib/font-awesome/css/font-awesome.min.css?v=4.6.2" rel="stylesheet" type="text/css" />
<link href="/css/main.css?v=6.4.0" rel="stylesheet" type="text/css" />
<link rel="apple-touch-icon" sizes="180x180" href="/images/apple-touch-icon-next.png?v=6.4.0">
<link rel="icon" type="image/png" sizes="32x32" href="/images/favicon-32x32-next.png?v=6.4.0">
<link rel="icon" type="image/png" sizes="16x16" href="/images/favicon-16x16-next.png?v=6.4.0">
<link rel="mask-icon" href="/images/logo.svg?v=6.4.0" color="#222">
<script type="text/javascript" id="hexo.configurations">
var NexT = window.NexT || {};
var CONFIG = {
root: '/',
scheme: 'Muse',
version: '6.4.0',
sidebar: {"position":"left","display":"post","offset":12,"b2t":false,"scrollpercent":false,"onmobile":false},
fancybox: false,
fastclick: false,
lazyload: false,
tabs: true,
motion: {"enable":true,"async":false,"transition":{"post_block":"fadeIn","post_header":"slideDownIn","post_body":"slideDownIn","coll_header":"slideLeftIn","sidebar":"slideUpIn"}},
algolia: {
applicationID: '',
apiKey: '',
indexName: '',
hits: {"per_page":10},
labels: {"input_placeholder":"Search for Posts","hits_empty":"We didn't find any results for the search: ${query}","hits_stats":"${hits} results found in ${time} ms"}
}
};
</script>
<meta property="og:type" content="website">
<meta property="og:title" content="代码块工作室">
<meta property="og:url" content="http://yoursite.com/index.html">
<meta property="og:site_name" content="代码块工作室">
<meta property="og:locale" content="zh-Hans">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="代码块工作室">
<link rel="canonical" href="http://yoursite.com/"/>
<script type="text/javascript" id="page.configurations">
CONFIG.page = {
sidebar: "",
};
</script>
<title>代码块工作室</title>
<noscript>
<style type="text/css">
.use-motion .motion-element,
.use-motion .brand,
.use-motion .menu-item,
.sidebar-inner,
.use-motion .post-block,
.use-motion .pagination,
.use-motion .comments,
.use-motion .post-header,
.use-motion .post-body,
.use-motion .collection-title { opacity: initial; }
.use-motion .logo,
.use-motion .site-title,
.use-motion .site-subtitle {
opacity: initial;
top: initial;
}
.use-motion {
.logo-line-before i { left: initial; }
.logo-line-after i { right: initial; }
}
</style>
</noscript>
</head>
<body itemscope itemtype="http://schema.org/WebPage" lang="zh-Hans">
<div class="container sidebar-position-left
page-home">
<div class="headband"></div>
<header id="header" class="header" itemscope itemtype="http://schema.org/WPHeader">
<div class="header-inner"><div class="site-brand-wrapper">
<div class="site-meta ">
<div class="custom-logo-site-title">
<a href="/" class="brand" rel="start">
<span class="logo-line-before"><i></i></span>
<span class="site-title">代码块工作室</span>
<span class="logo-line-after"><i></i></span>
</a>
</div>
</div>
<div class="site-nav-toggle">
<button aria-label="Toggle navigation bar">
<span class="btn-bar"></span>
<span class="btn-bar"></span>
<span class="btn-bar"></span>
</button>
</div>
</div>
<nav class="site-nav">
<ul id="menu" class="menu">
<li class="menu-item menu-item-home menu-item-active">
<a href="/" rel="section">
<i class="menu-item-icon fa fa-fw fa-home"></i> <br />Home</a>
</li>
<li class="menu-item menu-item-tags">
<a href="/tags/" rel="section">
<i class="menu-item-icon fa fa-fw fa-tags"></i> <br />Tags</a>
</li>
<li class="menu-item menu-item-categories">
<a href="/categories/" rel="section">
<i class="menu-item-icon fa fa-fw fa-th"></i> <br />Categories</a>
</li>
<li class="menu-item menu-item-archives">
<a href="/archives/" rel="section">
<i class="menu-item-icon fa fa-fw fa-archive"></i> <br />Archives</a>
</li>
</ul>
</nav>
</div>
</header>
<main id="main" class="main">
<div class="main-inner">
<div class="content-wrap">
<div id="content" class="content">
<section id="posts" class="posts-expand">
<article class="post post-type-normal" itemscope itemtype="http://schema.org/Article">
<div class="post-block">
<link itemprop="mainEntityOfPage" href="http://yoursite.com/2018/09/29/7.hive自定义一个jsonUDF/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="name" content="严亮">
<meta itemprop="description" content="">
<meta itemprop="image" content="/images/avatar.gif">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="代码块工作室">
</span>
<header class="post-header">
<h1 class="post-title" itemprop="name headline">
<a class="post-title-link" href="/2018/09/29/7.hive自定义一个jsonUDF/" itemprop="url">
7.hive自定义一个jsonUDF
</a>
</h1>
<div class="post-meta">
<span class="post-time">
<span class="post-meta-item-icon">
<i class="fa fa-calendar-o"></i>
</span>
<span class="post-meta-item-text">Posted on</span>
<time title="Created: 2018-09-29 16:56:47 / Modified: 17:06:55" itemprop="dateCreated datePublished" datetime="2018-09-29T16:56:47+08:00">2018-09-29</time>
</span>
<span class="post-category" >
<span class="post-meta-divider">|</span>
<span class="post-meta-item-icon">
<i class="fa fa-folder-o"></i>
</span>
<span class="post-meta-item-text">In</span>
<span itemprop="about" itemscope itemtype="http://schema.org/Thing"><a href="/categories/大数据/" itemprop="url" rel="index"><span itemprop="name">大数据</span></a></span>
</span>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<p>[TOC]</p>
<h1 id="1-编辑解析jar"><a href="#1-编辑解析jar" class="headerlink" title="1.编辑解析jar"></a>1.编辑解析jar</h1><h2 id="1-1创建项目"><a href="#1-1创建项目" class="headerlink" title="1.1创建项目"></a>1.1创建项目</h2><p>创建一个maven项目:pom.xml如些</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><project xmlns="http://maven.apache.org/POM/4.0.0"</span><br><span class="line"> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"</span><br><span class="line"> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"></span><br><span class="line"> <modelVersion>4.0.0</modelVersion></span><br><span class="line"> <groupId>com.codingkuai</groupId></span><br><span class="line"> <artifactId>parseJsonHive</artifactId></span><br><span class="line"> <version>0.0.1-SNAPSHOT</version></span><br><span class="line"></span><br><span class="line"> <dependencies></span><br><span class="line"> <dependency></span><br><span class="line"> <groupId>org.apache.hive</groupId></span><br><span class="line"> <artifactId>hive-exec</artifactId></span><br><span class="line"> <version>1.2.1</version></span><br><span class="line"> </dependency></span><br><span class="line"> <dependency></span><br><span class="line"> <groupId>org.apache.hadoop</groupId></span><br><span class="line"> <artifactId>hadoop-common</artifactId></span><br><span class="line"> <version>2.6.0</version></span><br><span class="line"> </dependency></span><br><span class="line"> <dependency></span><br><span class="line"> <groupId>jdk.tools</groupId></span><br><span class="line"> <artifactId>jdk.tools</artifactId></span><br><span class="line"> <version>1.6</version></span><br><span class="line"> <scope>system</scope></span><br><span class="line"> <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath></span><br><span class="line"> </dependency></span><br><span class="line"> </dependencies></span><br><span class="line"> <build></span><br><span class="line"> <plugins></span><br><span class="line"> <plugin></span><br><span class="line"> <groupId>org.apache.maven.plugins</groupId></span><br><span class="line"> <artifactId>maven-shade-plugin</artifactId></span><br><span class="line"> <version>2.2</version></span><br><span class="line"> <executions></span><br><span class="line"> <execution></span><br><span class="line"> <phase>package</phase></span><br><span class="line"> <goals></span><br><span class="line"> <goal>shade</goal></span><br><span class="line"> </goals></span><br><span class="line"> <configuration></span><br><span class="line"> <filters></span><br><span class="line"> <filter></span><br><span class="line"> <artifact>*:*</artifact></span><br><span class="line"> <excludes></span><br><span class="line"> <exclude>META-INF/*.SF</exclude></span><br><span class="line"> <exclude>META-INF/*.DSA</exclude></span><br><span class="line"> <exclude>META-INF/*.RSA</exclude></span><br><span class="line"> </excludes></span><br><span class="line"> </filter></span><br><span class="line"> </filters></span><br><span class="line"> </configuration></span><br><span class="line"> </execution></span><br><span class="line"> </executions></span><br><span class="line"> </plugin></span><br><span class="line"> </plugins></span><br><span class="line"> </build></span><br><span class="line"></project></span><br></pre></td></tr></table></figure>
<h2 id="1-2编写业务逻辑"><a href="#1-2编写业务逻辑" class="headerlink" title="1.2编写业务逻辑"></a>1.2编写业务逻辑</h2><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> com.codingkuai;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">MovieRate</span> </span>{</span><br><span class="line"> <span class="keyword">private</span> String movie;</span><br><span class="line"> <span class="keyword">private</span> String rate;</span><br><span class="line"> <span class="keyword">private</span> String timeStamp;</span><br><span class="line"> <span class="keyword">private</span> String uid;</span><br><span class="line"> <span class="function"><span class="keyword">public</span> String <span class="title">getMovie</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> movie;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">setMovie</span><span class="params">(String movie)</span> </span>{</span><br><span class="line"> <span class="keyword">this</span>.movie = movie;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> String <span class="title">getRate</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> rate;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">setRate</span><span class="params">(String rate)</span> </span>{</span><br><span class="line"> <span class="keyword">this</span>.rate = rate;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> String <span class="title">getTimeStamp</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> timeStamp;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">setTimeStamp</span><span class="params">(String timeStamp)</span> </span>{</span><br><span class="line"> <span class="keyword">this</span>.timeStamp = timeStamp;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> String <span class="title">getUid</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> uid;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">setUid</span><span class="params">(String uid)</span> </span>{</span><br><span class="line"> <span class="keyword">this</span>.uid = uid;</span><br><span class="line"> }</span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="function"><span class="keyword">public</span> String <span class="title">toString</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> movie + <span class="string">"\t"</span> + rate + <span class="string">"\t"</span> + timeStamp + <span class="string">"\t"</span> + uid;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> com.codingkuai;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.hive.ql.exec.UDF;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> parquet.org.codehaus.jackson.map.ObjectMapper;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">ParseJsonUDF</span> <span class="keyword">extends</span> <span class="title">UDF</span> </span>{</span><br><span class="line"> <span class="function"><span class="keyword">public</span> String <span class="title">evaluate</span><span class="params">(String jsonLine)</span> </span>{</span><br><span class="line"> ObjectMapper objectMapper = <span class="keyword">new</span> ObjectMapper();</span><br><span class="line"> <span class="keyword">try</span> {</span><br><span class="line"> MovieRate readValue = objectMapper.readValue(jsonLine, MovieRate.class);</span><br><span class="line"> <span class="keyword">return</span> readValue.toString();</span><br><span class="line"> } <span class="keyword">catch</span> (Exception e) {</span><br><span class="line"> e.printStackTrace();</span><br><span class="line"> } </span><br><span class="line"> <span class="keyword">return</span> <span class="string">""</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h2 id="1-3打包"><a href="#1-3打包" class="headerlink" title="1.3打包"></a>1.3打包</h2><h1 id="2-创建临时函数"><a href="#2-创建临时函数" class="headerlink" title="2.创建临时函数"></a>2.创建临时函数</h1><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">1</span>.把程序打包放到目标机器上去</span><br><span class="line"><span class="number">2</span>.进入hive客户端,添加jar包:hive>add jar /opt/parseJson.jar</span><br><span class="line"><span class="number">3</span>.创建临时函数:create temporary function parsejson as <span class="string">'com.codingkuai.ParseJsonUDF'</span>;</span><br></pre></td></tr></table></figure>
<h1 id="3-测试"><a href="#3-测试" class="headerlink" title="3.测试"></a>3.测试</h1><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">创建存原始数据的表</span><br><span class="line"><span class="function">create table <span class="title">rat_json</span><span class="params">(line string)</span> row format delimited</span>;</span><br><span class="line"></span><br><span class="line">把json上传上去</span><br><span class="line">load data local inpath <span class="string">'/root/rating.json'</span> into table rat_json;</span><br><span class="line"></span><br><span class="line">创建存解析数据后的表</span><br><span class="line"><span class="function">create table <span class="title">t_rating</span><span class="params">(movieid string,rate <span class="keyword">int</span>,timestring string,uid string)</span></span></span><br><span class="line"><span class="function">row format delimited fields terminated by '\t'</span>;</span><br><span class="line"></span><br><span class="line">使用我们自定义的函数,把rat_json解析后插入到t_rating内</span><br><span class="line">insert overwrite table t_rating</span><br><span class="line"><span class="function">select <span class="title">split</span><span class="params">(parsejson(line)</span>,'\t')[0]as movieid,<span class="title">split</span><span class="params">(parsejson(line)</span>,'\t')[1] as rate,<span class="title">split</span><span class="params">(parsejson(line)</span>,'\t')[2] as timestring,<span class="title">split</span><span class="params">(parsejson(line)</span>,'\t')[3] as uid from rat_json</span>;</span><br></pre></td></tr></table></figure>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</div>
</article>
<article class="post post-type-normal" itemscope itemtype="http://schema.org/Article">
<div class="post-block">
<link itemprop="mainEntityOfPage" href="http://yoursite.com/2018/09/28/6.hive基本操作/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="name" content="严亮">
<meta itemprop="description" content="">
<meta itemprop="image" content="/images/avatar.gif">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="代码块工作室">
</span>
<header class="post-header">
<h1 class="post-title" itemprop="name headline">
<a class="post-title-link" href="/2018/09/28/6.hive基本操作/" itemprop="url">
6.hive基本操作
</a>
</h1>
<div class="post-meta">
<span class="post-time">
<span class="post-meta-item-icon">
<i class="fa fa-calendar-o"></i>
</span>
<span class="post-meta-item-text">Posted on</span>
<time title="Created: 2018-09-28 16:21:40 / Modified: 17:28:37" itemprop="dateCreated datePublished" datetime="2018-09-28T16:21:40+08:00">2018-09-28</time>
</span>
<span class="post-category" >
<span class="post-meta-divider">|</span>
<span class="post-meta-item-icon">
<i class="fa fa-folder-o"></i>
</span>
<span class="post-meta-item-text">In</span>
<span itemprop="about" itemscope itemtype="http://schema.org/Thing"><a href="/categories/大数据/" itemprop="url" rel="index"><span itemprop="name">大数据</span></a></span>
</span>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<p>[TOC]</p>
<h1 id="hive创建表以及如何加载数据到hive表中"><a href="#hive创建表以及如何加载数据到hive表中" class="headerlink" title="hive创建表以及如何加载数据到hive表中"></a>hive创建表以及如何加载数据到hive表中</h1><h2 id="建表语法"><a href="#建表语法" class="headerlink" title="建表语法"></a>建表语法</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name </span><br><span class="line"> [(col_name data_type [COMMENT col_comment], ...)] </span><br><span class="line"> [COMMENT table_comment] </span><br><span class="line"> [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] </span><br><span class="line"> [CLUSTERED BY (col_name, col_name, ...) </span><br><span class="line"> [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] </span><br><span class="line"> [ROW FORMAT row_format] </span><br><span class="line"> [STORED AS file_format] </span><br><span class="line"> [LOCATION hdfs_path]</span><br><span class="line"></span><br><span class="line">说明:</span><br><span class="line">1、 CREATE TABLE 创建一个指定名字的表。如果相同名字的表已经存在,则抛出异常;用户可以用 IF NOT EXISTS 选项来忽略这个异常。</span><br><span class="line">2、 EXTERNAL关键字可以让用户创建一个外部表,在建表的同时指定一个指向实际数据的路径(LOCATION),Hive 创建内部表时,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不对数据的位置做任何改变。在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。</span><br><span class="line">3、 LIKE 允许用户复制现有的表结构,但是不复制数据。</span><br><span class="line">4、 ROW FORMAT </span><br><span class="line">DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] </span><br><span class="line"> [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] </span><br><span class="line"> | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]</span><br><span class="line">用户在建表的时候可以自定义 SerDe 或者使用自带的 SerDe。如果没有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED,将会使用自带的 SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的 SerDe,Hive通过 SerDe 确定表的具体的列的数据。</span><br><span class="line">5、 STORED AS </span><br><span class="line">SEQUENCEFILE|TEXTFILE|RCFILE</span><br><span class="line">如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCEFILE。</span><br><span class="line"></span><br><span class="line">6、CLUSTERED BY</span><br><span class="line">对于每一个表(table)或者分区, Hive可以进一步组织成桶,也就是说桶是更为细粒度的数据范围划分。Hive也是 针对某一列进行桶的组织。Hive采用对列值哈希,然后除以桶的个数求余的方式决定该条记录存放在哪个桶当中。 </span><br><span class="line">把表(或者分区)组织成桶(Bucket)有两个理由:</span><br><span class="line">(1)获得更高的查询处理效率。桶为表加上了额外的结构,Hive 在处理有些查询时能利用这个结构。具体而言,连接两个在(包含连接列的)相同列上划分了桶的表,可以使用 Map 端连接 (Map-side join)高效的实现。比如JOIN操作。对于JOIN操作两个表有一个相同的列,如果对这两个表都进行了桶操作。那么将保存相同列值的桶进行JOIN操作就可以,可以大大较少JOIN的数据量。</span><br><span class="line">(2)使取样(sampling)更高效。在处理大规模数据集时,在开发和修改查询的阶段,如果能在数据集的一小部分数据上试运行查询,会带来很多方便。</span><br></pre></td></tr></table></figure>
<h2 id="具体实施"><a href="#具体实施" class="headerlink" title="具体实施"></a>具体实施</h2><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> student(<span class="keyword">id</span> <span class="built_in">int</span>, <span class="keyword">name</span> <span class="keyword">string</span>, age <span class="built_in">int</span>) </span><br><span class="line"><span class="keyword">row</span> formate <span class="keyword">delimited</span></span><br><span class="line"><span class="keyword">fields</span> <span class="keyword">terminated</span> <span class="keyword">by</span> <span class="string">','</span>;</span><br></pre></td></tr></table></figure>
<h2 id="数据导入"><a href="#数据导入" class="headerlink" title="数据导入"></a>数据导入</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">方式一:直接把文件上传到/user/hive/warehouse/数据库名.db/表名/</span><br><span class="line"> hadoop fs -put /opt/student.txt /user/hive/warehouse/db1.db/student</span><br><span class="line">方式二: 通过load</span><br><span class="line"> 本地文件load到表</span><br><span class="line"> load data local inpath '/opt/student.txt' into table student;</span><br><span class="line"> hdfs内load到表 hdfs原路径下的文件不存在。</span><br><span class="line"> load data inpath '/student.txt' into table student;</span><br></pre></td></tr></table></figure>
<h1 id="修改表"><a href="#修改表" class="headerlink" title="修改表"></a>修改表</h1><h2 id="增加-删除分区"><a href="#增加-删除分区" class="headerlink" title="增加/删除分区"></a>增加/删除分区</h2><p>必须是分区表才能进行本次操作</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">create table t_patition(ip string, duration int)</span><br><span class="line">partitioned by(country string)</span><br><span class="line">row format delimited</span><br><span class="line">fields terminated by ',';</span><br></pre></td></tr></table></figure>
<figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">load</span> <span class="keyword">data</span> <span class="keyword">local</span> inpath <span class="string">'/opt/partition.txt'</span> <span class="keyword">into</span> <span class="keyword">table</span> t_patition <span class="keyword">partition</span>(country=<span class="string">"china"</span>);</span><br></pre></td></tr></table></figure>
<figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">添加分区</span><br><span class="line"><span class="keyword">alter</span> <span class="keyword">table</span> t_patition <span class="keyword">add</span> <span class="keyword">partition</span>(country=<span class="string">"Japan"</span>);</span><br></pre></td></tr></table></figure>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</div>
</article>
<article class="post post-type-normal" itemscope itemtype="http://schema.org/Article">
<div class="post-block">
<link itemprop="mainEntityOfPage" href="http://yoursite.com/2018/09/28/4.mapreduce程序编写/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="name" content="严亮">
<meta itemprop="description" content="">
<meta itemprop="image" content="/images/avatar.gif">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="代码块工作室">
</span>
<header class="post-header">
<h1 class="post-title" itemprop="name headline">
<a class="post-title-link" href="/2018/09/28/4.mapreduce程序编写/" itemprop="url">
4.mapreduce程序编写
</a>
</h1>
<div class="post-meta">
<span class="post-time">
<span class="post-meta-item-icon">
<i class="fa fa-calendar-o"></i>
</span>
<span class="post-meta-item-text">Posted on</span>
<time title="Created: 2018-09-28 14:19:08" itemprop="dateCreated datePublished" datetime="2018-09-28T14:19:08+08:00">2018-09-28</time>
<span class="post-meta-divider">|</span>
<span class="post-meta-item-icon">
<i class="fa fa-calendar-check-o"></i>
</span>
<span class="post-meta-item-text">Edited on</span>
<time title="Modified: 2018-09-26 15:01:55" itemprop="dateModified" datetime="2018-09-26T15:01:55+08:00">2018-09-26</time>
</span>
<span class="post-category" >
<span class="post-meta-divider">|</span>
<span class="post-meta-item-icon">
<i class="fa fa-folder-o"></i>
</span>
<span class="post-meta-item-text">In</span>
<span itemprop="about" itemscope itemtype="http://schema.org/Thing"><a href="/categories/大数据/" itemprop="url" rel="index"><span itemprop="name">大数据</span></a></span>
</span>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<h1 id="MAPREDUCE-原理"><a href="#MAPREDUCE-原理" class="headerlink" title="MAPREDUCE 原理"></a>MAPREDUCE 原理</h1><p>Mapreduce 是一个分布式运算程序的*<em>编程框架<strong>,</strong>是用户开发“基于hadoop**的数据分析应用”的核心框架;</em></p>
<p>Mapreduce<strong>核心功能</strong>是将用户编写的业务逻辑代码和自带默认组件整合成一个完整的分布式运算程序,并发运行在一个hadoop集群上</p>
<h2 id="为什么要MAPREDUCE"><a href="#为什么要MAPREDUCE" class="headerlink" title="为什么要MAPREDUCE"></a>为什么要MAPREDUCE</h2><p>(1)海量数据在单机上处理因为硬件资源限制,无法胜任<br>(2)而一旦将单机版程序扩展到集群来分布式运行,将极大增加程序的复杂度和开发难度<br>(3)引入mapreduce框架后,开发人员可以将绝大部分工作集中在业务逻辑的开发上,而将分布式计算中的复杂性交由框架来处理</p>
<h2 id="流程解析"><a href="#流程解析" class="headerlink" title="流程解析"></a>流程解析</h2><p>1、 一个mr程序启动的时候,最先启动的是MRAppMaster,MRAppMaster启动后根据本次job的描述信息,计算出需要的maptask实例数量,然后向集群申请机器启动相应数量的maptask进程</p>
<p>2、 maptask进程启动之后,根据给定的数据切片范围进行数据处理,主体流程为:<br>a) 利用客户指定的inputformat来获取RecordReader读取数据,形成输入KV对<br>b) 将输入KV对传递给客户定义的map()方法,做逻辑运算,并将map()方法输出的KV对收集到缓存<br>c) 将缓存中的KV对按照K分区排序后不断溢写到磁盘文件</p>
<p>3、 MRAppMaster监控到所有maptask进程任务完成之后,会根据客户指定的参数启动相应数量的reducetask进程,并告知reducetask进程要处理的数据范围(数据分区)</p>
<p>4、 Reducetask进程启动之后,根据MRAppMaster告知的待处理数据所在位置,从若干台maptask运行所在机器上获取到若干个maptask输出结果文件,并在本地进行重新归并排序,然后按照相同key的KV为一个组,调用客户定义的reduce()方法进行逻辑运算,并收集运算输出的结果KV,然后调用客户指定的outputformat将结果数据输出到外部存储</p>
<h1 id="MAPREDUCE-实践"><a href="#MAPREDUCE-实践" class="headerlink" title="MAPREDUCE 实践"></a>MAPREDUCE 实践</h1><h2 id="MAPREDUCE-示例编写及编程规范"><a href="#MAPREDUCE-示例编写及编程规范" class="headerlink" title="MAPREDUCE 示例编写及编程规范"></a>MAPREDUCE 示例编写及编程规范</h2><p>(1)用户编写的程序分成三个部分:Mapper,Reducer,Driver(提交运行mr程序的客户端)<br>(2)Mapper的输入数据是KV对的形式(KV的类型可自定义)<br>(3)Mapper的输出数据是KV对的形式(KV的类型可自定义)<br>(4)Mapper中的业务逻辑写在map()方法中<br>(5)map()方法(maptask进程)对每一个<K,V>调用一次<br>(6)Reducer的输入数据类型对应Mapper的输出数据类型,也是KV<br>(7)Reducer的业务逻辑写在reduce()方法中<br>(8)Reducetask进程对每一组相同k的<k,v>组调用一次reduce()方法<br>(9)用户自定义的Mapper和Reducer都要继承各自的父类<br>(10)整个程序需要一个Drvier来进行提交,提交的是一个描述了各种必要信息的job对象</p>
<h2 id="单词统计实例"><a href="#单词统计实例" class="headerlink" title="单词统计实例"></a>单词统计实例</h2><h3 id="定义一个mapper类"><a href="#定义一个mapper类" class="headerlink" title="定义一个mapper类"></a>定义一个mapper类</h3><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> cn.codingkuai.mapreduce;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> java.io.IOException;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.IntWritable;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.LongWritable;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Mapper;</span><br><span class="line"></span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> 介紹</span></span><br><span class="line"><span class="comment"> * KEYIN: 是值框架读取到的数据的key类型</span></span><br><span class="line"><span class="comment"> * 在默认的读取数据组件InputFormat下,读取的key是一行文本的偏移量,所以key的流行是long类型的</span></span><br><span class="line"><span class="comment"> * VALUEIN:指框架读取到数据的value类型</span></span><br><span class="line"><span class="comment"> * 在默认的读取数据组件InputFormat下,读到的value就是一行文本的内容,所以value的类型是String的</span></span><br><span class="line"><span class="comment"> * KEYOUT:是指用户自定义逻辑方法返回的数据中的key的类型 这个是由用户业务逻辑决定的。</span></span><br><span class="line"><span class="comment"> * 在我们的单词统计当中,我们输出的是单词作为key,所以类型是String</span></span><br><span class="line"><span class="comment"> * VALUEOUT:是指用户自定义逻辑方法返回的数据中value的类型 这个是由用户业务逻辑决定的。</span></span><br><span class="line"><span class="comment"> * 在我们的单词统计当中,我们输出的是单词数量作为value,所以类型是Integer</span></span><br><span class="line"><span class="comment"> * 但是,String, Long都是jdk中自带的数据类型,在序列化的时候,效率比较低,hadoop为了提高序列化效率,他就 自定义了一套数据类型。</span></span><br><span class="line"><span class="comment"> * Long -> LongWritable</span></span><br><span class="line"><span class="comment"> * String -> Text</span></span><br><span class="line"><span class="comment"> * Integer -> IntWritable</span></span><br><span class="line"><span class="comment"> * null -> NullWritable</span></span><br><span class="line"><span class="comment"> * <span class="doctag">@author</span> yanliang</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">WordCountMapper</span> <span class="keyword">extends</span> <span class="title">Mapper</span><<span class="title">LongWritable</span>, <span class="title">Text</span>, <span class="title">Text</span>, <span class="title">IntWritable</span>></span>{</span><br><span class="line"> </span><br><span class="line"> <span class="comment">/**</span></span><br><span class="line"><span class="comment"> * 这个map方法就是mapreduce程序中被主题程序MapTask所调用的用户业务逻辑方法</span></span><br><span class="line"><span class="comment"> * MapTask会驱动我们的读取数据组件InputFormat去读取数据(KEYIN,VALUE),每读取一个(K, V),他就会传入到这个用户写的map方法中去调用一次</span></span><br><span class="line"><span class="comment"> * 在默认的inputFormat冲突中,此处的key就是一行的起始偏移量,value就是一行的内容</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">map</span><span class="params">(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)</span></span></span><br><span class="line"><span class="function"> <span class="keyword">throws</span> IOException, InterruptedException </span>{</span><br><span class="line"> String line = value.toString();</span><br><span class="line"> String[] split = line.split(<span class="string">" "</span>);</span><br><span class="line"> <span class="keyword">for</span> (String word : split) {</span><br><span class="line"> context.write(<span class="keyword">new</span> Text(word), <span class="keyword">new</span> IntWritable(<span class="number">1</span>));</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h3 id="定义一个reducer类"><a href="#定义一个reducer类" class="headerlink" title="定义一个reducer类"></a>定义一个reducer类</h3><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> cn.codingkuai.mapreduce;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> java.io.IOException;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.IntWritable;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Reducer;</span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * reducetask在调用我们的reduce方法</span></span><br><span class="line"><span class="comment"> * reducetask应该接受到map阶段中所有maptask输出的数据中的一部分</span></span><br><span class="line"><span class="comment"> * (key.hashcode % numReduceTask == 本ReduceTask编号)</span></span><br><span class="line"><span class="comment"> * </span></span><br><span class="line"><span class="comment"> * reducetask将接受到的kv数据拿来处理时,是这样调用我们的reduce方法的:</span></span><br><span class="line"><span class="comment"> * 先将自己接受到所有kv对按照K分组(根据K是否相同)</span></span><br><span class="line"><span class="comment"> * 然后将一组kv中的K传给我们的reduce方法的key变量,把这一组kv中所有v用一个迭代器传给reduce方法的变量values</span></span><br><span class="line"><span class="comment"> * <span class="doctag">@author</span> yanliang</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">WordCountReduce</span> <span class="keyword">extends</span> <span class="title">Reducer</span><<span class="title">Text</span>, <span class="title">IntWritable</span>, <span class="title">Text</span>, <span class="title">IntWritable</span>> </span>{</span><br><span class="line"> </span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">reduce</span><span class="params">(Text key, Iterable<IntWritable> values,</span></span></span><br><span class="line"><span class="function"><span class="params"> Reducer<Text, IntWritable, Text, IntWritable>.Context context)</span> <span class="keyword">throws</span> IOException, InterruptedException </span>{</span><br><span class="line"> <span class="keyword">int</span> count = <span class="number">0</span>;</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">for</span> (IntWritable v : values) {</span><br><span class="line"> <span class="keyword">int</span> i = v.get();</span><br><span class="line"> count += i;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> context.write(key, <span class="keyword">new</span> IntWritable(count));</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h3 id="定义一个主类,用来描述job并提交job"><a href="#定义一个主类,用来描述job并提交job" class="headerlink" title="定义一个主类,用来描述job并提交job"></a>定义一个主类,用来描述job并提交job</h3><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> cn.codingkuai.mapreduce;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.conf.Configuration;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.fs.Path;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.IntWritable;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Job;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.input.FileInputFormat;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.input.TextInputFormat;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;</span><br><span class="line"></span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * 本类是客户端用来指定wordCount job程序运行时候所需要的很多参数 比如:指定哪一个类左右map阶段的业务逻辑类</span></span><br><span class="line"><span class="comment"> * 哪个类作为reduce阶段的业务逻辑类 指定用那个组件作为数据的读取组件 数据结果输出组件 指定这个wordcount jar包锁在的路径</span></span><br><span class="line"><span class="comment"> * 以及其它各种所需要的参数</span></span><br><span class="line"><span class="comment"> * </span></span><br><span class="line"><span class="comment"> * <span class="doctag">@author</span> yanliang</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">WordCountDriver</span> </span>{</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> Exception </span>{</span><br><span class="line"> Configuration conf = <span class="keyword">new</span> Configuration();</span><br><span class="line"> System.setProperty(<span class="string">"HADOOP_USER_NAME"</span>, <span class="string">"root"</span>);</span><br><span class="line"> conf.set(<span class="string">"fs.defaultFS"</span>, <span class="string">"hdfs://min1:9000"</span>);</span><br><span class="line"><span class="comment">// conf.set("mapreduce.framework.name", "yarn");</span></span><br><span class="line"><span class="comment">// conf.set("yarn.resourcemanager.hostname", "min1");</span></span><br><span class="line"> Job job = Job.getInstance(conf);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 告诉框架,我们的程序所有jar包的位置.linux上的路径</span></span><br><span class="line"><span class="comment">// job.setJar("/opt/temp/wordcount.jar");</span></span><br><span class="line"> job.setJarByClass(WordCountDriver.class);</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 告诉框架,我们程序所用的mapper和reduce类是什么</span></span><br><span class="line"> job.setMapperClass(WordCountMapper.class);</span><br><span class="line"> job.setReducerClass(WordCountReduce.class);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 告诉框架,我们程序输出的数据类型</span></span><br><span class="line"> job.setMapOutputKeyClass(Text.class); <span class="comment">// map阶段</span></span><br><span class="line"> job.setMapOutputValueClass(IntWritable.class);</span><br><span class="line"> job.setOutputKeyClass(Text.class); <span class="comment">// 最终结果</span></span><br><span class="line"> job.setOutputValueClass(IntWritable.class);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 告诉框架,我们程序使用的数据读取组件 结果输出所用的组件</span></span><br><span class="line"> <span class="comment">// TextInputFormat是mapreduce程序中内置的一种读取数据组件 准确的说叫做读取文件的输入组件</span></span><br><span class="line"> job.setInputFormatClass(TextInputFormat.class);</span><br><span class="line"> job.setOutputFormatClass(TextOutputFormat.class);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 告诉框架,我们要处理的数据文件在哪个路径下</span></span><br><span class="line"> FileInputFormat.setInputPaths(job, <span class="keyword">new</span> Path(<span class="string">"/wordcount/input"</span>));</span><br><span class="line"> FileOutputFormat.setOutputPath(job, <span class="keyword">new</span> Path(<span class="string">"/wordcount/output"</span>));</span><br><span class="line"></span><br><span class="line"> <span class="keyword">boolean</span> res = job.waitForCompletion(<span class="keyword">true</span>);</span><br><span class="line"> System.exit(res ? <span class="number">0</span> : <span class="number">1</span>);</span><br><span class="line"> <span class="comment">// hadoop 上运行</span></span><br><span class="line"> <span class="comment">//hadoop jar wordcount.jar cn.codingkuai.mapreduce.WordCountDriver</span></span><br><span class="line"> <span class="comment">// 插件结果</span></span><br><span class="line"> <span class="comment">//hadoop fs -cat /wordcount/output/part-r-00000</span></span><br><span class="line"> <span class="comment">/*</span></span><br><span class="line"><span class="comment"> * guo 1</span></span><br><span class="line"><span class="comment"> hello 7</span></span><br><span class="line"><span class="comment"> nihao 2</span></span><br><span class="line"><span class="comment"> Re. 1</span></span><br><span class="line"><span class="comment"> shi 1</span></span><br><span class="line"><span class="comment"> tom 1</span></span><br><span class="line"><span class="comment"> tom1 1</span></span><br><span class="line"><span class="comment"> uu 1</span></span><br><span class="line"><span class="comment"> wo 1</span></span><br><span class="line"><span class="comment"> yanliang 1</span></span><br><span class="line"><span class="comment"> zhong 1</span></span><br><span class="line"><span class="comment"> zhongguo 1</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h2 id="MAPREDUCE程序运行模式"><a href="#MAPREDUCE程序运行模式" class="headerlink" title="MAPREDUCE程序运行模式"></a>MAPREDUCE程序运行模式</h2><h3 id="本地运行模式"><a href="#本地运行模式" class="headerlink" title="本地运行模式"></a>本地运行模式</h3><p>(1)mapreduce程序是被提交给LocalJobRunner在本地以单进程的形式运行</p>
<p>(2)而处理的数据及输出结果可以在本地文件系统,也可以在hdfs上</p>
<p>(3)怎样实现本地运行?写一个程序,不要带集群的配置文件(本质是你的mr程序的conf中是否有mapreduce.framework.name=local以及yarn.resourcemanager.hostname参数)</p>
<p><em>(4)<strong>本地模式非常便于进行业务逻辑的debug</strong>,只要在eclipse**中打断点即可</em></p>
<p><em>如果在windows<strong>下想运行本地模式来测试程序逻辑,需要在windows</strong>中配置环境变量:</em></p>
<p><em>%HADOOP_HOME**% = d:/hadoop-2.6.1</em></p>
<p><em>%PATH% =</em> <em>%HADOOP_HOME**%\bin</em></p>
<p><em>并且要将d:/hadoop-2.6.1<strong>的lib</strong>和bin<strong>目录替换成windows</strong>平台编译的版本</em></p>
<h1 id="项目细节"><a href="#项目细节" class="headerlink" title="项目细节"></a>项目细节</h1><h2 id="自定义对象实现MR中的序列化接口"><a href="#自定义对象实现MR中的序列化接口" class="headerlink" title="自定义对象实现MR中的序列化接口"></a>自定义对象实现MR中的序列化接口</h2><p>如果需要将自定义的bean放在key中传输,则还需要实现comparable接口,因为mapreduce框中的shuffle过程一定会对key进行排序,此时,自定义的bean实现的接口应该是:</p>
<p>public class FlowBean implements WritableComparable<flowbean> </flowbean></p>
<p>需要自己实现的方法是:</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> cn.codingkuai;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> java.io.DataInput;</span><br><span class="line"><span class="keyword">import</span> java.io.DataOutput;</span><br><span class="line"><span class="keyword">import</span> java.io.IOException;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.WritableComparable;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">FlowBean</span> <span class="keyword">implements</span> <span class="title">WritableComparable</span><<span class="title">FlowBean</span>></span>{</span><br><span class="line"></span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">long</span> upFlow;</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">long</span> downFlow;</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">long</span> sumFlow;</span><br><span class="line"> </span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="title">FlowBean</span><span class="params">()</span> </span>{</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="title">FlowBean</span><span class="params">(<span class="keyword">long</span> upFlow, <span class="keyword">long</span> downFlow)</span> </span>{</span><br><span class="line"> <span class="keyword">super</span>();</span><br><span class="line"> <span class="keyword">this</span>.upFlow = upFlow;</span><br><span class="line"> <span class="keyword">this</span>.downFlow = downFlow;</span><br><span class="line"> <span class="keyword">this</span>.sumFlow = upFlow + downFlow;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="title">FlowBean</span><span class="params">(<span class="keyword">long</span> upFlow, <span class="keyword">long</span> downFlow, <span class="keyword">long</span> sumFlow)</span> </span>{</span><br><span class="line"> <span class="keyword">super</span>();</span><br><span class="line"> <span class="keyword">this</span>.upFlow = upFlow;</span><br><span class="line"> <span class="keyword">this</span>.downFlow = downFlow;</span><br><span class="line"> <span class="keyword">this</span>.sumFlow = sumFlow;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">set</span><span class="params">(<span class="keyword">long</span> upFlow, <span class="keyword">long</span> downFlow)</span> </span>{</span><br><span class="line"> <span class="keyword">this</span>.upFlow = upFlow;</span><br><span class="line"> <span class="keyword">this</span>.downFlow = downFlow;</span><br><span class="line"> <span class="keyword">this</span>.sumFlow = (upFlow + downFlow);</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">long</span> <span class="title">getUpFlow</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> upFlow;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">setUpFlow</span><span class="params">(<span class="keyword">long</span> upFlow)</span> </span>{</span><br><span class="line"> <span class="keyword">this</span>.upFlow = upFlow;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">long</span> <span class="title">getDownFlow</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> downFlow;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">setDownFlow</span><span class="params">(<span class="keyword">long</span> downFlow)</span> </span>{</span><br><span class="line"> <span class="keyword">this</span>.downFlow = downFlow;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">long</span> <span class="title">getSumFlow</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> sumFlow;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">setSumFlow</span><span class="params">(<span class="keyword">long</span> sumFlow)</span> </span>{</span><br><span class="line"> <span class="keyword">this</span>.sumFlow = sumFlow;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">/**</span></span><br><span class="line"><span class="comment"> * 序列化</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">write</span><span class="params">(DataOutput out)</span> <span class="keyword">throws</span> IOException </span>{</span><br><span class="line"> out.writeLong(<span class="keyword">this</span>.upFlow);</span><br><span class="line"> out.writeLong(<span class="keyword">this</span>.downFlow);</span><br><span class="line"> out.writeLong(<span class="keyword">this</span>.sumFlow);</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="comment">/**</span></span><br><span class="line"><span class="comment"> * 反序列化,顺序和序列化顺序一样</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">readFields</span><span class="params">(DataInput in)</span> <span class="keyword">throws</span> IOException </span>{</span><br><span class="line"> <span class="keyword">this</span>.upFlow = in.readLong();</span><br><span class="line"> <span class="keyword">this</span>.downFlow = in.readLong();</span><br><span class="line"> <span class="keyword">this</span>.sumFlow = in.readLong();</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="comment">/**</span></span><br><span class="line"><span class="comment"> * TextOutputFormat组件输出结果时候调用的是toString方法</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> <span class="function"><span class="keyword">public</span> String <span class="title">toString</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">this</span>.upFlow + <span class="string">"\t"</span> + <span class="keyword">this</span>.downFlow + <span class="string">"\t"</span> + <span class="keyword">this</span>.sumFlow;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">compareTo</span><span class="params">(FlowBean o)</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> (<span class="keyword">int</span>) (o.getSumFlow() - <span class="keyword">this</span>.getSumFlow());</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h2 id="自定义partitioner"><a href="#自定义partitioner" class="headerlink" title="自定义partitioner"></a>自定义partitioner</h2><p>如果我们通过key不同去到不通文件中的需求,这时候我们需要去设置ReduceTask个数默认是一个,其次去编写分片分规则。</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> cn.codingkuai;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> java.util.HashMap;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</span><br><span class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Partitioner;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">ProvincePartitioner</span> <span class="keyword">extends</span> <span class="title">Partitioner</span><<span class="title">Text</span>, <span class="title">FlowBean</span>></span>{</span><br><span class="line"> <span class="keyword">static</span> HashMap<String, Integer> provinceMap = <span class="keyword">new</span> HashMap<String, Integer>();</span><br><span class="line"></span><br><span class="line"> <span class="keyword">static</span> {</span><br><span class="line"></span><br><span class="line"> provinceMap.put(<span class="string">"135"</span>, <span class="number">0</span>);</span><br><span class="line"> provinceMap.put(<span class="string">"136"</span>, <span class="number">1</span>);</span><br><span class="line"> provinceMap.put(<span class="string">"137"</span>, <span class="number">2</span>);</span><br><span class="line"> provinceMap.put(<span class="string">"138"</span>, <span class="number">3</span>);</span><br><span class="line"> provinceMap.put(<span class="string">"139"</span>, <span class="number">4</span>);</span><br><span class="line"></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">getPartition</span><span class="params">(Text key, FlowBean value, <span class="keyword">int</span> numPartitions)</span> </span>{</span><br><span class="line"> Integer code = provinceMap.get(key.toString().substring(<span class="number">0</span>, <span class="number">3</span>));</span><br><span class="line"> <span class="keyword">return</span> code == <span class="keyword">null</span> ? <span class="number">5</span> : code;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">job.setPartitionerClass(ProvincePartitioner.class);</span><br><span class="line">job.setNumReduceTasks(<span class="number">6</span>);</span><br></pre></td></tr></table></figure>
<h2 id="MAPREDUCE中的Combiner"><a href="#MAPREDUCE中的Combiner" class="headerlink" title="MAPREDUCE中的Combiner"></a>MAPREDUCE中的Combiner</h2><p>(1)combiner是MR程序中Mapper和Reducer之外的一种组件</p>
<p>(2)combiner组件的父类就是Reducer</p>
<p>(3)combiner和reducer的区别在于运行的位置:</p>
<p>Combiner是在每一个maptask所在的节点运行</p>
<p>Reducer是接收全局所有Mapper的输出结果;</p>
<p>(4) combiner的意义就是对每一个maptask的输出进行局部汇总,以减小网络传输量</p>
<p>具体实现步骤:</p>
<p>1、 自定义一个combiner继承Reducer,重写reduce方法</p>
<p>2、 在job中设置: job.setCombinerClass(CustomCombiner.class)</p>
<p>(5) combiner能够应用的前提是不能影响最终的业务逻辑</p>
<p>而且,combiner的输出kv应该跟reducer的输入kv类型要对应起来</p>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</div>
</article>
<article class="post post-type-normal" itemscope itemtype="http://schema.org/Article">
<div class="post-block">
<link itemprop="mainEntityOfPage" href="http://yoursite.com/2018/09/26/5.hive安装/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="name" content="严亮">
<meta itemprop="description" content="">
<meta itemprop="image" content="/images/avatar.gif">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="代码块工作室">
</span>
<header class="post-header">
<h1 class="post-title" itemprop="name headline">
<a class="post-title-link" href="/2018/09/26/5.hive安装/" itemprop="url">
5.hive安装
</a>
</h1>
<div class="post-meta">
<span class="post-time">
<span class="post-meta-item-icon">
<i class="fa fa-calendar-o"></i>
</span>
<span class="post-meta-item-text">Posted on</span>
<time title="Created: 2018-09-26 14:40:32" itemprop="dateCreated datePublished" datetime="2018-09-26T14:40:32+08:00">2018-09-26</time>
<span class="post-meta-divider">|</span>
<span class="post-meta-item-icon">
<i class="fa fa-calendar-check-o"></i>
</span>
<span class="post-meta-item-text">Edited on</span>
<time title="Modified: 2018-09-28 15:38:48" itemprop="dateModified" datetime="2018-09-28T15:38:48+08:00">2018-09-28</time>
</span>
<span class="post-category" >
<span class="post-meta-divider">|</span>
<span class="post-meta-item-icon">
<i class="fa fa-folder-o"></i>
</span>
<span class="post-meta-item-text">In</span>
<span itemprop="about" itemscope itemtype="http://schema.org/Thing"><a href="/categories/大数据/" itemprop="url" rel="index"><span itemprop="name">大数据</span></a></span>
</span>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<p>Hive只在一个节点上安装即可</p>
<h1 id="安装mysql"><a href="#安装mysql" class="headerlink" title="安装mysql"></a>安装mysql</h1><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">参考:http://www.cnblogs.com/starof/p/4680083.html</span><br></pre></td></tr></table></figure>
<h1 id="配置hive"><a href="#配置hive" class="headerlink" title="配置hive"></a>配置hive</h1><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br></pre></td><td class="code"><pre><span class="line">(a)配置HIVE_HOME环境变量 vi conf/hive-env.sh 配置其中的$hadoop_home</span><br><span class="line"> HADOOP_HOME=/opt/hadoop-2.6.4</span><br><span class="line">(b)配置元数据库信息 vi hive-site.xml </span><br><span class="line"> 添加如下内容:</span><br><span class="line"><configuration></span><br><span class="line"><property></span><br><span class="line"><name>javax.jdo.option.ConnectionURL</name></span><br><span class="line"><value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true</value></span><br><span class="line"><description>JDBC connect string for a JDBC metastore</description></span><br><span class="line"></property></span><br><span class="line"></span><br><span class="line"><property></span><br><span class="line"><name>javax.jdo.option.ConnectionDriverName</name></span><br><span class="line"><value>com.mysql.jdbc.Driver</value></span><br><span class="line"><description>Driver class name for a JDBC metastore</description></span><br><span class="line"></property></span><br><span class="line"></span><br><span class="line"><property></span><br><span class="line"><name>javax.jdo.option.ConnectionUserName</name></span><br><span class="line"><value>root</value></span><br><span class="line"><description>username to use against metastore database</description></span><br><span class="line"></property></span><br><span class="line"></span><br><span class="line"><property></span><br><span class="line"><name>javax.jdo.option.ConnectionPassword</name></span><br><span class="line"><value>963852</value></span><br><span class="line"><description>password to use against metastore database</description></span><br><span class="line"></property></span><br><span class="line"></configuration></span><br><span class="line"></span><br><span class="line">(C)Jline包版本不一致的问题,需要拷贝hive的lib目录中jline.2.12.jar的jar包替换掉hadoop中的 </span><br><span class="line"> cp jline-2.12.jar /opt/hadoop-2.6.4/share/hadoop/yarn/lib/</span><br><span class="line"></span><br><span class="line">(d) 添加环境变量 vi /etc/profile</span><br><span class="line"> export JAVA_HOME=/opt/jdk1.8</span><br><span class="line"> export HADOOP_HOME=/opt/hadoop-2.6.4</span><br><span class="line"> export HIVE_HOME=/opt/hive</span><br><span class="line"> export PATH=${PATH}:${JAVA_HOME}/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin</span><br><span class="line"></span><br><span class="line">. /etc/profile </span><br><span class="line"></span><br><span class="line">(e) mysql驱动添加到lib目录下</span><br><span class="line">(f) 启动</span><br><span class="line"> hive</span><br><span class="line"> </span><br><span class="line"></span><br><span class="line">hive也可以启动为一个服务器,来对外提供</span><br><span class="line"></span><br><span class="line">启动方式,</span><br><span class="line">启动为前台:bin/hiveserver2</span><br><span class="line">启动为后台:nohup bin/hiveserver2 1>/var/log/hiveserver.log 2>/var/log/hiveserver.err &</span><br><span class="line"></span><br><span class="line">启动成功后,可以在别的节点上用beeline去连接</span><br><span class="line">方式(1)</span><br><span class="line">hive/bin/beeline 回车,进入beeline的命令界面</span><br><span class="line">输入命令连接hiveserver2</span><br><span class="line">beeline> !connect jdbc:hive2://min3:10000</span><br><span class="line">(itcast01是hiveserver2所启动的那台主机名,端口默认是10000)</span><br><span class="line"> 方式(2)</span><br><span class="line">或者启动就连接:</span><br><span class="line">bin/beeline -u jdbc:hive2://min3:10000 -n root</span><br><span class="line"></span><br><span class="line">接下来就可以做正常sql查询了</span><br></pre></td></tr></table></figure>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</div>
</article>
<article class="post post-type-normal" itemscope itemtype="http://schema.org/Article">