Various set of util project
配置Hadoop配置文件路径和kerberos验证信息 resources/application.properties
# Hadoop configuration root path
# ${hadoop.conf.path}/hdfs-site.xml
# ${hadoop.conf.path}/core-site.xml
# ${hadoop.conf.path}/mapred-site.xml
# ${hadoop.conf.path}/yarn-site.xml
hadoop.conf.path=${hadoop.conf.path}
# support kerberos
hadoop.security.authentication=${hadoop.conf.path}
username.client.kerberos.principal=${hadoop.conf.path}
username.client.keytab.file=${hadoop.conf.path}
java.security.krb5.conf=${hadoop.conf.path}
1.操作Hdfs文件和文件夹
com.alex.space.hadoop.utils.HdfsUtils
2.MapReduce示例
- 2.1 排序
- 2.1.1 简单排序
com.alex.space.hadoop.example.simplesort - 2.1.2 二次排序
com.alex.space.hadoop.example.secondsort
- 2.1.1 简单排序
- 2.2 比较
- 2.2.1 按值排序
com.alex.space.hadoop.example,comparator
- 2.2.1 按值排序
- 2.3 分区
- 2.3.1 自定义分区
com.alex.space.hadoop.example,partition
- 2.3.1 自定义分区
- 2.4 连接
- 2.4.1 Map-side join com.alex.space.hadoop.example,join.MapJoinApp
- 2.4.2 Reduce-side join
com.alex.space.hadoop.example,join.ReduceJoinApp
- 2.5 数据库操作
- 2.5.1 Input
com.alex.space.hadoop.example,database.DBInputApp - 2.5.2 Output
com.alex.space.hadoop.example,database.DBOutputApp
- 2.5.1 Input
- 2.6 数据去重
- 2.6.1 数据去重
com.alex.space.hadoop.example,diff.DiffApp
- 2.6.1 数据去重
- 2.7 自定义数据类型
- 2.7.1 自定义数据类型
com.alex.space.hadoop.example,entity
- 2.7.1 自定义数据类型
- 2.8 索引
- 2.8.1 倒排索引
com.alex.space.hadoop.example,index
- 2.8.1 倒排索引
- 2.9 实例
- 2.9.1 WordCount
com.alex.space.hadoop.example,wordcount - 2.9.2 KPI Analysis
com.alex.space.hadoop.example,kpi
- 2.9.1 WordCount
配置HBase连接信息 resources/application.properties
# HBase connection
hbase.zookeeper.quorum=${hbase.zookeeper.quorum}
hbase.zookeeper.property.clientPort=${hbase.zookeeper.property.clientPort}
# Hadoop and Hbase configuration root path
# ${hadoop.conf.path}/hdfs-site.xml
# ${hadoop.conf.path}/core-site.xml
# ${hbase.conf.path}/hbase-site.xml
hadoop.conf.path=${hadoop.conf.path}
hbase.conf.path=${hbase.conf.path}
# support kerberos
hadoop.security.authentication=none
username.client.kerberos.principal=
username.client.keytab.file=
1.HBaseAdmin操作HBase
com.alex.space.hbase.utils.HBaseAdminUtils
2.HBaseAPI操作HBase
com.alex.space.hbase.utils.HBaseUtils
配置HBase连接信息 resources/application.properties
3.创建预分区表
com.alex.space.hbase.utils.HBaseTableUtils
根据Rowkey首字母分区,0~9A~Za~z共62个region
建议Rowkey添加md5前缀,如果不需要scan一个区域的功能
4.Hbase MapReduce Scan
com.alex.space.hbase.mapred.HBaseMapRedScan
MR实现HBase的Scan,HBase默认ScanAPI效率低
配置ES连接信息 resources/application.properties
# elastic connection info
es.cluster.name=${es.cluster.name}
es.node.ip=${es.node.ip}
es.node.port=${es.node.port}
1.TransportClient操作Elastic
com.alex.space.elastic.utils.ElasticUtils
配置ZK连接信息 resources/application.properties
# zookeeper connection info
zk.nodes=${zk.nodes}
1.ZKClient操作Zookeeper
com.alex.space.zoo.client.ZooClient
2.Curator操作Zookeeper
com.alex.space.zoo.curator.CuratorClientDemo
3.Curator高级特性
com.alex.space.zoo.curator.recipes
修改application.properties
,设置env=local
本地执行,env=prod
生产集群执行
本地执行配置(如文件路径)在local.properties
中配置
集群执行配置(如文件路径)在prod.properties
中配置
1.Spark任务基类
com.alex.space.spark.mains.BaseWorker
默认的配置以default
作为前缀
# application.conf
env=local
# local.conf
default.input=/home/hadoop/test
# com.alex.space.spark.mains.Worker
...
println("input:" + configString("input"))
...
一个新的任务,继承BaseWorker,重写prefix,可以读取对应prefix的配置
# xxx.conf
test.input=/home/hadoop/example
# XXXWorker
override val prefix: String = "test"
println("input:" + configString("input"))
# input:/home/hadoop/example
2.Spark SQL与Hive互操作
com.alex.space.spark.mains.DataFrameDemo
3.Livy通过Rest提交Spark任务 com.alex.space.spark.livy 提交方式与spark-client/bin/wordCount.sh相同
1.hive常用udf方法
com.alex.space.hive.udf
配置Kafka连接和Client配置信息 resources/application.properties
# broker list
# example:127.0.0.1:9092,127.0.0.2:9092
broker=127.0.0.1:9092
# topic name
topic=test
# kafka properties
# 与kafka原生配置一致
request.required.acks=0
producer.type=async
serializer.class=kafka.serializer.StringEncoder
message.send.max.retries=3
batch.num.messages=10
send.buffer.bytes=102400
1.Kafka消费者
com.alex.space.JavaKafkaProducer
2.Kafka消费者消息分区
com.alex.space.JavaKafkaProducerPartitioner