Skip to content

collection of client for hadoop、spark、hbase、es and so on

Notifications You must be signed in to change notification settings

BowenSun90/client-collection

Repository files navigation

GitHub Release

Collection Utils

Various set of util project

Hadoop

配置Hadoop配置文件路径和kerberos验证信息 resources/application.properties

# Hadoop configuration root path
# ${hadoop.conf.path}/hdfs-site.xml
# ${hadoop.conf.path}/core-site.xml
# ${hadoop.conf.path}/mapred-site.xml
# ${hadoop.conf.path}/yarn-site.xml
hadoop.conf.path=${hadoop.conf.path}

# support kerberos
hadoop.security.authentication=${hadoop.conf.path}
username.client.kerberos.principal=${hadoop.conf.path}
username.client.keytab.file=${hadoop.conf.path}
java.security.krb5.conf=${hadoop.conf.path}

1.操作Hdfs文件和文件夹
com.alex.space.hadoop.utils.HdfsUtils

2.MapReduce示例

HBase Client

配置HBase连接信息 resources/application.properties

# HBase connection
hbase.zookeeper.quorum=${hbase.zookeeper.quorum}
hbase.zookeeper.property.clientPort=${hbase.zookeeper.property.clientPort}

# Hadoop and Hbase configuration root path
# ${hadoop.conf.path}/hdfs-site.xml
# ${hadoop.conf.path}/core-site.xml
# ${hbase.conf.path}/hbase-site.xml
hadoop.conf.path=${hadoop.conf.path}
hbase.conf.path=${hbase.conf.path}

# support kerberos
hadoop.security.authentication=none
username.client.kerberos.principal=
username.client.keytab.file=

1.HBaseAdmin操作HBase
com.alex.space.hbase.utils.HBaseAdminUtils

2.HBaseAPI操作HBase
com.alex.space.hbase.utils.HBaseUtils
配置HBase连接信息 resources/application.properties

3.创建预分区表
com.alex.space.hbase.utils.HBaseTableUtils
根据Rowkey首字母分区,0~9A~Za~z共62个region

建议Rowkey添加md5前缀,如果不需要scan一个区域的功能

4.Hbase MapReduce Scan
com.alex.space.hbase.mapred.HBaseMapRedScan
MR实现HBase的Scan,HBase默认ScanAPI效率低

ElasticSearch

配置ES连接信息 resources/application.properties

# elastic connection info
es.cluster.name=${es.cluster.name}
es.node.ip=${es.node.ip}
es.node.port=${es.node.port}

1.TransportClient操作Elastic
com.alex.space.elastic.utils.ElasticUtils

Zookeeper

配置ZK连接信息 resources/application.properties

# zookeeper connection info
zk.nodes=${zk.nodes}

1.ZKClient操作Zookeeper
com.alex.space.zoo.client.ZooClient

2.Curator操作Zookeeper
com.alex.space.zoo.curator.CuratorClientDemo

3.Curator高级特性
com.alex.space.zoo.curator.recipes

Spark

修改application.properties,设置env=local本地执行,env=prod生产集群执行
本地执行配置(如文件路径)在local.properties中配置
集群执行配置(如文件路径)在prod.properties中配置

1.Spark任务基类
com.alex.space.spark.mains.BaseWorker
默认的配置以default作为前缀

# application.conf
env=local
# local.conf
default.input=/home/hadoop/test
# com.alex.space.spark.mains.Worker
  ...
  println("input:" + configString("input"))
  ...

一个新的任务,继承BaseWorker,重写prefix,可以读取对应prefix的配置

# xxx.conf
test.input=/home/hadoop/example

# XXXWorker
override val prefix: String = "test"
println("input:" + configString("input"))

# input:/home/hadoop/example

2.Spark SQL与Hive互操作
com.alex.space.spark.mains.DataFrameDemo

3.Livy通过Rest提交Spark任务 com.alex.space.spark.livy 提交方式与spark-client/bin/wordCount.sh相同

Hive

1.hive常用udf方法
com.alex.space.hive.udf

Kafka

配置Kafka连接和Client配置信息 resources/application.properties

# broker list
# example:127.0.0.1:9092,127.0.0.2:9092
broker=127.0.0.1:9092
# topic name
topic=test

# kafka properties
# 与kafka原生配置一致
request.required.acks=0
producer.type=async
serializer.class=kafka.serializer.StringEncoder
message.send.max.retries=3
batch.num.messages=10
send.buffer.bytes=102400

1.Kafka消费者
com.alex.space.JavaKafkaProducer

2.Kafka消费者消息分区
com.alex.space.JavaKafkaProducerPartitioner

To be continue

About

collection of client for hadoop、spark、hbase、es and so on

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published