(Translated by https://www.hiragana.jp/)
GitHub - bluishglc/bdp: A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
Skip to content
/ bdp Public

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

Notifications You must be signed in to change notification settings

bluishglc/bdp

Repository files navigation

大数たいすうすえ平台ひらだい工程こうてい原型げんけい(Big Data Platform Project Prototype)

2008ねんHadoopなり为Apacheてき顶级项目,以此为开はし大数たいすうすえわざ术迎らいりょう十多年的持续发展,其间ずいSparkてき异军突起とっきせい个大すうすえなま态圈また经历りょういちそう备升级”,变得さらかん善和よしわ强大きょうだいこんてん,很多くわだて业已经完成かんせいりょう早期そうき对大すうすえわざ术的尝试探索たんさく转而进入いた应用阶段,ただしとく说的大数たいすうすえ平台ひらだいてき构体けい非常ひじょう庞大,わざ术堆栈特别深,从事大数たいすうすえ开发てき同学どうがく对此应该ふか有体ありていかい

而在很多细分领域(れい如实时计さんさく业调)也没ゆうぞう样的工程こうていばん, 这一现状あずかJavaしゃ使用しようSpring Bootしんしゅひねらい搭建Web工程こうてい原型げんけい形成けいせいりょう鲜明りょう对比。这导致很团队ざい启动大数たいすうすえ平台ひらだいけん设时往往おうおうかんじいた无所さむらい从,也使とく希望きぼう深入ふかいりがく习大すうすえわざ术的开发しゃよし于缺しょう工程こうてい级的しめせれい参考さんこう而感いた迷茫。

该原がた项目就是以此为命题创けんてき,它基于过往项目的もくてきさいけい实践ひさげ炼而らい以帮じょ团队快速かいそく启动开发,上手じょうず就写业务だい码。

ほん项目どう时是《大数たいすうすえ平台ひらだい构与原型げんけい实现:かずすえ中台ちゅうたいけん设实战》一书的配套示例代码。该书やめよし知名ちめいIT图书ひんぱい电子こう出版しゅっぱんしゃ博文ひろぶみ视点出版しゅっぱん发行,在京ざいきょう东和とうとうゆう售。 关于如何いか部署ぶしょ运行该项,书中做了非常ひじょう细致てき讲解。

きょう东购书链せっhttps://item.jd.com/12677623.html

とうとう购书链接:http://product.dangdang.com/28974965.html

てんじゅう磅推荐:けん大数たいすうすえ平台ひらだいふとし难了!给我发个工程こうてい原型げんけい吧!》了解りょうかい图书详情,扫码进入京にゅうきょう东手つくえ购书页面!

大数据平台架构与原型实现:数据中台建设实战


部署ぶしょ原型げんけい项目使用しよういたてき脚本きゃくほん

大数たいすうすえ平台ひらだい构与原型げんけい实现:かずすえ中台ちゅうたいけん设实战いち书的だい《4.5 しょ原型げんけい项目》节详细介绍了该项目的もくてき部署ぶしょ启动方法ほうほう,为了めん手写しゅしゃだい码,便びん于大操作そうさわが们将使用しようてきしょう关脚ほん配置はいちねば出来でき

  • 4.5.1.1 ざい远程ふく务器じょう建立こんりゅう应用ほどじょ专有账号
# Run as 'root'
# add group if not exists
group=bdp

egrep "^$group\:" /etc/group >& /dev/null
if [ "$?" != "0" ]
then
    groupadd "$group"
    echo "Group: $group is added."
fi
    
users=(bdp-metric bdp-collect bdp-dwh bdp-master-server bdp-stream bdp-workflow)
password='Bdpp1234!'
for user in ${users[@]}
do
    # add user if not exists and set password
    egrep "^$user\:" /etc/passwd >& /dev/null
    if [ "$?" != "0" ]
    then
        useradd -g "$group" "$user"
        echo "User: $user is added."
        echo "$user:$password"|chpasswd
        echo "User: $user, password is reset."
    fi
done

# enable all users of bdp group can sudo as hdfs. 
echo '%bdp ALL = (hdfs) NOPASSWD: ALL'>/etc/sudoers.d/bdp
  • 4.5.1.2 ざいHDFSじょう为程じょ专有账号创建Home
# create home on hdfs for users need hdfs storage
su -l hdfs
users=(bdp-dwh bdp-stream bdp-workflow)
for user in ${users[@]}
do
    home=/user/$user
    hdfs dfs -test -d $home && hdfs dfs -rm -r -f $home
    hdfs dfs -mkdir -p $home
    hdfs dfs -chown -R $user:bdp $home
done
exit
  • 4.5.1.3 ざいHDFSじょう创建すうすえ仓库しょ需文けん
# create data zones
su -l hdfs
dirs=(/data/src /data/dwh /data/dmt /data/app /data/tmp /data/stg)

for dir in ${dirs[@]}
do
    hdfs dfs -test -d $dir && hdfs dfs -rm -r -f $dir
    hdfs dfs -mkdir -p $dir
    hdfs dfs -chown -R bdp-dwh:bdp $dir
done

hdfs dfs -chmod a+w /data/tmp

exit
  • 4.5.1.4 创建すうすえ仓库
drop database if exists src cascade;
create database if not exists src
location '/data/src';

drop database if exists dwh cascade;
create database if not exists dwh
location '/data/dwh';

drop database if exists dmt cascade;
create database if not exists dmt
location '/data/dmt';

drop database if exists app cascade;
create database if not exists app
location '/data/app';

drop database if exists tmp cascade;
create database if not exists tmp
location '/data/tmp';

drop database if exists stg cascade;
create database if not exists stg
location '/data/stg';
  • 4.5.1.5 创建bdp-streamてきこころざし
mkdir /var/log/bdp-stream
chown bdp-stream:bdp /var/log/bdp-stream
chmod a+w /var/log/bdp-stream
  • 4.5.1.9 创建bdp_metricbdp_masterかずすえ
-- 1. bdp_metric
DROP DATABASE IF EXISTS bdp_metric;
CREATE DATABASE IF NOT EXISTS bdp_metric DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'bdp_metric'@'%';
CREATE USER IF NOT EXISTS 'bdp_metric'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON bdp_metric.* TO 'bdp_metric'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

-- 2. bdp_master

DROP DATABASE IF EXISTS bdp_master;
CREATE DATABASE IF NOT EXISTS bdp_master DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'bdp_master'@'%';
CREATE USER IF NOT EXISTS 'bdp_master'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON bdp_master.* TO 'bdp_master'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
  • 4.5.1.10 创建kafka topic
kafka-topics \
    --zookeeper master1.cluster:2181,master1.cluster:2181,utility1.cluster:2181 \
    --create \
    --topic cpu.usage \
    --partitions 12 \
    --replication-factor 3

kafka-topics \
    --zookeeper master1.cluster:2181,master1.cluster:2181,utility1.cluster:2181 \
    --describe \
    --topic cpu.usage

kafka-topics \
    --zookeeper master1.cluster:2181,master1.cluster:2181,utility1.cluster:2181 \
    --create \
    --topic mem.used \
    --partitions 12 \
    --replication-factor 3

kafka-topics \
    --zookeeper master1.cluster:2181,master1.cluster:2181,utility1.cluster:2181 \
    --describe \
    --topic mem.used

kafka-topics \
    --zookeeper master1.cluster:2181,master1.cluster:2181,utility1.cluster:2181 \
    --create \
    --topic alert \
    --partitions 12 \
    --replication-factor 3

kafka-topics \
    --zookeeper master1.cluster:2181,master1.cluster:2181,utility1.cluster:2181 \
    --describe \
    --topic alert
  • 4.5.1.11 创建HBaseすうすえひょう
disable 'metric'
drop 'metric'
create 'metric', {NAME=>'f', VERSIONS=>1, COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true'}

disable 'alert'
drop 'alert'
create 'alert', {NAME=>'f', VERSIONS=>1, COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true'}

disable 'server_state'
drop 'server_state'
create 'server_state', {NAME=>'f', VERSIONS=>1, COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true'}

あんそうしゅうぐん过程ちゅう使用しよういたてき脚本きゃくほん

大数たいすうすえ平台ひらだい构与原型げんけい实现:かずすえ中台ちゅうたいけん设实战》一书的第三章介绍了一个7节点CDHしゅうぐんてきあんそう过程,为了めん手写しゅしゃだい码,便びん于大操作そうさわが们将使用しようてきしょう关脚ほん配置はいちねば出来でき

  • 3.3.4.1 生成せいせいyum repoぶんけん
tee /etc/yum.repos.d/galera.repo <<EOF
[galera]
name = Galera
baseurl = https://releases.galeracluster.com/galera-3/centos/7/x86_64
gpgkey = https://releases.galeracluster.com/GPG-KEY-galeracluster.com
gpgcheck = 1

[mysql-wsrep]
name = MySQL-wsrep
baseurl =  http://releases.galeracluster.com/mysql-wsrep-5.7/centos/7/x86_64
gpgkey = http://releases.galeracluster.com/mysql-wsrep-5.7/GPG-KEY-galeracluster.com
gpgcheck = 1
EOF
  • 3.3.4.4 配置はいちGaleraしゅうぐん
[root@master1 ~]# tee /etc/my.cnf <<EOF
!includedir /etc/my.cnf.d/
[mysqld]
max_connections=1000
max_connect_errors=10000
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
binlog_format=ROW
bind-address=0.0.0.0
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=122M
character-set-server=utf8
collation-server=utf8_general_ci
character_set_server=utf8
collation_server=utf8_general_ci
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_provider_options="gcache.size=300M; gcache.page_size=300M"
wsrep_cluster_name="mysql_cluster"
wsrep_cluster_address="gcomm://master1.cluster,master2.cluster,utility1.cluster"
wsrep_node_name="master1.cluster"
wsrep_node_address="10.0.0.86"
wsrep_sst_method=rsync

[mysql_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
EOF
  • 3.3.5.1 载CDH Repository
nohup wget --recursive --no-parent --no-host-directories https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.15.2/ -P /opt/nginx/cloudera-repos &

nohup wget --recursive --no-parent --no-host-directories https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/RPM-GPG-KEY-cloudera -P /opt/nginx/cloudera-repos &

nohup wget --recursive --no-parent --no-host-directories https://archive.cloudera.com/cdh5/parcels/5.15.2/ -P /opt/nginx/cloudera-repos &
  • 3.3.5.2 あんそう配置はいちNginx
server {
    listen       80 default_server;
    listen       [::]:80 default_server;
    server_name  _;
    root         /opt/nginx;

    # Load configuration files for the default server block.
    include /etc/nginx/default.d/*.conf;

    location / {
        autoindex on;
    }
}
  • 3.3.5.3 生成せいせいRepository描述ぶんけん
tee /etc/yum.repos.d/cloudera-manager.repo <<EOF
[cloudera-manager]
name=Cloudera Manager 5.15.2
baseurl=http://utility1.cluster/cloudera-repos/cm5/redhat/7/x86_64/cm/5.15.2/
gpgkey=http://utility1.cluster/cloudera-repos/cm5/redhat/7/x86_64/cm/RPM-GPG-KEY-cloudera
gpgcheck=1
EOF
  • 3.3.6.3 创建CDHかくふく使用しようてきすうすえ
-- 1. scm
DROP DATABASE IF EXISTS scm;
CREATE DATABASE IF NOT EXISTS scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'scm'@'%';
CREATE USER IF NOT EXISTS 'scm'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON scm.* TO 'scm'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

-- 2. amon

DROP DATABASE IF EXISTS amon;
CREATE DATABASE IF NOT EXISTS amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'amon'@'%';
CREATE USER IF NOT EXISTS 'amon'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON amon.* TO 'amon'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

-- 3. rman

DROP DATABASE IF EXISTS rman;
CREATE DATABASE IF NOT EXISTS rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'rman'@'%';
CREATE USER IF NOT EXISTS 'rman'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON rman.* TO 'rman'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

-- 4. hue

DROP DATABASE IF EXISTS hue;
CREATE DATABASE IF NOT EXISTS hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'hue'@'%';
CREATE USER IF NOT EXISTS 'hue'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON hue.* TO 'hue'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

-- 5. hive

DROP DATABASE IF EXISTS hive;
CREATE DATABASE IF NOT EXISTS hive DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'hive'@'%';
CREATE USER IF NOT EXISTS 'hive'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

-- 6. sentry

DROP DATABASE IF EXISTS sentry;
CREATE DATABASE IF NOT EXISTS sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'sentry'@'%';
CREATE USER IF NOT EXISTS 'sentry'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON sentry.* TO 'sentry'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

-- 7. nav

DROP DATABASE IF EXISTS nav;
CREATE DATABASE IF NOT EXISTS nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'nav'@'%';
CREATE USER IF NOT EXISTS 'nav'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON nav.* TO 'nav'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

-- 8. navms

DROP DATABASE IF EXISTS navms;
CREATE DATABASE IF NOT EXISTS navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'navms'@'%';
CREATE USER IF NOT EXISTS 'navms'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON navms.* TO 'navms'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

-- 9. oozie

DROP DATABASE IF EXISTS oozie;
CREATE DATABASE IF NOT EXISTS oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

DROP USER IF EXISTS 'oozie'@'%';
CREATE USER IF NOT EXISTS 'oozie'@'%' IDENTIFIED BY 'Bdpp1234!';
GRANT ALL PRIVILEGES ON oozie.* TO 'oozie'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
  • 3.3.6.4. あんそうMySQL JDBC Driver
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.48.tar.gz
tar zxvf mysql-connector-java-5.1.48.tar.gz
mkdir -p /usr/share/java/
cp mysql-connector-java-5.1.48/mysql-connector-java-5.1.48-bin.jar /usr/share/java/mysql-connector-java.jar
  • 3.3.10 启用Spark SQL
cp hive-cli-1.2.1.spark2.jar /opt/cloudera/parcels/SPARK2/lib/spark2/jars/
cp spark-hive-thriftserver_2.11-2.3.0.jar /opt/cloudera/parcels/SPARK2/lib/spark2/jars/
cp stop-thriftserver.sh /opt/cloudera/parcels/SPARK2/lib/spark2/sbin/
cp start-thriftserver.sh /opt/cloudera/parcels/SPARK2/lib/spark2/sbin/
cp spark-sql /opt/cloudera/parcels/SPARK2/lib/spark2/bin/
alternatives --install /usr/bin/spark-shell spark-shell /opt/cloudera/parcels/SPARK2/bin/spark2-shell 1
alternatives --install /usr/bin/spark-sql spark-sql /opt/cloudera/parcels/SPARK2/bin/spark2-sql 1
alternatives --install /usr/bin/spark-submit spark-submit /opt/cloudera/parcels/SPARK2/bin/spark2-submit 1
alternatives --install /etc/spark/conf spark-conf /etc/spark2/conf.cloudera.spark2_on_yarn 1
alternatives --remove spark-shell /opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/bin/spark-shell
alternatives --remove spark-submit /opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/bin/spark-submit