导图社区 大数据相关软件安装流程SOP整理
大数据相关软件安装流程SOP整理:Hadoop(Apache)(数据存储及计算)、Hadoop(CDH)(数据存储及计算)、Hadoop(HDP)(数据存储及计算)……
编辑于2022-11-02 09:59:17 广东大数据相关软件安装流程SOP整理
Hadoop(Apache)(数据存储及计算)
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html HDFS HA在Zookeeper下
安装JDK
1.卸载现有JDK
(1)查询是否安装Java软件:
[atguigu@hadoop101 opt]$ rpm -qa | grep java
(2)如果安装的版本低于1.7,卸载该JDK:
[atguigu@hadoop101 opt]$ sudo rpm -e 软件包
(3)查看JDK安装路径:
[atguigu@hadoop101 ~]$ which java
2.导入解压
[atguigu@hadoop101 software]$ tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/
3.配置环境
[atguigu@hadoop101 software]$ sudo vi /etc/profile
profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
4.检查
java -version
安装Hadoop
1.导入解压
tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
2.配置路径
sudo vi /etc/profile
profile
##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
source /etc/profile
3.检查
hadoop version
完全分布式环境配置
1.虚拟机准备
防火墙
sudo service iptables stop
sudo chkconfig iptables off
静态IP
vim /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static
NAME="eth0"
IPADDR=192.168.5.101
PREFIX=24
GATEWAY=192.168.5.2
DNS1=192.168.5.2
vim /etc/udev/rules.d/70-persistent-net.rules
主机名
vim /etc/sysconfig/network
hosts目录
vim /etc/hosts
192.168.1.100 hadoop100
192.168.1.101 hadoop101
192.168.1.102 hadoop102
192.168.1.103 hadoop103
192.168.1.104 hadoop104
192.168.1.105 hadoop105
192.168.1.106 hadoop106
192.168.1.107 hadoop107
192.168.1.108 hadoop108
192.168.1.109 hadoop109
配置用户
useradd atguigu
passwd atguigu
vim /etc/sudoers
root ALL=(ALL) ALL
atguigu ALL=(ALL) NOPASSWD:ALL
建立文件夹
mkdir /opt/module /opt/software
chown atguigu:atguigu /opt/module /opt/software
配置分发脚本
cd ~
vi xsync
#!/bin/bash
#1 获取输入参数个数,如果没有参数,直接退出
pcount=$#
if ((pcount==0)); then
echo no args;
exit;
fi
#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname
#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir
#4 获取当前用户名称
user=`whoami`
#5 循环
for((host=103; host<105; host++)); do
echo ------------------- hadoop$host --------------
rsync -av $pdir/$fname $user@hadoop$host:$pdir
done
chmod +x xsync
sudo cp xsync /bin
sudo xsync /bin/xsync
2.配置SSH
ssh-keygen -t rsa
ssh-copy-id hadoop158
ssh hadoop103
exit
ssh hadoop104
exit
xsync /home/atguigu/.ssh
3.配置环境变量
环境
hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
core-site.xml
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:9000</value>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
hdfs-site.xml
<name>dfs.replication</name>
<value>3</value>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:50090</value>
yarn-site.xml
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
mapred-site.xml
配置
<name>mapreduce.framework.name</name>
<value>yarn</value>
<name>mapreduce.jobhistory.address</name>
<value>hadoop104:10020</value>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop104:19888</value>
启动
启动历史服务器:mr-jobhistory-daemon.sh start historyserver
slaves
hadoop102
hadoop103
hadoop104
4.群起并测试
分发
xsync /opt/module/hadoop-2.7.2/etc
格式化
hdfs namenode -format
启动
start-dfs.sh
start-yarn.sh
如果出问题
rm -rf data logs
LZO压缩配置
下载并解压LZO,置入hadoop/share/hadoop/common中
分发同步到各机
增加core-site.xml配置并同步
io.compression.codecs org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec io.compression.codec.lzo.class com.hadoop.compression.lzo.LzoCodec
<name>io.compression.codecs</name>
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
Hadoop扩容
Linux硬盘扩容与挂载
创建并格式化新分区
fdisk /dev/sda
m #进入帮助引导模式
n #新增分区
p #指定新分区为基本分区
一路回车 #但要记住分区号
w #保存并执行刚才的分区操作
reboot #重启
fdisk -l
mkfs.xfs /dev/sdax,x为分区号
创建路径并挂载盘符
mkdir /newdisk
起名沉思,下回增加新盘符,可以叫Eden
临时挂载
mount /dev/sdax /newdisk
永久挂载
vim /etc/fstab
/dev/sdax /newdisk ext4 defaults 0 0
赋予权限
这步一定要赋予给使用hadoop的用户,否则没有权限对盘符进行操作
chown -R atguigu:atguigu /newdisk
hdfs的扩容
vim /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
<name>dfs.datanode.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data, /newdisk</value>
Hadoop(CDH)(数据存储及计算)
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/installation.html
虚拟机准备
安装MySQL
卸载MySQL
查看是否安装Mysql
rpm -qa | grep -i mysql
查看MySQL服务是否启动,关闭
sudo service mysql status
sudo service mysql stop
卸载MySQL安装的组件
sudo rpm -e MySQL-server-5.6.24-1.el6.x86_64
sudo rpm -e MySQL-client-5.6.24-1.el6.x86_64
查找并删除MySQL相关的文件
whereis mysql
sudo find / -name mysql
sudo rm -rf /var/lib/mysql
sudo rm -rf /usr/lib64/mysql
安装MySQL
安装启动服务端
rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
cat /root/.mysql_secret
service mysql status
service mysql start
安装客户端
rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm
mysql -uroot -p
mysql>SET PASSWORD=PASSWORD('000000');
mysql>exit
配置User表
mysql -uroot -p
show databases;
use mysql;
show tables;
desc user;
update user set host='%' where host='localhost';
delete from user where Host='hadoop101';
delete from user where Host='127.0.0.1';
delete from user where Host='::1';
flush privileges;
配置分发脚本
cd ~
vi xsync
#!/bin/bash
#1 获取输入参数个数,如果没有参数,直接退出
pcount=$#
if ((pcount==0)); then
echo no args;
exit;
fi
#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname
#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir
#4 获取当前用户名称
user=`whoami`
#5 循环
for((host=103; host<105; host++)); do
echo ------------------- hadoop$host --------------
rsync -av $pdir/$fname $user@hadoop$host:$pdir
done
chmod +x xsync
sudo cp xsync /bin
sudo xsync /bin/xsync
安装JDK
1.卸载现有JDK
(1)查询是否安装Java软件:
[atguigu@hadoop101 opt]$ rpm -qa | grep java
(2)如果安装的版本低于1.7,卸载该JDK:
[atguigu@hadoop101 opt]$ sudo rpm -e 软件包
(3)查看JDK安装路径:
[atguigu@hadoop101 ~]$ which java
2.导入解压
[atguigu@hadoop101 software]$ tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/
3.配置环境
[atguigu@hadoop101 software]$ sudo vi /etc/profile
profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
4.检查
java -version
关闭SELinux
setenforce 0(临时关闭)
vim /etc/selinux/config
SELINUX=disabled
xsync /etc/selinux/config
下载第三方依赖
yum -y install chkconfig python bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse fuse-libs redhat-lsb
安装CM
tar -zxvf /opt/software/cloudera-manager-el6-cm5.12.1_x86_64.tar.gz -C /opt/module/cm/
在各机创建用户
useradd \
--system \
--home=/opt/module/cm/cm-5.12.1/run/cloudera-scm-server \
--no-create-home \
--shell=/bin/false \
--comment "Cloudera SCM User" cloudera-scm
修改Agent配置
vim /opt/module/cm/cm-5.12.1/etc/cloudera-scm-agent/config.ini
server_host=hadoop102
配置数据库
mkdir /usr/share/java/
tar -zxvf mysql-connector-java-5.1.27.tar.gz
cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /usr/share/java/
mv /usr/share/java/mysql-connector-java-5.1.27-bin.jar /usr/share/java/mysql-connector-java.jar
创建CM库并分发
/opt/module/cm/cm-5.12.1/share/cmf/schema/scm_prepare_database.sh mysql cm -hhadoop102 -uroot -p000000 --scm-host hadoop102 scm scm scm
xsync /opt/module/cm
创建Parcel-repo
mkdir -p /opt/cloudera/parcel-repo
chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
拷贝文件到/opt/cloudera/parcel-repo目录下
CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel
CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha1
manifest.json
mv CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha1 CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha
创建/opt/cloudera/parcels
mkdir -p /opt/cloudera/parcels
修改权限组
chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
分发
xsync /opt/cloudera/
启动和关闭服务
启动服务
服务节点
/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-server start
工作节点
/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent start
访问网站
http://hadoop102:7180(用户名、密码:admin)
关闭服务
工作节点
/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent stop
服务节点
/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-server stop
CM集群部署
傻瓜式安装
Hadoop(HDP)(数据存储及计算)
http://ambari.apache.org/index.html
虚拟机准备(三机均需要)
关闭防火墙
chkconfig iptables off
service iptables stop
chkconfig --list iptables
关闭SELINUX
vim /etc/sysconfig/selinux
SELINUX=disabled
安装JDK
SSH免密登陆(三机均需要)
ssh-keygen -t rsa
ssh-copy-id hadoop102
ssh-copy-id hadoop103
ssh-copy-id hadoop104
修改yum源
vim /etc/resolv.conf
nameserver 223.5.5.5
nameserver 223.6.6.6
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bk
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo
yum makecache
安装ntp
yum install -y ntp
chkconfig --list ntpd
chkconfig ntpd on
service ntpd start
关闭Linux的THP服务
vim /etc/grub.conf
transparent_hugepage=never
vim /etc/rc.local
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
exit 0
检查
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
always madvise [never]
配置UMASK
umask 0022
禁止离线更新
vim /etc/yum/pluginconf.d/refresh-packagekit.conf
enabled=0
安装Ambari集群
制作本地源
配置HTTPD服务
chkconfig httpd on
service httpd start
安装工具
yum install yum-utils createrepo yum-plugin-priorities -y
vim /etc/yum/pluginconf.d/priorities.conf
gpgcheck=0
下载并解压ambari-2.5.0.3/HDP-2.6.0.3/HDP-UTILS-1.1.0.21
tar -zxvf /opt/software/ambari-2.5.0.3-centos6.tar.gz -C /var/www/html/
mkdir /var/www/html/hdp
tar -zxvf /opt/software/HDP-2.6.0.3-centos6-rpm.tar.gz -C /var/www/html/hdp
tar -zxvf /opt/software/HDP-UTILS-1.1.0.21-centos6.tar.gz -C /var/www/html/hdp
创建本地源
cd /var/www/html/
createrepo ./
将Ambari存储库文件下载到安装主机上的目录中
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.6.1.5/ambari.repo -O /etc/yum.repos.d/ambari.repo
修改配置文件
vim /etc/yum.repos.d/ambari.repo
#VERSION_NUMBER=2.6.1.5-3
[ambari-2.6.1.5]
name=ambari Version - ambari-2.6.1.5
baseurl=http://hadoop102/ambari/centos6/
gpgcheck=0
gpgkey=http://hadoop102/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
vim /var/www/html/hdp/hdp-util.repo
[HDP-UTILS-1.1.0.21]
name=Hortonworks Data Platform Version - HDP-UTILS-1.1.0.21
baseurl=http://hadoop102/hdp/
gpgcheck=0
enabled=1
priority=1
vim /var/www/html/hdp/HDP/centos6/hdp.repo
#VERSION_NUMBER=2.6.0.3-8
[HDP-2.6.0.3]
name=HDP Version - HDP-2.6.0.3
baseurl=http://hadoop102/hdp/HDP/centos6/
gpgcheck=0
gpgkey=http://hadoop102/hdp/HDP/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
[HDP-UTILS-1.1.0.21]
name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
baseurl=http://hadoop102/hdp/
gpgcheck=0
gpgkey=http://hadoop102/hdp/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
yum clean all
yum makecache
检查
http://hadoop102/ambari/centos6/
http://hadoop102/hdp/HDP/centos6/
http://hadoop102/hdp/
安装MySQL
卸载MySQL
查看是否安装Mysql
rpm -qa | grep -i mysql
查看MySQL服务是否启动,关闭
sudo service mysql status
sudo service mysql stop
卸载MySQL安装的组件
sudo rpm -e MySQL-server-5.6.24-1.el6.x86_64
sudo rpm -e MySQL-client-5.6.24-1.el6.x86_64
rpm -e --nodeps mysql-libs-5.1.73-7.el6.x86_64
查找并删除MySQL相关的文件
whereis mysql
sudo find / -name mysql
sudo rm -rf /var/lib/mysql
sudo rm -rf /usr/lib64/mysql
安装MySQL
安装启动服务端
rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
cat /root/.mysql_secret
service mysql status
service mysql start
安装客户端
rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm
mysql -uroot -p
mysql>SET PASSWORD=PASSWORD('000000');
mysql>exit
配置User表
mysql -uroot -p
show databases;
use mysql;
show tables;
desc user;
update user set host='%' where host='localhost';
delete from user where Host='hadoop101';
delete from user where Host='127.0.0.1';
delete from user where Host='::1';
flush privileges;
安装Ambari
安装ambari-server
yum install ambari-server
拷贝mysql驱动
mkdir /usr/share/java
cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /usr/share/java/mysql-connector-java.jar
cp /usr/share/java/mysql-connector-java.jar /var/lib/ambari-server/resources/mysql-jdbc-driver.jar
vim /etc/ambari-server/conf/ambari.properties
server.jdbc.driver.path=/usr/share/java/mysql-connector-java.jar
在MySQL中创建数据库
create database ambari;
use ambari;
source /var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql;
grant all privileges on *.* to 'root'@'%' identified by '000000';
flush privileges;
配置Ambari
ambari-server setup
y
3
/opt/module/jdk1.8.0_144
y
3
y
启动Ambari
ambari-server start
ambari-server stop
HDP集群部署
集群搭建
http://hadoop102:8080/
Launch Install Wizard
本地库地址
http://hadoop158/hdp/HDP/centos6/
http://hadoop158/hdp/
id.rsa
Target hosts
hadoop158
hadoop161
hadoop162
id_rsa文件位于/root/.ssh下,隐藏文件
安装Hive
mkdir -p /path/to/mysql/
cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /path/to/mysql/mysql-connector-java.jar
ambari-server setup --jdbc-db=mysql --jdbc-driver=/path/to/mysql/mysql-connector-java.jar
安装Hive
创建Hive数据库
create database hive
添加服务,按流程进行
配置HDFS-HA
添加服务
nameservice
nameservice
Ranger
添加服务Ambari Infra
安装Ranger
在MySQL中创建用户并授权
mysql -uroot -p
create database ranger;
CREATE USER 'rangerdba'@'localhost' IDENTIFIED BY 'rangerdba';
GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'localhost';
CREATE USER 'rangerdba'@'%' IDENTIFIED BY 'rangerdba';
GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'%';
GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'localhost' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
配置JDBC链接
ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
添加服务,填写配置
Ranger DB host
Ranger DB username
Ranger DB password
Database Administrator (DBA) password
Ranger Audit
Audit to Solr
SolrCloud
Audit to HDFS
添加插件
Configs
Ranger Plugin
HDFS Ranger Plugin
YARN Ranger Plugin
操作Ranger
进入WebUI
账号密码均为admin
添加权限策略
子主题 1
验证是否生效
su – atguigu
hadoop fs -ls /user
hadoop fs -mkdir /user/test
Zookeeper(监控)
https://zookeeper.apache.org/doc/r3.5.5/zookeeperStarted.html
Zookeeper
解压分发
tar -zxvf zookeeper-3.4.10.tar.gz -C /opt/module/
xsync zookeeper-3.4.10/
配置服务编号
mkdir -p zkData
touch myid
vi myid
2
xsync myid
配置zoo.cfg(conf)
mv zoo_sample.cfg zoo.cfg
vim zoo.cfg
dataDir=/opt/module/zookeeper-3.4.10/zkData
#######################cluster##########################
server.2=hadoop102:2888:3888
server.3=hadoop103:2888:3888
server.4=hadoop104:2888:3888
xsync zoo.cfg
server.A=B:C:D
A是一个数字,表示这个是第几号服务器; 集群模式下配置一个文件myid,这个文件在dataDir目录下,这个文件里面有一个数据就是A的值,Zookeeper启动时读取此文件,拿到里面的数据与zoo.cfg里面的配置信息比较从而判断到底是哪个server。 B是这个服务器的地址; C是这个服务器Follower与集群中的Leader服务器交换信息的端口; D是万一集群中的Leader服务器挂了,需要一个端口来重新进行选举,选出一个新的Leader,而这个端口就是用来执行选举时服务器相互通信的端口。
启动
bin/zkServer.sh start
bin/zkServer.sh status
HDFS HA
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
创建并复制hadoop
mkdir /opt/ha
cp -r hadoop-2.7.2/ /opt/ha/
配置hadoop
环境
hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
core-site.xml
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
<name>hadoop.tmp.dir</name>
<value>/opt/ha/hadoop-2.7.2/data/tmp</value>
<name>ha.zookeeper.quorum</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
hdfs-site.xml
<name>dfs.nameservices</name>
<value>mycluster</value>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop102:9000</value>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop103:9000</value>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop102:50070</value>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop103:50070</value>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/atguigu/.ssh/id_rsa</value>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/ha/hadoop-2.7.2/data/jn</value>
<name>dfs.permissions.enable</name>
<value>false</value>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
xsync /opt/module/ha
启动
sbin/hadoop-daemon.sh start journalnode
NN1格式化并启动
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
NN2设置同步并启动
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
启动NN1并切换为Active
sbin/hadoop-daemons.sh start datanode
bin/hdfs haadmin -transitionToActive nn1
bin/hdfs haadmin -getServiceState nn1
配置完自动故障转移后再启动
sbin/stop-dfs.sh
bin/zkServer.sh start
bin/hdfs zkfc -formatZK
sbin/start-dfs.sh
Yarn HA
http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
配置yarn-site.xml
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster-yarn1</value>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop102</value>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop103</value>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
<name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
启动hdfs
sbin/hadoop-daemon.sh start journalnode
初始化NN1并启动
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
NN2同步并启动
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
启动所有DN
sbin/hadoop-daemons.sh start datanode
bin/hdfs haadmin -transitionToActive nn1
启动yarn
sbin/start-yarn.sh
sbin/yarn-daemon.sh start resourcemanager
bin/yarn rmadmin -getServiceState rm1
Hive(查询)
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
安装Hive及配置
tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /opt/module/
mv apache-hive-1.2.1-bin/ hive
mv hive-env.sh.template hive-env.sh
配置hive-env.sh
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export HIVE_CONF_DIR=/opt/module/hive/conf
Hadoop集群配置
必须启动hdfs和yarn
sbin/start-dfs.sh
sbin/start-yarn.sh
在HDFS上创建/tmp和/user/hive/warehouse并修改权限
bin/hadoop fs -mkdir /tmp
bin/hadoop fs -mkdir -p /user/hive/warehouse
bin/hadoop fs -chmod g+w /tmp
bin/hadoop fs -chmod g+w /user/hive/warehouse
安装MySQL
卸载MySQL
查看是否安装Mysql
rpm -qa | grep -i mysql
查看MySQL服务是否启动,关闭
sudo service mysql status
sudo service mysql stop
卸载MySQL安装的组件
sudo rpm -e MySQL-server-5.6.24-1.el6.x86_64
sudo rpm -e MySQL-client-5.6.24-1.el6.x86_64
查找并删除MySQL相关的文件
whereis mysql
sudo find / -name mysql
sudo rm -rf /var/lib/mysql
sudo rm -rf /usr/lib64/mysql
安装MySQL
安装启动服务端
rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
cat /root/.mysql_secret
service mysql status
service mysql start
安装客户端
rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm
mysql -uroot -p
mysql>SET PASSWORD=PASSWORD('000000');
mysql>exit
配置User表
mysql -uroot -p
show databases;
use mysql;
show tables;
desc user;
update user set host='%' where host='localhost';
delete from user where Host='hadoop102';
delete from user where Host='127.0.0.1';
delete from user where Host='::1';
flush privileges;
Hive元数据配置到MySQL
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration
驱动拷贝
tar -zxvf mysql-connector-java-5.1.27.tar.gz
cp mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib/
配置Metastore到MySQL
touch /opt/module/hive/conf/hive-site.xml
vi hive-site.xml
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
<name>javax.jdo.option.ConnectionPassword</name>
<value>000000</value>
<description>password to use against metastore database</description>
数仓配置
Default数据仓库的最原始位置是在hdfs上的:/user/hive/warehouse路径下。
在仓库目录下,没有对默认的数据库default创建文件夹。如果某张表属于default数据库,直接在数据仓库目录下创建一个文件夹。
hive-site.xml
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
bin/hdfs dfs -chmod g+w /user/hive/warehouse
查询后信息显示配置
hive-site.xml
<name>hive.cli.print.header</name>
<value>true</value>
<name>hive.cli.print.current.db</name>
<value>true</value>
运行日志信息配置
Hive的log默认存放在/tmp/atguigu/hive.log目录下
修改hive的log存放日志到/opt/module/hive/logs
mv hive-log4j.properties.template hive-log4j.properties
vim hive-log4j.properties
hive.log.dir=/opt/module/hive/logs
Hive配置Tez引擎
hive-env.sh
# Folder containing extra libraries required for hive compilation/execution can be controlled by: export TEZ_HOME=/opt/module/tez-0.9.1 #是你的tez的解压目录 export TEZ_JARS="" for jar in `ls $TEZ_HOME |grep jar`; do export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar done for jar in `ls $TEZ_HOME/lib`; do export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar done export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS
hive-site.xml
hive.execution.engine tez
配置Tez
/opt/module/hive/conf
tez-site.xml
tez.lib.uris ${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib tez.lib.uris.classpath ${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib tez.use.cluster.hadoop-libs true tez.history.logging.service.class org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
上传Tez到集群
hadoop fs -mkdir /tez
hadoop fs -put /opt/module/tez-0.9.1/ /tez
针对Tez被Nodemanager杀死的情况
方案一
yarn-site.xml
yarn.nodemanager.vmem-check-enabled false
方案二
mapred-site.xml
mapreduce.map.memory.mb 1536 mapreduce.map.java.opts -Xmx1024M mapreduce.reduce.memory.mb 3072 mapreduce.reduce.java.opts -Xmx2560M
开启Map输出阶段压缩
进入Hive
set hive.exec.compress.intermediate=true;
set mapreduce.map.output.compress=true;
set mapreduce.map.output.compress.codec= org.apache.hadoop.io.compress.SnappyCodec;
开启Reduce输出阶段压缩
进入Hive
set hive.exec.compress.output=true;
set mapreduce.output.fileoutputformat.compress=true;
set mapreduce.output.fileoutputformat.compress.type=BLOCK;
Flume(日志采集)
http://flume.apache.org/FlumeUserGuide.html
解压安装配置
tar -zxf apache-flume-1.7.0-bin.tar.gz -C /opt/module/
mv apache-flume-1.7.0-bin flume
mv flume-env.sh.template flume-env.sh
vi flume-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
传到HDFS需要将jar包置入/opt/module/flume/lib中
Kafka(用于实时处理的消息队列)
http://kafka.apache.org/quickstart
安装Kafka
tar -zxvf kafka_2.11-0.11.0.0.tgz -C /opt/module/
mv kafka_2.11-0.11.0.0/ kafka
配置文件
mkdir logs
cd config/
vi server.properties
#broker的全局唯一编号,不能重复
broker.id=0
#删除topic功能使能
delete.topic.enable=true
#处理网络请求的线程数量
num.network.threads=3
#用来处理磁盘IO的现成数量
num.io.threads=8
#发送套接字的缓冲区大小
socket.send.buffer.bytes=102400
#接收套接字的缓冲区大小
socket.receive.buffer.bytes=102400
#请求套接字的缓冲区大小
socket.request.max.bytes=104857600
#kafka运行日志存放的路径
log.dirs=/opt/module/kafka/logs
#topic在当前broker上的分区个数
num.partitions=1
#用来恢复和清理data下数据的线程数量
num.recovery.threads.per.data.dir=1
#segment文件保留的最长时间,超时将被删除
log.retention.hours=168
#配置连接Zookeeper集群地址
zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181
配置环境变量
sudo vi /etc/profile
#KAFKA_HOME
export KAFKA_HOME=/opt/module/kafka
export PATH=$PATH:$KAFKA_HOME/bin
source /etc/profile
xsync kafka/
/opt/module/kafka/config/server.properties中的broker.id=1、broker.id=2
Kafka Monitor
下载安装包并解压到集群opt/module/kafka-offset-console
创建启动脚本start.sh
#!/bin/bash
java -cp KafkaOffsetMonitor-assembly-0.4.6-SNAPSHOT.jar \
com.quantifind.kafka.offsetapp.OffsetGetterWeb \
--offsetStorage kafka \
--kafkaBrokers hadoop102:9092,hadoop103:9092,hadoop104:9092 \
--kafkaSecurityProtocol PLAINTEXT \
--zk hadoop102:2181,hadoop103:2181,hadoop104:2181 \
--port 8086 \
--refresh 10.seconds \
--retain 2.days \
--dbName offsetapp_kafka &
创建文件夹mobile-logs
启动zk和kf,然后启动kafka monitor
Kafka Manager
下载并解压
修改conf文件
application.conf
kafka-manager.zkhosts="hadoop102:2181,hadoop103:2181,hadoop104:2181"
配置JMX脚本
#! /bin/bash case $1 in "start"){ for i in hadoop131 hadoop145 hadoop146 do echo " --------启动 $i Kafka-------" # 用于KafkaManager监控 ssh $i "export JMX_PORT=9988 && /opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties " done };; "stop"){ for i in hadoop131 hadoop145 hadoop146 do echo " --------停止 $i Kafka-------" ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh stop" done };; esac
HBase(NoSQL数据库)
https://hbase.apache.org/book.html#quickstart
开启Zookeeper
开启HDFS
解压
tar -zxvf hbase-1.3.1-bin.tar.gz -C /opt/module
配置文件
hbase-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
export HBASE_MANAGES_ZK=false
hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop102:9000/hbase</value>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<name>hbase.master.port</name>
<value>16000</value>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop102,hadoop103,hadoop104</value>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/module/zookeeper-3.4.10/zkData</value>
regionservers
hadoop102
hadoop103
hadoop104
发送同步
xsync hbase/
启动
bin/start-hbase.sh
http://hadoop102:16010
Sqoop(数据传递)
http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_introduction
解压
tar -zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /opt/module/
修改配置文件
重命名
mv sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/module/hadoop-2.7.2
export HADOOP_MAPRED_HOME=/opt/module/hadoop-2.7.2
export HIVE_HOME=/opt/module/hive
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.10
export ZOOCFGDIR=/opt/module/zookeeper-3.4.10/conf
export HBASE_HOME=/opt/module/hbase
拷贝jdbc驱动
cp mysql-connector-java-5.1.27-bin.jar /opt/module/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/
验证链接
bin/sqoop list-databases --connect jdbc:mysql://hadoop102:3306/ --username root --password 000000
Oozie(任务调度)CDH版本
http://oozie.apache.org/docs/4.0.0/DG_QuickStart.html
安装部署hadoop(cdh版本)
解压hadoop-2.5.0-cdh5.3.6.tar.gz
tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/module/cdh
配置环境变量
环境
hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
core-site.xml
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:8020</value>
<name>hadoop.tmp.dir</name>
<value>/opt/module/cdh/hadoop-2.5.0-cdh5.3.6/data/tmp</value>
<name>hadoop.proxyuser.atguigu.hosts</name>
<value>*</value>
<name>hadoop.proxyuser.atguigu.groups</name>
<value>*</value>
hdfs-site.xml
<name>dfs.replication</name>
<value>3</value>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:50090</value>
yarn-site.xml
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
<name>yarn.log.server.url</name>
<value>http://hadoop102:19888/jobhistory/logs/</value>
mapred-site.xml
配置
<name>mapreduce.framework.name</name>
<value>yarn</value>
<name>mapreduce.jobhistory.address</name>
<value>hadoop104:10020</value>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop104:19888</value>
启动
启动历史服务器:mr-jobhistory-daemon.sh start historyserver
slaves
hadoop102
hadoop103
hadoop104
同步
xsync cdh
格式化并启动集群
hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver
部署Oozie
安装解压及拷贝
解压Oozie
tar -zxvf /opt/software/cdh/oozie-4.0.0-cdh5.3.6.tar.gz -C /opt/module
解压oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz
tar -zxvf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C ../
../ 必须在oozie根目录下解压
创建libext
mkdir libext/
拷贝依赖的jar包
都在oozie根目录执行
cp -ra hadooplibs/hadooplib-2.5.0-cdh5.3.6.oozie-4.0.0-cdh5.3.6/* libext/
cp -a /opt/software/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar ./libext/
拷贝ext-2.2
cp -a /opt/software/cdh/ext-2.2.zip libext/
修改Oozie配置文件
vi oozie-site.xml
属性:oozie.service.JPAService.jdbc.driver
属性值:com.mysql.jdbc.Driver
解释:JDBC的驱动
属性:oozie.service.JPAService.jdbc.url
属性值:jdbc:mysql://hadoop102:3306/oozie
解释:oozie所需的数据库地址
属性:oozie.service.JPAService.jdbc.username
属性值:root
解释:数据库用户名
属性:oozie.service.JPAService.jdbc.password
属性值:000000
解释:数据库密码
属性:oozie.service.HadoopAccessorService.hadoop.configurations
属性值:*=/opt/module/CDH/hadoop-2.5.0-cdh5.3.6/etc/hadoop
解释:让Oozie引用Hadoop的配置文件
Mysql创建Oozie数据库
create database oozie;
初始化Oozie
上传Oozie目录下的yarn.tar.gz文件到HDFS
bin/oozie-setup.sh sharelib create -fs hdfs://hadoop131:8020 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
创建oozie.sql文件
bin/ooziedb.sh create -sqlfile oozie.sql -run
打包项目,生成war包
bin/oozie-setup.sh prepare-war
启动
bin/oozied.sh start
网页
http://hadoop131:11000/oozie
Azkaban(任务调度)
https://azkaban.readthedocs.io/en/latest/getStarted.html
安装Azkaban
mkdir /opt/module/azkaban
解压server,executor,及sql-script文件到该目录下
重命名解压文件
mv azkaban-web-2.5.0/ server
mv azkaban-executor-2.5.0/ executor
脚本导入创建数据库
mysql -uroot -p000000
mysql> create database azkaban;
mysql> use azkaban;
mysql> source /opt/module/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql
生成密钥对和证书
keytool -keystore keystore -alias jetty -genkey -keyalg RSA
mv keystore /opt/module/azkaban/server/
时间同步配置(若三机同步,可不做)
tzselect
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
sudo date -s '2018-10-18 16:39:30'
配置文件
Web服务器配置
vim /opt/module/azkaban/server/conf/azkaban.properties
#Azkaban Personalization Settings #服务器UI名称,用于服务器上方显示的名字 azkaban.name=Test #描述 azkaban.label=My Local Azkaban #UI颜色 azkaban.color=#FF3601 azkaban.default.servlet.path=/index #默认web server存放web文件的目录 web.resource.dir=/opt/module/azkaban/server/web/ #默认时区,已改为亚洲/上海 默认为美国 default.timezone.id=Asia/Shanghai #Azkaban UserManager class user.manager.class=azkaban.user.XmlUserManager #用户权限管理默认类(绝对路径) user.manager.xml.file=/opt/module/azkaban/server/conf/azkaban-users.xml #Loader for projects #global配置文件所在位置(绝对路径) executor.global.properties=/opt/module/azkaban/executor/conf/global.properties azkaban.project.dir=projects #数据库类型 database.type=mysql #端口号 mysql.port=3306 #数据库连接IP mysql.host=hadoop102 #数据库实例名 mysql.database=azkaban #数据库用户名 mysql.user=root #数据库密码 mysql.password=000000 #最大连接数 mysql.numconnections=100 # Velocity dev mode velocity.dev.mode=false # Azkaban Jetty server properties. # Jetty服务器属性. #最大线程数 jetty.maxThreads=25 #Jetty SSL端口 jetty.ssl.port=8443 #Jetty端口 jetty.port=8081 #SSL文件名(绝对路径) jetty.keystore=/opt/module/azkaban/server/keystore #SSL文件密码 jetty.password=000000 #Jetty主密码与keystore文件相同 jetty.keypassword=000000 #SSL文件名(绝对路径) jetty.truststore=/opt/module/azkaban/server/keystore #SSL文件密码 jetty.trustpassword=000000 # Azkaban Executor settings executor.port=12321 # mail settings mail.sender= mail.host= job.failure.email= job.success.email= lockdown.create.projects=false cache.directory=cache
vim /opt/module/azkaban/server/conf/azkaban-users.xml
执行服务器配置
vim /opt/module/azkaban/executor/conf/azkaban.properties
#Azkaban #时区 default.timezone.id=Asia/Shanghai # Azkaban JobTypes Plugins #jobtype 插件所在位置 azkaban.jobtype.plugin.dir=plugins/jobtypes #Loader for projects executor.global.properties=/opt/module/azkaban/executor/conf/global.properties azkaban.project.dir=projects database.type=mysql mysql.port=3306 mysql.host=hadoop102 mysql.database=azkaban mysql.user=root mysql.password=000000 mysql.numconnections=100 # Azkaban Executor settings #最大线程数 executor.maxThreads=50 #端口号(如修改,请与web服务中一致) executor.port=12321 #线程数 executor.flow.threads=30
Kettle(ETL工具)
http://community.pentaho.com/projects/data-integration/
绿色安装,解压即用
Kettle的集群配置
目的在于加快Kettle处理速度 多服务器运行,加快处理速度,对于大数据量的操作更明显 防单点失败,一台服务器故障后其它服务器还可以运行 采用主从结构,不具备自动切换主从的功能。所以一旦主节点宕机,整个系统不可用 对网络要求高,节点之间需要不断的传输数据 需要更多的服务器,而且主节点没有处理能力 需求kettle能时刻保持正常运行的场景 大批量处理数据的场景 所以视需求进行配置
启动hadoop
上传解压kettle安装包
配置文件
/opt/module/data-integration/pwd
vim carte-config-master-8080.xml
<name>master</name>
<hostname>hadoop102</hostname>
<port>8080</port>
<master>Y</master>
<username>cluster</username>
<password>cluster</password>
carte-config-8081.xml
<name>master</name>
<hostname>hadoop102</hostname>
<port>8080</port>
<username>cluster</username>
<password>cluster</password>
<master>Y</master>
<report_to_masters>Y</report_to_masters>
<name>slave1</name>
<hostname>hadoop103</hostname>
<port>8081</port>
<username>cluster</username>
<password>cluster</password>
<master>N</master>
carte-config-8082.xml
<name>master</name>
<hostname>hadoop102</hostname>
<port>8080</port>
<username>cluster</username>
<password>cluster</password>
<master>Y</master>
<report_to_masters>Y</report_to_masters>
<name>slave2</name>
<hostname>hadoop104</hostname>
<port>8082</port>
<username>cluster</username>
<password>cluster</password>
<master>N</master>
分发
xsync data-integration
启动
./carte.sh hadoop102 8080
./carte.sh hadoop103 8081
./carte.sh hadoop104 8082
访问web
http://hadoop102:8080
ClickHouse(列式数据库,在线处理)
https://clickhouse.yandex/ https://packagecloud.io/altinity/clickhouse
安装前准备
CentOS取消打开文件数限制
vim /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
vim /etc/security/limits.d/90-nproc.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
CentOS取消SELINUX
vim /etc/selinux/config
SELINUX=disabled
关闭防火墙
service iptables stop
安装依赖
yum install -y libtool
yum install -y *unixODBC*
yum install libicu.x86_64
重启
单机模式
注意,此处是root用户
上传文件到/opt/software/ck
这步若出现问题则赋予权限 chown atguigu:atguigu /opt/software/ck
安装4个文件
rpm -ivh clickhouse-common-static-19.7.3.9-1.el6.x86_64.rpm
rpm -ivh clickhouse-server-common-19.7.3.9-1.el6.x86_64.rpm
rpm -ivh clickhouse-server-19.7.3.9-1.el6.x86_64.rpm
rpm -ivh clickhouse-client-19.7.3.9-1.el6.x86_64.rpm
启动ClickServer及client连接
service clickhouse-server start
clickhouse-client
分布式集群安装
注意,此处是root用户
同步CentOs配置
xsync /etc/security/limits.conf
xsync /etc/security/limits.d/90-nproc.conf
xsync /etc/selinux/config
三台机器分别修改config.xml(不可分发)
vim /etc/clickhouse-server/config.xml
<listen_host>::</listen_host>
三台机器新建metrika.xml
vim /etc/metrika.xml
红色部分需要根据各机自行修改
<internal_replication>true</internal_replication>
<host>hadoop131</host>
<port>9000</port>
<internal_replication>true</internal_replication>
<host>hadoop145</host>
<port>9000</port>
<internal_replication>true</internal_replication>
<host>hadoop146</host>
<port>9000</port>
<host>hadoop131</host>
<port>2181</port>
<host>hadoop145</host>
<port>2181</port>
<host>hadoop146</host>
<port>2181</port>
<replica>hadoop102</replica>
<ip>::/0</ip>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
启动
先行启动zookeeper
启动服务
select * from system.clusters
如何卸载
rpm -qa | grep clickhouse
rpm -e 包名
rpm -e clickhouse-client-19.7.3.9-1.el6.x86_64
rpm -e clickhouse-server-19.7.3.9-1.el6.x86_64
rpm -e clickhouse-common-static-19.7.3.9-1.el6.x86_64
rpm -e clickhouse-server-common-19.7.3.9-1.el6.x86_64
DataX(异构数据源离线同步工具)
https://github.com/alibaba/DataX
前置环境
Linux
JDK1.8
Python2.6.x
解压即用
MongoDB(数据库)
上传压缩包到虚拟机并解压
重命名
mv mongodb-linux-x86_64-4.0.10/ mongodb
创建数据库目录
sudo mkdir -p /data/db
sudo chmod 777 -R /data/db/
启动服务
bin/mongod
bin/mongo
docker
安装
docker search mongo
docker pull mongo
docker images mongo
docker run -p 27017:27017 -v $PWD/db:/data/db -d mongo:latest
docker exec -it contianer id bin/bash
mongod
mongo
Elasticsearch和Kibana
基于docker
Elasticsearch
https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
拉取并运行镜像
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.2.0
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.2.0
单机模式
docker ps
运行elastic
docker exec -it contaner id /bin/bash
测试是否运行
curl http://localhost:9200
http://localhost:9200
Kibana
拉取并运行镜像
docker pull docker.elastic.co/kibana/kibana:7.2.0
docker run --link YOUR_ELASTICSEARCH_CONTAINER_NAME_OR_ID:elasticsearch -p 5601:5601 docker.elastic.co/kibana/kibana:7.2.0
运行Kibana
bin/kibana
docker exec -it contaner id /bin/bash
测试是否运行
http://localhost:5601
Presto(即席查询)
https://prestodb.github.io/
Presto Server安装
下载并解压,修改名称为 presto
在presto目录下创建 data文件夹和etc文件夹
进入etc文件夹
vim jvm.config
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
mkdir catalog
vim hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hadoop131:9083
分发
进入etc文件夹
vim node.properties
hadoop131
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/opt/module/presto/data
hadoop145
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-fffffffffffe
node.data-dir=/opt/module/presto/data
hadoop146
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-fffffffffffd
node.data-dir=/opt/module/presto/data
vim config.properties
hadoop131
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8881
query.max-memory=50GB
discovery-server.enabled=true
discovery.uri=http://hadoop131:8881
hadoop145
coordinator=false
http-server.http.port=8881
query.max-memory=50GB
discovery.uri=http://hadoop131:8881
hadoop146
coordinator=false
http-server.http.port=8881
query.max-memory=50GB
discovery.uri=http://hadoop131:8881
启动Hive Metastore
nohup bin/hive --service metastore >/dev/null 2>&1 &
后台启动 Presto Server
bin/launcher start
Presto命令行Client安装
上传到presto文档下,赋予权限,更名为prestocli
启动
/prestocli --server hadoop131:8881 --catalog hive --schema default
查询测试
select * from schema.table limit 100
Presto可视化Client安装
下载并解压
进入conf目录
vim yanagishima.properties
jetty.port=7080
presto.datasources=atguigu-presto
presto.coordinator.server.atguigu-presto=http://hadoop131:8881
catalog.atguigu-presto=hive
schema.atguigu-presto=default
sql.query.engines=presto
后台启动
nohup bin/yanagishima-start.sh >y.log 2>&1 &
页面启动
http://hadoop131:7080
Druid(即席查询)
Kylin(即席查询)
http://kylin.apache.org/cn/
解压即可用
需要在/etc/profile中
##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
#HBASE_HOME
export HBASE_HOME=/opt/module/hbase-1.3.1
export PATH=$PATH:$HBASE_HOME/bin
#HIVE_HOME
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin
但在使用之前需要开启
hadoop
historyserver
zookeeper
hbase
Spark
http://spark.apache.org/docs/2.1.1/
Local模式
下载解压即用
Standalone模式
下载解压
更改配置文件
cd conf
mv slaves.template slaves
vim slaves
hadoop131
hadoop145
hadoop146
mv spark-env.sh.template spark-env.sh
vim spark-env.sh
SPARK_MASTER_HOST=hadoop131
SPARK_MASTER_PORT=7077
JobHistoryServer设置
mv spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop131:9000/directory
在HDFS上提前创建文件夹
hadoop fs -mkdir /directory
vi spark-env.sh
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080
-Dspark.history.retainedApplications=30
-Dspark.history.fs.logDirectory=hdfs://hadoop131:9000/directory"
HA配置
vi spark-env.sh
注释掉如下内容:
#SPARK_MASTER_HOST=hadoop131
#SPARK_MASTER_PORT=7077
添加上如下内容:
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=hadoop131,hadoop145,hadoop146
-Dspark.deploy.zookeeper.dir=/spark"
分发
xsync spark/
启动
sbin/start-all.sh
hadoop131:8080
sbin/start-master.sh
如遇java_home not set异常
vim spark-config.sh
export JAVA_HOME=XXXX
Yarn模式
修改hadoop的yarn-site.xml
vi yarn-site.xml
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
修改spark-env.sh
vi spark-env.sh
YARN_CONF_DIR=/opt/module/hadoop-2.7.2/etc/hadoop
日志查看
vim spark-default.conf
spark.yarn.historyServer.address=hadoop131:18080
spark.history.ui.port=18080
分发
启动
sbin/start-history-server.sh
Solr
http://lucene.apache.org/solr/
下载解压,更改名称
修改配置文件
vim solr/bin/solr.in.sh
#添加下列指令
ZK_HOST="hadoop102:2181,hadoop103:2181,hadoop104:2181"
SOLR_HOST="hadoop102"
# Sets the port Solr binds to, default is 8983
#可修改端口号
SOLR_PORT=8983
同步后进行微调
xsync solr
vim solr/bin/solr.in.sh
solr_host = 主机名
启动
Atlas
http://atlas.apache.org
安装流程
下载解压改名
集成外部框架
集成Hbase
更改配置文件
vim atlas-application.properties
atlas.graph.storage.hostname=hadoop102:2181,hadoop103:2181,hadoop104:2181
添加Hbase集群配置
ln -s /opt/module/hbase/conf/ /opt/module/atlas/conf/hbase/
增加Hbase路径
vim atlas-env.sh
export HBASE_CONF_DIR=/opt/module/atlas/conf/hbase/conf
集成Solr
更改配置文件
将Atlas自带的Solr文件夹拷贝到外部Solr集群的各个节点
cp -r /opt/module/atlas/conf/solr /opt/module/solr/
修改拷贝文件名称
cd solr目录下
mv solr atlas_conf
集成Kafka
集成Hive
其他设置
编译Atlas