安装Hadoop+HBase分布式集群
三台机器在cluster中充当的角色(namenode, secondary namenode, datanode , resourcemanager, nodemanager)
192.168.0.1 chubing/passwd ubuntu01 nn/snn/rm UbuntuServer12.04
192.168.0.2 chubing/passwd ubuntu02 dn/nm UbuntuServer12.04
192.168.0.3 chubing/passwd ubuntu03 dn/nm UbuntuServer12.04
修改~/hadoop-2.2.0/etc/hadoop目录下的配置文件:
hadoop-env.sh
环境变量设置:修改JAVA_HOME的值
yarn-env.sh
修改JAVA_HOME的值
slaves
保存所有slave节点
core-site.xml
设置fs.default.name:值为NameNode的IP地址及端口
设置hadoop.tmp.dir:
hdfs-site.xml
设置dfs.namenode.name.dir:值为NameNode存储名字空间及汇报日志的位置
设置dfs.datanode.data.dir:值为DataNode存储数据块的位置
dfs.replication:副本数,每个datanode只能一个副本,默认情况下是3
dfs.permissions:hadoop hdfs的用户权限检查,默认true
mapred-site.xml
mapreduce.framework.name
??mapred.system.dir
??mapred.local.dir
mapred.map.tasks:10倍于slave节点的个数
mapred.reduce.tasks:2倍于slave节点的个数
yarn-site.xml
很多
关闭防火墙:
sudo ufw disable
配置SSH无密码登陆
sudo apt-get install ssh
mkdir .ssh
修改hostname:
sudo vi /etc/hostname
修改固定ip地址:
sudo vi /etc/network/interfaces
auto eth0
iface eth0 inet static
address 192.168.237.XXX
netmask 255.255.255.0
gateway 192.168.237.1
重启网络使ip地址生效
sudo /etc/init.d/networking restart
设置无密码SSH登陆:
生成密钥对
ssh-keygen -t rsa -P “”
拷贝到本机及其他机器上
在Ubuntu01机器上:
cat .ssh/id_rsa.pub >>.ssh/authorized_keys
scp .ssh/id_rsa.pub chubing@ubuntu02:~/.ssh/id_rsa.pub_ubuntu01
scp .ssh/id_rsa.pub chubing@ubuntu03:~/.ssh/id_rsa.pub_ubuntu01
在Ubuntu02机器上:
cat .ssh/id_rsa.pub_ubuntu01 >>.ssh/authorized_keys
在Ubuntu03机器上:
cat .ssh/id_rsa.pub_ubuntu01 >>.ssh/authorized_keys
检验安装成功:
在master(即ubuntu01):
格式化:
bin/hdfs namenode -format
启动namenode和resourcemanager:
sbin/hadoop-daemon.sh start namenode
sbin/yarn-daemon.sh start resourcemanager
或者
cd ~/hadoop-2.2.0/sbin/
./start-dfs.sh
./start-yarn.sh
或者
./start-all.sh
在slaves(即ubuntu02和ubuntu03):
启动datanode 和 nodemanager:
sbin/hadoop-daemon.sh start datanode
sbin/yarn-daemon.sh start nodemanager
在master上面运行jps,如果有NameNode、ResourceManager二个进程,说明master安装好了。
在slave上面运行jps,如果有DataNode、NodeManager二个进程,说明node(node1)安装好了。
查看集群状态:./bin/hdfs dfsadmin –report
查看文件块组成: ./bin/hdfsfsck / -files -blocks
查看HDFS: http://192.168.237.101:50070
查看RM: http://192.168.237.101:8088
运行例子:
cd ~/hadoop-2.2.0
bin/hdfs dfs -mkdir /input1
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter input1
HBase
解压
修改配置文件:
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ubuntu01:9090/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ubuntu01,ubuntu02,ubuntu03</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>~/zookeeper_data</value>
</property>
<property>
<name>hbase.master</name>
<value>ubuntu01:60000</value>
</property>
配置 hbase.sh
配置regionservers
建立目录:
mkdir ~/zookeeper_data
复制到其他节点:
scp -r hbase-0.96.1.1-hadoop2/ chubing@ubuntu02:/home/chubing
scp -r hbase-0.96.1.1-hadoop2/ chubing@ubuntu02:/home/chubing
提示:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/chubing/hbase-0.96.1.1-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/chubing/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
解决方法:拷贝和覆盖
解决方法:时间同步:
sudo ntpdate 202.120.2.101