本文利用post脚本实现Hadoop集群的安装与自动配置。
资源分配文件用于Hadoop集群的网络资源分配。post脚本根据本机MAC地址查找资源分配文件,获取本机的主机名、IP地址等信息,然后自动配置。其中MAC地址需要事先知道,服务器一般厂商会提供。虚拟机需要自己指定。这里注意,VMware中不要用自动生成MAC地址功能,生成的 00:50:56开头的一开机就会变成00:0C:29,所以需要手动指定以 00:0C:29开头的MAC地址。
一个示例资源分配文件
#主机名 MAC地址 IP地址 掩码 网关 运行进程 集群角色 标识 Master 00:0C:29:11:00:00 192.168.60.20 255.255.255.0 192.168.60.2 NameNode,JobTracker master HADOOP Slave1 00:0C:29:00:00:01 192.168.60.31 255.255.255.0 192.168.60.2 DataNode,TaskTracker slave HADOOP Slave2 00:0C:29:00:00:02 192.168.60.32 255.255.255.0 192.168.60.2 DataNode,TaskTracker slave HADOOP
首先需要搭建能够进行Hadoop集群安装配置的PXE Installer服务器。该服务器提供Hadoop所需软件包、资源配置文件及Hadoop安装脚本[4]。本次实验集群中的每台机器都被配置为rsync服务器,以便同步Master的公钥文件。
拓扑结构如图所示:
Kickstart部署Hadoop拓扑结构
部署流程图
PXE自动部署Hadoop流程图
修改 PXE服务器配置脚本,添加参数选项,增加配置Hadoop自动部署的函数。
#检查参数
[ $# -eq 0 ] && { PrintHelp; exit 1; }
[ $1 != "-d" ] && [ $1 != "-h" ] || [ $1 = "--help" ] && { PrintHelp; exit 1;}
if [ $1 = "-h" ]; then
[ $# -eq 1 ] && { echo -e "Please specify the resourcefile. examples can be found at conf/network.conf. \nExample: ./init_pxeserver.sh -h conf/network.conf"; exit 1;}
[ ! -f $2 ] && { echo -e "resourcefile error! please check resourcefile path\n"; exit 1;}
fi根据参数将PXE Installer配置成不同类型,-d 为default,即Centos自动安装服务器,-h为Hadoop自动部署服务器。他们的不同主要是ks文件的不同。选项为-h时配置适用于Hadoop自动部署的ks文件。
[ $1 = "-h" ] && HadoopKS 2>&1 | tee -a /tmp/init_pxeserver.log
向默认的ks文件添加部署Hadoop的post脚本。用 cat > ks.cfg < 首先获取本机的MAC地址,然后以MAC地址为索引找到本机的IP,掩码,网关,主机名等信息,然后自动配置。脚本还配置了集群机器的hosts文件,添加集群所有主机的A记录。 这里比较简单,直接从Installer下载软件包和安装脚本,运行安装脚本即可。 slave需要同步master的密钥,可以在master上配置允许匿名访问的rsync服务器[5],然后在slave上拉取master公钥。 将master的公钥文件id_dsa.pub复制到rsync服务器目录,slave拉取即可。这里涉及到一个失败重传的过程,由于slave可能早于master的安装进程,当slave执行到拉取密钥的时候,master可能还没有准备好密钥。脚本设置slave如果拉取公钥失败,则等待5s后重试,共重试100次。因此请尽量保证master早于slave。另外,像集群添加新的slave时,请保持master为开机状态。 涉及到core-site.xml,hdfs-site.xml,mapred-site.xml几个文件的配置,及masters,slaves的配置[6]。这里需要注意,像集群新加slave时,请直接在资源分配文件末尾添加。新加集群存在masters,slaves及hosts文件不同步的问题,可以考虑用cfengine或者puppet,待后续解决。 第一次启动Hadoop时,需要格式化Hadoop的HDFS,执行: 然后启动Hadoop,执行 宿主机中,编辑hosts文件,加入Hadoop集群主机的IP记录: 然后用浏览器访问 http://master:50070和http://master:50030 Jobtracker根据资源分配文件配置网络信息
curl -o network.conf http://$IPADDR/$CONFDIR/network.conf &>/dev/null && echo -e "[SUCC]: resourcefile download ok! "
MAC=`ifconfig eth0 |grep HWaddr |awk '{print \$5}'`
IP=`grep "\$MAC" network.conf |grep HADOOP |awk '{print \$3}'`
HOST=`grep "\$MAC" network.conf |grep HADOOP |awk '{print \$1}'`
NETMASK=`grep "\$MAC" network.conf |grep HADOOP |awk '{print \$4}'`
GATEWAY=`grep "\$MAC" network.conf |grep HADOOP |awk '{print \$5}'`
#set static ip addr
IFCFG=/etc/sysconfig/network-scripts/ifcfg-eth0
sed -i "s/BOOTPROTO.*/BOOTPROTO=static/g" \$IFCFG
echo "IPADDR=\$IP" >>\$IFCFG
echo "NETMASK=\$NETMASK" >>\$IFCFG
echo "GATEWAY=\$GATEWAY" >>\$IFCFG
# add hostname to /etc/hosts
cat network.conf |grep HADOOP |awk '{print \$3" "\$1}' >>/etc/hosts
# reset hostname
sed -i "s/HOSTNAME.*/HOSTNAME=\$HOST/g" /etc/sysconfig/network
?安装Hadoop
mkdir soft && cd soft
curl -o $HADOOPFILE http://$IPADDR/$SOFTDIR/$HADOOPFILE &>/dev/null && echo -e "[SUCC]: hadoop download ok! "
curl -o $JDKFILE http://$IPADDR/$SOFTDIR/$JDKFILE &>/dev/null && echo -e "[SUCC]: jdk download ok! "
cd ../
curl -o hadoop_centos.sh http://$IPADDR/$CONFDIR/hadoop_centos.sh &>/dev/null && echo -e "[SUCC]: hadoop_centos.sh download ok! "
bash hadoop_centos.sh &>/dev/null && echo -e "[SUCC]: run hadoop_centos.sh ok! "
?配置rsync服务
curl -o xinetd-2.3.14-39.el6_4.x86_.rpm http://$IPADDR/$SOFTDIR/package/xinetd-2.3.14-39.el6_4.x86_.rpm &>/dev/null && echo -e "[SUCC]: xinetd down ok! "
curl -o rsync-3.0.6-9.el6_4.1.x86_.rpm http://$IPADDR/$SOFTDIR/package/rsync-3.0.6-9.el6_4.1.x86_.rpm &>/dev/null && echo -e "[SUCC]: rsync down ok! "
rpm -ivh xinetd-2.3.14-39.el6_4.x86_.rpm &>/dev/null && echo -e "[SUCC]: xinetd install ok! "
rpm -ivh rsync-3.0.6-9.el6_4.1.x86_.rpm &>/dev/null && echo -e "[SUCC]: rsync install ok! "
sed -i "s/disable.*/disable = no/g" /etc/xinetd.d/rsync
cat > /etc/rsyncd.conf <
?同步master公钥
# slave pull d_dsa.pub from master
MASTER=`cat network.conf |grep master |awk '{print \$1}'` # hostname of master
SLAVE=`cat network.conf |grep slave |awk '{print \$1}'` # hostname of slave
if [ "\$HOST" = "\$MASTER" ]; then
cp $HOMEDIR/.ssh/id_dsa.pub /tmp/id_dsa.pub
cd /tmp
chmod 777 id_dsa.pub
else
i=0
while [ \$i -lt 100 ]
do
#pull from master
rsync Master::pubkey/id_dsa.pub . &>/dev/null && r=0 || r=1
#check if rsync successfull and retry
if [ \$r -eq 0 ]; then
break;
else
sleep 5 #sleep 5 second then retry
((i+=1)) #retry 100 times
echo "retry \$i ..."
fi
done
cat $HOMEDIR/.ssh/authorized_keys | grep `cat id_dsa.pub` &>/dev/null && r=0 || r=1
[ \$r -eq 1 ] && cat id_dsa.pub >> $HOMEDIR/.ssh/authorized_keys
fi
# add id_dsa.pub of Controller
cat $HOMEDIR/.ssh/authorized_keys | grep "$KEY" &>/dev/null && r=0 || r=1
[ \$r -eq 1 ] && echo "$KEY" >> $HOMEDIR/.ssh/authorized_keys
?Hadoop集群配置
#配置集群
cd /etc/hadoop
#使用下面的core-site.xml
cat > core-site.xml <
集群验证测试
[root@Master ~]# hadoop namenode -format
14/05/12 01:07:43 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = Master/192.168.60.20
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:27:42 PDT 2013
STARTUP_MSG: java = 1.7.0_51
************************************************************/
Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y
Format aborted in /tmp/hadoop-root/dfs/name
14/05/12 01:07:59 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Master/192.168.60.20
************************************************************/
[root@Master ~]# start-all.sh
starting namenode, logging to /var/log/hadoop/root/hadoop-root-namenode-Master.out
Slave1: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-Slave1.out
Slave2: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-Slave2.out
Master: starting secondarynamenode, logging to /var/log/hadoop/root/hadoop-root-secondarynamenode-Master.out
starting jobtracker, logging to /var/log/hadoop/root/hadoop-root-jobtracker-Master.out
Slave1: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-Slave1.out
Slave2: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-Slave2.out
#Hadoop
192.168.60.10 ctrl
192.168.60.20 master
192.168.60.31 slave1
192.168.60.32 slave2
NameNode
测试mapreduce程序,本地创建几个文本文件,上传到HDFS上:
[root@Master ~]# mkdir hadoop [root@Master ~]# cd hadoop/ [root@Master hadoop]# mkdir input [root@Master hadoop]# echo "hello hadoop" >>input/hadoop.txt [root@Master hadoop]# echo "hello world" >>input/hello.txt [root@Master hadoop]# echo "hi my name is hadoop" >>input/hi.txt [root@Master hadoop]# hadoop fs -mkdir input [root@Master hadoop]# hadoop fs -put input/* input
用浏览器 查看HDFS上的文件
浏览HDFS文件
执行Wordcount
[root@Master hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input/ output 14/05/12 01:30:00 INFO input.FileInputFormat: Total input paths to process : 3 14/05/12 01:30:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/05/12 01:30:00 WARN snappy.LoadSnappy: Snappy native library not loaded 14/05/12 01:30:01 INFO mapred.JobClient: Running job: job_201405120108_0002 14/05/12 01:30:02 INFO mapred.JobClient: map 0% reduce 0% 14/05/12 01:30:20 INFO mapred.JobClient: map 33% reduce 0% 14/05/12 01:30:28 INFO mapred.JobClient: map 100% reduce 0% 14/05/12 01:30:35 INFO mapred.JobClient: map 100% reduce 100% 14/05/12 01:30:37 INFO mapred.JobClient: Job complete: job_201405120108_0002 14/05/12 01:30:37 INFO mapred.JobClient: Counters: 29 14/05/12 01:30:37 INFO mapred.JobClient: Job Counters 14/05/12 01:30:37 INFO mapred.JobClient: Launched reduce tasks=1 14/05/12 01:30:37 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=49355 14/05/12 01:30:37 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/05/12 01:30:37 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/05/12 01:30:37 INFO mapred.JobClient: Launched map tasks=3 14/05/12 01:30:37 INFO mapred.JobClient: Data-local map tasks=3 14/05/12 01:30:37 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14236 14/05/12 01:30:37 INFO mapred.JobClient: File Output Format Counters 14/05/12 01:30:37 INFO mapred.JobClient: Bytes Written=47 14/05/12 01:30:37 INFO mapred.JobClient: FileSystemCounters 14/05/12 01:30:37 INFO mapred.JobClient: FILE_BYTES_READ=106 14/05/12 01:30:37 INFO mapred.JobClient: HDFS_BYTES_READ=371 14/05/12 01:30:37 INFO mapred.JobClient: FILE_BYTES_WRITTEN=218711 14/05/12 01:30:37 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=47 14/05/12 01:30:37 INFO mapred.JobClient: File Input Format Counters 14/05/12 01:30:37 INFO mapred.JobClient: Bytes Read=46 14/05/12 01:30:37 INFO mapred.JobClient: Map-Reduce Framework 14/05/12 01:30:37 INFO mapred.JobClient: Map output materialized bytes=118 14/05/12 01:30:37 INFO mapred.JobClient: Map input records=3 14/05/12 01:30:37 INFO mapred.JobClient: Reduce shuffle bytes=118 14/05/12 01:30:37 INFO mapred.JobClient: Spilled Records=18 14/05/12 01:30:37 INFO mapred.JobClient: Map output bytes=82 14/05/12 01:30:37 INFO mapred.JobClient: Total committed heap usage (bytes)=617562112 14/05/12 01:30:37 INFO mapred.JobClient: CPU time spent (ms)=6190 14/05/12 01:30:37 INFO mapred.JobClient: Combine input records=9 14/05/12 01:30:37 INFO mapred.JobClient: SPLIT_RAW_BYTES=325 14/05/12 01:30:37 INFO mapred.JobClient: Reduce input records=9 14/05/12 01:30:37 INFO mapred.JobClient: Reduce input groups=7 14/05/12 01:30:37 INFO mapred.JobClient: Combine output records=9 14/05/12 01:30:37 INFO mapred.JobClient: Physical memory (bytes) snapshot=553525248 14/05/12 01:30:37 INFO mapred.JobClient: Reduce output records=7 14/05/12 01:30:37 INFO mapred.JobClient: Virtual memory (bytes) snapshot=24249984 14/05/12 01:30:37 INFO mapred.JobClient: Map output records=9
用浏览器查看
运行中的任务
完成的任务
查看结果
[root@Master hadoop]# hadoop fs -cat output/* hadoop 2 hello 2 hi 1 is 1 my 1 name 1 world 1 cat: File does not exist: /user/root/output/_logs
用浏览器查看结果
Wordcount执行结果
用notepad++,编码为 utf8无BOM即可
密钥总是拉取失败,rsync无法连接master。通过添加调试代码发现在post脚本中设置完网络信息之后需要重启网络。因为虽然hosts中存在集群中各主机的IP信息,但是没有重启网络之前各主机的IP仍然是从PXE服务器那里自动获取的。这里涉及到时序的问题,因为slave要访问master,master需要先配置完网络并重启网络后才能被slave访问到。所以slave需要等待master配置完成。脚本设置最多等待100x5s,即500s。
^[[36m[SUCC]: Stop iptables ^[[0m ^[[36m[SUCC]: Chkconfig iptables off ^[[0m /tmp ^[[36m[SUCC]: resourcefile download ok! ^[[0m ^[[36m[SUCC]: hadoop download ok! ^[[0m ^[[36m[SUCC]: jdk download ok! ^[[0m ^[[36m[SUCC]: hadoop_centos.sh download ok! ^[[0m ^[[36m[SUCC]: xinetd down ok! ^[[0m ^[[36m[SUCC]: rsync down ok! ^[[0m ^[[36m[SUCC]: xinetd install ok! ^[[0m ^[[36m[SUCC]: rsync install ok! ^[[0m Stopping xinetd: ^[[60G[^[[0;31mFAILED^[[0;39m]^M Starting xinetd: ^[[60G[^[[0;32m OK ^[[0;39m]^M rsync: failed to connect to Master: No route to host (113) rsync error: error in socket IO (code 10) at clientserver.c(124) [receiver=3.0.6]
另外防火墙也可能导致rsync失败,可以在ks文件中直接关闭防火墙。[7]
firewall(可选) 这个选项对应安装程序里的「防火墙配置」屏幕: firewall --enabled|--disabled [--trust=]输出请求如DNS答复或DHCP请求的进入连接.如果需要使用在这个机器上运行的服务,可以选择允许指定的服务穿过防火墙. --disabled或--disable,不要配置任何iptables规则. --trust=,在此列出设备,如eth0,这允许所有经由这个设备的数据包通过防火墙.如果需要列出多个设备,使用--trust eth0 --trust eth1.不要使用以逗号分隔的格式,如--trust eth0, eth1.[--port=] --enabled或者--enable,拒绝不是答复
ssh登录某台机器的时候,如果这台机器从来没有使用ssh登录过(严格来说应该是~/.ssh/known_hosts文件中没有这台机器的HostKey),那么,ssh会产生一个提示,询问是否需要添加这台机器的HostKey,回答yes/no即可,虽然只要不删除~/.ssh/known_hosts文件中该机器的HostKey,则这个提示将不会出现,但是如果我们需要书写一些自动化脚本的时候,这就会成为问题。
man了一下ssh_config,发现有解决的办法:创建文件~/.ssh/config,添加一行:
StrictHostKeyChecking no
即可。以后ssh将会自动添加HostKey到~/.ssh/known_hosts,不会再询问。默认该项配置是ask,所以会询问。如果配置成yes,则每次必须手动将hostkey添加到~/.ssh/known_hosts文件中,这是最严格的配置。[8]
[1]. ChinaUnix博客. http://blog.chinaunix.net/uid-17240700-id-2813881.html
[2]. ChinaUnix博客. http://blog.chinaunix.net/uid-17240700-id-2813881.html
[3]. ChinaUnix博客. http://blog.chinaunix.net/uid-17240700-id-2813881.html
[4]. 知行近思. http://www.annhe.net/article-2672.html
[5]. CSDN博客. http://blog.csdn.net/zombee/article/details/6793672
[6]. 陆嘉桓. Hadoop实战 第二版
[7]. ChinaUnix博客. http://blog.chinaunix.net/uid-17240700-id-2813881.html
[8]. 163博客. http://blog.163.com/kartwall@126/blog/static/42370200831485241268/
详见github:https://github.com/annProg/paper/blob/master/code/init_pxeserver.sh
[1].?MORE-Kickstart-Tips-and-Tricks.?Tricks.
http://www.redhat.com/promo/summit/2010/presentations/summit/decoding-the-code/wed/cshabazi-530-more/MORE-Kickstart-Tips-and-Tricks.pdf
Copyright © 2019- 7swz.com 版权所有 赣ICP备2024042798号-8
违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务