Sunday, June 5, 2016

Building Spark cluster - Part 2 - Install Spark


Install Spark on the Hadoop cluster using the following steps:

    1) Download latest version of Spark: 

    $ wget http://mirrors.ocf.berkeley.edu/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz

    2) Extract and rename in the specific directory (/opt/Apache):

    $ tar xzf spark-1.6.1-bin-hadoop2.6.tgz
    $ mv spark-1.6.1-bin-hadoop2.6 spark-1.6.1

    3) Spark configuration files are part of the build and named as ".template" files in the conf directory

    We can get started by editing: spark-defaults.conf, spark-env.sh and slaves files

    $ cp slaves.template slaves
    $ cp spark-defaults.conf.template spark-defaults.conf
    $ cp spark-env.sh.template spark-env.sh

    4) slaves

    This file holds the hostname/IP adresses of Spark worker nodes. Add the two nodes we have in the cluster:

    192.168.1.20
    192.168.1.18

    5) spark-defaults.conf

    Defines Spark master node and other standard options as shown below:

    spark.master                     spark://192.168.1.16:7077
    spark.serializer                 org.apache.spark.serializer.KryoSerializer

    spark.eventLog.enabled           true
    spark.history.fs.logDirectory    file:/opt/Apache/hadoop-2.6.1/logs/spark-events
    spark.eventLog.dir               file:/opt/Apache/hadoop-2.6.1/logs/spark-events-log


    6) spark-env.sh

    This script defines runtime options for Spark. We can define them either using spark-submit or use defaults set in this script.

    HADOOP_CONF_DIR=/opt/Apache/hadoop-2.6.1/etc/hadoop
    SPARK_MASTER_IP=192.168.1.16

    7) We can re-use the spark cfg changes made on master node without having to repeat all the above steps.
    rsync the spark-1.6.1 folder to both worker nodes

    $ rsync -avxP /opt/Apache/spark-1.6.1@192.168.1.18:/opt/Apache
    $ rsync -avxP /opt/Apache/spark-1.6.1@192.168.1.20:/opt/Apache


    8) Start Spark master and worker processes

    $ sbin/start-all.sh

    9) Start history server

    $ sbin/start-history-server.sh




    No comments:

    Post a Comment