Cloud Computing for Big Data
Dates | Timing | Location |
---|---|---|
03 - 07 February 2019 | 07:00PM - 10:00PM | Dubai Knowledge Park |
07 - 11 April 2019 | 07:00PM - 10:00PM | Dubai Knowledge Park |
Course Description
To be able to process massive datasets, you need to setup clusters for both data processing and data storage. Many of the aspiring data science professionals or engineers have very little knowledge or experience on how to do this in a Linux environment. This course will enable you to master all the skills required to setup cloud clusters for data storage using Hadoop’s HDFS, and Spark for data processing.
Not only you will successfully install and configure Hadoop and Spark clusters, but you will also learn how configure your development environment (Jupyter Notebook) to access these clusters to store and process massive datasets.
Linux Administration Fundamentals
Hadoop Cluster Installation and Configuration
Architecture of a Hadoop Cluster
DNS Configuration
Creating and Distributing SSH Keys
Downloading and Unpacking Hadoop Binaries
Setting up Environment Variables
Configuring the Master Node
Slave Nodes Configuration
Configuring Memory Allocation
Formating and Running HDFS
Configuring YARN as a Job Scheduler
Running and Monitoring HDFS
Running YARN
Spark Cluster Installation and Configuration
Preparing your System for Spark Installation
Installing Spark on the Master Node
Installing Spark On the Slave Nodes
Integrating Spark with YARN
Running the Spark Cluster
Configuring the Memory Allocation
Running a Spark Application on top of a YARN Cluster
Monitoring Your Spark Applications
Running Massive Datasets on Spark and Hadoop Clusters
Storing Massive Datasets on HDFS
Configure Jupyter Notebooks to access Spark and Hadoop Clusters
IT professionals, Data Scientists and Big Data Engineers who are interested to setup Hadoop and Spark Clusters on the cloud, and run massive datasets on top of this infrastructure.
There are no prerequisites for this course.
The participants who have successfully completed this course will be able to setup large cloud infrastructure for Big Data on a Linux environment.