Get in Touch

Course Outline

Big Data Overview:

  • Defining Big Data
  • The driving forces behind Big Data's growing popularity
  • Case Studies in Big Data implementation
  • Key characteristics of Big Data
  • Solutions for managing Big Data environments.

Hadoop and Its Core Components:

  • Understanding Hadoop and its constituent parts.
  • Hadoop architecture and its capabilities for handling and processing data.
  • A brief history of Hadoop, along with an overview of organizations utilizing it and the motivations for adoption.
  • A detailed examination of the Hadoop framework and its components.
  • Explaining HDFS (Hadoop Distributed File System) and the mechanics of data reads and writes.
  • Instructions for setting up a Hadoop cluster in various modes: standalone, pseudo-distributed, and multi-node.

(This module covers establishing a Hadoop cluster within virtual environments such as VirtualBox, KVM, or VMware, addressing critical network configurations, launching Hadoop daemons, and conducting cluster tests).

  • Introduction to the MapReduce framework and its operational principles.
  • Executing MapReduce jobs on a Hadoop cluster.
  • Comprehending replication, mirroring, and rack awareness within Hadoop cluster contexts.

Hadoop Cluster Planning:

  • Strategies for effectively planning a Hadoop cluster.
  • Aligning hardware and software requirements for cluster planning.
  • Analyzing workloads to design a cluster that prevents failures and ensures optimal performance.

Introduction to MapR and the Case for MapR:

  • An overview of MapR and its architectural design.
  • Understanding and utilizing MapR Control System, MapR Volumes, snapshots, and mirrors.
  • Strategic planning for clusters within the MapR ecosystem.
  • Comparative analysis of MapR against other distributions and Apache Hadoop.
  • Procedures for MapR installation and cluster deployment.

Cluster Setup and Administration:

  • Managing services, nodes, snapshots, mirrored volumes, and remote clusters.
  • Strategies for understanding and managing nodes.
  • Gaining insight into Hadoop components and installing them alongside MapR services.
  • Accessing data on the cluster, including through NFS, while managing services and nodes.
  • Comprehensive data management using volumes, user and group administration, role assignment to nodes, node commissioning and decommissioning, cluster administration, performance monitoring, configuring and analyzing metrics, and implementing MapR security protocols.
  • Working with M7 native storage technology for MapR tables.
  • Configuring and tuning the cluster for peak performance.

Cluster Upgrades and Integration:

  • Upgrading MapR software versions and exploring different upgrade methods.
  • Configuring MapR clusters to interact with HDFS clusters.
  • Deploying MapR clusters on Amazon Elastic MapReduce.

All topics covered include demonstrations and practical exercises to provide learners with hands-on experience with the technology.

Requirements

  • Fundamental knowledge of the Linux File System
  • Basic proficiency in Java
  • Familiarity with Apache Hadoop (recommended)
 28 Hours

Number of participants


Price per participant

Testimonials (1)

Upcoming Courses

Related Categories