Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Big Data Overview:
- Defining Big Data
- The driving forces behind Big Data's growing popularity
- Case Studies in Big Data implementation
- Key characteristics of Big Data
- Solutions for managing Big Data environments.
Hadoop and Its Core Components:
- Understanding Hadoop and its constituent parts.
- Hadoop architecture and its capabilities for handling and processing data.
- A brief history of Hadoop, along with an overview of organizations utilizing it and the motivations for adoption.
- A detailed examination of the Hadoop framework and its components.
- Explaining HDFS (Hadoop Distributed File System) and the mechanics of data reads and writes.
- Instructions for setting up a Hadoop cluster in various modes: standalone, pseudo-distributed, and multi-node.
(This module covers establishing a Hadoop cluster within virtual environments such as VirtualBox, KVM, or VMware, addressing critical network configurations, launching Hadoop daemons, and conducting cluster tests).
- Introduction to the MapReduce framework and its operational principles.
- Executing MapReduce jobs on a Hadoop cluster.
- Comprehending replication, mirroring, and rack awareness within Hadoop cluster contexts.
Hadoop Cluster Planning:
- Strategies for effectively planning a Hadoop cluster.
- Aligning hardware and software requirements for cluster planning.
- Analyzing workloads to design a cluster that prevents failures and ensures optimal performance.
Introduction to MapR and the Case for MapR:
- An overview of MapR and its architectural design.
- Understanding and utilizing MapR Control System, MapR Volumes, snapshots, and mirrors.
- Strategic planning for clusters within the MapR ecosystem.
- Comparative analysis of MapR against other distributions and Apache Hadoop.
- Procedures for MapR installation and cluster deployment.
Cluster Setup and Administration:
- Managing services, nodes, snapshots, mirrored volumes, and remote clusters.
- Strategies for understanding and managing nodes.
- Gaining insight into Hadoop components and installing them alongside MapR services.
- Accessing data on the cluster, including through NFS, while managing services and nodes.
- Comprehensive data management using volumes, user and group administration, role assignment to nodes, node commissioning and decommissioning, cluster administration, performance monitoring, configuring and analyzing metrics, and implementing MapR security protocols.
- Working with M7 native storage technology for MapR tables.
- Configuring and tuning the cluster for peak performance.
Cluster Upgrades and Integration:
- Upgrading MapR software versions and exploring different upgrade methods.
- Configuring MapR clusters to interact with HDFS clusters.
- Deploying MapR clusters on Amazon Elastic MapReduce.
All topics covered include demonstrations and practical exercises to provide learners with hands-on experience with the technology.
Requirements
- Fundamental knowledge of the Linux File System
- Basic proficiency in Java
- Familiarity with Apache Hadoop (recommended)
28 Hours
Testimonials (1)
practical things of doing, also theory was served good by Ajay