Modular Programme – Data Analytics

Programme Summary

This modular course is intended for the candidates who would like to learn to store, manage, process and analyse massive amounts of unstructured data for competitive advantage, select and implement the correct Big Data stores and apply sophisticated analytic techniques and tools to process and analyse big data. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data.

The course provides an overview of how to plan and implement a Big Data solution and the various technologies that comprise Big Data. Many examples and exercises of Big Data systems are provided throughout the course. The programming examples are in Java but the primary focus is on best practices that can be applied to any supported programming language.

Those completing this course will be encouraged to take an accreditation test on Hadoop’s Big Data to obtain a professional certification at any time of their convenience.

Topics Covered

This modular course contains the following topics

Introduction to Big Data
Storing Big Data
Processing Big Data
Tools and Techniques to Analyze Big Data
Developing and Implementing a Big Data Strategy

Duration

Full Time: 5 days

Course Objectives

At the end of the course, candidates will learn the following:

An overview of Big Data
Know what is Hadoop
The basic Hadoop architecture
The features of the Hadoop Distributed Filesystem (HDFS)
The full MapReduce flow
Know how to write a MapReduce job in Java
Submitting a MapReduce job in the Hadoop cluster
The basic and advanced MapReduce API
Optimizing MapReduce jobs
The common MapReduce Algorithms
The different Hadoop Ecosystem Tools
Data management using Sqoop and Flume
Basic Programming with HBase
The basics of writing a Pig Latin Script
The different Hive commands
Importance of HCatalog
Writing Oozie workflows

Candidates will work through the following lab exercises using the Hadoop Data Platform:

Data manipulation using the HDFS command-line interface (CLI)
Writing and running a MapReduce job in Java
Using Eclipse to speed up MapReduce development in Java
Testing MapReduce jobs using the MRUnit Test and LocalJobRunner
Practicing the common MapReduce Algorithms
How to optimize MapReduce jobs
Importing data from MySQL to HDFS using Sqoop
Split and join datasets using Pig
Data manipulation using Hive

Module Outline

Core Modules:

Computing Environment	The current mix of computing resources and demands that motivates use of a technology like Apache Hadoop
Hadoop Distributed File System	How files are stored and managed in HDFS; the infrastructure that supports HDFS
MapReduce	The phases of execution and framework for running a MapReduce job. Expected properties of job runs based on number of mappers, number of reducers and distribution of data
Hadoop API	The Java classes that make up the API for developers who wish to write Apache Hadoop MapReduce jobs
Hadoop Platform	The basic purpose, design and operation of tools that augment the Apache Hadoop core to make a comprehensive platform, including Hadoop Streaming, fuse-dfs, Apache Hive, Apache Pig, Apache Flume, Apache Sqoop, Apache HBase, Apache Oozie and HUE

Delivery Format

This modular course delivered as per the following
- Class room learning is 60 % of the duration
- Hands on tutorials is the balance 40 % of the duration

Applicable to Singaporeans/Singapore PRs ONLY

The candidates who enrol under the Workforce Development Authority (WDA) funding program under Lithan Academy need to complete the assessments at the end of the module.
The candidates who enrolled under Workforce Development Authority (WDA) funding program under Lithan Academy will be issued their Statement of Appointment (SOA) after successfully completing the module.

Modular Programme – Data Analytics

Programme Summary

Topics Covered

Duration

Course Objectives

Module Outline

Delivery Format

About Us

Why GNosis

Our Offerings

Contact Us