Diploma Programme
Course Summary
This course to develop students, both technically and academically, and produce focused graduates of high academic and practical standards to match the needs of both the Singapore and international IT industry. It will also meet the needs for skilled staff who will be required to extract actionable insight for large amounts of raw big data in order to enable better decision making within an organisation.
Course Objective
Candidates will be able to acquire knowledge and skills in
- Maximize the potential effectiveness of Big Data exercises by developing detailed insight into underlying business processes and data structures related to the subject matter;
- Learn about software tools in particular Hadoop and their respective applicability, strengths and weaknesses;
- Research up-to-date procedures, standards and techniques applicable to data analysis and apply these to an on-going enterprise data analysis project;
- Facilitate the implementation of Big Data derived insights and business process changes through effective communications, training, advocacy and engaging senior management;
- Effectively manage an enterprise data-related initiative, including specialized staffing, custom facilities and third party consultants as required;
- Perform at an intermediate level as a data engineer/ analyst and junior level as a data scientist;
- Solve simple statistical and mathematical problems related to very large data sets;
- Develop intermediate level programs in selected open source data analytic tools (Hadoop);
- Contribute meaningful input to staff discussions regarding statistical methods, data base technology, knowledge representation, cost effectiveness of tool sets, selection bias and machine learning;
- Work effectively with all three types of data – structured, semi-structures, unstructured;
- Learn about the practical applications of Big Data with a focus on select industries
Entry Criteria
- Candidates must have Bachelor degree.
- Candidates must have an IELTS score of 6.5 and above. Applicable to candidates with Bachelor’s degree whose medium of instruction is not English.
- Candidates with Diploma and minimum 5 years of working experience may apply.
- Candidate should not be barred by the Workforce Development Authority of Singapore (WDA) from taking assistance of funding schemes. Applicable to Singaporeans and Singaporean PRs only. (Note: WDA funding program is under Lithan Academy)
Course Structure
Module Name |
0. Intro to Data Science |
1. Data Analytics / R Programming |
2. Hadoop Architecture (1 and 2) |
3. Intro to NoSQL with Programming |
4. Big Data Vertical Application |
5. Big Data Project |
Total |
Module Outline
Core Modules:
- Intro to Data Science
Business today is being transformed by data-driven discovery and prediction. Skills required for data analytics at massive levels – scalable data management on and off the cloud, parallel algorithms, statistical modeling, and proficiency with a complex ecosystem of tools and platforms – span a variety of disciplines and are not easy to obtain through conventional curricula. Tour the basic techniques of data science, including both SQL and NoSQL solutions for massive data management (e.g., MapReduce and contemporaries), algorithms for data mining (e.g., clustering and association rule mining), and basic statistical modeling (e.g., linear and non-linear regression).
This course will cover the foundational topics in data science, namely:
- Data Manipulation
- Data Analysis with Statistics and Machine Learning
- Data Communication with Information Visualization
- Data at Scale - Working with Big Data
The course will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give students the opportunity to sample and apply the basic techniques of data science.
- Data Analytics / R Programming
This course is about learning the fundamental computing skills necessary for effective data analysis. Students will learn to program in R and to use R for reading data, writing functions, making informative graphs, and applying modern statistical methods. In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss the generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Subjects such as statistical data analysis and optimization will be provided as working examples.
- Hadoop Architecture
Big Data is a collection of large and complex data sets that is difficult to process using conventional database management tools or traditional data processing applications. Industries the world over experience difficulties in storing, retrieving and processing the ever increasing data volumes.
Hadoop is an open source software framework that supports data-intensive distributed applications. It is licensed under the Apache v2 license and is therefore generally known as Apache Hadoop. Hadoop is written in the Java programming language and is the highest-level Apache project constructed and used by a global community of contributors. Hadoop is an emerging domain. Many global MNCs, including Yahoo and Facebook, use Hadoop and consider it as an integral part of their functioning.
This course will present a mix of lecture and instructor-led demonstrations to explain what Apache Hadoop is and why it’s becoming a standard for large-scale data storage and processing. To further enhance students’ understanding, practical exercises will also be part of the course curriculum. It will cover the below:-
- What is Apache Hadoop?
- Fundamental Concepts
- HDFS: The Hadoop Distributed Filesystem
- MapReduce
- Using Apache Hadoop
- The Hadoop Ecosystem
- Intro to NoSQL with Programming
An Introduction to the NoSQL Database course, students will identify the key features, benefits and use cases of the NoSQL Database. They will be able to explain the concepts and terms related to NoSQL and identify key considerations, which they should keep in mind, while designing a schema and an application for a NoSQL Database. (note: it is to be noted that there are numerous NoSQL Databases and this course will endeavor to introduce them but with a focus on the popular ones.)
Students will learn to access and manipulate the NoSQL Database using the API’s. They are introduced to the API's that are used to read, write, and delete data in the NoSQL Database. Students will also learn to access the Admin Console and configure the store using the Admin Console. Learn to:
- Explain Big Data and NoSQL Database Concepts
- Identify and Use Java API's to access the NoSQL Database
- Access the Admin Console and Configure a Store
- Big Data Vertical Application
The vertical modules provide participants with a deep understanding of different analytic techniques required for specific industry sectors. We will be covering industries such as eCommerce, Social Media and Financial. The course enables students to become sensitive to diverse vertical sectors to which Big Data can be applied, and to be aware of its potential to offer solutions. Importantly, this course build on the knowledge, concepts and skills gained from the essential modules. Case studies will cover the following application:-
- Techniques
- Analytics
- Database design
- Etc.
- Big Data Project
This is a hands-on project course in which students are expected to form teams to complete intensive programming and analytics projects using the real-world example of a real world application that comes with its data and code bases. Experts may from LithanHall and/or the industry will be invited to help advise student projects, and students will have to present their final project presentations to an audience from the industry and LithanHall. Project topics include building on existing infrastructure tools, building big data apps, and analyzing industrial data using analytics. Access to data will be provided.