COMP6130

Course Aims:

This course will focus on the processing and analysis of large datasets (Big Data) while applying parallel machine learning techniques for handling these datasets. The use of tools like Apache Spark, Hadoop, MapReduce and NoSQL systems will be leveraged to speed up computation.
The majority of the course material will be drawn from textbooks and research papers.

Learning Outcomes:

On successful completion of this course, student should be able to:

 Understand the concepts of Big Data
 Build predictive systems that rely on large datasets.
 Analyze data streams using appropriate technology (e.g. Apache Spark)
 Process large datasets to extract valuable information
 Plan and implement a strategy for big data management in an organization
 Store and process unstructured data
 Apply the appropriate parallel machine learning algorithms to reduce computation
 Design graphical model solutions to problems

Syllabus:

Overview
- Introduction to Big Data
- Why Big Data?
- Characteristics of Big Data
Big Data Infrastructure (e.g. Apache Hadoop + MapReduce)
Stream Processing using appropriate technology (e.g. Apache Spark)
Machine Learning systems for Big Data
- Data Exploration
- Data Preparation
- Regression, Classification and Association Analysis
- Data Visualization
- Evaluation of Machine Learning Models

NoSQL Systems for Big Data
Graph Analytics for Big Data
Clustering Analysis for Big Data
Recommendation Systems using Big Data
Big Data Management

Course Assessment:

Course work 60%

In-course test. 10%
Projects (2). 40%
Homework assignment (2). 10%

Final Written Examination (3 hrs). 40%

Students will be required to pass both the coursework and the final examination to pass the course.

Online Systems

Student Services

Registration & Fees

Online Support

The University of the West Indies, Mona

The Department of Computing