This course will focus on the processing and analysis of large datasets (Big Data) while applying parallel machine learning techniques for handling these datasets. The use of tools like Apache Spark, Hadoop, MapReduce and NoSQL systems will be leveraged to speed up computation.
The majority of the course material will be drawn from textbooks and research papers.
On successful completion of this course, student should be able to:
Understand the concepts of Big Data
Build predictive systems that rely on large datasets.
Analyze data streams using appropriate technology (e.g. Apache Spark)
Process large datasets to extract valuable information
Plan and implement a strategy for big data management in an organization
Store and process unstructured data
Apply the appropriate parallel machine learning algorithms to reduce computation
Design graphical model solutions to problems
Overview
Big Data Infrastructure (e.g. Apache Hadoop + MapReduce)
Stream Processing using appropriate technology (e.g. Apache Spark)
Machine Learning systems for Big Data
NoSQL Systems for Big Data
Graph Analytics for Big Data
Clustering Analysis for Big Data
Recommendation Systems using Big Data
Big Data Management
Course work 60%
Final Written Examination (3 hrs). 40%
Students will be required to pass both the coursework and the final examination to pass the course.