Course Description

This course covers fundamental topics of big data and natural language processing. Topics include Hadoop, MapReduce, data pre-processing techniques, recurrent neural networks, n-gram, and Naïve Bayes methods. 

Course Outline

Module 0: Introduction 
Module 1: Big Data Analytics
•    Module 1.1: Introduction to Big Data
•    Module 1.2: Big Data Storage and Processing
Module 2: Hadoop and Map-Reduce
•    Module 2.1: Hadoop Stack and Execution Environment
•    Module 2.2: Hands-on: Setting up a Hadoop Cluster
•    Module 2.3: Hands-On: Running Map-Reduce on Hadoop
Module 3: Data, Quality, and Text Pre-Processing
•    Module 3.1: Types of Data
•    Module 3.2: Data Quality and Cleaning
•    Module 3.3: Text Pre-Processing
•    Module 3.4: Hands-on: Clean and Pre-Process Structured Data
Module 4: NLP Models 
•    Module 4.2: N-Grams
•    Module 4.3: Naïve Bayes
•    Module 4.4: RNN
Module 5 NLP Applications (optional)
•    Module 5.1: Sentiment Analysis
•    Module 5.2: Entity Resolution

Learner Outcomes

•    Describe basic methods for distributed processing of big datasets
•    Describe different layers of the Hadoop eco-system
•    Implement and deploy simple Map Reduce programs in Hadoop
•    Apply pre-processing techniques on NLP datasets
•    Demonstrate understanding of core NLP models such as RNN, n-gram, Naïve Bayes 
•    Demonstrate knowledge on NLP algorithms for sentiment analysis and entity resolution 
Thank you for your interest in this course. Unfortunately, the course you have selected is currently not open for enrollment. Please complete a Course Inquiry so that we may promptly notify you when enrollment opens.