Teaching

IST 718: Big Data Analytics (formerly Advanced Information Analytics)

Fall 2016, Spring 2017, Fall 2017, Spring 2018, Maymester 2018, Fall 2018, Spring 2019

Goal
This course is a broad introduction to modern techniques in data science including elastic net regularized regression, random forest, gradient boosting, and deep learning. It emphasizes a statistical learning point of view, and a careful examination of generalization error, model interpretability, feature engineering, and bias-variance tradeoff.

Tools
The tool of choice is Apache Spark on Hadoop’s HDFS. We use an environment based on Jupyter Notebook and Python, deployed with Kubernetes.

Read more