IST 718: Big Data Analytics

1 minute read

This is an advanced course: There seem to be no official pre-requisites in the Syracuse University’s catalog system for taking this class. Most students have already taken IST 687 - Introduction to Data Science, which is a nice introduction to the field. However, students will be expected to know programming in Python or R and have some background in linear algebra, calculus, probability, and statistics as well. This means that even if you register for the class, you might not have the necessary background to fully take advantage of what this class has to offer.
If you are in doubt, take the following test, which you should be able to solve relatively easily
Preliminary test

Goal

This course is a broad introduction to modern techniques in data science including elastic net regularized regression, random forest, gradient boosting, and deep learning. It emphasizes a statistical learning point of view, and a careful examination of generalization error, model interpretability, feature engineering, and bias-variance tradeoff.

Tools

The tool of choice is Apache Spark on Hadoop’s HDFS. The environment we use is Databricks Community Edition, which runs a highly customized version of the Jupyter Notebook.

Prerequisites

The pre-requistes for this course are a basic knowledge of discrete mathematics, calculus, probability, and Python.

We use the following books:

  1. Python for Data Analysis (PFDA), 2nd Edition
  2. An introduction to Statistical Learning with Applications in R (ISLR) by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
  3. Spark: The Definitive Guide (STDG), Upcoming (expected 2018) by B. Chambers and M. Zaharia,
  4. Deep Learning (DL) by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Syllabus

Categories:

Updated: