This is an advanced course: There seem to be no official pre-requisites
in the Syracuse University’s catalog system for taking this class.
Most students have already taken IST 687 - Introduction to Data Science,
which is a nice introduction to the field. However, students will be
expected to know programming in Python or R and have
some background in linear algebra, calculus, probability, and statistics as well. This means
that even if you register for the class, you might not have the necessary
background to fully take advantage of what this class has to offer.
If you are in doubt, take the following test, which you should be able to solve relatively easily
This course is a broad introduction to modern techniques in data science including elastic net regularized regression, random forest, gradient boosting, and deep learning. It emphasizes a statistical learning point of view, and a careful examination of generalization error, model interpretability, feature engineering, and bias-variance tradeoff.
The tool of choice is Apache Spark on Hadoop’s HDFS. The environment we use is Databricks Community Edition, which runs a highly customized version of the Jupyter Notebook.
The pre-requistes for this course are a basic knowledge of discrete mathematics, calculus, probability, and Python.
We use the following books:
- Python for Data Analysis (PFDA), 2nd Edition
- An introduction to Statistical Learning with Applications in R (ISLR) by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
- Spark: The Definitive Guide (STDG), Upcoming (expected 2018) by B. Chambers and M. Zaharia,
- Deep Learning (DL) by Ian Goodfellow, Yoshua Bengio, and Aaron Courville