Mathematical Foundations for Big Data (Spring 2016)
Course: MAT 280
CRN: 49752
Title: Mathematical Foundations for Big Data
Class: MF 1:30pm-3:00pm, 2112 Math. Sci. Bldg.
Instructor: Thomas Strohmer
Office: 3144 MSB
Email:"my last name" at math.ucdavis.edu
Office Hours: By appointment
Course Objective:
Experiments, observations, and numerical simulations in many areas of science nowadays generate massive amounts of
data. This rapid growth heralds an era of "data-centric science," which requires new paradigms addressing how data
are acquired, processed, distributed, and analyzed. This course will cover
mathematical models and concepts for developing algorithms that can deal with some of the challenges posed by Big Data.
Prerequisite:
Linear algebra and a basic background in probability as well as basic experience in programming (preferably
Matlab) will be required. Some basic knowledge in optimization is recommended.
List of topics: (subject to minor changes)
- Principal Component Analysis, Singular Value Decomposition.
- Probability in high dimensions. Concentration of measure, matrix concentration inequalities.
Curses and blessings of dimensionality.
- Data clustering, community detection.
- Dimension reduction. Johnson-Lindenstrauss, sketching, random projections.
- Stochastic gradient descent.
- Kernel regression.
- Randomized numerical linear algebra.
- Compressive sensing. Efficient acquisition of data, sparsity, low-rank matrix recovery.
- Diffusion maps, manifold learning, intrinsic geometry of massive data sets.
- Some basics on Deep Learning (if time permits).
Textbooks:
There is no required textbook. The following books contains some material on these topics (but there is no need to
buy these books)
- C. Bishop. Pattern Recognition and Machine Learning.
- F. Cucker, D. X. Zho. Learning Theory: an approximation theory viewpoint.
- S. Foucart and H. Rauhut. A mathematical introduction to compressive sensing.
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning: Data Mining, Inference and Prediction.
- M. Mahoney. Randomized Algorithms for Matrices and Data.
Grading Scheme:
- 10% Scribing Lectures
- 30% Homework
- 60% Final Project
Scribing Lectures:
Depending on the class size, each student may have to scribe 1-2 lectures.
Scribe notes must be typeset in LaTeX. A template and more details will be posted later.
Homework:
I will assign homework about every other week.
A subset of these problems will be graded.
The homework will be announced
here. Late homework will not be accepted.
Final Project:
For the Final Project you need to write a report on one of the following topics: -
Describe how some of the methods you learned in this course will be
used in your research.
- Find a practical application yourself (not copying from
papers/books) using the methods you learned in this course; describe
how to use them; describe the importance of that application; what
impact would you expect if you are successful?
- A report describing a thorough numerical comparison of existing
algorithms related to one of the topics of this couse for a specific application or problem.
More information about the class can be found
here