Mathematics for Data Analysis & Decision Making
Class Time/Place: MWF 10:00 AM - 10:50 AM at Wellman 226
Instructor: Jesus A. De Loera
Office: 3228 Math. Sci. Building
Email: deloera@math.ucdavis.edu
Office Hours: Wed 11:00am-12:30pm, Fr: 3:10-4:30pm
(or by appointment).
TA:Ji Chen
Office: 3131 Math. Sci. Building
Email: ljichen@math.ucdavis.edu
Office Hours: Tuesday 2-4pm.
Course Description:
Mathematical models are at the heart of all
data science applications such as information searching (Google),
machine learning (e.g., face recognition algorithms), and all logistic
and planning challenges (e.g., airline-crew scheduling, social
network analysis). To make intelligent decisions math is indispensable!
This course discusses the mathematics used in the analysis of data and
the models used to make optimal decisions.
Methods include advanced
linear algebra, graph theory, optimization, probability, and geometry.
These are some
of the mathematical tools necessary for the data
classification, machine learning, clustering, pattern recognition,
and for planning scheduling, optimal allocation, and ranking.
This course is great for students who wish to learn the
mathematical theory behind data science and decision making algorithms
and software.
References:
Unfortunately there is not a unique undergraduate textbook that contains all
the relevant mathematics (yet!!).
I will share my notes with the class.
Students who give me corrections will receive extra credit.
Most of my notes are based on the following source, but you
are NOT required to buy them!
1) Optimization Models, by G. Calafiore and L. El Ghaoui, Cambridge Press, 2015
2) Matrix Methods in Data Mining and Pattern Recognition (Fundamentals of Algorithms), by Lars Elden, Published by SIAM
Note that this textbook has its official website: author's web site. There, you can find a lot of useful information (e.g., errata).
3) A gentle introduction to optimization, by B. Guenin, J. Koenemann, L. Tuncel
Cambridge University Press, 2015.
4) Who's #1? The science of rating and ranking, by A. Langville and C.D. Meyer.
Here is the
Syllabus (order may change)
Some Data Analysis and Decision Projects
- Project 1. (weeks 1-2 (6 lectures)) Linear Algebra models for Ranking and Learning from Data.
Eigenvalues and Singular Value Decompositions,
basic graph theory for Network analysis and ranking.
Modeling who is top-ranked. Finding key word
Pagerank algorithm and markov chains: How does Google work?
HOMEWORK: The recognition of a hand-written digit or ranking of electoral votes Analysis of text-documents through networks.
- Project 2. (weeks 3-4 (6 lectures)) Convex Optimization models for Supervised learning and decisions
First steps on
optimization models: linear & quadratic models.
Data Fitting/Regression vs sparse regression, Support vector Machines, LASSO, convex optimization basics. Semi-definite programs.
HOMEWORK: Diagnosis of cancer through Support vector machines. More on text-mining, identifying keywords of an author.
- Project 3. (weeks 5-6-7, 8 lectures) General continuous Non-linear Models
Non-linear programs, subgradients, Karush-Kuhn-Tucker optimality conditions, Mathematics of Neural networks, Gradient descent methods.
HOMEWORK: Stocks Index, choosing a stock portfolio through optimization, pricing, supply chain management.
- Project 4. (weeks 7-8-9-10, 9 lectures) Discrete Models
Integer programming, discrete optimization techniques: Scheduling, Optimal Packing bins and bags. Stable assignment problems. Homework Routing problems (shortest path), Scheduling and transportation problems (job/transplant allocation). HOMEWORK: Sudoku solver, Network analysis (shortest paths), knapsack.
- Final Project (Due final day) . Putting all together:
Mathematical models for optimal decisions require both nonlinear and discrete components. The final project will require you to go from data collection to
decision making. TBA.
Prerequisite and Expectations
- MAT 167 or equivalent (i.e., solid understanding of elementary linear algebra, beyond MAT 22A or MAT 67).
Mathematical maturity equivalent to at least one upper division course with proofs.
- Solid familiarity with programming is required. MATLAB will be
used in the class. The software SCIP will also be used in class.
- Although not required, having had MAT 168 before 160, would make this
class so much easier for you
- I will provide some tutorial for the software that we will use regularly. E.g., If do not know how to use MATLAB,
then you need to self-study using the MATLAB Primer and other material listed below.
- Create an account at the Math Department. Visit http://www.math.ucdavis.edu/comp/class-accts and follow the instructions.
It is important to create your account before you come to the Lab for the first time. You can then work either at the Undergraduate Computer Lab (2118 Math. Sci. Bldg.) or from any other lab in the campus or even from your home PC by remotely connecting to one of the departmental servers, such as [point,cosine,sine,tangent].math.ucdavis.edu. The lab is open 9am-5pm on weekdays.
- IMPORTANT WARNINGS:
- Attendance will not be taken, however, whether you are
able to attend class or not, you are responsible for
all the material presented in class.
- This is a 4 unit course! You are expected to work
3 hours at home for each hour of lecture. In other words,
expect to have 10 hours of homework each week.
- Trying to take this class without linear algebra is a bad idea!
Take 167 or 128B first! (ECS 130 is similar too).
- This course is computer heavy. Grade is based in 5
computer projects. If you are not comfortable with computers
already you are likely to encounter problems. I do not teach
how to program in this class.
- This course is a mathematics course. Proofs and justification is
expected on all your arguments.
Grading:
The grades will be calculated using the
average and standard deviation of the class. 100 points are possible
which will be divided as follows:
- 4 Regular Projects 15 points (the lowest score is dropped),
- 1 midterm (April 26, 2019) 20 points
- 1 Final Project (June 12, 2019) 35 points
Important rules will be followed:
- The due projects and other material will be posted at bottom of the course
web site.
LATE PROJECTS WILL NOT BE ACCEPTED. Please do not even ask me for an extension. I will simply not reply to such requests.
- Your work is not being graded solely from the final answer,
I expect you to write neatly and with organization and logic,
justify your reasoning, and show all missing details.
- I will distribute the homeworks via UC Davis email and CANVAS. Make sure
to read your UC Davis email and CANVAS announcements.
- I will assign some HW problems that require you to use MATLAB, SCIP, Python, or R.
- The projects will include writing code to investigate the application
topics presented in class and theory to understand methods.
SOFTWARE and other RESOURCES:
This class uses MATLAB and SCIP. For accessing the software necessary:
- Create an account at the Math Department. Visit
http://www.math.ucdavis.edu/comp/class-accts
and follow the instructions.
It is important to create your account before you
come to the Lab for the first time. You can then work either at the
Undergraduate Computer Lab (2118 Math. Sci. Bldg.) or from any other lab in the
campus or even from your home PC by remotely connecting to one of the
departmental servers, such as [fuzzy,cosine,sine,tangent].math.ucdavis.edu. The
lab is open 9am-5pm on weekdays.
- Use your own account at your own department if your department
has the MATLAB license. This is the case for most of the engineering
departments.
- Buy a Student Version of MATLAB at UCD Bookstore (costs about
$100).
- Install Octave system on your own PC, which is free
software and emulates MATLAB. Caution: Most likely you can do all
the lab exercises, but I have not tested all the exercises yet.
Visit the official web site of Octave at
http://www.octave.org for downloading and installing information.
An introduction to ZIMPL (the language used to program SCIP) is available
in ZIMPL Manual. THe best way to learn it is to
follow the numerous examples provided in the text.
For MATLAB, please take a look at the following highly useful MATLAB
primers and tutorials.