Martin Mächler: Catalogue data in Autumn Semester 2018

Name Prof. em. Dr. Martin Mächler
Address
Seminar für Statistik (SfS)
ETH Zürich, HG GO 14.2
Rämistrasse 101
8092 Zürich
SWITZERLAND
Telephone+41 44 632 34 08
E-mailmaechler@stat.math.ethz.ch
URLhttp://stat.ethz.ch/~maechler
DepartmentMathematics
RelationshipRetired Adjunct Professor

NumberTitleECTSHoursLecturers
401-3620-68LStudent Seminar in Statistics: Statistical Learning with Sparsity Restricted registration - show details
Number of participants limited to 24.

Mainly for students from the Mathematics Bachelor and Master Programmes who, in addition to the introductory course unit 401-2604-00L Probability and Statistics, have heard at least one core or elective course in statistics. Also offered in the Master Programmes Statistics resp. Data Science.
4 credits2SM. Mächler, M. H. Maathuis, N. Meinshausen, S. van de Geer
AbstractWe study selected chapters from the 2015 book "Statistical Learning with Sparsity" by Trevor Hastie, Rob Tibshirani and Martin Wainwright.

(details see below)
ObjectiveDuring this seminar, we will study roughly one chapter per week from the book. You will obtain a good overview of the field of sparse & high-dimensional modeling of modern statistics.
Moreover, you will practice your self-studying and presentation skills.
Content(From the book's preface:) "... summarize the actively developing
field of statistical learning with sparsity.
A sparse statistical model is one having only a small number of nonzero parameters or weights. It represents a classic case of “less is more”: a sparse model can be much easier to estimate and interpret than a dense model.
In this age of big data, the number of features measured on a person or object can be large, and might be larger than the number of observations. The sparsity assumption allows us to tackle such problems and extract useful and reproducible patterns from big datasets."

For presentation of the material, occasionally you'd consider additional published research, possibly e.g., for "High-Dimensional Inference"
Lecture notesWebsite: with groups, FAQ, topics, slides, and Rscripts :
https://stat.ethz.ch/lectures/as18/seminar.php#course_materials
LiteratureTrevor Hastie, Robert Tibshirani, Martin Wainwright (2015)
Statistical Learning with Sparsity: The Lasso and Generalization
Monographs on Statistics and Applied Probability 143
Chapman Hall/CRC
ISBN 9781498712170

Access :

- https://www.taylorfrancis.com/books/9781498712170
(full access via ETH (library) network, if inside ETH (VPN))

- Author's website (includes errata, updated pdf, data):
https://web.stanford.edu/~hastie/StatLearnSparsity/
Prerequisites / NoticeWe require at least one course in statistics in addition to the 4th semester course Introduction to Probability and Statistics, as well as some experience with the statistical software R.

Topics will be assigned during the first meeting.
401-5640-00LZüKoSt: Seminar on Applied Statistics Information 0 credits1KM. Kalisch, R. Furrer, L. Held, T. Hothorn, M. H. Maathuis, M. Mächler, L. Meier, N. Meinshausen, M. Robinson, C. Strobl, S. van de Geer
AbstractAbout 5 talks on applied statistics.
ObjectiveSee how statistical methods are applied in practice.
ContentThere will be about 5 talks on how statistical methods are applied in practice.
Prerequisites / NoticeThis is no lecture. There is no exam and no credit points will be awarded. The current program can be found on the web:
http://stat.ethz.ch/events/zukost
Course language is English or German and may depend on the speaker.
401-6215-00LUsing R for Data Analysis and Graphics (Part I) Information Restricted registration - show details 1.5 credits1GM. Mächler, M. Tanadini
AbstractThe course provides the first part an introduction to the statistical software R (https://www.r-project.org/) for scientists. Topics covered are data generation and selection, graphical and basic statistical functions, creating simple functions, basic types of objects.
ObjectiveThe students will be able to use the software R for simple data analysis and graphics.
ContentThe course provides the first part of an introduction to the statistical software R for scientists. R is free software that contains a huge collection of functions with focus on statistics and graphics. If one wants to use R one has to learn the programming language R - on very rudimentary level. The course aims to facilitate this by providing a basic introduction to R.

Part I of the course covers the following topics:
- What is R?
- R Basics: reading and writing data from/to files, creating vectors & matrices, selecting elements of dataframes, vectors and matrices, arithmetics;
- Types of data: numeric, character, logical and categorical data, missing values;
- Simple (statistical) functions: summary, mean, var, etc., simple statistical tests;
- Writing simple functions;
- Introduction to graphics: scatter-, boxplots and other high-level plotting functions, embellishing plots by title, axis labels, etc., adding elements (lines, points) to existing plots.

The course focuses on practical work at the computer. We will make use of the graphical user interface RStudio: www.rstudio.org

Note: Part I of UsingR is complemented and extended by Part II, which is offered during the second part of the semester and which can be taken independently from Part I.
Lecture notesAn Introduction to R. http://stat.ethz.ch/CRAN/doc/contrib/Lam-IntroductionToR_LHL.pdf
Prerequisites / NoticeThe course resources will be provided via the Moodle web learning platform
Please login (with your ETH (or other University) username+password) at
https://moodle-app2.let.ethz.ch/course/view.php?id=1145

Choose the course "Using R for Data Analysis and Graphics" (there is at least one other course about "R", do not choose the wrong one!)
and follow the instructions for registration.
401-6217-00LUsing R for Data Analysis and Graphics (Part II) Information Restricted registration - show details 1.5 credits1GM. Mächler, M. Tanadini
AbstractThe course provides the second part an introduction to the statistical software R for scientists. Topics are data generation and selection, graphical functions, important statistical functions, types of objects, models, programming and writing functions.
Note: This part builds on "Using R... (Part I)", but can be taken independently if the basics of R are already known.
ObjectiveThe students will be able to use the software R efficiently for data analysis, graphics and simple programming
ContentThe course provides the second part of an introduction to the statistical software R (https://www.r-project.org/) for scientists. R is free software that contains a huge collection of functions with focus on statistics and graphics. If one wants to use R one has to learn the programming language R - on very rudimentary level. The course aims to facilitate this by providing a basic introduction to R.

Part II of the course builds on part I and covers the following additional topics:
- Elements of the R language: control structures (if, else, loops), lists, overview of R objects, attributes of R objects;
- More on R functions;
- Applying functions to elements of vectors, matrices and lists;
- Object oriented programming with R: classes and methods;
- Tayloring R: options
- Extending basic R: packages

The course focuses on practical work at the computer. We will make use of the graphical user interface RStudio: www.rstudio.org
Lecture notesAn Introduction to R. http://stat.ethz.ch/CRAN/doc/contrib/Lam-IntroductionToR_LHL.pdf
Prerequisites / NoticeBasic knowledge of R equivalent to "Using R .. (part 1)" ( = 401-6215-00L ) is a prerequisite for this course.

The course resources will be provided via the Moodle web learning platform
Please login (with your ETH (or other University) username+password) at
https://moodle-app2.let.ethz.ch/course/view.php?id=1145
Choose the course "Using R for Data Analysis and Graphics" and follow the instructions for registration.
447-6221-00LNonparametric Regression Restricted registration - show details
Special Students "University of Zurich (UZH)" in the Master Program in Biostatistics at UZH cannot register for this course unit electronically. Forward the lecturer's written permission to attend to the Registrar's Office. Alternatively, the lecturer may also send an email directly to Link. The Registrar's Office will then register you for the course.
1 credit1GM. Mächler
AbstractThis course focusses on nonparametric estimation of probability densities and regression functions. These recent methods allow modelling without restrictive assumptions such as 'linear function'. These smoothing methods require a weight function and a smoothing parameter. Focus is on one dimension, higher dimensions and samples of curves are treated briefly. Exercises at the computer.
ObjectiveKnowledge on estimation of probability densities and regression functions via various statistical methods.
Understanding of the choice of weight function and of the smoothing parameter, also done automatically.
Practical application on data sets at the computer.
447-6245-00LData Mining Information Restricted registration - show details
Special Students "University of Zurich (UZH)" in the Master Program in Biostatistics at UZH cannot register for this course unit electronically. Forward the lecturer's written permission to attend to the Registrar's Office. Alternatively, the lecturer may also send an email directly to Link. The Registrar's Office will then register you for the course.
1 credit1GM. Mächler
AbstractBlock course only on prediction problems, aka "supervised learning".

Part 1, Classification: logistic regression, linear/quadratic discriminant analysis, Bayes classifier; additive and tree models; further flexible ("nonparametric") methods.

Part 2, Flexible Prediction: additive models, MARS, Y-Transformation models (ACE,AVAS); Projection Pursuit Regression (PPR), neural nets.
Objective
Content"Data Mining" is a large field from which in this block course, we only treat so called prediction problems, aka "supervised learning".

Part 1, Classification, recalls logistic regression and linear / quadratic discriminant analysis (LDA/QDA) and extends these (in the framework of 'Bayes classifier") to (generalized) additive (GAM) and tree models (CART), and further
mentions other flexible ("nonparametric") methods.

Part 2, Flexible Prediction (of continuous or "class" response/target) contains additive models, MARS, Y-Transformation models (ACE, AVAS); Projection Pursuit Regression (PPR), neural nets.
Lecture notesThe block course is based on (German language) lecture notes.
Prerequisites / NoticeThe exercises are done exlusively with the (free, open source) software "R"
(http://www.r-project.org). A final exam will also happen at the computers, using R (and your brains!).