Search result: Catalogue data in Autumn Semester 2018

DAS in Data Science Information
Core Courses
Foundations Courses
NumberTitleTypeECTSHoursLecturers
227-0427-00LSignal Analysis, Models, and Machine LearningW6 credits4GH.‑A. Loeliger
AbstractMathematical methods in signal processing and machine learning.
I. Linear signal representation and approximation: Hilbert spaces, LMMSE estimation, regularization and sparsity.
II. Learning linear and nonlinear functions and filters: neural networks, kernel methods.
III. Structured statistical models: hidden Markov models, factor graphs, Kalman filter, Gaussian models with sparse events.
Learning objectiveThe course is an introduction to some basic topics in signal processing and machine learning.
ContentPart I - Linear Signal Representation and Approximation: Hilbert spaces, least squares and LMMSE estimation, projection and estimation by linear filtering, learning linear functions and filters, L2 regularization, L1 regularization and sparsity, singular-value decomposition and pseudo-inverse, principal-components analysis.
Part II - Learning Nonlinear Functions: fundamentals of learning, neural networks, kernel methods.
Part III - Structured Statistical Models and Message Passing Algorithms: hidden Markov models, factor graphs, Gaussian message passing, Kalman filter and recursive least squares, Monte Carlo methods, parameter estimation, expectation maximization, linear Gaussian models with sparse events.
Lecture notesLecture notes.
Prerequisites / NoticePrerequisites:
- local bachelors: course "Discrete-Time and Statistical Signal Processing" (5. Sem.)
- others: solid basics in linear algebra and probability theory
Capstone Project
NumberTitleTypeECTSHoursLecturers
266-0100-00LCapstone Project Restricted registration - show details
Does not take place this semester.
Only for DAS in Data Science.
O8 credits17AProfessors
Abstract
Learning objective
Specialisation Track
Hardware for Machine Learning
Wird im Frühjahrssemester angeboten.
Image Analysis & Computer Vision
NumberTitleTypeECTSHoursLecturers
263-5902-00LComputer Vision Information W6 credits3V + 1U + 1AM. Pollefeys, V. Ferrari, L. Van Gool
AbstractThe goal of this course is to provide students with a good understanding of computer vision and image analysis techniques. The main concepts and techniques will be studied in depth and practical algorithms and approaches will be discussed and explored through the exercises.
Learning objectiveThe objectives of this course are:
1. To introduce the fundamental problems of computer vision.
2. To introduce the main concepts and techniques used to solve those.
3. To enable participants to implement solutions for reasonably complex problems.
4. To enable participants to make sense of the computer vision literature.
ContentCamera models and calibration, invariant features, Multiple-view geometry, Model fitting, Stereo Matching, Segmentation, 2D Shape matching, Shape from Silhouettes, Optical flow, Structure from motion, Tracking, Object recognition, Object category recognition
Prerequisites / NoticeIt is recommended that students have taken the Visual Computing lecture or a similar course introducing basic image processing concepts before taking this course.
Neural Information Processing
NumberTitleTypeECTSHoursLecturers
227-1033-00LNeuromorphic Engineering I Information Restricted registration - show details
Registration in this class requires the permission of the instructors. Class size will be limited to available lab spots.
Preference is given to students that require this class as part of their major.
W6 credits2V + 3UT. Delbrück, G. Indiveri, S.‑C. Liu
AbstractThis course covers analog circuits with emphasis on neuromorphic engineering: MOS transistors in CMOS technology, static circuits, dynamic circuits, systems (silicon neuron, silicon retina, silicon cochlea) with an introduction to multi-chip systems. The lectures are accompanied by weekly laboratory sessions.
Learning objectiveUnderstanding of the characteristics of neuromorphic circuit elements.
ContentNeuromorphic circuits are inspired by the organizing principles of biological neural circuits. Their computational primitives are based on physics of semiconductor devices. Neuromorphic architectures often rely on collective computation in parallel networks. Adaptation, learning and memory are implemented locally within the individual computational elements. Transistors are often operated in weak inversion (below threshold), where they exhibit exponential I-V characteristics and low currents. These properties lead to the feasibility of high-density, low-power implementations of functions that are computationally intensive in other paradigms. Application domains of neuromorphic circuits include silicon retinas and cochleas for machine vision and audition, real-time emulations of networks of biological neurons, and the development of autonomous robotic systems. This course covers devices in CMOS technology (MOS transistor below and above threshold, floating-gate MOS transistor, phototransducers), static circuits (differential pair, current mirror, transconductance amplifiers, etc.), dynamic circuits (linear and nonlinear filters, adaptive circuits), systems (silicon neuron, silicon retina and cochlea) and an introduction to multi-chip systems that communicate events analogous to spikes. The lectures are accompanied by weekly laboratory sessions on the characterization of neuromorphic circuits, from elementary devices to systems.
LiteratureS.-C. Liu et al.: Analog VLSI Circuits and Principles; various publications.
Prerequisites / NoticeParticular: The course is highly recommended for those who intend to take the spring semester course 'Neuromorphic Engineering II', that teaches the conception, simulation, and physical layout of such circuits with chip design tools.

Prerequisites: Background in basics of semiconductor physics helpful, but not required.
Statistics
NumberTitleTypeECTSHoursLecturers
401-0625-01LApplied Analysis of Variance and Experimental Design Information W5 credits2V + 1UL. Meier
AbstractPrinciples of experimental design, one-way analysis of variance, contrasts and multiple comparisons, multi-factor designs and analysis of variance, complete block designs, Latin square designs, random effects and mixed effects models, split-plot designs, incomplete block designs, two-series factorials and fractional designs, power.
Learning objectiveParticipants will be able to plan and analyze efficient experiments in the fields of natural sciences. They will gain practical experience by using the software R.
ContentPrinciples of experimental design, one-way analysis of variance, contrasts and multiple comparisons, multi-factor designs and analysis of variance, complete block designs, Latin square designs, random effects and mixed effects models, split-plot designs, incomplete block designs, two-series factorials and fractional designs, power.
LiteratureG. Oehlert: A First Course in Design and Analysis of Experiments, W.H. Freeman and Company, New York, 2000.
Prerequisites / NoticeThe exercises, but also the classes will be based on procedures from the freely available, open-source statistical software R, for which an introduction will be held.
401-0649-00LApplied Statistical RegressionW5 credits2V + 1UM. Dettling
AbstractThis course offers a practically oriented introduction into regression modeling methods. The basic concepts and some mathematical background are included, with the emphasis lying in learning "good practice" that can be applied in every student's own projects and daily work life. A special focus will be laid in the use of the statistical software package R for regression analysis.
Learning objectiveThe students acquire advanced practical skills in linear regression analysis and are also familiar with its extensions to generalized linear modeling.
ContentThe course starts with the basics of linear modeling, and then proceeds to parameter estimation, tests, confidence intervals, residual analysis, model choice, and prediction. More rarely touched but practically relevant topics that will be covered include variable transformations, multicollinearity problems and model interpretation, as well as general modeling strategies.

The last third of the course is dedicated to an introduction to generalized linear models: this includes the generalized additive model, logistic regression for binary response variables, binomial regression for grouped data and poisson regression for count data.
Lecture notesA script will be available.
LiteratureFaraway (2005): Linear Models with R
Faraway (2006): Extending the Linear Model with R
Draper & Smith (1998): Applied Regression Analysis
Fox (2008): Applied Regression Analysis and GLMs
Montgomery et al. (2006): Introduction to Linear Regression Analysis
Prerequisites / NoticeThe exercises, but also the classes will be based on procedures from the freely available, open-source statistical software package R, for which an introduction will be held.

In the Mathematics Bachelor and Master programmes, the two course units 401-0649-00L "Applied Statistical Regression" and 401-3622-00L "Regression" are mutually exclusive. Registration for the examination of one of these two course units is only allowed if you have not registered for the examination of the other course unit.
401-3612-00LStochastic SimulationW5 credits3GF. Sigrist
AbstractThis course provides an introduction to statistical Monte Carlo methods. This includes applications of simulations in various fields (Bayesian statistics, statistical mechanics, operations research, financial mathematics), algorithms for the generation of random variables (accept-reject, importance sampling), estimating the precision, variance reduction, introduction to Markov chain Monte Carlo.
Learning objectiveStochastic simulation (also called Monte Carlo method) is the experimental analysis of a stochastic model by implementing it on a computer. Probabilities and expected values can be approximated by averaging simulated values, and the central limit theorem gives an estimate of the error of this approximation. The course shows examples of the many applications of stochastic simulation and explains different algorithms used for simulation. These algorithms are illustrated with the statistical software R.
ContentExamples of simulations in different fields (computer science, statistics, statistical mechanics, operations research, financial mathematics). Generation of uniform random variables. Generation of random variables with arbitrary distributions (quantile transform, accept-reject, importance sampling), simulation of Gaussian processes and diffusions. The precision of simulations, methods for variance reduction. Introduction to Markov chains and Markov chain Monte Carlo (Metropolis-Hastings, Gibbs sampler, Hamiltonian Monte Carlo, reversible jump MCMC).
Lecture notesA script will be available in English.
LiteratureP. Glasserman, Monte Carlo Methods in Financial Engineering.
Springer 2004.

B. D. Ripley. Stochastic Simulation. Wiley, 1987.

Ch. Robert, G. Casella. Monte Carlo Statistical Methods.
Springer 2004 (2nd edition).
Prerequisites / NoticeFamiliarity with basic concepts of probability theory (random variables, joint and conditional distributions, laws of large numbers and central limit theorem) will be assumed.
401-3621-00LFundamentals of Mathematical Statistics Information W10 credits4V + 1US. van de Geer
AbstractThe course covers the basics of inferential statistics.
Learning objective
401-4623-00LTime Series AnalysisW6 credits3GN. Meinshausen
AbstractStatistical analysis and modeling of observations in temporal order, which exhibit dependence. Stationarity, trend estimation, seasonal decomposition, autocorrelations,
spectral and wavelet analysis, ARIMA-, GARCH- and state space models. Implementations in the software R.
Learning objectiveUnderstanding of the basic models and techniques used in time series analysis and their implementation in the statistical software R.
ContentThis course deals with modeling and analysis of variables which change randomly in time. Their essential feature is the dependence between successive observations.
Applications occur in geophysics, engineering, economics and finance. Topics covered: Stationarity, trend estimation, seasonal decomposition, autocorrelations,
spectral and wavelet analysis, ARIMA-, GARCH- and state space models. The models and techniques are illustrated using the statistical software R.
Lecture notesNot available
LiteratureA list of references will be distributed during the course.
Prerequisites / NoticeBasic knowledge in probability and statistics
401-3628-14LBayesian Statistics
Does not take place this semester.
W4 credits2V
AbstractIntroduction to the Bayesian approach to statistics: Decision theory, prior distributions, hierarchical Bayes models, Bayesian tests and model selection, empirical Bayes, computational methods, Laplace approximation, Monte Carlo and Markov chain Monte Carlo methods.
Learning objectiveStudents understand the conceptual ideas behind Bayesian statistics and are familiar with common techniques used in Bayesian data analysis.
ContentTopics that we will discuss are:

Difference between the frequentist and Bayesian approach (decision theory, principles), priors (conjugate priors, Jeffreys priors), tests and model selection (Bayes factors, hyper-g priors in regression),hierarchical models and empirical Bayes methods, computational methods (Laplace approximation, Monte Carlo and Markov chain Monte Carlo methods)
Lecture notesA script will be available in English.
LiteratureChristian Robert, The Bayesian Choice, 2nd edition, Springer 2007.

A. Gelman et al., Bayesian Data Analysis, 3rd edition, Chapman & Hall (2013).

Additional references will be given in the course.
Prerequisites / NoticeFamiliarity with basic concepts of frequentist statistics and with basic concepts of probability theory (random variables, joint and conditional distributions, laws of large numbers and central limit theorem) will be assumed.
Machine Learning and Artificial Intelligence
NumberTitleTypeECTSHoursLecturers
227-0689-00LSystem IdentificationW4 credits2V + 1UR. Smith
AbstractTheory and techniques for the identification of dynamic models from experimentally obtained system input-output data.
Learning objectiveTo provide a series of practical techniques for the development of dynamical models from experimental data, with the emphasis being on the development of models suitable for feedback control design purposes. To provide sufficient theory to enable the practitioner to understand the trade-offs between model accuracy, data quality and data quantity.
ContentIntroduction to modeling: Black-box and grey-box models; Parametric and non-parametric models; ARX, ARMAX (etc.) models.

Predictive, open-loop, black-box identification methods. Time and frequency domain methods. Subspace identification methods.

Optimal experimental design, Cramer-Rao bounds, input signal design.

Parametric identification methods. On-line and batch approaches.

Closed-loop identification strategies. Trade-off between controller performance and information available for identification.
Literature"System Identification; Theory for the User" Lennart Ljung, Prentice Hall (2nd Ed), 1999.

"Dynamic system identification: Experimental design and data analysis", GC Goodwin and RL Payne, Academic Press, 1977.
Prerequisites / NoticeControl systems (227-0216-00L) or equivalent.
252-0535-00LAdvanced Machine Learning Information W8 credits3V + 2U + 2AJ. M. Buhmann
AbstractMachine learning algorithms provide analytical methods to search data sets for characteristic patterns. Typical tasks include the classification of data, function fitting and clustering, with applications in image and speech analysis, bioinformatics and exploratory data analysis. This course is accompanied by practical machine learning projects.
Learning objectiveStudents will be familiarized with advanced concepts and algorithms for supervised and unsupervised learning; reinforce the statistics knowledge which is indispensible to solve modeling problems under uncertainty. Key concepts are the generalization ability of algorithms and systematic approaches to modeling and regularization. Machine learning projects will provide an opportunity to test the machine learning algorithms on real world data.
ContentThe theory of fundamental machine learning concepts is presented in the lecture, and illustrated with relevant applications. Students can deepen their understanding by solving both pen-and-paper and programming exercises, where they implement and apply famous algorithms to real-world data.

Topics covered in the lecture include:

Fundamentals:
What is data?
Bayesian Learning
Computational learning theory

Supervised learning:
Ensembles: Bagging and Boosting
Max Margin methods
Neural networks

Unsupservised learning:
Dimensionality reduction techniques
Clustering
Mixture Models
Non-parametric density estimation
Learning Dynamical Systems
Lecture notesNo lecture notes, but slides will be made available on the course webpage.
LiteratureC. Bishop. Pattern Recognition and Machine Learning. Springer 2007.

R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley &
Sons, second edition, 2001.

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical
Learning: Data Mining, Inference and Prediction. Springer, 2001.

L. Wasserman. All of Statistics: A Concise Course in Statistical
Inference. Springer, 2004.
Prerequisites / NoticeThe course requires solid basic knowledge in analysis, statistics and numerical methods for CSE as well as practical programming experience for solving assignments.
Students should have followed at least "Introduction to Machine Learning" or an equivalent course offered by another institution.
263-2400-00LReliable and Interpretable Artificial Intelligence Information W4 credits2V + 1UM. Vechev
AbstractCreating reliable and explainable probabilistic models is a fundamental challenge to solving the artificial intelligence problem. This course covers some of the latest and most exciting advances that bring us closer to constructing such models.
Learning objectiveThe main objective of this course is to expose students to the latest and most exciting research in the area of explainable and interpretable artificial intelligence, a topic of fundamental and increasing importance. Upon completion of the course, the students should have mastered the underlying methods and be able to apply them to a variety of problems.

To facilitate deeper understanding, an important part of the course will be a group hands-on programming project where students will build a system based on the learned material.
ContentThe course covers the following inter-connected directions.

Part I: Robust and Explainable Deep Learning
-------------------------------------------------------------

Deep learning technology has made impressive advances in recent years. Despite this progress however, the fundamental challenge with deep learning remains that of understanding what a trained neural network has actually learned, and how stable that solution is. Forr example: is the network stable to slight perturbations of the input (e.g., an image)? How easy it is to fool the network into mis-classifying obvious inputs? Can we guide the network in a manner beyond simple labeled data?

Topics:
- Attacks: Finding adversarial examples via state-of-the-art attacks (e.g., FGSM, PGD attacks).
- Defenses: Automated methods and tools which guarantee robustness of deep nets (e.g., using abstract domains, mixed-integer solvers)
- Combing differentiable logic with gradient-based methods so to train networks to satisfy richer properties.
- Frameworks: AI2, DiffAI, Reluplex, DQL, DeepPoly, etc.

Part II: Program Synthesis/Induction
------------------------------------------------

Synthesis is a new frontier in AI where the computer programs itself via user provided examples. Synthesis has significant applications for non-programmers as well as for programmers where it can provide massive productivity increase (e.g., wrangling for data scientists). Modern synthesis techniques excel at learning functions over discrete spaces from (partial) intent. There have been a number of recent, exciting breakthroughs in techniques that discover complex, interpretable/explainable functions from few examples, partial sketches and other forms of supervision.

Topics:
- Theory of program synthesis: version spaces, counter-example guided inductive synthesis (CEGIS) with SAT/SMT, lower bounds on learning.
- Applications of techniques: synthesis for end users (e.g., spreadsheets) and data analytics.
- Combining synthesis with learning: application to learning from code.
- Frameworks: PHOG, DeepCode.

Part III: Probabilistic Programming
----------------------------------------------

Probabilistic programming is an emerging direction, recently also pushed by various companies (e.g., Facebook, Uber, Google) whose goal is democratize the construction of probabilistic models. In probabilistic programming, the user specifies a model while inference is left to the underlying solver. The idea is that the higher level of abstraction makes it easier to express, understand and reason about probabilistic models.

Topics:

- Probabilistic Inference: sampling based, exact symbolic inference, semantics
- Applications of probabilistic programming: bias in deep learning, differential privacy (connects to Part I).
- Frameworks: PSI, Edward2, Venture.
Prerequisites / NoticeThe course material is self-contained: needed background is covered in the lectures and exercises, and additional pointers.
263-3210-00LDeep Learning Information Restricted registration - show details
Number of participants limited to 300.
W4 credits2V + 1UF. Perez Cruz
AbstractDeep learning is an area within machine learning that deals with algorithms and models that automatically induce multi-level data representations.
Learning objectiveIn recent years, deep learning and deep networks have significantly improved the state-of-the-art in many application domains such as computer vision, speech recognition, and natural language processing. This class will cover the mathematical foundations of deep learning and provide insights into model design, training, and validation. The main objective is a profound understanding of why these methods work and how. There will also be a rich set of hands-on tasks and practical projects to familiarize students with this emerging technology.
Prerequisites / NoticeThis is an advanced level course that requires some basic background in machine learning. More importantly, students are expected to have a very solid mathematical foundation, including linear algebra, multivariate calculus, and probability. The course will make heavy use of mathematics and is not (!) meant to be an extended tutorial of how to train deep networks with tools like Torch or Tensorflow, although that may be a side benefit.

The participation in the course is subject to the following conditions:
1) The number of participants is limited to 300 students (MSc and PhDs).
2) Students must have taken the exam in Machine Learning (252-0535-00) or have acquired equivalent knowledge, see exhaustive list below:

Machine Learning
https://ml2.inf.ethz.ch/courses/ml/

Computational Intelligence Lab
http://da.inf.ethz.ch/teaching/2018/CIL/

Learning and Intelligent Systems/Introduction to Machine Learning
https://las.inf.ethz.ch/teaching/introml-S18

Statistical Learning Theory
http://ml2.inf.ethz.ch/courses/slt/

Computational Statistics
https://stat.ethz.ch/lectures/ss18/comp-stats.php

Probabilistic Artificial Intelligence
https://las.inf.ethz.ch/teaching/pai-f17

Data Mining: Learning from Large Data Sets
https://las.inf.ethz.ch/teaching/dm-f17
263-5210-00LProbabilistic Artificial Intelligence Information W4 credits2V + 1UA. Krause
AbstractThis course introduces core modeling techniques and algorithms from statistics, optimization, planning, and control and study applications in areas such as sensor networks, robotics, and the Internet.
Learning objectiveHow can we build systems that perform well in uncertain environments and unforeseen situations? How can we develop systems that exhibit "intelligent" behavior, without prescribing explicit rules? How can we build systems that learn from experience in order to improve their performance? We will study core modeling techniques and algorithms from statistics, optimization, planning, and control and study applications in areas such as sensor networks, robotics, and the Internet. The course is designed for upper-level undergraduate and graduate students.
ContentTopics covered:
- Search (BFS, DFS, A*), constraint satisfaction and optimization
- Tutorial in logic (propositional, first-order)
- Probability
- Bayesian Networks (models, exact and approximative inference, learning) - Temporal models (Hidden Markov Models, Dynamic Bayesian Networks)
- Probabilistic palnning (MDPs, POMPDPs)
- Reinforcement learning
- Combining logic and probability
Prerequisites / NoticeSolid basic knowledge in statistics, algorithms and programming
Big Data Systems
NumberTitleTypeECTSHoursLecturers
252-0834-00LInformation Systems for Engineers Information W4 credits2V + 1UG. Fourny
AbstractThis course provides the basics of relational databases from the perspective of the user.

We will discover why tables are so incredibly powerful to express relations, learn the SQL query language, and how to make the most of it. The course also covers support for data cubes (analytics).

After this course, you will be ready for Big Data for Engineers.
Learning objectiveAfter visiting this course, you will be capable to:

1. Explain, in the big picture, how a relational database works and what it can do in your own words.

2. Explain the relational data model (tables, rows, attributes, primary keys, foreign keys), formally and informally, including the relational algebra operators (select, project, rename, all kinds of joins, division, cartesian product, union, intersection, etc).

3. Perform non-trivial reading SQL queries on existing relational databases, as well as insert new data, update and delete existing data.

4. Design new schemas to store data in accordance to the real world's constraints, such as relationship cardinality

5. Explain what bad design is and why it matters.

6. Adapt and improve an existing schema to make it more robust against anomalies, thanks to a very good theoretical knowledge of what is called "normal forms".

7. Understand how indices work (hash indices, B-trees), how they are implemented, and how to use them to make queries faster.

8. Access an existing relational database from a host language such as Java, using bridges such as JDBC.

9. Explain what data independence is all about and didn't age a bit since the 1970s.

10. Explain, in the big picture, how a relational database is physically implemented.

11. Know and deal with the natural syntax for relational data, CSV.

12. Explain the data cube model including slicing and dicing.

13. Store data cubes in a relational database.

14. Map cube queries to SQL.

15. Slice and dice cubes in a UI.

And of course, you will think that tables are the most wonderful object in the world.
ContentUsing a relational database
=================
1. Introduction
2. The relational model
3. Data definition with SQL
4. The relational algebra
5. Queries with SQL

Taking a relational database to the next level
=================
6. Database design theory
7. Databases and host languages
8. Databases and host languages
9. Indices and optimization
10. Database architecture and storage

Analytics on top of a relational database
=================
12. Data cubes

Outlook
=================
13. Outlook
Literature- Lecture material (slides).

- Book: "Database Systems: The Complete Book", H. Garcia-Molina, J.D. Ullman, J. Widom
(It is not required to buy the book, as the library has it)
Prerequisites / NoticeFor non-CS/DS students only, BSc and MSc
Elementary knowledge of set theory and logics
Knowledge as well as basic experience with a programming language such as Pascal, C, C++, Java, Haskell, Python
263-2800-00LDesign of Parallel and High-Performance Computing Information W7 credits3V + 2U + 1AT. Hoefler, M. Püschel
AbstractAdvanced topics in parallel / concurrent programming.
Learning objectiveUnderstand concurrency paradigms and models from a higher perspective and acquire skills for designing, structuring and developing possibly large concurrent software systems. Become able to distinguish parallelism in problem space and in machine space. Become familiar with important technical concepts and with concurrency folklore.
263-3010-00LBig Data Information Restricted registration - show details W8 credits3V + 2U + 2AG. Fourny
AbstractThe key challenge of the information society is to turn data into information, information into knowledge, knowledge into value. This has become increasingly complex. Data comes in larger volumes, diverse shapes, from different sources. Data is more heterogeneous and less structured than forty years ago. Nevertheless, it still needs to be processed fast, with support for complex operations.
Learning objectiveThis combination of requirements, together with the technologies that have emerged in order to address them, is typically referred to as "Big Data." This revolution has led to a completely new way to do business, e.g., develop new products and business models, but also to do science -- which is sometimes referred to as data-driven science or the "fourth paradigm".

Unfortunately, the quantity of data produced and available -- now in the Zettabyte range (that's 21 zeros) per year -- keeps growing faster than our ability to process it. Hence, new architectures and approaches for processing it were and are still needed. Harnessing them must involve a deep understanding of data not only in the large, but also in the small.

The field of databases evolves at a fast pace. In order to be prepared, to the extent possible, to the (r)evolutions that will take place in the next few decades, the emphasis of the lecture will be on the paradigms and core design ideas, while today's technologies will serve as supporting illustrations thereof.

After visiting this lecture, you should have gained an overview and understanding of the Big Data landscape, which is the basis on which one can make informed decisions, i.e., pick and orchestrate the relevant technologies together for addressing each business use case efficiently and consistently.
ContentThis course gives an overview of database technologies and of the most important database design principles that lay the foundations of the Big Data universe. The material is organized along three axes: data in the large, data in the small, data in the very small. A broad range of aspects is covered with a focus on how they fit all together in the big picture of the Big Data ecosystem.

- physical storage: distributed file systems (HDFS), object storage(S3), key-value stores

- logical storage: document stores (MongoDB), column stores (HBase), graph databases (neo4j), data warehouses (ROLAP)

- data formats and syntaxes (XML, JSON, RDF, Turtle, CSV, XBRL, YAML, protocol buffers, Avro)

- data shapes and models (tables, trees, graphs, cubes)

- type systems and schemas: atomic types, structured types (arrays, maps), set-based type systems (?, *, +)

- an overview of functional, declarative programming languages across data shapes (SQL, XQuery, JSONiq, Cypher, MDX)

- the most important query paradigms (selection, projection, joining, grouping, ordering, windowing)

- paradigms for parallel processing, two-stage (MapReduce) and DAG-based (Spark)

- resource management (YARN)

- what a data center is made of and why it matters (racks, nodes, ...)

- underlying architectures (internal machinery of HDFS, HBase, Spark, neo4j)

- optimization techniques (functional and declarative paradigms, query plans, rewrites, indexing)

- applications.

Large scale analytics and machine learning are outside of the scope of this course.
LiteraturePapers from scientific conferences and journals. References will be given as part of the course material during the semester.
Prerequisites / NoticeThis course, in the autumn semester, is only intended for:
- Computer Science students
- Data Science students
- CBB students with a Computer Science background

Mobility students in CS are also welcome and encouraged to attend. If you experience any issue while registering, please contact the study administration and you will be gladly added.

Another version of this course will be offered in Spring for students of other departments. However, if you would like to already start learning about databases now, a course worth taking as a preparation/good prequel to the Spring edition of Big Data is the "Information Systems for Engineers" course, offered this Fall for other departments as well, and introducing relational databases and SQL.
  •  Page  1  of  1