Suchergebnis: Katalogdaten im Herbstsemester 2018

DAS in Data Science Information
Kernfächer
Einführungskurse
NummerTitelTypECTSUmfangDozierende
227-0427-00LSignal Analysis, Models, and Machine LearningW6 KP4GH.‑A. Loeliger
KurzbeschreibungMathematical methods in signal processing and machine learning.
I. Linear signal representation and approximation: Hilbert spaces, LMMSE estimation, regularization and sparsity.
II. Learning linear and nonlinear functions and filters: neural networks, kernel methods.
III. Structured statistical models: hidden Markov models, factor graphs, Kalman filter, Gaussian models with sparse events.
LernzielThe course is an introduction to some basic topics in signal processing and machine learning.
InhaltPart I - Linear Signal Representation and Approximation: Hilbert spaces, least squares and LMMSE estimation, projection and estimation by linear filtering, learning linear functions and filters, L2 regularization, L1 regularization and sparsity, singular-value decomposition and pseudo-inverse, principal-components analysis.
Part II - Learning Nonlinear Functions: fundamentals of learning, neural networks, kernel methods.
Part III - Structured Statistical Models and Message Passing Algorithms: hidden Markov models, factor graphs, Gaussian message passing, Kalman filter and recursive least squares, Monte Carlo methods, parameter estimation, expectation maximization, linear Gaussian models with sparse events.
SkriptLecture notes.
Voraussetzungen / BesonderesPrerequisites:
- local bachelors: course "Discrete-Time and Statistical Signal Processing" (5. Sem.)
- others: solid basics in linear algebra and probability theory
Capstone-Projekt
NummerTitelTypECTSUmfangDozierende
266-0100-00LCapstone Project Belegung eingeschränkt - Details anzeigen
Findet dieses Semester nicht statt.
Only for DAS in Data Science.
O8 KP17AProfessor/innen
Kurzbeschreibung
Lernziel
Vertiefungen
Hardware for Machine Learning
Wird im Frühjahrssemester angeboten.
Image Analysis & Computer Vision
NummerTitelTypECTSUmfangDozierende
263-5902-00LComputer Vision Information W6 KP3V + 1U + 1AM. Pollefeys, V. Ferrari, L. Van Gool
KurzbeschreibungThe goal of this course is to provide students with a good understanding of computer vision and image analysis techniques. The main concepts and techniques will be studied in depth and practical algorithms and approaches will be discussed and explored through the exercises.
LernzielThe objectives of this course are:
1. To introduce the fundamental problems of computer vision.
2. To introduce the main concepts and techniques used to solve those.
3. To enable participants to implement solutions for reasonably complex problems.
4. To enable participants to make sense of the computer vision literature.
InhaltCamera models and calibration, invariant features, Multiple-view geometry, Model fitting, Stereo Matching, Segmentation, 2D Shape matching, Shape from Silhouettes, Optical flow, Structure from motion, Tracking, Object recognition, Object category recognition
Voraussetzungen / BesonderesIt is recommended that students have taken the Visual Computing lecture or a similar course introducing basic image processing concepts before taking this course.
Neural Information Processing
NummerTitelTypECTSUmfangDozierende
227-1033-00LNeuromorphic Engineering I Information Belegung eingeschränkt - Details anzeigen
Registration in this class requires the permission of the instructors. Class size will be limited to available lab spots.
Preference is given to students that require this class as part of their major.
W6 KP2V + 3UT. Delbrück, G. Indiveri, S.‑C. Liu
KurzbeschreibungThis course covers analog circuits with emphasis on neuromorphic engineering: MOS transistors in CMOS technology, static circuits, dynamic circuits, systems (silicon neuron, silicon retina, silicon cochlea) with an introduction to multi-chip systems. The lectures are accompanied by weekly laboratory sessions.
LernzielUnderstanding of the characteristics of neuromorphic circuit elements.
InhaltNeuromorphic circuits are inspired by the organizing principles of biological neural circuits. Their computational primitives are based on physics of semiconductor devices. Neuromorphic architectures often rely on collective computation in parallel networks. Adaptation, learning and memory are implemented locally within the individual computational elements. Transistors are often operated in weak inversion (below threshold), where they exhibit exponential I-V characteristics and low currents. These properties lead to the feasibility of high-density, low-power implementations of functions that are computationally intensive in other paradigms. Application domains of neuromorphic circuits include silicon retinas and cochleas for machine vision and audition, real-time emulations of networks of biological neurons, and the development of autonomous robotic systems. This course covers devices in CMOS technology (MOS transistor below and above threshold, floating-gate MOS transistor, phototransducers), static circuits (differential pair, current mirror, transconductance amplifiers, etc.), dynamic circuits (linear and nonlinear filters, adaptive circuits), systems (silicon neuron, silicon retina and cochlea) and an introduction to multi-chip systems that communicate events analogous to spikes. The lectures are accompanied by weekly laboratory sessions on the characterization of neuromorphic circuits, from elementary devices to systems.
LiteraturS.-C. Liu et al.: Analog VLSI Circuits and Principles; various publications.
Voraussetzungen / BesonderesParticular: The course is highly recommended for those who intend to take the spring semester course 'Neuromorphic Engineering II', that teaches the conception, simulation, and physical layout of such circuits with chip design tools.

Prerequisites: Background in basics of semiconductor physics helpful, but not required.
Statistics
NummerTitelTypECTSUmfangDozierende
401-0625-01LApplied Analysis of Variance and Experimental Design Information W5 KP2V + 1UL. Meier
KurzbeschreibungPrinciples of experimental design, one-way analysis of variance, contrasts and multiple comparisons, multi-factor designs and analysis of variance, complete block designs, Latin square designs, random effects and mixed effects models, split-plot designs, incomplete block designs, two-series factorials and fractional designs, power.
LernzielParticipants will be able to plan and analyze efficient experiments in the fields of natural sciences. They will gain practical experience by using the software R.
InhaltPrinciples of experimental design, one-way analysis of variance, contrasts and multiple comparisons, multi-factor designs and analysis of variance, complete block designs, Latin square designs, random effects and mixed effects models, split-plot designs, incomplete block designs, two-series factorials and fractional designs, power.
LiteraturG. Oehlert: A First Course in Design and Analysis of Experiments, W.H. Freeman and Company, New York, 2000.
Voraussetzungen / BesonderesThe exercises, but also the classes will be based on procedures from the freely available, open-source statistical software R, for which an introduction will be held.
401-0649-00LApplied Statistical RegressionW5 KP2V + 1UM. Dettling
KurzbeschreibungThis course offers a practically oriented introduction into regression modeling methods. The basic concepts and some mathematical background are included, with the emphasis lying in learning "good practice" that can be applied in every student's own projects and daily work life. A special focus will be laid in the use of the statistical software package R for regression analysis.
LernzielThe students acquire advanced practical skills in linear regression analysis and are also familiar with its extensions to generalized linear modeling.
InhaltThe course starts with the basics of linear modeling, and then proceeds to parameter estimation, tests, confidence intervals, residual analysis, model choice, and prediction. More rarely touched but practically relevant topics that will be covered include variable transformations, multicollinearity problems and model interpretation, as well as general modeling strategies.

The last third of the course is dedicated to an introduction to generalized linear models: this includes the generalized additive model, logistic regression for binary response variables, binomial regression for grouped data and poisson regression for count data.
SkriptA script will be available.
LiteraturFaraway (2005): Linear Models with R
Faraway (2006): Extending the Linear Model with R
Draper & Smith (1998): Applied Regression Analysis
Fox (2008): Applied Regression Analysis and GLMs
Montgomery et al. (2006): Introduction to Linear Regression Analysis
Voraussetzungen / BesonderesThe exercises, but also the classes will be based on procedures from the freely available, open-source statistical software package R, for which an introduction will be held.

In the Mathematics Bachelor and Master programmes, the two course units 401-0649-00L "Applied Statistical Regression" and 401-3622-00L "Regression" are mutually exclusive. Registration for the examination of one of these two course units is only allowed if you have not registered for the examination of the other course unit.
401-3612-00LStochastic SimulationW5 KP3GF. Sigrist
KurzbeschreibungThis course provides an introduction to statistical Monte Carlo methods. This includes applications of simulations in various fields (Bayesian statistics, statistical mechanics, operations research, financial mathematics), algorithms for the generation of random variables (accept-reject, importance sampling), estimating the precision, variance reduction, introduction to Markov chain Monte Carlo.
LernzielStochastic simulation (also called Monte Carlo method) is the experimental analysis of a stochastic model by implementing it on a computer. Probabilities and expected values can be approximated by averaging simulated values, and the central limit theorem gives an estimate of the error of this approximation. The course shows examples of the many applications of stochastic simulation and explains different algorithms used for simulation. These algorithms are illustrated with the statistical software R.
InhaltExamples of simulations in different fields (computer science, statistics, statistical mechanics, operations research, financial mathematics). Generation of uniform random variables. Generation of random variables with arbitrary distributions (quantile transform, accept-reject, importance sampling), simulation of Gaussian processes and diffusions. The precision of simulations, methods for variance reduction. Introduction to Markov chains and Markov chain Monte Carlo (Metropolis-Hastings, Gibbs sampler, Hamiltonian Monte Carlo, reversible jump MCMC).
SkriptA script will be available in English.
LiteraturP. Glasserman, Monte Carlo Methods in Financial Engineering.
Springer 2004.

B. D. Ripley. Stochastic Simulation. Wiley, 1987.

Ch. Robert, G. Casella. Monte Carlo Statistical Methods.
Springer 2004 (2nd edition).
Voraussetzungen / BesonderesFamiliarity with basic concepts of probability theory (random variables, joint and conditional distributions, laws of large numbers and central limit theorem) will be assumed.
401-3621-00LFundamentals of Mathematical Statistics Information W10 KP4V + 1US. van de Geer
KurzbeschreibungThe course covers the basics of inferential statistics.
Lernziel
401-4623-00LTime Series AnalysisW6 KP3GN. Meinshausen
KurzbeschreibungStatistical analysis and modeling of observations in temporal order, which exhibit dependence. Stationarity, trend estimation, seasonal decomposition, autocorrelations,
spectral and wavelet analysis, ARIMA-, GARCH- and state space models. Implementations in the software R.
LernzielUnderstanding of the basic models and techniques used in time series analysis and their implementation in the statistical software R.
InhaltThis course deals with modeling and analysis of variables which change randomly in time. Their essential feature is the dependence between successive observations.
Applications occur in geophysics, engineering, economics and finance. Topics covered: Stationarity, trend estimation, seasonal decomposition, autocorrelations,
spectral and wavelet analysis, ARIMA-, GARCH- and state space models. The models and techniques are illustrated using the statistical software R.
SkriptNot available
LiteraturA list of references will be distributed during the course.
Voraussetzungen / BesonderesBasic knowledge in probability and statistics
401-3628-14LBayesian Statistics
Findet dieses Semester nicht statt.
W4 KP2V
KurzbeschreibungIntroduction to the Bayesian approach to statistics: Decision theory, prior distributions, hierarchical Bayes models, Bayesian tests and model selection, empirical Bayes, computational methods, Laplace approximation, Monte Carlo and Markov chain Monte Carlo methods.
LernzielStudents understand the conceptual ideas behind Bayesian statistics and are familiar with common techniques used in Bayesian data analysis.
InhaltTopics that we will discuss are:

Difference between the frequentist and Bayesian approach (decision theory, principles), priors (conjugate priors, Jeffreys priors), tests and model selection (Bayes factors, hyper-g priors in regression),hierarchical models and empirical Bayes methods, computational methods (Laplace approximation, Monte Carlo and Markov chain Monte Carlo methods)
SkriptA script will be available in English.
LiteraturChristian Robert, The Bayesian Choice, 2nd edition, Springer 2007.

A. Gelman et al., Bayesian Data Analysis, 3rd edition, Chapman & Hall (2013).

Additional references will be given in the course.
Voraussetzungen / BesonderesFamiliarity with basic concepts of frequentist statistics and with basic concepts of probability theory (random variables, joint and conditional distributions, laws of large numbers and central limit theorem) will be assumed.
Machine Learning and Artificial Intelligence
NummerTitelTypECTSUmfangDozierende
227-0689-00LSystem IdentificationW4 KP2V + 1UR. Smith
KurzbeschreibungTheory and techniques for the identification of dynamic models from experimentally obtained system input-output data.
LernzielTo provide a series of practical techniques for the development of dynamical models from experimental data, with the emphasis being on the development of models suitable for feedback control design purposes. To provide sufficient theory to enable the practitioner to understand the trade-offs between model accuracy, data quality and data quantity.
InhaltIntroduction to modeling: Black-box and grey-box models; Parametric and non-parametric models; ARX, ARMAX (etc.) models.

Predictive, open-loop, black-box identification methods. Time and frequency domain methods. Subspace identification methods.

Optimal experimental design, Cramer-Rao bounds, input signal design.

Parametric identification methods. On-line and batch approaches.

Closed-loop identification strategies. Trade-off between controller performance and information available for identification.
Literatur"System Identification; Theory for the User" Lennart Ljung, Prentice Hall (2nd Ed), 1999.

"Dynamic system identification: Experimental design and data analysis", GC Goodwin and RL Payne, Academic Press, 1977.
Voraussetzungen / BesonderesControl systems (227-0216-00L) or equivalent.
252-0535-00LAdvanced Machine Learning Information W8 KP3V + 2U + 2AJ. M. Buhmann
KurzbeschreibungMachine learning algorithms provide analytical methods to search data sets for characteristic patterns. Typical tasks include the classification of data, function fitting and clustering, with applications in image and speech analysis, bioinformatics and exploratory data analysis. This course is accompanied by practical machine learning projects.
LernzielStudents will be familiarized with advanced concepts and algorithms for supervised and unsupervised learning; reinforce the statistics knowledge which is indispensible to solve modeling problems under uncertainty. Key concepts are the generalization ability of algorithms and systematic approaches to modeling and regularization. Machine learning projects will provide an opportunity to test the machine learning algorithms on real world data.
InhaltThe theory of fundamental machine learning concepts is presented in the lecture, and illustrated with relevant applications. Students can deepen their understanding by solving both pen-and-paper and programming exercises, where they implement and apply famous algorithms to real-world data.

Topics covered in the lecture include:

Fundamentals:
What is data?
Bayesian Learning
Computational learning theory

Supervised learning:
Ensembles: Bagging and Boosting
Max Margin methods
Neural networks

Unsupservised learning:
Dimensionality reduction techniques
Clustering
Mixture Models
Non-parametric density estimation
Learning Dynamical Systems
SkriptNo lecture notes, but slides will be made available on the course webpage.
LiteraturC. Bishop. Pattern Recognition and Machine Learning. Springer 2007.

R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley &
Sons, second edition, 2001.

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical
Learning: Data Mining, Inference and Prediction. Springer, 2001.

L. Wasserman. All of Statistics: A Concise Course in Statistical
Inference. Springer, 2004.
Voraussetzungen / BesonderesThe course requires solid basic knowledge in analysis, statistics and numerical methods for CSE as well as practical programming experience for solving assignments.
Students should have followed at least "Introduction to Machine Learning" or an equivalent course offered by another institution.
263-2400-00LReliable and Interpretable Artificial Intelligence Information W4 KP2V + 1UM. Vechev
KurzbeschreibungCreating reliable and explainable probabilistic models is a fundamental challenge to solving the artificial intelligence problem. This course covers some of the latest and most exciting advances that bring us closer to constructing such models.
LernzielThe main objective of this course is to expose students to the latest and most exciting research in the area of explainable and interpretable artificial intelligence, a topic of fundamental and increasing importance. Upon completion of the course, the students should have mastered the underlying methods and be able to apply them to a variety of problems.

To facilitate deeper understanding, an important part of the course will be a group hands-on programming project where students will build a system based on the learned material.
InhaltThe course covers the following inter-connected directions.

Part I: Robust and Explainable Deep Learning
-------------------------------------------------------------

Deep learning technology has made impressive advances in recent years. Despite this progress however, the fundamental challenge with deep learning remains that of understanding what a trained neural network has actually learned, and how stable that solution is. Forr example: is the network stable to slight perturbations of the input (e.g., an image)? How easy it is to fool the network into mis-classifying obvious inputs? Can we guide the network in a manner beyond simple labeled data?

Topics:
- Attacks: Finding adversarial examples via state-of-the-art attacks (e.g., FGSM, PGD attacks).
- Defenses: Automated methods and tools which guarantee robustness of deep nets (e.g., using abstract domains, mixed-integer solvers)
- Combing differentiable logic with gradient-based methods so to train networks to satisfy richer properties.
- Frameworks: AI2, DiffAI, Reluplex, DQL, DeepPoly, etc.

Part II: Program Synthesis/Induction
------------------------------------------------

Synthesis is a new frontier in AI where the computer programs itself via user provided examples. Synthesis has significant applications for non-programmers as well as for programmers where it can provide massive productivity increase (e.g., wrangling for data scientists). Modern synthesis techniques excel at learning functions over discrete spaces from (partial) intent. There have been a number of recent, exciting breakthroughs in techniques that discover complex, interpretable/explainable functions from few examples, partial sketches and other forms of supervision.

Topics:
- Theory of program synthesis: version spaces, counter-example guided inductive synthesis (CEGIS) with SAT/SMT, lower bounds on learning.
- Applications of techniques: synthesis for end users (e.g., spreadsheets) and data analytics.
- Combining synthesis with learning: application to learning from code.
- Frameworks: PHOG, DeepCode.

Part III: Probabilistic Programming
----------------------------------------------

Probabilistic programming is an emerging direction, recently also pushed by various companies (e.g., Facebook, Uber, Google) whose goal is democratize the construction of probabilistic models. In probabilistic programming, the user specifies a model while inference is left to the underlying solver. The idea is that the higher level of abstraction makes it easier to express, understand and reason about probabilistic models.

Topics:

- Probabilistic Inference: sampling based, exact symbolic inference, semantics
- Applications of probabilistic programming: bias in deep learning, differential privacy (connects to Part I).
- Frameworks: PSI, Edward2, Venture.
Voraussetzungen / BesonderesThe course material is self-contained: needed background is covered in the lectures and exercises, and additional pointers.
263-3210-00LDeep Learning Information Belegung eingeschränkt - Details anzeigen
Maximale Teilnehmerzahl: 300
W4 KP2V + 1UF. Perez Cruz
KurzbeschreibungDeep learning is an area within machine learning that deals with algorithms and models that automatically induce multi-level data representations.
LernzielIn recent years, deep learning and deep networks have significantly improved the state-of-the-art in many application domains such as computer vision, speech recognition, and natural language processing. This class will cover the mathematical foundations of deep learning and provide insights into model design, training, and validation. The main objective is a profound understanding of why these methods work and how. There will also be a rich set of hands-on tasks and practical projects to familiarize students with this emerging technology.
Voraussetzungen / BesonderesThis is an advanced level course that requires some basic background in machine learning. More importantly, students are expected to have a very solid mathematical foundation, including linear algebra, multivariate calculus, and probability. The course will make heavy use of mathematics and is not (!) meant to be an extended tutorial of how to train deep networks with tools like Torch or Tensorflow, although that may be a side benefit.

The participation in the course is subject to the following conditions:
1) The number of participants is limited to 300 students (MSc and PhDs).
2) Students must have taken the exam in Machine Learning (252-0535-00) or have acquired equivalent knowledge, see exhaustive list below:

Machine Learning
https://ml2.inf.ethz.ch/courses/ml/

Computational Intelligence Lab
http://da.inf.ethz.ch/teaching/2018/CIL/

Learning and Intelligent Systems/Introduction to Machine Learning
https://las.inf.ethz.ch/teaching/introml-S18

Statistical Learning Theory
http://ml2.inf.ethz.ch/courses/slt/

Computational Statistics
https://stat.ethz.ch/lectures/ss18/comp-stats.php

Probabilistic Artificial Intelligence
https://las.inf.ethz.ch/teaching/pai-f17

Data Mining: Learning from Large Data Sets
https://las.inf.ethz.ch/teaching/dm-f17
263-5210-00LProbabilistic Artificial Intelligence Information W4 KP2V + 1UA. Krause
KurzbeschreibungThis course introduces core modeling techniques and algorithms from statistics, optimization, planning, and control and study applications in areas such as sensor networks, robotics, and the Internet.
LernzielHow can we build systems that perform well in uncertain environments and unforeseen situations? How can we develop systems that exhibit "intelligent" behavior, without prescribing explicit rules? How can we build systems that learn from experience in order to improve their performance? We will study core modeling techniques and algorithms from statistics, optimization, planning, and control and study applications in areas such as sensor networks, robotics, and the Internet. The course is designed for upper-level undergraduate and graduate students.
InhaltTopics covered:
- Search (BFS, DFS, A*), constraint satisfaction and optimization
- Tutorial in logic (propositional, first-order)
- Probability
- Bayesian Networks (models, exact and approximative inference, learning) - Temporal models (Hidden Markov Models, Dynamic Bayesian Networks)
- Probabilistic palnning (MDPs, POMPDPs)
- Reinforcement learning
- Combining logic and probability
Voraussetzungen / BesonderesSolid basic knowledge in statistics, algorithms and programming
Big Data Systems
NummerTitelTypECTSUmfangDozierende
252-0834-00LInformation Systems for Engineers Information W4 KP2V + 1UG. Fourny
KurzbeschreibungThis course provides the basics of relational databases from the perspective of the user.

We will discover why tables are so incredibly powerful to express relations, learn the SQL query language, and how to make the most of it. The course also covers support for data cubes (analytics).

After this course, you will be ready for Big Data for Engineers.
LernzielAfter visiting this course, you will be capable to:

1. Explain, in the big picture, how a relational database works and what it can do in your own words.

2. Explain the relational data model (tables, rows, attributes, primary keys, foreign keys), formally and informally, including the relational algebra operators (select, project, rename, all kinds of joins, division, cartesian product, union, intersection, etc).

3. Perform non-trivial reading SQL queries on existing relational databases, as well as insert new data, update and delete existing data.

4. Design new schemas to store data in accordance to the real world's constraints, such as relationship cardinality

5. Explain what bad design is and why it matters.

6. Adapt and improve an existing schema to make it more robust against anomalies, thanks to a very good theoretical knowledge of what is called "normal forms".

7. Understand how indices work (hash indices, B-trees), how they are implemented, and how to use them to make queries faster.

8. Access an existing relational database from a host language such as Java, using bridges such as JDBC.

9. Explain what data independence is all about and didn't age a bit since the 1970s.

10. Explain, in the big picture, how a relational database is physically implemented.

11. Know and deal with the natural syntax for relational data, CSV.

12. Explain the data cube model including slicing and dicing.

13. Store data cubes in a relational database.

14. Map cube queries to SQL.

15. Slice and dice cubes in a UI.

And of course, you will think that tables are the most wonderful object in the world.
InhaltUsing a relational database
=================
1. Introduction
2. The relational model
3. Data definition with SQL
4. The relational algebra
5. Queries with SQL

Taking a relational database to the next level
=================
6. Database design theory
7. Databases and host languages
8. Databases and host languages
9. Indices and optimization
10. Database architecture and storage

Analytics on top of a relational database
=================
12. Data cubes

Outlook
=================
13. Outlook
Literatur- Lecture material (slides).

- Book: "Database Systems: The Complete Book", H. Garcia-Molina, J.D. Ullman, J. Widom
(It is not required to buy the book, as the library has it)
Voraussetzungen / BesonderesFor non-CS/DS students only, BSc and MSc
Elementary knowledge of set theory and logics
Knowledge as well as basic experience with a programming language such as Pascal, C, C++, Java, Haskell, Python
263-2800-00LDesign of Parallel and High-Performance Computing Information W7 KP3V + 2U + 1AT. Hoefler, M. Püschel
KurzbeschreibungAdvanced topics in parallel / concurrent programming.
LernzielUnderstand concurrency paradigms and models from a higher perspective and acquire skills for designing, structuring and developing possibly large concurrent software systems. Become able to distinguish parallelism in problem space and in machine space. Become familiar with important technical concepts and with concurrency folklore.
263-3010-00LBig Data Information Belegung eingeschränkt - Details anzeigen W8 KP3V + 2U + 2AG. Fourny
KurzbeschreibungThe key challenge of the information society is to turn data into information, information into knowledge, knowledge into value. This has become increasingly complex. Data comes in larger volumes, diverse shapes, from different sources. Data is more heterogeneous and less structured than forty years ago. Nevertheless, it still needs to be processed fast, with support for complex operations.
LernzielThis combination of requirements, together with the technologies that have emerged in order to address them, is typically referred to as "Big Data." This revolution has led to a completely new way to do business, e.g., develop new products and business models, but also to do science -- which is sometimes referred to as data-driven science or the "fourth paradigm".

Unfortunately, the quantity of data produced and available -- now in the Zettabyte range (that's 21 zeros) per year -- keeps growing faster than our ability to process it. Hence, new architectures and approaches for processing it were and are still needed. Harnessing them must involve a deep understanding of data not only in the large, but also in the small.

The field of databases evolves at a fast pace. In order to be prepared, to the extent possible, to the (r)evolutions that will take place in the next few decades, the emphasis of the lecture will be on the paradigms and core design ideas, while today's technologies will serve as supporting illustrations thereof.

After visiting this lecture, you should have gained an overview and understanding of the Big Data landscape, which is the basis on which one can make informed decisions, i.e., pick and orchestrate the relevant technologies together for addressing each business use case efficiently and consistently.
InhaltThis course gives an overview of database technologies and of the most important database design principles that lay the foundations of the Big Data universe. The material is organized along three axes: data in the large, data in the small, data in the very small. A broad range of aspects is covered with a focus on how they fit all together in the big picture of the Big Data ecosystem.

- physical storage: distributed file systems (HDFS), object storage(S3), key-value stores

- logical storage: document stores (MongoDB), column stores (HBase), graph databases (neo4j), data warehouses (ROLAP)

- data formats and syntaxes (XML, JSON, RDF, Turtle, CSV, XBRL, YAML, protocol buffers, Avro)

- data shapes and models (tables, trees, graphs, cubes)

- type systems and schemas: atomic types, structured types (arrays, maps), set-based type systems (?, *, +)

- an overview of functional, declarative programming languages across data shapes (SQL, XQuery, JSONiq, Cypher, MDX)

- the most important query paradigms (selection, projection, joining, grouping, ordering, windowing)

- paradigms for parallel processing, two-stage (MapReduce) and DAG-based (Spark)

- resource management (YARN)

- what a data center is made of and why it matters (racks, nodes, ...)

- underlying architectures (internal machinery of HDFS, HBase, Spark, neo4j)

- optimization techniques (functional and declarative paradigms, query plans, rewrites, indexing)

- applications.

Large scale analytics and machine learning are outside of the scope of this course.
LiteraturPapers from scientific conferences and journals. References will be given as part of the course material during the semester.
Voraussetzungen / BesonderesThis course, in the autumn semester, is only intended for:
- Computer Science students
- Data Science students
- CBB students with a Computer Science background

Mobility students in CS are also welcome and encouraged to attend. If you experience any issue while registering, please contact the study administration and you will be gladly added.

Another version of this course will be offered in Spring for students of other departments. However, if you would like to already start learning about databases now, a course worth taking as a preparation/good prequel to the Spring edition of Big Data is the "Information Systems for Engineers" course, offered this Fall for other departments as well, and introducing relational databases and SQL.
  •  Seite  1  von  1