Search result: Catalogue data in Autumn Semester 2018
DAS in Data Science | ||||||
Specialisation Track | ||||||
Hardware for Machine Learning Wird im Frühjahrssemester angeboten. | ||||||
Image Analysis & Computer Vision | ||||||
Number | Title | Type | ECTS | Hours | Lecturers | |
---|---|---|---|---|---|---|
263-5902-00L | Computer Vision | W | 6 credits | 3V + 1U + 1A | M. Pollefeys, V. Ferrari, L. Van Gool | |
Abstract | The goal of this course is to provide students with a good understanding of computer vision and image analysis techniques. The main concepts and techniques will be studied in depth and practical algorithms and approaches will be discussed and explored through the exercises. | |||||
Learning objective | The objectives of this course are: 1. To introduce the fundamental problems of computer vision. 2. To introduce the main concepts and techniques used to solve those. 3. To enable participants to implement solutions for reasonably complex problems. 4. To enable participants to make sense of the computer vision literature. | |||||
Content | Camera models and calibration, invariant features, Multiple-view geometry, Model fitting, Stereo Matching, Segmentation, 2D Shape matching, Shape from Silhouettes, Optical flow, Structure from motion, Tracking, Object recognition, Object category recognition | |||||
Prerequisites / Notice | It is recommended that students have taken the Visual Computing lecture or a similar course introducing basic image processing concepts before taking this course. | |||||
Neural Information Processing | ||||||
Number | Title | Type | ECTS | Hours | Lecturers | |
227-1033-00L | Neuromorphic Engineering I Registration in this class requires the permission of the instructors. Class size will be limited to available lab spots. Preference is given to students that require this class as part of their major. | W | 6 credits | 2V + 3U | T. Delbrück, G. Indiveri, S.‑C. Liu | |
Abstract | This course covers analog circuits with emphasis on neuromorphic engineering: MOS transistors in CMOS technology, static circuits, dynamic circuits, systems (silicon neuron, silicon retina, silicon cochlea) with an introduction to multi-chip systems. The lectures are accompanied by weekly laboratory sessions. | |||||
Learning objective | Understanding of the characteristics of neuromorphic circuit elements. | |||||
Content | Neuromorphic circuits are inspired by the organizing principles of biological neural circuits. Their computational primitives are based on physics of semiconductor devices. Neuromorphic architectures often rely on collective computation in parallel networks. Adaptation, learning and memory are implemented locally within the individual computational elements. Transistors are often operated in weak inversion (below threshold), where they exhibit exponential I-V characteristics and low currents. These properties lead to the feasibility of high-density, low-power implementations of functions that are computationally intensive in other paradigms. Application domains of neuromorphic circuits include silicon retinas and cochleas for machine vision and audition, real-time emulations of networks of biological neurons, and the development of autonomous robotic systems. This course covers devices in CMOS technology (MOS transistor below and above threshold, floating-gate MOS transistor, phototransducers), static circuits (differential pair, current mirror, transconductance amplifiers, etc.), dynamic circuits (linear and nonlinear filters, adaptive circuits), systems (silicon neuron, silicon retina and cochlea) and an introduction to multi-chip systems that communicate events analogous to spikes. The lectures are accompanied by weekly laboratory sessions on the characterization of neuromorphic circuits, from elementary devices to systems. | |||||
Literature | S.-C. Liu et al.: Analog VLSI Circuits and Principles; various publications. | |||||
Prerequisites / Notice | Particular: The course is highly recommended for those who intend to take the spring semester course 'Neuromorphic Engineering II', that teaches the conception, simulation, and physical layout of such circuits with chip design tools. Prerequisites: Background in basics of semiconductor physics helpful, but not required. | |||||
Statistics | ||||||
Number | Title | Type | ECTS | Hours | Lecturers | |
401-0625-01L | Applied Analysis of Variance and Experimental Design | W | 5 credits | 2V + 1U | L. Meier | |
Abstract | Principles of experimental design, one-way analysis of variance, contrasts and multiple comparisons, multi-factor designs and analysis of variance, complete block designs, Latin square designs, random effects and mixed effects models, split-plot designs, incomplete block designs, two-series factorials and fractional designs, power. | |||||
Learning objective | Participants will be able to plan and analyze efficient experiments in the fields of natural sciences. They will gain practical experience by using the software R. | |||||
Content | Principles of experimental design, one-way analysis of variance, contrasts and multiple comparisons, multi-factor designs and analysis of variance, complete block designs, Latin square designs, random effects and mixed effects models, split-plot designs, incomplete block designs, two-series factorials and fractional designs, power. | |||||
Literature | G. Oehlert: A First Course in Design and Analysis of Experiments, W.H. Freeman and Company, New York, 2000. | |||||
Prerequisites / Notice | The exercises, but also the classes will be based on procedures from the freely available, open-source statistical software R, for which an introduction will be held. | |||||
401-0649-00L | Applied Statistical Regression | W | 5 credits | 2V + 1U | M. Dettling | |
Abstract | This course offers a practically oriented introduction into regression modeling methods. The basic concepts and some mathematical background are included, with the emphasis lying in learning "good practice" that can be applied in every student's own projects and daily work life. A special focus will be laid in the use of the statistical software package R for regression analysis. | |||||
Learning objective | The students acquire advanced practical skills in linear regression analysis and are also familiar with its extensions to generalized linear modeling. | |||||
Content | The course starts with the basics of linear modeling, and then proceeds to parameter estimation, tests, confidence intervals, residual analysis, model choice, and prediction. More rarely touched but practically relevant topics that will be covered include variable transformations, multicollinearity problems and model interpretation, as well as general modeling strategies. The last third of the course is dedicated to an introduction to generalized linear models: this includes the generalized additive model, logistic regression for binary response variables, binomial regression for grouped data and poisson regression for count data. | |||||
Lecture notes | A script will be available. | |||||
Literature | Faraway (2005): Linear Models with R Faraway (2006): Extending the Linear Model with R Draper & Smith (1998): Applied Regression Analysis Fox (2008): Applied Regression Analysis and GLMs Montgomery et al. (2006): Introduction to Linear Regression Analysis | |||||
Prerequisites / Notice | The exercises, but also the classes will be based on procedures from the freely available, open-source statistical software package R, for which an introduction will be held. In the Mathematics Bachelor and Master programmes, the two course units 401-0649-00L "Applied Statistical Regression" and 401-3622-00L "Regression" are mutually exclusive. Registration for the examination of one of these two course units is only allowed if you have not registered for the examination of the other course unit. | |||||
401-3612-00L | Stochastic Simulation | W | 5 credits | 3G | F. Sigrist | |
Abstract | This course provides an introduction to statistical Monte Carlo methods. This includes applications of simulations in various fields (Bayesian statistics, statistical mechanics, operations research, financial mathematics), algorithms for the generation of random variables (accept-reject, importance sampling), estimating the precision, variance reduction, introduction to Markov chain Monte Carlo. | |||||
Learning objective | Stochastic simulation (also called Monte Carlo method) is the experimental analysis of a stochastic model by implementing it on a computer. Probabilities and expected values can be approximated by averaging simulated values, and the central limit theorem gives an estimate of the error of this approximation. The course shows examples of the many applications of stochastic simulation and explains different algorithms used for simulation. These algorithms are illustrated with the statistical software R. | |||||
Content | Examples of simulations in different fields (computer science, statistics, statistical mechanics, operations research, financial mathematics). Generation of uniform random variables. Generation of random variables with arbitrary distributions (quantile transform, accept-reject, importance sampling), simulation of Gaussian processes and diffusions. The precision of simulations, methods for variance reduction. Introduction to Markov chains and Markov chain Monte Carlo (Metropolis-Hastings, Gibbs sampler, Hamiltonian Monte Carlo, reversible jump MCMC). | |||||
Lecture notes | A script will be available in English. | |||||
Literature | P. Glasserman, Monte Carlo Methods in Financial Engineering. Springer 2004. B. D. Ripley. Stochastic Simulation. Wiley, 1987. Ch. Robert, G. Casella. Monte Carlo Statistical Methods. Springer 2004 (2nd edition). | |||||
Prerequisites / Notice | Familiarity with basic concepts of probability theory (random variables, joint and conditional distributions, laws of large numbers and central limit theorem) will be assumed. | |||||
401-3621-00L | Fundamentals of Mathematical Statistics | W | 10 credits | 4V + 1U | S. van de Geer | |
Abstract | The course covers the basics of inferential statistics. | |||||
Learning objective | ||||||
401-4623-00L | Time Series Analysis | W | 6 credits | 3G | N. Meinshausen | |
Abstract | Statistical analysis and modeling of observations in temporal order, which exhibit dependence. Stationarity, trend estimation, seasonal decomposition, autocorrelations, spectral and wavelet analysis, ARIMA-, GARCH- and state space models. Implementations in the software R. | |||||
Learning objective | Understanding of the basic models and techniques used in time series analysis and their implementation in the statistical software R. | |||||
Content | This course deals with modeling and analysis of variables which change randomly in time. Their essential feature is the dependence between successive observations. Applications occur in geophysics, engineering, economics and finance. Topics covered: Stationarity, trend estimation, seasonal decomposition, autocorrelations, spectral and wavelet analysis, ARIMA-, GARCH- and state space models. The models and techniques are illustrated using the statistical software R. | |||||
Lecture notes | Not available | |||||
Literature | A list of references will be distributed during the course. | |||||
Prerequisites / Notice | Basic knowledge in probability and statistics | |||||
401-3628-14L | Bayesian Statistics Does not take place this semester. | W | 4 credits | 2V | ||
Abstract | Introduction to the Bayesian approach to statistics: Decision theory, prior distributions, hierarchical Bayes models, Bayesian tests and model selection, empirical Bayes, computational methods, Laplace approximation, Monte Carlo and Markov chain Monte Carlo methods. | |||||
Learning objective | Students understand the conceptual ideas behind Bayesian statistics and are familiar with common techniques used in Bayesian data analysis. | |||||
Content | Topics that we will discuss are: Difference between the frequentist and Bayesian approach (decision theory, principles), priors (conjugate priors, Jeffreys priors), tests and model selection (Bayes factors, hyper-g priors in regression),hierarchical models and empirical Bayes methods, computational methods (Laplace approximation, Monte Carlo and Markov chain Monte Carlo methods) | |||||
Lecture notes | A script will be available in English. | |||||
Literature | Christian Robert, The Bayesian Choice, 2nd edition, Springer 2007. A. Gelman et al., Bayesian Data Analysis, 3rd edition, Chapman & Hall (2013). Additional references will be given in the course. | |||||
Prerequisites / Notice | Familiarity with basic concepts of frequentist statistics and with basic concepts of probability theory (random variables, joint and conditional distributions, laws of large numbers and central limit theorem) will be assumed. | |||||
Machine Learning and Artificial Intelligence | ||||||
Number | Title | Type | ECTS | Hours | Lecturers | |
227-0689-00L | System Identification | W | 4 credits | 2V + 1U | R. Smith | |
Abstract | Theory and techniques for the identification of dynamic models from experimentally obtained system input-output data. | |||||
Learning objective | To provide a series of practical techniques for the development of dynamical models from experimental data, with the emphasis being on the development of models suitable for feedback control design purposes. To provide sufficient theory to enable the practitioner to understand the trade-offs between model accuracy, data quality and data quantity. | |||||
Content | Introduction to modeling: Black-box and grey-box models; Parametric and non-parametric models; ARX, ARMAX (etc.) models. Predictive, open-loop, black-box identification methods. Time and frequency domain methods. Subspace identification methods. Optimal experimental design, Cramer-Rao bounds, input signal design. Parametric identification methods. On-line and batch approaches. Closed-loop identification strategies. Trade-off between controller performance and information available for identification. | |||||
Literature | "System Identification; Theory for the User" Lennart Ljung, Prentice Hall (2nd Ed), 1999. "Dynamic system identification: Experimental design and data analysis", GC Goodwin and RL Payne, Academic Press, 1977. | |||||
Prerequisites / Notice | Control systems (227-0216-00L) or equivalent. | |||||
252-0535-00L | Advanced Machine Learning | W | 8 credits | 3V + 2U + 2A | J. M. Buhmann | |
Abstract | Machine learning algorithms provide analytical methods to search data sets for characteristic patterns. Typical tasks include the classification of data, function fitting and clustering, with applications in image and speech analysis, bioinformatics and exploratory data analysis. This course is accompanied by practical machine learning projects. | |||||
Learning objective | Students will be familiarized with advanced concepts and algorithms for supervised and unsupervised learning; reinforce the statistics knowledge which is indispensible to solve modeling problems under uncertainty. Key concepts are the generalization ability of algorithms and systematic approaches to modeling and regularization. Machine learning projects will provide an opportunity to test the machine learning algorithms on real world data. | |||||
Content | The theory of fundamental machine learning concepts is presented in the lecture, and illustrated with relevant applications. Students can deepen their understanding by solving both pen-and-paper and programming exercises, where they implement and apply famous algorithms to real-world data. Topics covered in the lecture include: Fundamentals: What is data? Bayesian Learning Computational learning theory Supervised learning: Ensembles: Bagging and Boosting Max Margin methods Neural networks Unsupservised learning: Dimensionality reduction techniques Clustering Mixture Models Non-parametric density estimation Learning Dynamical Systems | |||||
Lecture notes | No lecture notes, but slides will be made available on the course webpage. | |||||
Literature | C. Bishop. Pattern Recognition and Machine Learning. Springer 2007. R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley & Sons, second edition, 2001. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2001. L. Wasserman. All of Statistics: A Concise Course in Statistical Inference. Springer, 2004. | |||||
Prerequisites / Notice | The course requires solid basic knowledge in analysis, statistics and numerical methods for CSE as well as practical programming experience for solving assignments. Students should have followed at least "Introduction to Machine Learning" or an equivalent course offered by another institution. | |||||
263-2400-00L | Reliable and Interpretable Artificial Intelligence | W | 4 credits | 2V + 1U | M. Vechev | |
Abstract | Creating reliable and explainable probabilistic models is a fundamental challenge to solving the artificial intelligence problem. This course covers some of the latest and most exciting advances that bring us closer to constructing such models. | |||||
Learning objective | The main objective of this course is to expose students to the latest and most exciting research in the area of explainable and interpretable artificial intelligence, a topic of fundamental and increasing importance. Upon completion of the course, the students should have mastered the underlying methods and be able to apply them to a variety of problems. To facilitate deeper understanding, an important part of the course will be a group hands-on programming project where students will build a system based on the learned material. | |||||
Content | The course covers the following inter-connected directions. Part I: Robust and Explainable Deep Learning ------------------------------------------------------------- Deep learning technology has made impressive advances in recent years. Despite this progress however, the fundamental challenge with deep learning remains that of understanding what a trained neural network has actually learned, and how stable that solution is. Forr example: is the network stable to slight perturbations of the input (e.g., an image)? How easy it is to fool the network into mis-classifying obvious inputs? Can we guide the network in a manner beyond simple labeled data? Topics: - Attacks: Finding adversarial examples via state-of-the-art attacks (e.g., FGSM, PGD attacks). - Defenses: Automated methods and tools which guarantee robustness of deep nets (e.g., using abstract domains, mixed-integer solvers) - Combing differentiable logic with gradient-based methods so to train networks to satisfy richer properties. - Frameworks: AI2, DiffAI, Reluplex, DQL, DeepPoly, etc. Part II: Program Synthesis/Induction ------------------------------------------------ Synthesis is a new frontier in AI where the computer programs itself via user provided examples. Synthesis has significant applications for non-programmers as well as for programmers where it can provide massive productivity increase (e.g., wrangling for data scientists). Modern synthesis techniques excel at learning functions over discrete spaces from (partial) intent. There have been a number of recent, exciting breakthroughs in techniques that discover complex, interpretable/explainable functions from few examples, partial sketches and other forms of supervision. Topics: - Theory of program synthesis: version spaces, counter-example guided inductive synthesis (CEGIS) with SAT/SMT, lower bounds on learning. - Applications of techniques: synthesis for end users (e.g., spreadsheets) and data analytics. - Combining synthesis with learning: application to learning from code. - Frameworks: PHOG, DeepCode. Part III: Probabilistic Programming ---------------------------------------------- Probabilistic programming is an emerging direction, recently also pushed by various companies (e.g., Facebook, Uber, Google) whose goal is democratize the construction of probabilistic models. In probabilistic programming, the user specifies a model while inference is left to the underlying solver. The idea is that the higher level of abstraction makes it easier to express, understand and reason about probabilistic models. Topics: - Probabilistic Inference: sampling based, exact symbolic inference, semantics - Applications of probabilistic programming: bias in deep learning, differential privacy (connects to Part I). - Frameworks: PSI, Edward2, Venture. | |||||
Prerequisites / Notice | The course material is self-contained: needed background is covered in the lectures and exercises, and additional pointers. | |||||
263-3210-00L | Deep Learning Number of participants limited to 300. | W | 4 credits | 2V + 1U | F. Perez Cruz | |
Abstract | Deep learning is an area within machine learning that deals with algorithms and models that automatically induce multi-level data representations. | |||||
Learning objective | In recent years, deep learning and deep networks have significantly improved the state-of-the-art in many application domains such as computer vision, speech recognition, and natural language processing. This class will cover the mathematical foundations of deep learning and provide insights into model design, training, and validation. The main objective is a profound understanding of why these methods work and how. There will also be a rich set of hands-on tasks and practical projects to familiarize students with this emerging technology. | |||||
Prerequisites / Notice | This is an advanced level course that requires some basic background in machine learning. More importantly, students are expected to have a very solid mathematical foundation, including linear algebra, multivariate calculus, and probability. The course will make heavy use of mathematics and is not (!) meant to be an extended tutorial of how to train deep networks with tools like Torch or Tensorflow, although that may be a side benefit. The participation in the course is subject to the following conditions: 1) The number of participants is limited to 300 students (MSc and PhDs). 2) Students must have taken the exam in Machine Learning (252-0535-00) or have acquired equivalent knowledge, see exhaustive list below: Machine Learning https://ml2.inf.ethz.ch/courses/ml/ Computational Intelligence Lab http://da.inf.ethz.ch/teaching/2018/CIL/ Learning and Intelligent Systems/Introduction to Machine Learning https://las.inf.ethz.ch/teaching/introml-S18 Statistical Learning Theory http://ml2.inf.ethz.ch/courses/slt/ Computational Statistics https://stat.ethz.ch/lectures/ss18/comp-stats.php Probabilistic Artificial Intelligence https://las.inf.ethz.ch/teaching/pai-f17 Data Mining: Learning from Large Data Sets https://las.inf.ethz.ch/teaching/dm-f17 | |||||
263-5210-00L | Probabilistic Artificial Intelligence | W | 4 credits | 2V + 1U | A. Krause | |
Abstract | This course introduces core modeling techniques and algorithms from statistics, optimization, planning, and control and study applications in areas such as sensor networks, robotics, and the Internet. | |||||
Learning objective | How can we build systems that perform well in uncertain environments and unforeseen situations? How can we develop systems that exhibit "intelligent" behavior, without prescribing explicit rules? How can we build systems that learn from experience in order to improve their performance? We will study core modeling techniques and algorithms from statistics, optimization, planning, and control and study applications in areas such as sensor networks, robotics, and the Internet. The course is designed for upper-level undergraduate and graduate students. | |||||
Content | Topics covered: - Search (BFS, DFS, A*), constraint satisfaction and optimization - Tutorial in logic (propositional, first-order) - Probability - Bayesian Networks (models, exact and approximative inference, learning) - Temporal models (Hidden Markov Models, Dynamic Bayesian Networks) - Probabilistic palnning (MDPs, POMPDPs) - Reinforcement learning - Combining logic and probability | |||||
Prerequisites / Notice | Solid basic knowledge in statistics, algorithms and programming | |||||
Big Data Systems | ||||||
Number | Title | Type | ECTS | Hours | Lecturers | |
252-0834-00L | Information Systems for Engineers | W | 4 credits | 2V + 1U | G. Fourny | |
Abstract | This course provides the basics of relational databases from the perspective of the user. We will discover why tables are so incredibly powerful to express relations, learn the SQL query language, and how to make the most of it. The course also covers support for data cubes (analytics). After this course, you will be ready for Big Data for Engineers. | |||||
Learning objective | After visiting this course, you will be capable to: 1. Explain, in the big picture, how a relational database works and what it can do in your own words. 2. Explain the relational data model (tables, rows, attributes, primary keys, foreign keys), formally and informally, including the relational algebra operators (select, project, rename, all kinds of joins, division, cartesian product, union, intersection, etc). 3. Perform non-trivial reading SQL queries on existing relational databases, as well as insert new data, update and delete existing data. 4. Design new schemas to store data in accordance to the real world's constraints, such as relationship cardinality 5. Explain what bad design is and why it matters. 6. Adapt and improve an existing schema to make it more robust against anomalies, thanks to a very good theoretical knowledge of what is called "normal forms". 7. Understand how indices work (hash indices, B-trees), how they are implemented, and how to use them to make queries faster. 8. Access an existing relational database from a host language such as Java, using bridges such as JDBC. 9. Explain what data independence is all about and didn't age a bit since the 1970s. 10. Explain, in the big picture, how a relational database is physically implemented. 11. Know and deal with the natural syntax for relational data, CSV. 12. Explain the data cube model including slicing and dicing. 13. Store data cubes in a relational database. 14. Map cube queries to SQL. 15. Slice and dice cubes in a UI. And of course, you will think that tables are the most wonderful object in the world. | |||||
Content | Using a relational database ================= 1. Introduction 2. The relational model 3. Data definition with SQL 4. The relational algebra 5. Queries with SQL Taking a relational database to the next level ================= 6. Database design theory 7. Databases and host languages 8. Databases and host languages 9. Indices and optimization 10. Database architecture and storage Analytics on top of a relational database ================= 12. Data cubes Outlook ================= 13. Outlook | |||||
Literature | - Lecture material (slides). - Book: "Database Systems: The Complete Book", H. Garcia-Molina, J.D. Ullman, J. Widom (It is not required to buy the book, as the library has it) | |||||
Prerequisites / Notice | For non-CS/DS students only, BSc and MSc Elementary knowledge of set theory and logics Knowledge as well as basic experience with a programming language such as Pascal, C, C++, Java, Haskell, Python | |||||
263-2800-00L | Design of Parallel and High-Performance Computing | W | 7 credits | 3V + 2U + 1A | T. Hoefler, M. Püschel | |
Abstract | Advanced topics in parallel / concurrent programming. | |||||
Learning objective | Understand concurrency paradigms and models from a higher perspective and acquire skills for designing, structuring and developing possibly large concurrent software systems. Become able to distinguish parallelism in problem space and in machine space. Become familiar with important technical concepts and with concurrency folklore. | |||||
263-3010-00L | Big Data | W | 8 credits | 3V + 2U + 2A | G. Fourny | |
Abstract | The key challenge of the information society is to turn data into information, information into knowledge, knowledge into value. This has become increasingly complex. Data comes in larger volumes, diverse shapes, from different sources. Data is more heterogeneous and less structured than forty years ago. Nevertheless, it still needs to be processed fast, with support for complex operations. | |||||
Learning objective | This combination of requirements, together with the technologies that have emerged in order to address them, is typically referred to as "Big Data." This revolution has led to a completely new way to do business, e.g., develop new products and business models, but also to do science -- which is sometimes referred to as data-driven science or the "fourth paradigm". Unfortunately, the quantity of data produced and available -- now in the Zettabyte range (that's 21 zeros) per year -- keeps growing faster than our ability to process it. Hence, new architectures and approaches for processing it were and are still needed. Harnessing them must involve a deep understanding of data not only in the large, but also in the small. The field of databases evolves at a fast pace. In order to be prepared, to the extent possible, to the (r)evolutions that will take place in the next few decades, the emphasis of the lecture will be on the paradigms and core design ideas, while today's technologies will serve as supporting illustrations thereof. After visiting this lecture, you should have gained an overview and understanding of the Big Data landscape, which is the basis on which one can make informed decisions, i.e., pick and orchestrate the relevant technologies together for addressing each business use case efficiently and consistently. | |||||
Content | This course gives an overview of database technologies and of the most important database design principles that lay the foundations of the Big Data universe. The material is organized along three axes: data in the large, data in the small, data in the very small. A broad range of aspects is covered with a focus on how they fit all together in the big picture of the Big Data ecosystem. - physical storage: distributed file systems (HDFS), object storage(S3), key-value stores - logical storage: document stores (MongoDB), column stores (HBase), graph databases (neo4j), data warehouses (ROLAP) - data formats and syntaxes (XML, JSON, RDF, Turtle, CSV, XBRL, YAML, protocol buffers, Avro) - data shapes and models (tables, trees, graphs, cubes) - type systems and schemas: atomic types, structured types (arrays, maps), set-based type systems (?, *, +) - an overview of functional, declarative programming languages across data shapes (SQL, XQuery, JSONiq, Cypher, MDX) - the most important query paradigms (selection, projection, joining, grouping, ordering, windowing) - paradigms for parallel processing, two-stage (MapReduce) and DAG-based (Spark) - resource management (YARN) - what a data center is made of and why it matters (racks, nodes, ...) - underlying architectures (internal machinery of HDFS, HBase, Spark, neo4j) - optimization techniques (functional and declarative paradigms, query plans, rewrites, indexing) - applications. Large scale analytics and machine learning are outside of the scope of this course. | |||||
Literature | Papers from scientific conferences and journals. References will be given as part of the course material during the semester. | |||||
Prerequisites / Notice | This course, in the autumn semester, is only intended for: - Computer Science students - Data Science students - CBB students with a Computer Science background Mobility students in CS are also welcome and encouraged to attend. If you experience any issue while registering, please contact the study administration and you will be gladly added. Another version of this course will be offered in Spring for students of other departments. However, if you would like to already start learning about databases now, a course worth taking as a preparation/good prequel to the Spring edition of Big Data is the "Information Systems for Engineers" course, offered this Fall for other departments as well, and introducing relational databases and SQL. |
- Page 1 of 1