Biomedical datasets are increasing in size and complexity, and discoveries arising from their analysis have important implications in human health and biotechnological advances. While the potential of biomedical dataset analysis is considerable, preclinical researchers often lack the computational tools to analyze them. This course will provide the basis of data analysis of large biomedical data
Learning objective
This course aims to provide practical tools to analyze large biomedical datasets, and it is tailored towards experimental researchers in the life sciences with minimal prior programming experience, but with a strong interest in exploring big data to solve own research problems. Through theoretical classes, practical demonstrations, in class exercises and homework, the participants will master computational methods to independently manipulate large datasets, effectively visualize big data, and analyze it with appropriate statistical tools and machine learning approaches. For the final assessment, students will conduct an independent data analysis project based on a biomedical problem of their choosing and using publicly available population-based biomedical datasets.
Content
While learning the programming skills needed to manipulate and visualize the data, participants will learn the statistical and modeling approaches for big data analysis. The course will cover: •Basis of Python programming and UNIX; •High performance computing; •Manipulation and cleaning of large datasets with Pandas; •Visualization tools (Matplotlib, Seaborn); •Machine learning and numerical libraries (SciPy, NumPy, Statsmodels, Scikit-Learn). •Statistical analysis and modeling of big data, and applications to biomedical datasets (statistical learning, distributions, linear and logistic regressions, principal component analysis, clustering, classification, time series analysis, tree-based methods, predictive models).
Prerequisites / Notice
Basic understanding of mathematics and statistics, as taught in basic courses at the Bachelor`s level.
Performance assessment
Performance assessment information (valid until the course unit is held again)