364-1140-00L  Hacking for Social Sciences - An Applied Guide to Programming with Data

SemesterAutumn Semester 2021
LecturersM. Bannert
Periodicityyearly recurring course
Language of instructionEnglish
CommentBasic experience with either R or Python, e.g., a stats course that was taught using R.


AbstractThe vast majority of data has been created within the last decade. As a result, more and more fields of research start to consider and embrace programming to process and analyse data. This course teaches applied programming with data and aims to leverage the open source tech stack to deal with this new wealth and complexity of data.
ObjectiveThe idea behind Hacking for Social Sciences is build a solid understanding of core technologies and concepts to help researchers develop a data processing strategy and increase your possibilities when working with data. The course approach is to single out those concepts stemming from software development that are easy to adopt and useful to social scientists. The course has three major learning objectives:

- Understand the role of focal components in a data science tech toolbox.
Learn how technologies like R, Python, Git Version Control, docker or Cloud Computing could play together in your research project.
- Learn how to manage and version control source code.
Hacking for Social Sciences teaches how to use git version control to collaborate professionally, make your research reproducible and your code base persistent.
- Applied data sourcing and data transformation
Learn how to communicate with SQL databases. Learn how to consume data from different sources using machine to machine communication interfaces (APIs) such as the OpenStreetMap geocoding API / Routing Engine or the KOF data API for macroeconomic time series.

Non-Goals:
Hacking for Social Sciences is not a Statistics, Econometrics or Machine Learning course. Though experience in these fields will help inasmuch that students will have an easier time to motivate investing in programming and to come up with their own application examples, profound methodological knowledge is not a prerequisite.
ContentHacking for Social Scientists is a guide to programming with data. It is tailored to the needs of a field in which scholars’ typical curricula do not contain a strong programming component. Yet this course argues that what the open source community calls a ‘software carpentry’ level is totally within reach for a quantitative social scientist and well worth the investment: being able to code leverages field specific expertise and fosters interdisciplinary collaboration, as source code continues to become an important communication channel.

The course contains three blocks that are mostly based on the three learning objectives presented above. Hacking for Social Sciences explicitly plans to spread its three blocks over 1-2 months to give students the ability to work on applied examples in between sessions in order to get most out of the subsequent session.

The first block demonstrates the components of a modern data science tech stack, classifies technologies and gives a big picture overview: from languages such as R and Python to container technology such as docker. The second block focuses on git version control, the de facto industry standard to manage source code. Version control is not only crucial to knowledge management and reproducible research, but it is also the backbone of collaboration in distributed teams. The third and final block focuses on data themselves
and teaches how to obtain data through machine to machine communication. Furthermore, the third block discusses data management in a research project.
Lecture notesA free and open online book (made with bookdown) is available from https://h4sci.github.io/h4sci-book/. The book/script will be continuously updated during the course to account for questions and participants' questions.
All course materials including, slides, resources and source code will be made available through: https://h4sci.github.io/
LiteratureA free and open online book (made with bookdown) is available from https://h4sci.github.io/h4sci-book/. The book/script will be continuously updated during the course to account for questions and participants' questions.
All course materials including, slides, resources and source code will be made available through: https://h4sci.github.io/
Prerequisites / NoticeBasic experience with either R or Python, e.g., a stats course that was taught using R.