- Introduction to Data Science
- Getting To Know Your Data
- Overview of Tasks & Techniques: Prediction
- Evaluation and Methodology of Data Science
- Data Engineering
- Overview of Tasks & Techniques: Probabilistic Models
- Overview of Tasks & Techniques: Exploratory Data Mining
- Case Studies in Data Science

- What is data science, relation to data mining, machine learning, big data and statistics
- Motivating examples
- Why is it interesting?
- Several data science settings
- Introduction to the WEKA tool
- Practical information

- From data to features

- Interactive group
- discussion
- Representing problems with matrices
- Representing problem with relations
- Example: Text with TFIDF

- Computing simple

- Boxplots
- Scatterplots
- Time series
- Spatial data

- Case studies

- X & Y examples
- Medical data

- The prediction task

- Definition
- Examples
- Format of input / output data

- Prediction algorithms

- Decision trees
- Rule learners
- Linear/logistic regression
- Nearest neighbour learning
- Support vector machines

- Properties of prediction algorithms and practical exercises
- Combining classifiers

- Experimental setup

- Training, tuning, test data
- Holdout method, cross-validation, bootstrap method

- Measuring performance of a model

- Accuracy, ROC curves, precision-recall curves
- Loss functions for regression

- Interpretation of results

- Confidence interval for accuracy
- Hypothesis tests for comparing models, algorithms

- Attribute selection

- Filter methods
- Wrapper methods

- Data discretization

- Unsupervised discretization
- Supervised discretization

- Data transformations

- PCA and variants

- Exercises

- Introduction

- Probabilities
- Rule of Bayes and Conditional Independence

- Naive Bayes

- Application to spam filtering

- Bayesian Networks

- Graphical representation
- Independence and correlation

- Temporal models

- Markov Chains
- Hidden Markov Models

- Introduction to Exploratory Data Mining
- Association discovery

- What is association discovery?
- What are the challenges?
- In detail: Apriori

- Clustering

- What is clustering?
- What are the challenges?
- In detail: agglomerative clustering

- Hands-on: clustering in WEKA

- Eve, the Pharmaceutical Robot Scientist: Data Science for Drug Discovery
- Data science for sports analytics
- Data science for sensor data (Introduction to challenge)