Introduction

 

 

The volume and the velocity at which data is collected and stored has increased tremendously over the past decades. In order to build actionable knowledge from very big data — in time — it is essential to use techniques especially designed to scale on very big data.

This course provides an overview of the most recent and relevant techniques that facilitate the analysis of very large collections of data. It covers the foundational techniques for coping with big data such as sampling, hashing, approximation methods, and distributed computing. In the hands-on session, the participants analyze a big data set themselves with an Apache Hadoop cluster. Several KU Leuven data mining experts are available to help participants solve the hands-on assignment.  The course focuses on providing its attendees with sufficient expertise to choose an appropriate technique and execution platform for the analysis problems they are concerned with.

This course is offered by the Machine Learning research group from the department of Computer Science. The target audience of this course consists of professionals who experience a need for the analysis of large amounts of data. Some programming experience is recommended to attend this course. If you have no programming experience at all, you might benefit more from our data science in practice course.