Pentaho Data Integration Advanced

This training is guaranteed to take place

Pentaho Course ID:



2 days


Course description

This instructor-led course builds on the participants’ basic knowledge of Pentaho Data Integration (PDI). In addition to the basics of creating transformations and jobs, you will also learn how to use PDI in real-world scenarios. You will add PDI as a data source for a variety of visualisation options, leverage the streaming data processing capabilities of PDI, create transformations with metadata injection, and scale and optimise the PDI solution.

Participants will particularly benefit from the engagement and experience of their instructor and the ability to carry out hands-on exercises using a full implementation of Pentaho in a virtual laboratory environment.

Target Audience

Those who have completed the “Pentaho Data Integration Fundamentals” course (PDI1000S for self-study or PDI1000L with a course instructor) as well as those who have some initial experience in dealing with PDI.


Learning Goals

  • Reduce manual tasks by leveraging the power of metadata injection.
  • Using PDI as a data source for CDA, Data Services, SnowFlake, Google BigQuery, and machine learning applications
  • Leverage PDI’s streaming computing capabilities with MQTT, Kafka, and Amazon Kinesis data streams.
  • Scaling PDI Using map clustering, monitoring, and partitioning.
  • Tuning PDI with checkpoints and logging.

Course schedule

Day 1
Module 1 Metadata Injection
Lesson Static Metadata Injection
Lesson Standard Metadata Injection
Lesson Metadata Injection (Push-Pull Modes)
Lesson 2-Phase Metadata Injection
Lesson Using Filters in Metadata Injection
Module 2 PDI as an Enterprise Data Hub
Lesson CDA Datasource
Lesson Data Services
Lesson Connecting to a SnowFlake Database
Lesson Pentaho Data Integration and Google BigQuery
Lesson Usecase Credit Card Fraud
Day 2
Module 3 Data Streaming
Lesson MQTT – Mosquitto Service
Lesson MQTT – Sensor Data (IoT)
Lesson Services – Zookeeper and Kafka
Lesson Kafka – Sensor Data
Lesson Amazon Kinesis Data Streams
Module 4 Scaling an Enterprise Solution
Lesson Master and Slave Server
Lesson Clustering and "group by"
Lesson Stream Partitioning
Lesson Checkpoints

Our Pentaho Trainer

Dirk Rönsch

Djordja Markovic

Laziz Karimov

Tom Haupt

Deepening knowledge of software and technology

I am interested in this training!