Pentaho Data Integration Fundamentals

This training is guaranteed to take place

Pentaho Course -ID



3 days


Course description

With data volumes growing all the time, companies need quick and easy ways to use their data and gain greater insights. The primary challenge is having a consistent, unified version of information across all sources in a form that is suitable for analysis. With Pentaho Data Integration, powerful (ETL) resources can be extracted, transformed and loaded. It is also possible to create an intuitive and professional graphical development environment and an open and standards-based architecture.

Pentaho Data Integration offers a comprehensive ETL solution

  • A powerful graphical process designer for ETL developers
  • Almost unlimited interfaces for integrating any type of data, including diverse and large amounts of data
  • High scalability and performance, including in-memory caching
  • Big Data integration, analysis and reporting (via Hadoop, NoSQL, traditional OLTP and analytical databases)
  • A modern, open and standards-based architecture

    The course includes both presentations and hands-on exercises that cover theory, best practices, and design patterns.

Target Audience

This course is the third in the field of data analysis. It is aimed at participants who have already dealt with the development or administration of databases or who would like to get started with Pentaho Data Integration.


Learning Goals

After completing this course, you will be able to:

  • install Pentaho Data Integration
  • create and execute basic transformations with steps and hops
  • display the mapping results in the metrics and log view
  • create database connections and use the data source via the Database Explorer
  • generated complex transformations by configuring the following steps: table input, table output, CSV file input, insert/update, add constants, filter, value mapper, stream lookup, add cells, merge cells, sort cells, row normaliser, JavaScript, dimension lookup/update, database lookup, extracting data from XML, setting environment variables and analytical queries
  • create transformations that use parameterised values
  • map the structure of an online transaction process database to the structure of an online analysis process database
  • load data and write it to different databases
  • use ETL templates to populate a data warehouse
  • create images that process slowly changing dimensions
  • create Pentaho Data Integration jobs that: show multiple images, use variables, contain sub-jobs, provide integrated error messages, load and edit multiple text files, convert files into Microsoft Excel format
  • configure logging for transformation steps and job entries and check logged data
  • configure transformation step troubleshooting
  • configure the Pentaho Enterprise Repository, including basic security
  • use the repository to: create folders, save transformations and jobs, lock, delete, revise and restore artefacts
  • execute a transformation in Pentaho Data Integration and plan and monitor this in the Pentaho Enterprise Console
  • create and delete an index with a transformation
  • create transformations, configure the steps for running on a cluster, run transformations on the cluster, review the results, and monitor the transformation

Course Schedule

Day 1
Module 1 Introduction to Pentaho Data Integration
Lesson Goals
Lesson What is Pentaho Data Integration (PDI)?
Module 2 Transformation basics
Lesson Getting to know the PDI user interface
Lesson Creating a transformation
Exercise Generierung von Rows, Sequences und Select Values
Lesson Error handling & introduction to logging
Lesson Introduction to repositories
Module 3 Reading and writing files
Lesson Input and output steps
Lesson Parameters & Kettle.properties
Exercise CSV input to multiple outputs using switch/case
Exercise Create a serialisable file from several files
Exercise Deserialise files
Day 2
Module 4 Working with databases
Lesson Connect and explore databases
Lesson Tables – Input and output
Exercise Reading and writing database tables
Lesson Steps using insert, update and delete
Lesson Cleansing Data
Lesson Using parameters & arguments in SQL Queries
Exercise Input with Parametern / Table Wizard
Module 5 Data flows and lookups
Lesson Copying and distributing data
Exercise Working with parallel processing
Lesson Lookups
Exercise Lookups & data formatting
Lesson Merging data
Day 3
Module 6 Calculations
Lesson Using the "Group by" step
Lesson Calculator
Exercise Sorting, grouping and calculating order quantities
Lesson Regular expressions
Lesson User Defined Java Expression
Lesson JavaScript
Module 7 Jobs Orchestration
Lesson Introduction to Jobs
Exercise Loading JVM data into a table
Lesson Sending alerts
Lesson Looping & conditions
Exercise Creating a job with a loop
Lesson Executing jobs from a terminal window (Kitchen)
Module 8 Scheduling
Lesson Creating a scheduler
Lesson Monitoring scheduled tasks
Module 9 Exploring data integration repositories
Lesson The Pentaho Data Integration repository
Exercise Use of the Pentaho Enterprise repository
Module 10 Detailed Logging
Lesson Detailed Logging

Our Pentaho Trainer

Dirk Rönsch

Djordja Markovic

Laziz Karimov

Tom Haupt

Deepening knowledge of software and technology

I am interested in this training!