Pentaho Data Integration Fundamentals
This training is guaranteed to take place
Pentaho Course -ID
DI1000
Duration:
3 days
Dates:
Course description
With data volumes growing all the time, companies need quick and easy ways to use their data and gain greater insights. The primary challenge is having a consistent, unified version of information across all sources in a form that is suitable for analysis. With Pentaho Data Integration, powerful (ETL) resources can be extracted, transformed and loaded. It is also possible to create an intuitive and professional graphical development environment and an open and standards-based architecture.
Pentaho Data Integration offers a comprehensive ETL solution
- A powerful graphical process designer for ETL developers
- Almost unlimited interfaces for integrating any type of data, including diverse and large amounts of data
- High scalability and performance, including in-memory caching
- Big Data integration, analysis and reporting (via Hadoop, NoSQL, traditional OLTP and analytical databases)
- A modern, open and standards-based architecture
The course includes both presentations and hands-on exercises that cover theory, best practices, and design patterns.
Target Audience
This course is the third in the field of data analysis. It is aimed at participants who have already dealt with the development or administration of databases or who would like to get started with Pentaho Data Integration.
Business User |
Business Analyst |
Data Analyst |
Software Architect |
Pentaho Admin |
Pentaho Support |
---|---|---|---|---|---|
Learning Goals
After completing this course, you will be able to:
- install Pentaho Data Integration
- create and execute basic transformations with steps and hops
- display the mapping results in the metrics and log view
- create database connections and use the data source via the Database Explorer
- generated complex transformations by configuring the following steps: table input, table output, CSV file input, insert/update, add constants, filter, value mapper, stream lookup, add cells, merge cells, sort cells, row normaliser, JavaScript, dimension lookup/update, database lookup, extracting data from XML, setting environment variables and analytical queries
- create transformations that use parameterised values
- map the structure of an online transaction process database to the structure of an online analysis process database
- load data and write it to different databases
- use ETL templates to populate a data warehouse
- create images that process slowly changing dimensions
- create Pentaho Data Integration jobs that: show multiple images, use variables, contain sub-jobs, provide integrated error messages, load and edit multiple text files, convert files into Microsoft Excel format
- configure logging for transformation steps and job entries and check logged data
- configure transformation step troubleshooting
- configure the Pentaho Enterprise Repository, including basic security
- use the repository to: create folders, save transformations and jobs, lock, delete, revise and restore artefacts
- execute a transformation in Pentaho Data Integration and plan and monitor this in the Pentaho Enterprise Console
- create and delete an index with a transformation
- create transformations, configure the steps for running on a cluster, run transformations on the cluster, review the results, and monitor the transformation
Course Schedule
Day 1 |
---|
Module 1 | Introduction to Pentaho Data Integration | |
---|---|---|
Lesson | Goals | |
Lesson | What is Pentaho Data Integration (PDI)? |
Module 2 | Transformation basics | |
---|---|---|
Lesson | Getting to know the PDI user interface | |
Lesson | Creating a transformation | |
Exercise | Generierung von Rows, Sequences und Select Values | |
Lesson | Error handling & introduction to logging | |
Lesson | Introduction to repositories |
Module 3 | Reading and writing files | |
---|---|---|
Lesson | Input and output steps | |
Lesson | Parameters & Kettle.properties | |
Exercise | CSV input to multiple outputs using switch/case | |
Exercise | Create a serialisable file from several files | |
Exercise | Deserialise files |
Day 2 |
---|
Module 4 | Working with databases | |
---|---|---|
Lesson | Connect and explore databases | |
Lesson | Tables – Input and output | |
Exercise | Reading and writing database tables | |
Lesson | Steps using insert, update and delete | |
Lesson | Cleansing Data | |
Lesson | Using parameters & arguments in SQL Queries | |
Exercise | Input with Parametern / Table Wizard |
Module 5 | Data flows and lookups | |
---|---|---|
Lesson | Copying and distributing data | |
Exercise | Working with parallel processing | |
Lesson | Lookups | |
Exercise | Lookups & data formatting | |
Lesson | Merging data |
Day 3 |
---|
Module 6 | Calculations | |
---|---|---|
Lesson | Using the "Group by" step | |
Lesson | Calculator | |
Exercise | Sorting, grouping and calculating order quantities | |
Lesson | Regular expressions | |
Lesson | User Defined Java Expression | |
Lesson | JavaScript |
Module 7 | Jobs Orchestration | |
---|---|---|
Lesson | Introduction to Jobs | |
Exercise | Loading JVM data into a table | |
Lesson | Sending alerts | |
Lesson | Looping & conditions | |
Exercise | Creating a job with a loop | |
Lesson | Executing jobs from a terminal window (Kitchen) |
Module 8 | Scheduling | |
---|---|---|
Lesson | Creating a scheduler | |
Lesson | Monitoring scheduled tasks |
Module 9 | Exploring data integration repositories | |
---|---|---|
Lesson | The Pentaho Data Integration repository | |
Exercise | Use of the Pentaho Enterprise repository |
Module 10 | Detailed Logging | |
---|---|---|
Lesson | Detailed Logging |
Our Pentaho Trainer
Dirk Rönsch
Djordja Markovic
Laziz Karimov
Tom Haupt
Deepening knowledge of software and technology
- In-House
- Online
- In Fulda
I am interested in this training!