CERN Accelerates its Business Computing with Pentaho

Jan Janke of CERN uses Pentaho for reporting and analytics
Jan Janke, Deputy Leader of Administrative Information Systems at CERN

CERN is the biggest research organization in the world. In 2014, CERN migrated from Business Objects to Pentaho for data integration, reporting and analytics. With more than 15,000 users this makes CERN one of the biggest Pentaho implementations. Jan Janke will present the project at Pentaho Community Meeting. I talked to Jan about the requirements of an international research organization and why CERN migrated from Business Objects to Pentaho.

Jan, who are you?
I am a software developer originally from Leipzig/Germany. I finished university with a joint Franco-German degree (Master equivalent) in “International Business Informatics” in 2004. During my university years and until joining CERN in April 2005 as a staff member, I worked as an IT consultant and software developer for an Austrian/German software house active in the area of business process modeling and integration.

After joining CERN, I first started to work as a Java Enterprise/ORACLE application developer providing various business-computing services to CERN. In 2011, I became Section Leader and am still today responsible for the Administrative Information Systems (AIS) group’s internal development infrastructure, its central data warehouse and the related BI and reporting services as well as the maintenance and development of other major administrative applications. In 2014, I was appointed Deputy Leader of the Administrative Information Systems group. Since then, I am co-responsible for the entire business-computing infrastructure of the organization.

My personal interests in informatics cover areas such as full stack web development, JVM programming languages and playing with sports scheduling and simulation algorithms. When not sitting in front of a screen, I like building (bigger) Lego models; I regularly swim (especially longer distances) and bike (principally in the nice and quiet landscapes of the Jura Mountains and the Alps).

Simulation of the collision of two protons producing the Higgs boson that quickly decays into four muons (yellow tracks). © 1995-2017 CERN
In 2012, CERN’s Large Hadron Collider detected the Higgs boson for the first time © 1995-2017 CERN

What will you present in your talk?
In my talk, I will first give a short overview of CERN – the European Organization for Nuclear Research. Working for CERN, the place where the Web was born and (even though less known) first capacitive touch screens have been developed, is a unique experience every day. I will present the specific challenges we face in the area of business computing affecting especially the areas of data warehousing, business intelligence and reporting.

We have adapted the Pentaho platform for CERN’s specific needs. To keep all content created by CERN under controll we have built an infrastructure around it. The tools we have developed provide enhanced search and scheduling capabilities, add support for delegated management of schemas including approval and environment promotion features as well as an integration with the “git” version control system. I will briefly present these tools and explain how they help us to operate in an environment where more than 15,000 potential users are exposed to content served directly or indirectly by Pentaho.

Why did CERN decide for Pentaho?
We started to use Pentaho in autumn 2014 after a longer BI tool selection process that began with a first evaluation of potential applications in 2012. In the past, we have been relying on in-house developed reporting solutions alongside Business Objects (now owned by SAP). Today, we are using both the business analytics and the data integration products from Pentaho.

With the data integration tools (PDI), we were able to stop relying on manually crafted PL/SQL and shell scripts for important ETL processes. Our new central (near real-time) data warehouse only relies on PDI as tool of choice for all ETL processes. PDI also helps us to be agnostic in terms of data sources to read from or write to.

Large Hadron Collider at CERN © 2017 CERN
Large Hadron Collider at CERN © 2017 CERN

Why does CERN use Pentaho?
We have chosen Pentaho because it provides us with a powerful data integration tool and brings an appealing solution for our general reporting, data analytics and visualization needs. The fact that most of the tools are available as open source software allows our own developers to easily inspect existing code and integrate our tools with it. Here, the fact that Pentaho is written in Java comes in especially handy as Java or rather the JVM is also our technology of choice for all in-house developments made by CERN’s AIS group.

Despite all the hype made around Big Data, “simple” reporting and data visualization remain essential for us. Pentaho solutions like CTools or the Report Designer are equally important as for instance the Analyzer which is part of Pentaho’s Enterprise version. All tools have their strong and weak points. We hope that Pentaho together with the community continues to maintain and further evolve them. While PDI is certainly an important corner stone of the Pentaho offering, we hope that strong innovation continues to happen in the business analytics area, too.

What do you expect from Pentaho Community Meeting?
First, I am eager to discover what other users from the international Pentaho community do with Pentaho. Seeing which parts of Pentaho are used where and in which form will help me to compare our own Pentaho usage and see in which areas we could possibly benefit by adopting new ideas inspired by the PCM.

In addition, I hope to get some insights with respect to tools that others are developing on top of the Pentaho technology stack. At last, besides meeting other Pentaho users and having interesting discussions, it is a goal to see if the various tools and enhancements we have made here at CERN could be of some benefit to other members of the Pentaho community.