Environmental data management and analysis with Pentaho

Indoor hydrogeologist taking samples - Kamil Nesetril
Indoor hydrogeologist taking samples: Kamil Nešetřil

Analyzing environmental data helps to better understand the availability of natural resources. Technical University of Liberec runs data warehouses that integrate data from different sources of environmental information. Kamil Nešetřil will present this exciting project at Pentaho Community Meeting and show how data warehouses can be built from semi-structured data.

Kamil, who are you?
I wanted to work outdoor, so I studied geology and hydrogeology. However I ended up as a groundwater modeler – I call myself “indoor hydrogeologist”. I work at the Technical University of Liberec in the city of Reichenberg in the Czech Republic in interdisciplinary teams. We have IT students but research focuses on groundwater and the environment. I have moved forward to IT and I am connecting both worlds.

What is your connection to Pentaho?
To build a groundwater model requires to integrate knowledge from scattered data (reports, spreadsheets, maps), understand the processes (where does water flow) and create the conceptual model (simplify this understanding). Finally (if there is enough time) one can build and evaluate the groundwater model. A simple model can be just a formula in a spreadsheet.

I wanted to use a domain-specific software to help me with data integration, visualization and reporting. I have reviewed such tools but I decided to develop a Pentaho-based solution. Integrating data from diverse sources can be easier done in PDI than in a domain-specific data-management tools. Designing report takes the same effort with domain-specific tool as with Pentaho. I am not a programmer so I enjoy Pentaho design tools and the fact that somebody designed the architecture of the whole system.

What will your talk be about?
I am going to speak about the Hydrogeological Information System (dataearth.cz) that is based on the Pentaho Platform. There are several data warehousing projects for environmental data; some environmental data management systems use existing reporting tools. But we have the first environmental application of the full BI stack. I will show how we build data warehouse from semi-structured data.

Why are projects like dataearth so important?
Our solution takes tools and concepts from BI into the world where are generally only Excel files and where even IT guys do not know what BI means. There is challenge in diversity (long-tail data) not volume (big data).

What do you expect from PCM?
I am looking forward to see the Pentaho community alive – not just thru a screen.