Are current infrastructures suitable for extreme data processing? Technologies for data management

June 13, 2023 - 15:30 - 17:00
Haakon room
Extreme data Analytics
HPC and the connection to industry transformation green and digital
Energy and Green Deal
Data/AI and the Cloud-to-Edge continuum
In person
Share this

The Graph-Massivizer project is a first-in-class project that has understood the potential of Graph processing to solve multiple challenges of economic, societal or environmental nature. However, current technologies are not suited to large-scale graph analytics. Graph-Massivizer will develop a series of tools to overcome these limitations, where underlying computing infrastructure plays a major role. The project uses European infrastructure in the computing continuum, from pre-exascale high-performance computing (HPC) facilities to local computing clusters with state-of-the-art networking and in-place cybersecurity capabilities. The session will revolve around the analysis of the suitability of current infrastructures for extreme analytics, including large-scale graph processing, providing insights on the ongoing work in the field and including examples. Concrete tools and technologies that will be shared and discussed with the audience include the following aspects i) Graph workload modeling, ii) Energy awareness and sustainability and iii) Scalable serverless graph processing.

The session will feature a diversity of examples so that technologies can be put in an operational and concrete problem-solving context, and in particular we will showcase a pilot on Green Data Center Digital Twins, targeting sustainable science throughput through scalable energy-aware, exascale operation, and traceable “total cost of ownership” (TCO) understanding, including sustainability indicators and their environmental effects (e.g., GHG emissions). Audience will see how the proposed solutions will enable the creation of a novel, graph-based digital twin of a data center; this digital twin will further support the construction of sustainable exascale computing operational models to support scientific discovery in the next decade.

Projects like DataCloud (addressing the complete life cycle management of Big Data pipelines through discovery, design, simulation, provisioning, deployment and adaptation across the computing continuum) and EnrichMyData (highly scalable and replicable data enrichment pipelines) are confronted with similar challenges and will join the discussion following the analysis-technological solutions-example approach.

The session will foster the creation of a collaborative agenda for joint work between the projects promoting the session (Graph-Massivizer, DataCloud, EnrichMyData) and will invite other projects and initiatives on stage to share their experiences. We will also identify major industrial and scientific players based in Lulea, well-known by the establishment of Data Centers, to participate in the session and will encourage the different stakeholders to engage in future validation campaigns.


  • Present current needs of extreme data processing with respect to infrastructures, elaborating on major challenges and opportunities for Europe.
  • Include energy efficiency and sustainability aspects as part of the problem equation.
  • Introduce ongoing work in the domain and proposed solutions to overcome main limitations.
  • Showcase examples that range from the use of HPC to computing clusters.
  • Identify stakeholders willing to engage in validation campaigns in the context of ongoing projects so that industrial needs are well aligned with the technical work proposed by the scientific community.
  • Converge towards an agenda of themes that are common to several projects and initiatives and can benefit from joint work to foster collaboration.


Introduction presentation here

Lilit Axner presentation here 

Graph-Massivizer presentation here

DataCloud presentation here

EnrichMyData presentation here

EXAMIND presentation here

Green Data Center Digital Twin presentation here

Dumitru Roman
Senior Research Scientist at SINTEF
Roberta Turra
Data Analytics team lead at CINECA
Radu Prodan
Professor in distributed systems at University of Klagenfurt
Lilit Axner
Programme Officer Infrastructure at EuroHPC Joint Undertaking
Nuria de Lama
Consulting Director at International Data Corporation (IDC)
Jan Martinovic
Head of the Advanced Data Analysis and Simulations Lab at IT4Innovations National Supercomputing Center
Bill Patrowicz
CEO at Kaiser Research
Irena Pavlova
EU Projects Manager at at GATE – Big Data Centre of Excellence, Sofia University
Button to get your ticket now Button to get your ticket now