Are current infrastructures suitable for extreme data processing? Technologies for data management
The Graph-Massivizer project is a first-in-class project that has understood the potential of Graph processing to solve multiple challenges of economic, societal or environmental nature. However, current technologies are not suited to large-scale graph analytics. Graph-Massivizer will develop a series of tools to overcome these limitations, where underlying computing infrastructure plays a major role. The project uses European infrastructure in the computing continuum, from pre-exascale high-performance computing (HPC) facilities to local computing clusters with state-of-the-art networking and in-place cybersecurity capabilities. The session will revolve around the analysis of the suitability of current infrastructures for extreme analytics, including large-scale graph processing, providing insights on the ongoing work in the field and including examples. Concrete tools and technologies that will be shared and discussed with the audience include the following aspects i) Graph workload modeling, ii) Energy awareness and sustainability and iii) Scalable serverless graph processing.
The session will feature a diversity of examples so that technologies can be put in an operational and concrete problem-solving context, and in particular we will showcase a pilot on Green Data Center Digital Twins, targeting sustainable science throughput through scalable energy-aware, exascale operation, and traceable “total cost of ownership” (TCO) understanding, including sustainability indicators and their environmental effects (e.g., GHG emissions). Audience will see how the proposed solutions will enable the creation of a novel, graph-based digital twin of a data center; this digital twin will further support the construction of sustainable exascale computing operational models to support scientific discovery in the next decade.
Projects like DataCloud (addressing the complete life cycle management of Big Data pipelines through discovery, design, simulation, provisioning, deployment and adaptation across the computing continuum) and EnrichMyData (highly scalable and replicable data enrichment pipelines) are confronted with similar challenges and will join the discussion following the analysis-technological solutions-example approach.
The session will foster the creation of a collaborative agenda for joint work between the projects promoting the session (Graph-Massivizer, DataCloud, EnrichMyData) and will invite other projects and initiatives on stage to share their experiences. We will also identify major industrial and scientific players based in Lulea, well-known by the establishment of Data Centers, to participate in the session and will encourage the different stakeholders to engage in future validation campaigns.
- Present current needs of extreme data processing with respect to infrastructures, elaborating on major challenges and opportunities for Europe.
- Include energy efficiency and sustainability aspects as part of the problem equation.
- Introduce ongoing work in the domain and proposed solutions to overcome main limitations.
- Showcase examples that range from the use of HPC to computing clusters.
- Identify stakeholders willing to engage in validation campaigns in the context of ongoing projects so that industrial needs are well aligned with the technical work proposed by the scientific community.
- Converge towards an agenda of themes that are common to several projects and initiatives and can benefit from joint work to foster collaboration.
Introduction presentation here
Lilit Axner presentation here
Graph-Massivizer presentation here
DataCloud presentation here
EnrichMyData presentation here
EXAMIND presentation here
Green Data Center Digital Twin presentation here