FAQs   Your questions answered

Below you'll find answers to the questions we get asked the most.

In the scope of DataBench (big data benchmarking), a Benchmark is a performance metric to be used for comparative purposes. In DataBench we identify business benchmarks, which are quantitative indicators to evaluate the impact on business performances of a Big Data technology, and technical benchmarks, which evaluate technical indicators or metrics such as performance, latency, etc.

From the technical perspective, existing Big Data benchmarks have primarily focused on the commercial/retail domain related to transaction processing (TPC benchmarks and BigBench) or to applications suitable for graph processing (Hobbit and LDBC – Linked Data Benchmark Council). The analysis of different sectors in the BDVA has concluded that they all use different mixes of the different Big Data Types (Structured data, Time series/IoT, Spatial, Media, Text and Graph). Industrial sector specific benchmarks will thus relate to a selection of important data types, and their corresponding vertical benchmarks, adapted for this sector. The existing holistic industry/application benchmarks have primarily been focusing on structured data and Graph data types and DataBench will in addition be focusing on also supporting the newer benchmarks related to the industry requirements for time series/IoT, spatial and media and text, from the requirements of different industrial sectors such as manufacturing, transport, bio economies, earth observation, health, energy and many others.

A Use Case is a discretely funded effort designed to accomplish a particular business goal or objective through the application of Big Data technology to particular business processes and/or application domains, employing line-of-business and IT resources. Some use cases are cross-domain and can be used to group both technical aspects and business metrics. Examples are: price optimization, fraud risk assessment, customer profiling.

In DataBench business benchmarks are result of the assemblage and calculation of performance metrics to be used for comparative purposes. Quantitative indicators of business performance improvements achieved by the use of Big Data and Analytics which can be used as target or best performance benchmarks by other organizations. Benchmarks are categorized according to type of use case, business process, industry.

In DataBench a business KPI is a multidimensional indicator including one or more of the following dimensions:

  • Time (time extension over which the KPI is calculated)
  • Organizational responsibility (organization unit/department to which the KPI is referred)
  • Process (set of activities the KPI evaluates)
  • Product (specific line of business the KPI evaluates)
  • Customer (market segment to which the KPI is referred)
  • Project (innovation initiatives the KPI intends to evaluate)

The literature highlights also the difference between a financial KPI – which means that the KPI can be measured in economic terms (e.g. sales) – and a non-financial KPI – it can be measured but not in economic terms (e.g. customer satisfaction).

The DataBench Toolbox is a one-stop-shop for Big Data Benchmarking offering multiple benefits for different kind of users. The Toolbox is one of the main outcomes of the EU-funded project DataBench working on addressing a significant gap in current benchmarking activities. DataBench provides a catalog of certifiable benchmarks and evaluation schemes of Big Data Technologies performance of high business impact and industrial significance. In the DataBench Toolbox you will be able to find:

  • Links to the most prominent initiatives and tools in Benchmarking big data and AI technical solutions and applications.
  • Ways to register new benchmarks.
  • Possibility of automating the configuration, deployment, and execution of some selected benchmarks.
  • Knowledge about business KPIs to take into account by different criteria (by industry, by use case, by company size, etc.).
  • Links to existing big data architectural blueprints and reference models to help organizations to map their needs to existing efforts in the field.
  • Access to other tools provided by DataBench, such as the project Handbook or the self-assessment tool.

DataBench will develop a handbook describing the main industrial and business performance benchmarks targeted at industrial users and European technology developers. This handbook will be made available via the Toolbox.

The DataBench Self-Assessment Tool provides organisations using or planning to use Big Data and Analytics (BDA) with the opportunity to benchmark their business performance against their peers (other companies in the same industry and same company size class).

The DataBench Self-Assessment Tool addresses people who are involved, influence, or are highly knowledgeable about their organisation’s approach to, and potential use of, BDA. A deep technical understanding of the use or development of Big Data systems is not required to fill in the Self-Assessment survey. The tool is extremely user-friendly. It will take you 20/25 minutes max. to answer 20 questions.

After the finalisation of the survey, a personalised summary report will be generated and sent to you with an analysis of your answers. Your responses will be compared with responses from other organisations in the same industry and of the same company size.

The self-assessment tool is accessible from the menu of the Toolbox.

The BDV Reference Model has been developed by the BDVA, taking into account input from technical experts and stakeholders along the whole Big Data Value chain as well as interactions with other related PPPs. An explicit aim of the BDV Reference Model in the SRIA 4.0 document is to also include logical relationships to other areas of a digital platform such as Cloud, High Performance Computing (HPC), IoT, Networks/5G, CyberSecurity etc.

The BDV Reference Model may serve as common reference framework to locate Big Data technologies on the overall IT stack. It addresses the main concerns and aspects to be considered for Big Data Value systems.

The BDV Reference Model is structured into horizontal and vertical concerns.

  • Horizontal concerns cover specific aspects along the data processing chain, starting with data collection and ingestion, reaching up to data visualization. It should be noted, that the horizontal concerns do not imply a layered architecture. As an example, data visualization may be applied directly to collected data (data management aspect) without the need for data processing and analytics. Further data analytics might take place in the IoT area – i.e. Edge Analytics. This shows logical areas – but they might execute in different physical layers.
  • Vertical concerns address cross-cutting issues, which may affect all the horizontal concerns. In addition, verticals may also involve non-technical aspects (e.g., standardization as technical concerns, but also non-technical ones).

Given the purpose of the BDV Reference Model to act as a reference framework to locate Big Data technologies, it is purposefully chosen to be as simple and easy to understand as possible. It thus does not have the ambition to serve as a full technical reference architecture. However, the BDV Reference Model is compatible with such reference architectures, most notably the emerging ISO JTC1 WG9 Big Data Reference Architecture – now being further developed in ISO JTC1 SC42 Artificial Intelligence.

The attached document represents the DataBench Framework for technical benchmarks. It is extended from the dimensions in the BDVA Reference Model – with additional aspects of industry sectors and application areas. The Benchmarks are listed chronologically according to when they have been introduced. It is expected that more benchmarks will be introduced in the Toolbox. For a complete overview of the existing benchmarks check the benchmark catalogue enabled by the search functionality of the Toolbox.