Infrastructure Use Scenarios

Possibilities offered by Dariah.lab in terms of services, equipment, and datasets, are illustrated by a number of scenarios described following the convention adopted in the Standardization Survival Kit. SSK is an open tool for publishing research scenarios documenting good practices and standards used in research in the field of digital humanities and cultural heritage.

The purpose of presenting scenarios developed for Dariah.lab is to explain what a given infrastructure element offers and how it can be used in different areas, and to enable assessment of its usefulness for different groups of users. The scenarios also illustrate the possibilities of using several elements of the infrastructure for data processing by putting them together in a certain order in a work flow.

The set of scenarios presented below will be gradually extended.

Creating subtitles for a video based on its audio track

The aim of the scenario is to generate so-called closed captioning based on the transcription of the audiovisual material’s soundtrack for viewers with hearing disabilities.

  • Obiects:
    • Multimedia
    • Sound
    • Text
  • Techniques:
    • Information Retrieval
    • Machine Learning
  • Standards:
    • ISO/IEC 14496
    • XML
    • WebVTT

Extracting audio tracks from audiovisual material

  • Conversion

Selecting the form of speech recognition results in terms of conversion (e.g., digital notation of dates, times, abbreviations, etc.).

Obtain speech transcriptions for individual audio tracks in XML format containing converted orthographic notation and timestamps

  • Data Recognition
  • Transcription

Conversion of the results to the selected subtitle format

  • Translation

Manual modification of speech recognition results in order to correct errors, adjust the length of subtitles to the available display time, supplement them with a description of non-verbal sounds

  • Cleaning
  • Editing

Generating automatic abstract based on article content

The goal is to obtain abstracts of literature articles using unsupervised learning algorithms to enrich metadata of scientific texts

  • Obiects:
    • Text
  • Techniques:
    • Information Retrieval
    • Encoding
    • Content Analysis

Loading an input file

  • Encoding

Paper content processing

  • Content Analysis

Generating abstract

  • Modelling
  • Anotating

Writing result to the output file

Result verification

FE logotype RP logotype EU logotype