Scenarios

Infrastructure Use Scenarios

Possibilities offered by Dariah.lab in terms of services, equipment, and datasets, are illustrated by a number of scenarios described following the convention adopted in the Standardization Survival Kit. SSK is an open tool for publishing research scenarios documenting good practices and standards used in research in the field of digital humanities and cultural heritage.

The purpose of presenting scenarios developed for Dariah.lab is to explain what a given infrastructure element offers and how it can be used in different areas, and to enable assessment of its usefulness for different groups of users. The scenarios also illustrate the possibilities of using several elements of the infrastructure for data processing by putting them together in a certain order in a work flow.

The set of scenarios presented below will be gradually extended.

Creating subtitles for a video based on its audio track
Generating automatic abstract based on article content

Creating subtitles for a video based on its audio track

The aim of the scenario is to generate so-called closed captioning based on the transcription of the audiovisual material’s soundtrack for viewers with hearing disabilities.

Obiects:
- Multimedia
- Sound
- Text

Techniques:
- Information Retrieval
- Machine Learning

Standards:
- ISO/IEC 14496
- XML
- WebVTT

Extracting audio tracks from audiovisual material

Conversion

Selecting the form of speech recognition results in terms of conversion (e.g., digital notation of dates, times, abbreviations, etc.).

Obtain speech transcriptions for individual audio tracks in XML format containing converted orthographic notation and timestamps

Data Recognition
Transcription

Conversion of the results to the selected subtitle format

Translation

Manual modification of speech recognition results in order to correct errors, adjust the length of subtitles to the available display time, supplement them with a description of non-verbal sounds

Cleaning
Editing

Generating automatic abstract based on article content

The goal is to obtain abstracts of literature articles using unsupervised learning algorithms to enrich metadata of scientific texts

Obiects:
- Text

Techniques:
- Information Retrieval
- Encoding
- Content Analysis

Loading an input file

Encoding

Paper content processing

Content Analysis

Generating abstract

Modelling
Anotating

Writing result to the output file

Result verification