Scenarios
Infrastructure Use Scenarios
Possibilities offered by Dariah.lab in terms of services, equipment, and datasets, are illustrated by a number of scenarios described following the convention adopted in the Standardization Survival Kit. SSK is an open tool for publishing research scenarios documenting good practices and standards used in research in the field of digital humanities and cultural heritage.
The purpose of presenting scenarios developed for Dariah.lab is to explain what a given infrastructure element offers and how it can be used in different areas, and to enable assessment of its usefulness for different groups of users. The scenarios also illustrate the possibilities of using several elements of the infrastructure for data processing by putting them together in a certain order in a work flow.
The set of scenarios presented below will be gradually extended.
- Creating subtitles for a video based on its audio track
- Generating automatic abstract based on article content
Creating subtitles for a video based on its audio track
The aim of the scenario is to generate so-called closed captioning based on the transcription of the audiovisual material’s soundtrack for viewers with hearing disabilities.
- Obiects:
- Multimedia
- Sound
- Text
- Techniques:
- Information Retrieval
- Machine Learning
- Standards:
- ISO/IEC 14496
- XML
- WebVTT
Extracting audio tracks from audiovisual material
- Conversion
Selecting the form of speech recognition results in terms of conversion (e.g., digital notation of dates, times, abbreviations, etc.).
Obtain speech transcriptions for individual audio tracks in XML format containing converted orthographic notation and timestamps
- Data Recognition
- Transcription
Conversion of the results to the selected subtitle format
- Translation
Manual modification of speech recognition results in order to correct errors, adjust the length of subtitles to the available display time, supplement them with a description of non-verbal sounds
- Cleaning
- Editing
Generating automatic abstract based on article content
The goal is to obtain abstracts of literature articles using unsupervised learning algorithms to enrich metadata of scientific texts
- Obiects:
- Text
- Techniques:
- Information Retrieval
- Encoding
- Content Analysis
Loading an input file
- Encoding
Paper content processing
- Content Analysis
Generating abstract
- Modelling
- Anotating
Writing result to the output file
Result verification