Dariah.lab | Intelligent Analysis and Interpretation Laboratory

Intelligent Analysis and Interpretation Laboratory

Intelligent Analysis and Interpretation Laboratory offers functionality for processing semantically linked information resources, that is data analysis, detailed interpretation and ongoing interaction with the data. At this stage of processing, the data under study is already enriched in the Automated Enrichment Laboratory and semantically linked in the Supervised Semantic Discovery Laboratory.

In research, inference process is often iterative, i.e., based on the results of the initial analysis, algorithms for the next stage of data processing are formulated, and based on their results, algorithms for further steps are formulated. This is a cycle that is repeated until the researcher obtains satisfactory results. Thus, working with data is very often interactive, requiring supervision by the user and reacting to the results obtained.

Intelligent Analysis and Interpretation Laboratory will provide solutions that allow researchers to supervise the processing of the data they analyze on an ongoing basis and to make changes to the algorithms used. These solutions depend, of course, largely on the type of data and the nature of the research being conducted. The analysis process is different for textual data and different for spatial or multimedia data. The Laboratory will provide sets of basic algorithms for processing data from different areas of digital humanities. Some of them will be more universal in nature, such as methods for analyzing textual data. Others, on the other hand, will be dedicated to data of a specific type and nature, such as press texts.

Text Data Analysis and Interpretation

The Laboratory will offer tools for extensive statistical text analysis. In this area, it will be possible to filter texts by metadata and to group them on the basis of metadata, i.e., to create sub-corpora, for comparative analysis. For texts from various subject areas, one will be able to analyze entities and concepts in terms of their frequency of occurrence with consideration of changes over time, to analyze relationships between entities, or to analyze changes in the meaning of concepts over time and within different sub-corpora.

These analyses are based on data acquired at the earlier stages of data processing, i.e., linguistic tagging, detection of named entities, disambiguation of meanings (e.g. names of people and places), recognition of multi-word lexical units. The results of the analyses will be presented both in textual form and in visual form, i.e., in the form of graphs or charts of various types.

The above mentioned tools can be applied to the analysis of text corpora from different subject areas, but solutions tailored to the nature of the collected data will also be offered, e.g., methods for the analysis and exploration of literature and literary texts, enabling the recognition of literary terms or thematic modeling with literary entities and terms. The peculiarities of data collected in library systems will be taken into account by a tool offered for statistical analyses of frequencies, counts of various metadata components – genres, types of publications, authors, and viewing statistics of classification, topics, types, time. Corpora of press texts are a separate category and tools for their statistical analysis will be also offered.

An example of text corpus analysis results visualization – heat map illustrating the share of topics in document corpus

Interactive Analysis of Multimedia

Tools for analyzing multimedia data are not as universal as those for textual data. In this case, the solutions offered in the Laboratory deal with data of a specific nature and meaning, and their processing has a specific purpose.

One of such solution is developed in the Laboratory for the annotation of emotions expressed in the linguistic and visual layers of a feature film. The purpose of the data analysis is to show the correspondence and/or differences in the manifestation of specific emotions in different languages (Polish, German, Danish) in relation to the same visual elements of the film. The bases of the analysis is the data from the annotation layer generated in the process of metadata enrichment, specifying the emotions assigned to individual elements of the content.

Another solution of this type is a tool for analyzing symbolic representations of various properties of melody, oral expression and gestures. In this case, the data to be analyzed, obtained at the earlier stages of processing audiovisual materials, is the information content calculated for individual discrete elements of musical (pitch classes, rhythmic values) and speech (sounds, syllables, words, gestures) waveforms. These values are a model representation of the effects of human mental predictive mechanisms used in the perception of the original musical and speech waveforms and accompanying gestural phenomena. The results of the analysis are applicable to studies of autonomic nervous system activity, the music industry and automatic speech analysis systems.

Interactive Analysis of Spatial Data

Data often have spatial aspect. They refer to a specific place, describing its character and features. The mere fact that certain types of data were produced in a specific place can also be significant. The application of modern processing, visualization and distribution techniques to historical data allows for a coherent representation of the history of a place and its changes over time.

Analysis of spatial and spatio-temporal data requires establishing links for data of different nature (descriptive and numerical), with reference layers such as administrative units, geographical names or address points. Processing such linked and enriched data, can provide answers to many questions requiring an understanding of the spatial relationships between elements (e.g. distances) and enables finding all locations that meet certain criteria, for example. Their analysis can reveal the existence of certain patterns and their changes over time, or identify areas of concentration of phenomena.

The Laboratory will offer tools for interactive analysis of spatial data supported by geomatics methods, that is, related to the collection, storage and processing of precisely spatial data.