Publications of the Department of Audiovisual Technology

The following list (automatically generated by the University Library) contains the publications from the year 2016. The publications up to the year 2015 can be found on an extra page.

Note: If you want to search through all the publications, select "Show All" and then you can use the browser search with Ctrl+F.

Results: 171
Created on: Sun, 30 Jun 2024 13:58:38 +0200 in 0.1280 sec


Göring, Steve; Ramachandra Rao, Rakesh Rao; Fremerey, Stephan; Raake, Alexander
AVrate Voyager: an open source online testing platform. - In: IEEE 23rd International Workshop on Multimedia Signal Processing, (2021), insges. 6 S.

Subjective testing is an integral part of many research fields considering, e.g., human perception. For this purpose, lab tests are a popular approach to gather ratings for subjective evaluations. However, not in all cases controlled lab tests can be performed, either in cases where no labs are existing, accessible or it may be disallowed to use them. For this reason, online tests, e.g., using crowdsourcing are supposed to be an alternative approach for traditional lab tests. We describe in the following paper a framework to implement such online tests for audio, video, and image-related evaluations or questionnaires. Our framework AVrate Voyager builds upon previously developed frameworks for lab tests including the experience with them. AVrate Voyager uses scalable web technologies to implement a test framework, this ensures that it will be running reliably. In addition, we added strategies for pre-caching to avoid additional influence for play-out, e.g. in the case of video testing. We analyze several conducted tests using the new framework and describe the required steps to modify the provided tool in detail.



https://doi.org/10.1109/MMSP53017.2021.9733561
Döring, Nicola; Mikhailova, Veronika; Brandenburg, Karlheinz; Broll, Wolfgang; Groß, Horst-Michael; Werner, Stephan; Raake, Alexander
Saying "Hi" to grandma in nine different ways : established and innovative communication media in the grandparent-grandchild relationship. - In: Technology, Mind, and Behavior, ISSN 2689-0208, (2021), insges. 1 S.

https://doi.org/10.1037/tms0000107
Fremerey, Stephan; Reimers, Carolin; Leist, Larissa; Spilski, Jan; Klatte, Maria; Fels, Janina; Raake, Alexander
Generation of audiovisual immersive virtual environments to evaluate cognitive performance in classroom type scenarios. - In: Tagungsband, DAGA 2021 - 47. Jahrestagung für Akustik, (2021), S. 1336-1339

https://doi.org/10.22032/dbt.50292
Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Enhancement of pixel-based video quality models using meta-data. - In: Electronic imaging, ISSN 2470-1173, Bd. 33 (2021), 9, art00022, S. 264-1-264-6

Current state-of-the-art pixel-based video quality models for 4K resolution do not have access to explicit meta information such as resolution and framerate and may not include implicit or explicit features that model the related effects on perceived video quality. In this paper, we propose a meta concept to extend state-of-the-art pixel-based models and develop hybrid models incorporating meta-data such as framerate and resolution. Our general approach uses machine learning to incooperate the meta-data to the overall video quality prediction. To this aim, in our study, we evaluate various machine learning approaches such as SVR, random forest, and extreme gradient boosting trees in terms of their suitability for hybrid model development. We use VMAF to demonstrate the validity of the meta-information concept. Our approach was tested on the publicly available AVT-VQDB-UHD-1 dataset. We are able to show an increase in the prediction accuracy for the hybrid models in comparison with the prediction accuracy of the underlying pixel-based model. While the proof-of-concept is applied to VMAF, it can also be used with other pixel-based models.



https://doi.org/10.2352/ISSN.2470-1173.2021.9.IQSP-264
Ho, Man M.; Zhang, Lu; Raake, Alexander; Zhou, Jinjia
Semantic-driven colorization. - In: Proceedings CVMP 2021, (2021), 1, S. 1-10

Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images. Consequently, the generated color is easier to be overflowed, and the semantic faults are invisible. According to human experience in colorization, our brains first detect and recognize the objects in the photo, then imagine their plausible colors based on many similar objects we have seen in real life, and finally colorize them, as described in Figure 1. In this study, we simulate that human-like action to let our network first learn to understand the photo, then colorize it. Thus, our work can provide plausible colors at a semantic level. Plus, the semantic information predicted from a well-trained model becomes understandable and able to be modified. Additionally, we also prove that Instance Normalization is also a missing ingredient for image colorization, then re-design the inference flow of U-Net to have two streams of data, providing an appropriate way of normalizing the features extracted from the black-and-white image. As a result, our network can provide plausible colors competitive to the typical colorization works for specific objects. Our interactive application is available at https://github.com/minhmanho/semantic-driven_colorization.



https://doi.org/10.1145/3485441.3485645
Keller, Dominik; Seybold, Tamara; Skowronek, Janto; Raake, Alexander
Sensorische Evaluierung in der Kinotechnik : wie Videoqualität mit Methoden aus der Lebensmittelforschung bewertet werden kann. - In: FKT, ISSN 1430-9947, Bd. 75 (2021), 4, S. 33-37

Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Towards high resolution video quality assessment in the crowd. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 1-6

Assessing high resolution video quality is usually performed using controlled, defined, and standardized lab tests. This method of acquiring human ratings in a lab environment is time-consuming and may also not reflect the typical viewing conditions. To overcome these disadvantages, crowd testing paradigms have been used for assessing video quality in general. Crowdsourcing-based tests enable a more diverse set of participants and also use a realistic hardware setup and viewing environment of typical users. However, obtaining valid ratings for high-resolution video quality poses several problems. Example issues are that streaming of such high-bandwidth content may not be feasible for some users, or that crowd participants lack an appropriate, high-resolution display device. In this paper, we propose a method to overcome such problems and conduct a crowd test using for higher resolution content by using a 540 p cutout from the center of the original 2160p video. To this aim, we use the videos from Test#1 of the publicly available dataset AVT-VQDB-UHD-1, which contains videos up to a resolution of UHD-1. The quality-labels available from that lab test allow us to compare the results with the crowd test presented in this paper. It is shown that there is a Pearson correlation of 0.96 between the lab and crowd tests and hence such crowd tests can reliably be used for video assessment of higher resolution content. The overall implementation of the crowd test framework and the results are made publicly available for further research and reproducibility1.



https://doi.org/10.1109/QoMEX51781.2021.9465425
Keller, Dominik; Vaalgamaa, Markus; Paajanen, Erkki; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Groovability: using groove as a novel measure for audio QoE with the example of smartphones. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 13-18

Groove in music is a fundamental part of why humans entrain to it and enjoy it. Smartphones have become an important medium to listen to music. Especially when being with others, loudspeaker playback may be the method of choice. However, due to the physical limits of acoustics, for loudspeaker playback, smartphones are equipped with sub-optimal audio capabilities. Therefore, it is desirable to measure Quality of Experience (QoE) of music played on smartphones. While audio playback is often assessed in terms of sound quality, the aim of this work is to address QoE in terms of the meaning or effect that the audio has on the listener. A key component for the meaning of popular music is groove. Hence, in this paper, we study groovability, that is, the ability of a piece of audio technology to convey groove. To instantiate our novel audio QoE assessment method, we apply it to music played by 8 different smartphones. For this purpose, looped 4-bar loudness-aligned recordings from 24 music pieces of different intrinsic groove were played back on the different smartphones. Our test method uses a multi-stimulus comparison with synchronized playback capability. A total of 62 subjects evaluated groovability using two stimulus subsets. It was found that the proposed methodology is highly effective to distinguish between the groovability provided by the considered phones. In addition, a reduced-reference model is proposed to predict groovability, using a set of both acoustics-and music-groove related features. In our formal validation on unknown data, the model is shown to provide good prediction performance with a Pearson correlation of greater than 0.90.



https://doi.org/10.1109/QoMEX51781.2021.9465440
Robitza, Werner; Ramachandra Rao, Rakesh Rao; Göring, Steve; Raake, Alexander
Impact of spatial and temporal information on video quality and compressibility. - In: 2021 13th International Conference on Quality of Multimedia Experience (QoMEX), (2021), S. 65-68

Spatial Information (SI) and Temporal Information (TI) are frequently-used metrics to classify the spatiotemporal complexity of video content. However, they are mostly used on original video sources, and their impact on actual encoding efficiency is not known. In this paper, we propose a method to determine the compressibility of video sources, that is, how good video quality can be under a given bitrate constraint. We show how various aggregations of SI and TI correlate with compressibility scores obtained from a public dataset of H.264/HEVCN P9 content. We observe that the minimum TI value as well as an existing criticality metric from the literature are good indicators for compressibility, as judged by subjective ratings as well as VMAF and P.1204.3 objective scores.



https://doi.org/10.1109/QoMEX51781.2021.9465452
Ávila Soto, Mauro; Barzegar, Najmeh
I know you are looking to me: enabling eye-gaze communication between small children and parents with visual impairments. - In: AH 2021, (2021), 9, insges. 4 S.

Eye-gaze interaction is a relevant mean of communication from the early infancy. The bonding between infants and their care-takers is Strengthened through eye contact. Parents with visual impairments are excluded of this type of interaction with their children. Thus, nowadays computer vision technologies allow to track eye-gaze with different purposes, even users with visual impairments are enable to recognize faces. This work starts from the following research question: Can current available eye tracking solutions aid parents with visual impairments to have eye-gaze interaction with their young infants children? We devised a software prototype based on currently available eye tracking technologies which was tested with three sets of visually impaired parents and their young infant children to explore the possibility to assist those parents to have eye-gaze interaction with their children. The experience was documented as semi-structured interviews which were processed with a content analysis technique. The approach got positive feedback in the functionality and Emotional interaction aspects.



https://doi.org/10.1145/3460881.3460883