The Lifelog Search Challenge (LSC) is an international content retrieval competition that evaluates search for personal lifelog data. At the LSC, content-based search is performed over a multi-modal dataset, continuously recorded by a lifelogger over 27 days, consisting of multimedia content, biometric data, human activity data, and information activities data. In this work, we report on the first LSC that took place in Yokohama, Japan in 2018 as a special workshop at ACM International Conference on Multimedia Retrieval 2018 (ICMR 2018). We describe the general idea of this challenge, summarise the participating search systems as well as the evaluation procedure, and analyse the search performance of the teams in various aspects. We try to identify reasons why some systems performed better than others and provide an outlook as well as open issues for upcoming iterations of the challenge.
With the advances in digital media processing technologies and the tremendous growth in the amount of digital media that have been created, new artworks are becoming possible and drawing much attention from researchers, industry, and consumers. A related emerging research area is the evaluation of such multimedia artworks by machine learning techniques. We call this research area “attractiveness computing.” Attractiveness computing is made possible by the great accumulation of such multimedia artworks and of consumers' responses. In this paper, we review existing research on multimedia artworks analysis and attractiveness computing.
This paper presents an interactive face retrieval framework for clarifying an image representation envisioned by a user. Our system is designed for a situation in which the user wishes to find a person but has only visual memory of the person. We address a critical challenge of image retrieval across the user's inputs. Instead of target-specific information, the user can select several images that are similar to an impression of the target person the user wishes to search for. Based on the user's selection, our proposed system automatically updates a deep convolutional neural network. By interactively repeating these process, the system can reduce the gap between human-based similarities and computer-based similarities and estimate the target image representation. We ran user studies with 10 participants on a public database and confirmed that the proposed framework is effective for clarifying the image representation envisioned by the user easily and quickly.
Nonparametric topic models such as hierarchical Dirichlet processes (HDP) have been attracting more and more attentions for multimedia data analysis. However, the existing models for multimedia data are unsupervised ones that purely cluster semantically or characteristically related features into a specific latent topic without considering side information such as class information. In this paper, we present a novel supervised sequential symmetric correspondence HDP (Sup-SSC-HDP) model for multi-class video classification, where the empirical topic frequencies learned from multimodal video data are modeled as a predictor of video class. Qualitative and quantitative assessments demonstrate the effectiveness of Sup-SSC-HDP.
Separating reflection components is a fundamental problem in computer vision and useful for many applications such as image quality. We propose a novel method that improves the accuracy of separating reflection components from a single image. Although several algorithms for separating reflection components have been proposed, our method can additionally improve the accuracy based on their results. First, we obtain diffuse and specular components by using an existing algorithm. Then, we apply a high-emphasis filter for each component. Since the responses of the high-emphasis filter where the separation fails become larger than the original values, we can detect erroneous pixels. Thus, we replace separation results of these erroneous pixels with results of other reference pixels from the image considering the similarity between the target and reference pixels. Experimental results show that our method can improve at most 13.61 dB in terms of the Peak Signal-to-Noise Ratio (PSNR).
A major problem in the subjective evaluation of TV image quality is individual variability among participants. The large individual differences in the susceptibility to image blurring result in imprecise evaluations and loss of power to detect statistically significant differences between experimental conditions. In image quality assessments of traditional televisions, the observers' visual acuities (VA) should be screened. For emerging TV systems with wide field-of-view (FOV), in which objects move quickly relative to the display frame, it is unclear whether screening viewers' VAs is sufficient or not. In the wide FOV TV image quality evaluations, screening of the dynamic visual acuity (DVA) might be effective to control for the individual differences. We show here that a significant correlation between the number of blurred frames reported and the observer's DVA. Therefore, DVA screening is important to avoid an imprecise evaluation and to detect statistical significance between experimental conditions.