Latest Results The latest content available from Springer
- Journal of Computer Science and Technologyon September 13, 2024 at 12:00 am
- Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectureson September 13, 2024 at 12:00 am
Abstract This paper presents an original investigation into the domain of violence detection in videos, introducing an innovative approach tailored to the unique challenges of a federated learning environment. The study encompasses a comprehensive exploration of machine learning techniques, leveraging spatio-temporal features extracted from benchmark video datasets. In a notable departure from conventional methodologies, we introduce a novel architecture, the “Diff Gated” network, designed to streamline preprocessing and training while simultaneously enhancing accuracy. Our exploration of advanced machine learning techniques, such as super-convergence and transfer learning, expands the horizons of federated learning, offering a broader range of practical applications. Moreover, our research introduces a method for seamlessly adapting centralized datasets to the federated learning context, bridging the gap between traditional machine learning and federated learning approaches. The outcome of this study is a remarkable advancement in the field of violence detection, with our federated learning model consistently outperforming state-of-the-art models, underscoring the transformative potential of our contributions. This work represents a significant step forward in the application of machine learning techniques to critical societal challenges.
- Overcoming Spatial Constraints in VR: A Survey of Redirected Walking Techniqueson July 1, 2024 at 12:00 am
Abstract As the virtual reality (VR) technology strives to provide immersive and natural user experiences, the challenge of aligning vast virtual environments with limited physical spaces remains significant. This survey comprehensively explores the advancements in redirected walking (RDW) techniques aimed at overcoming spatial constraints in VR. RDW addresses this by subtly manipulating users’ physical movements to allow for seamless navigation within constrained areas. The survey delves into gain perception mechanisms, detailing how slight discrepancies between virtual and real-world movements can be utilized without user awareness, thus extending the effective navigable space. Various RDW control algorithms for gain-based RDW are analyzed, highlighting their implementation and effectiveness in maintaining immersion and minimizing perceptual disturbances. Furthermore, novel methods extending beyond traditional gain-based techniques are discussed, showcasing innovative approaches that further refine VR interactions. The practical implications of RDW in enhancing safety and reducing physical collisions in VR environments are underscored, alongside its potential to improve user experience by aligning virtual exploration more closely with natural human behavior patterns. Through a thorough review of existing literature and recent advancements, this survey provides a systematic understanding for researchers, developers, and industry professionals. It underscores the importance of RDW in the future of VR, emphasizing RDW’s role in making VR more accessible and practical across various applications, from education and training to therapy and entertainment. The paper concludes with a forward-looking perspective on the continued evolution and potential of RDW in revolutionizing virtual reality experiences.
- Face Anti-Spoofing with Unknown Attacks: A Comprehensive Feature Extraction and Representation Perspectiveon July 1, 2024 at 12:00 am
Abstract Face anti-spoofing aims at detecting whether the input is a real photo of a user (living) or a fake (spoofing) image. As new types of attacks keep emerging, the detection of unknown attacks, known as Zero-Shot Face Anti-Spoofing (ZSFA), has become increasingly important in both academia and industry. Existing ZSFA methods mainly focus on extracting discriminative features between spoofing and living faces. However, the nature of the spoofing faces is to trick anti-spoofing systems by mimicking the livings, therefore the deceptive features between the known attacks and the livings, which have been ignored by existing ZSFA methods, are essential to comprehensively represent the livings. Therefore, existing ZSFA models are incapable of learning the complete representations of living faces and thus fall short of effectively detecting newly emerged attacks. To tackle this problem, we propose an innovative method that effectively captures both the deceptive and discriminative features distinguishing between genuine and spoofing faces. Our method consists of two main components: a two-against-all training strategy and a semantic autoencoder. The two-against-all training strategy is employed to separate deceptive and discriminative features. To address the subsequent invalidation issue of categorical functions and the dominance disequilibrium issue among different dimensions of features after importing deceptive features, we introduce a modified semantic autoencoder. This autoencoder is designed to map all extracted features to a semantic space, thereby achieving a balance in the dominance of each feature dimension. We combine our method with the feature extraction model ResNet50, and experimental results show that the trained ResNet50 model simultaneously achieves a feasible detection of unknown attacks and comparably accurate detection of known spoofing. Experimental results confirm the superiority and effectiveness of our proposed method in identifying the living with the interference of both known and unknown spoofing types.
- Learning a Mixture of Conditional Gating Blocks for Visual Question Answeringon July 1, 2024 at 12:00 am
Abstract As a Turing test in multimedia, visual question answering (VQA) aims to answer the textual question with a given image. Recently, the “dynamic” property of neural networks has been explored as one of the most promising ways of improving the adaptability, interpretability, and capacity of the neural network models. Unfortunately, despite the prevalence of dynamic convolutional neural networks, it is relatively less touched and very nontrivial to exploit dynamics in the transformers of the VQA tasks through all the stages in an end-to-end manner. Typically, due to the large computation cost of transformers, researchers are inclined to only apply transformers on the extracted high-level visual features for downstream vision and language tasks. To this end, we introduce a question-guided dynamic layer to the transformer as it can effectively increase the model capacity and require fewer transformer layers for the VQA task. In particular, we name the dynamics in the Transformer as Conditional Multi-Head Self-Attention block (cMHSA). Furthermore, our questionguided cMHSA is compatible with conditional ResNeXt block (cResNeXt). Thus a novel model mixture of conditional gating blocks (McG) is proposed for VQA, which keeps the best of the Transformer, convolutional neural network (CNN), and dynamic networks. The pure conditional gating CNN model and the conditional gating Transformer model can be viewed as special examples of McG. We quantitatively and qualitatively evaluate McG on the CLEVR and VQA-Abstract datasets. Extensive experiments show that McG has achieved the state-of-the-art performance on these benchmark datasets.