Latest Results The latest content available from Springer
- Adaptive multimodal prompt for human-object interaction with local feature enhanced transformeron December 1, 2024 at 12:00 am
Abstract Human-object interaction (HOI) detection is an important computer vision task for recognizing the interaction between humans and surrounding objects in an image or video. The HOI datasets have a serious long-tailed data distribution problem because it is challenging to have a dataset that contains all potential interactions. Many HOI detectors have addressed this issue by utilizing visual-language models. However, due to the calculation mechanism of the Transformer, the visual-language model is not good at extracting the local features of input samples. Therefore, we propose a novel local feature enhanced Transformer to motivate encoders to extract multi-modal features that contain more information. Moreover, it is worth noting that the application of prompt learning in HOI detection is still in preliminary stages. Consequently, we propose a multi-modal adaptive prompt module, which uses an adaptive learning strategy to facilitate the interaction of language and visual prompts. In the HICO-DET and SWIG-HOI datasets, the proposed model achieves full interaction with 24.21% mAP and 14.29% mAP, respectively. Our code is available at https://github.com/small-code-cat/AMP-HOI.
- StreamTrack: real-time meta-detector for streaming perception in full-speed domain driving scenarioson December 1, 2024 at 12:00 am
Abstract Streaming perception is a crucial task in the field of autonomous driving, which aims to eliminate the inconsistency between the perception results and the real environment due to the delay. In high-speed driving scenarios, the inconsistency becomes larger. Previous research has ignored the study of streaming perception in high-speed driving scenarios and the robustness of the model to object’s speed. To fill this gap, we first define the full-speed domain streaming perception problem and construct a real-time meta-detector, StreamTrack. Second, to perform motion trend extraction, Swift Multi-Cost Tracker (SMCT) is proposed for fast and accurate data association. Meanwhile, the Direct-Decoupled Prediction Head (DDPH) is introduced for predicting future locations. Furthermore, we introduce the Uniform Motion Prior Loss (UMPL), which ensures stable learning of the model for rapidly moving objects. Compared with the strong baseline, our model improves the SAsAP (Speed-Adaptive steaming Average Precision) by 15.46 %. Extensive experiments show that our approach achieves state-of-the-art performance in the full-speed domain streaming perception task.
- Temporal graphs anomaly emergence detection: benchmarking for social media interactionson December 1, 2024 at 12:00 am
Abstract Temporal graphs have become an essential tool for analyzing complex dynamic systems with multiple agents. Detecting anomalies in temporal graphs is crucial for various applications, including identifying emerging trends, monitoring network security, understanding social dynamics, tracking disease outbreaks, and understanding financial dynamics. In this paper, we present a comprehensive benchmarking study that compares 12 data-driven methods for anomaly detection in temporal graphs. We conduct experiments on two temporal graphs extracted from Twitter and Facebook, aiming to identify anomalies in group interactions. Surprisingly, our study reveals an unclear pattern regarding the best method for such tasks, highlighting the complexity and challenges involved in anomaly emergence detection in large and dynamic systems. The results underscore the need for further research and innovative approaches to effectively detect emerging anomalies in dynamic systems represented as temporal graphs.
- Solving time-delay issues in reinforcement learning via transformerson December 1, 2024 at 12:00 am
Abstract The presence of observation and action delays in remote control scenarios significantly challenges the decision-making of agents that depend on immediate interactions, particularly within traditional deep reinforcement learning (DRL) algorithms. Existing approaches attempt to tackle this problem through various strategies, such as predicting delayed states, transforming delayed Markov Decision Processes (MDPs) into delay-free equivalents. However, both model-free and model-based methods require extensive online data, making them time-consuming and resource-intensive. To effectively handle time-delay challenges and develop a competent and robust RL algorithm, the Augmented Decision Transformer (ADT) is proposed as the first offline RL algorithm designed to enable agents to manage diverse tasks with various constant delays. It transforms a deterministic delayed MDP (DDMDP) into a standard MDP by simulating trajectories in delayed environments using offline dataset from undelayed environments. The Decision Transformer, an autoregressive model, is then employed to train a decision model based on expected rewards, past state sequences and past action sequences. Extensive experiments conducted on MuJoCo and Adroit tasks validate the robustness and efficiency of the ADT, with its average performance across all tasks being 56% better than the worst-performing comparative algorithms. The results demonstrate that the ADT can outperform state-of-the-art RL counterparts, achieving superior performance across various tasks with different delay conditions.
- A novel approach for predicting the spread of APT malware in the networkon December 1, 2024 at 12:00 am
Abstract Advanced Persistent Threat (APT) attack is one of the most dangerous cyber-attack techniques nowadays. Therefore, the issue of detecting and predicting the spread of APT malware in the network is a very urgent issue to help the process of preventing this attack effectively. In this paper, we propose a new approach that is capable of predicting the spread of APT malware in the network based on the APT's own behaviors. Accordingly, to predict the spread of APT malicious code in the system, we propose to use a combination of two single Susceptible‐Infected‐Recovered (SIR) models. Specifically, the first SIR model was built to predict the spread of APT malicious code to devices and computers within the organization. These devices and computers are often used by APT malicious code as a basis to escalate privileges to devices or computers containing important and sensitive information of the organization. The second SIR model has the function of predicting the spread of APT malware to a group of computers containing sensitive information or potentially causing high risks to the organization. The two SIR models will provide information about infections between computer groups in the system to help accurately predict the spread of APT malware in the system. The proposal to combine two SIR models in the article is a new proposal based on the behavior of APT malware in practice. By combining two SIR models, the proposal in this article has opened up a new approach for a number of problems predicting the spread in the internet such as malicious code in wireless sensor networks or malicious information on the social network.