Abstract:
The limited resolution of visible-light sensors under weak illumination conditions and the insufficient representational capacity of a single modality lead to low vehicle re-identification accuracy. To address this problem, a vehicle re-identification method based on dynamic feature interaction and adaptive multi-modal fusion is proposed. In terms of network architecture, the SimAM module is embedded into the convolutional layers of the YOLOv9 backbone network without introducing additional parameters, enabling the modeling of spatial and channel relationships within features and extracting initial representations from visible, near-infrared, and far-infrared modalities. A multi-modal feature interaction module is then constructed to perform refined feature extraction and cross-modal information exchange, thereby obtaining enhanced features for all three modalities. Furthermore, a multi-modal adaptive feature fusion network is designed, in which the weighting coefficients for each modality are adaptively generated based on global vectors and mask vectors, achieving effective feature fusion. To handle large intra-class variance, small inter-class differences, and significant appearance variations of the same vehicle across different scenarios, ajoint loss function combining cross-entropy loss, contrastive loss, and center loss is introduced. The proposed method is trained and validated on the publicly available datasets RGBN300 and RGBNT100. The results show that compared with existing methods, the mean average precision (mAP) and the recognition accuracy of Rank-1, Rank-5, and Rank-10 are improved to varying degrees. Among them, mAP is improved by 20.6%, 29.0%, 5.0%, and 3.5% on the RGBN300 dataset, and 22.5%, 12.0%, 3.7%, and 3.0% on the RGBNT100 dataset. Rank-1, Rank-5, and Rank-10 of the RGBNT100 dataset achieves 95.1%, 96.7%, and 96.9%. The experimental results show that feature interaction and adaptive multi-modal fusion lead to more discriminative features and excellent vehicle re-identification performance.