Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification

Computing and Communications

Text available via DOI:

https://doi.org/10.1109/TMM.2021.3134839
Final published version

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification. / Li, Ming; Liu, Jun; Zheng, Ce et al.
In: IEEE Transactions on Multimedia, Vol. 25, 31.12.2023, p. 919-929.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Li, M, Liu, J, Zheng, C, Huang, X & Zhang, Z 2023, 'Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification', IEEE Transactions on Multimedia, vol. 25, pp. 919-929. https://doi.org/10.1109/TMM.2021.3134839

APA

Li, M., Liu, J., Zheng, C., Huang, X., & Zhang, Z. (2023). Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification. IEEE Transactions on Multimedia, 25, 919-929. https://doi.org/10.1109/TMM.2021.3134839

Vancouver

Li M, Liu J, Zheng C, Huang X, Zhang Z. Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification. IEEE Transactions on Multimedia. 2023 Dec 31;25:919-929. Epub 2021 Dec 13. doi: 10.1109/TMM.2021.3134839

Author

Li, Ming ; Liu, Jun ; Zheng, Ce et al. / Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification. In: IEEE Transactions on Multimedia. 2023 ; Vol. 25. pp. 919-929.

Bibtex

@article{4663e6cafb414f73a063451833cd7611,

title = "Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification",

abstract = "Image-based vehicle re-identification (ReID) has witnessed much progress in recent years. However, most of existing works struggled to extract robust but discriminative features from a single image to represent one vehicle instance. We argue that images taken from distinct viewpoints, e.g., front and back, have significantly different appearances and patterns for recognition. In order to identify each vehicle, these models have to capture consistent “ID codes” from totally different views, causing learning difficulties. Additionally, we claim that part-level correspondences among views, i.e., various vehicle parts observed from the identical image and the same part visible from different viewpoints, contribute to instance-level feature learning as well. Motivated by these, we propose to extract comprehensive vehicle instance representations from multiple views through modelling part-wise correlations. To this end, we present our efficient transformer-based framework to exploit both inner- and inter-view correlations for vehicle ReID. In specific, we first adopt a convnet encoder to condense a series of patch embeddings from each view. Then our efficient transformer, consisting of a distillation token and a noise token in addition to a regular classification token, is constructed for enforcing these patch embeddings to interact with each other regardless of whether they are taken from identical or different views. We conduct extensive experiments on widely used vehicle ReID benchmarks, and our approach achieves the state-of-the-art performance, showing the effectiveness of our method.",

author = "Ming Li and Jun Liu and Ce Zheng and Xinming Huang and Ziming Zhang",

year = "2023",

month = dec,

day = "31",

doi = "10.1109/TMM.2021.3134839",

language = "English",

volume = "25",

pages = "919--929",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

RIS

TY - JOUR

T1 - Exploiting Multi-View Part-Wise Correlation via an Efficient Transformer for Vehicle Re-Identification

AU - Li, Ming

AU - Liu, Jun

AU - Zheng, Ce

AU - Huang, Xinming

AU - Zhang, Ziming

PY - 2023/12/31

Y1 - 2023/12/31

N2 - Image-based vehicle re-identification (ReID) has witnessed much progress in recent years. However, most of existing works struggled to extract robust but discriminative features from a single image to represent one vehicle instance. We argue that images taken from distinct viewpoints, e.g., front and back, have significantly different appearances and patterns for recognition. In order to identify each vehicle, these models have to capture consistent “ID codes” from totally different views, causing learning difficulties. Additionally, we claim that part-level correspondences among views, i.e., various vehicle parts observed from the identical image and the same part visible from different viewpoints, contribute to instance-level feature learning as well. Motivated by these, we propose to extract comprehensive vehicle instance representations from multiple views through modelling part-wise correlations. To this end, we present our efficient transformer-based framework to exploit both inner- and inter-view correlations for vehicle ReID. In specific, we first adopt a convnet encoder to condense a series of patch embeddings from each view. Then our efficient transformer, consisting of a distillation token and a noise token in addition to a regular classification token, is constructed for enforcing these patch embeddings to interact with each other regardless of whether they are taken from identical or different views. We conduct extensive experiments on widely used vehicle ReID benchmarks, and our approach achieves the state-of-the-art performance, showing the effectiveness of our method.

AB - Image-based vehicle re-identification (ReID) has witnessed much progress in recent years. However, most of existing works struggled to extract robust but discriminative features from a single image to represent one vehicle instance. We argue that images taken from distinct viewpoints, e.g., front and back, have significantly different appearances and patterns for recognition. In order to identify each vehicle, these models have to capture consistent “ID codes” from totally different views, causing learning difficulties. Additionally, we claim that part-level correspondences among views, i.e., various vehicle parts observed from the identical image and the same part visible from different viewpoints, contribute to instance-level feature learning as well. Motivated by these, we propose to extract comprehensive vehicle instance representations from multiple views through modelling part-wise correlations. To this end, we present our efficient transformer-based framework to exploit both inner- and inter-view correlations for vehicle ReID. In specific, we first adopt a convnet encoder to condense a series of patch embeddings from each view. Then our efficient transformer, consisting of a distillation token and a noise token in addition to a regular classification token, is constructed for enforcing these patch embeddings to interact with each other regardless of whether they are taken from identical or different views. We conduct extensive experiments on widely used vehicle ReID benchmarks, and our approach achieves the state-of-the-art performance, showing the effectiveness of our method.

U2 - 10.1109/TMM.2021.3134839

DO - 10.1109/TMM.2021.3134839

M3 - Journal article

VL - 25

SP - 919

EP - 929

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

ER -

Research

Links

Text available via DOI: