Home > Research > Publications & Outputs > SemTrack
View graph of relations

SemTrack: Large-Scale Dataset for Semantic Tracking in the Wild

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
  • Pengfei Wang
  • Xiaofei Hui
  • Jing Wu
  • Zile Yang
  • Kian Eng Ong
  • Xinge Zhao
  • Beijia Lu
  • Dezhao Huang
  • Evan Ling
  • Weiling Chen
  • Keng Teck Ma
  • Minhoe Hur
  • Jun Liu
Close
Publication date7/12/2024
Host publicationComputer Vision -- ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXIV
EditorsAleš Leonardis, Elissa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
Place of PublicationCham
PublisherSpringer
Pages486-504
Number of pages19
ISBN (print)9783031726910
<mark>Original language</mark>English

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume15082
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

Knowing merely where the target is located is not sufficient for many real-life scenarios. In contrast, capturing rich details about the tracked target via its semantic trajectory, i.e. who/what this target is interacting with and when, where, and how they are interacting over time, is especially crucial and beneficial for various applications (e.g., customer analytics, public safety). We term such tracking as Semantic Tracking and define it as tracking the target based on the user’s input and then, most importantly, capturing the semantic trajectory of this target. Acquiring such information can have significant impacts on sales, public safety, etc. However, currently, there is no dataset for such comprehensive tracking of the target. To address this gap, we create SemTrack, a large and comprehensive dataset containing annotations of the target’s semantic trajectory. The dataset contains 6.7 million frames from 6961 videos, covering a wide range of 52 different interaction classes with 115 different object classes spanning 10 different supercategories in 12 types of different scenes, including both indoor and outdoor environments. We also propose SemTracker, a simple and effective method, and incorporate a meta-learning approach to better handle the challenges of this task. Our dataset and code can be found at https://sutdcv.github.io/SemTrack.