Home > Research > Publications & Outputs > OHD

Links

Text available via DOI:

View graph of relations

OHD: An Online Category-Aware Framework for Learning with Noisy Labels under Long-Tailed Distribution

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published
  • Qihao Zhao
  • Fan Zhang
  • Wei Hu
  • Songhe Feng
  • Jun Liu
Close
<mark>Journal publication date</mark>1/05/2024
<mark>Journal</mark>IEEE Transactions on Circuits and Systems for Video Technology
Issue number5
Volume34
Number of pages13
Pages (from-to)3806-3818
Publication StatusPublished
Early online date4/10/23
<mark>Original language</mark>English

Abstract

Recently, many effective methods have emerged to address the robustness problem of Deep Neural Networks (DNNs) trained with noisy labels. However, existing work on learning with noisy labels (LNL) mainly focuses on balanced datasets, while real-world scenarios usually also exhibit a long-tailed distribution (LTD). In this paper, we propose an online category-aware approach to mitigate the impact of noisy labels and LTD on the robustness of DNNs. First, the category frequency of clean samples used to rebalance the feature space cannot be obtained directly in the presence of noisy samples. We design a novel category-aware Online Joint Distribution to dynamically estimate the category frequency of clean samples. Second, previous LNL methods were category-agnostic. These methods would easily be confused with noisy samples and tail categories' samples under LTD. Based on this observation, we propose a Harmonizing Factor strategy to exploit more information from the category-aware online joint distribution. This strategy provides more accurate estimates of clean samples between noisy samples and samples with tail categories. Finally, we propose Dynamic Cost-sensitive Learning, which utilizes the loss and category frequency of the estimated clean samples to address both LNL and LTD. Compared to extensive state-of-the-art methods, our strategy consistently improves the generalization performance of DNNs on several synthetic datasets and two real-world datasets.