Home > Research > Publications & Outputs > In All Likelihoods
View graph of relations

In All Likelihoods: Robust Selection of Pseudo-Labeled Data

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

In All Likelihoods: Robust Selection of Pseudo-Labeled Data. / Rodemann, Julian; Jansen, Christoph; Schollmeyer, Georg et al.
Proceedings of the Thirteenth International Symposium on Imprecise Probabilities: Theories and Applications (ISIPTA '23). PMLR, 2023. p. 412-425 (PMLR; Vol. 215).

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Rodemann, J, Jansen, C, Schollmeyer, G & Augustin, T 2023, In All Likelihoods: Robust Selection of Pseudo-Labeled Data. in Proceedings of the Thirteenth International Symposium on Imprecise Probabilities: Theories and Applications (ISIPTA '23). PMLR, vol. 215, PMLR, pp. 412-425. <https://proceedings.mlr.press/v215/rodemann23a.html>

APA

Rodemann, J., Jansen, C., Schollmeyer, G., & Augustin, T. (2023). In All Likelihoods: Robust Selection of Pseudo-Labeled Data. In Proceedings of the Thirteenth International Symposium on Imprecise Probabilities: Theories and Applications (ISIPTA '23) (pp. 412-425). (PMLR; Vol. 215). PMLR. https://proceedings.mlr.press/v215/rodemann23a.html

Vancouver

Rodemann J, Jansen C, Schollmeyer G, Augustin T. In All Likelihoods: Robust Selection of Pseudo-Labeled Data. In Proceedings of the Thirteenth International Symposium on Imprecise Probabilities: Theories and Applications (ISIPTA '23). PMLR. 2023. p. 412-425. (PMLR).

Author

Rodemann, Julian ; Jansen, Christoph ; Schollmeyer, Georg et al. / In All Likelihoods : Robust Selection of Pseudo-Labeled Data. Proceedings of the Thirteenth International Symposium on Imprecise Probabilities: Theories and Applications (ISIPTA '23). PMLR, 2023. pp. 412-425 (PMLR).

Bibtex

@inproceedings{9823a0558c6049ae91e9dc1daa210624,
title = "In All Likelihoods: Robust Selection of Pseudo-Labeled Data",
abstract = "Self-training is a simple yet effective method within semi-supervised learning. Self-training{\textquoteright}s rationale is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we render PLS more robust towards the involved modeling assumptions. To this end, we treat PLS as a decision problem, which allows us to introduce a generalized utility function. The idea is to select pseudo-labeled data that maximize a multi-objective utility function. We demonstrate that the latter can be constructed to account for different sources of uncertainty and explore three examples: model selection, accumulation of errors and covariate shift. In the absence of second-order information on such uncertainties, we furthermore consider the generic approach of the generalized Bayesian α-cut updating rule for credal sets. We spotlight the application of three of our robust extensions on both simulated and three real-world data sets. In a benchmarking study, we compare these extensions to traditional PLS methods. Results suggest that robustness with regard to model choice can lead to substantial accuracy gains.",
author = "Julian Rodemann and Christoph Jansen and Georg Schollmeyer and Thomas Augustin",
year = "2023",
month = jul,
day = "14",
language = "English",
series = "PMLR",
publisher = "PMLR",
pages = "412--425",
booktitle = "Proceedings of the Thirteenth International Symposium on Imprecise Probabilities: Theories and Applications (ISIPTA '23)",

}

RIS

TY - GEN

T1 - In All Likelihoods

T2 - Robust Selection of Pseudo-Labeled Data

AU - Rodemann, Julian

AU - Jansen, Christoph

AU - Schollmeyer, Georg

AU - Augustin, Thomas

PY - 2023/7/14

Y1 - 2023/7/14

N2 - Self-training is a simple yet effective method within semi-supervised learning. Self-training’s rationale is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we render PLS more robust towards the involved modeling assumptions. To this end, we treat PLS as a decision problem, which allows us to introduce a generalized utility function. The idea is to select pseudo-labeled data that maximize a multi-objective utility function. We demonstrate that the latter can be constructed to account for different sources of uncertainty and explore three examples: model selection, accumulation of errors and covariate shift. In the absence of second-order information on such uncertainties, we furthermore consider the generic approach of the generalized Bayesian α-cut updating rule for credal sets. We spotlight the application of three of our robust extensions on both simulated and three real-world data sets. In a benchmarking study, we compare these extensions to traditional PLS methods. Results suggest that robustness with regard to model choice can lead to substantial accuracy gains.

AB - Self-training is a simple yet effective method within semi-supervised learning. Self-training’s rationale is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we render PLS more robust towards the involved modeling assumptions. To this end, we treat PLS as a decision problem, which allows us to introduce a generalized utility function. The idea is to select pseudo-labeled data that maximize a multi-objective utility function. We demonstrate that the latter can be constructed to account for different sources of uncertainty and explore three examples: model selection, accumulation of errors and covariate shift. In the absence of second-order information on such uncertainties, we furthermore consider the generic approach of the generalized Bayesian α-cut updating rule for credal sets. We spotlight the application of three of our robust extensions on both simulated and three real-world data sets. In a benchmarking study, we compare these extensions to traditional PLS methods. Results suggest that robustness with regard to model choice can lead to substantial accuracy gains.

M3 - Conference contribution/Paper

T3 - PMLR

SP - 412

EP - 425

BT - Proceedings of the Thirteenth International Symposium on Imprecise Probabilities: Theories and Applications (ISIPTA '23)

PB - PMLR

ER -