Final published version, 4.83 MB, PDF document
Research output: Thesis › Doctoral Thesis
Research output: Thesis › Doctoral Thesis
}
TY - BOOK
T1 - Managing uncertainty in machine learning techniques
T2 - An investigation of adaptive sampling strategies through land cover mappings
AU - Phillipson, Jordan
PY - 2024
Y1 - 2024
N2 - In recent decades, the use of machine learning techniques in classification problems has become increasingly popular across a wide variety of domains. For users to have trust in such classifiers though, one must be able to reliably quantify uncertainty. A common way of quantifying uncertainty in classifiers is through reference sampling where a smaller set of ground-truths is sampled and compared to their predicted counterparts to make inferences about the precision and accuracy of classifiers using statistical methods.However, classification via machine learning can bring some additional challenges to uncertainty quantification, as machine learning techniques are often (i) trained using data that has not been sampled with formal statistical inference in mind; (ii) are often black-box when compared to traditional modelling.These issues are further compounded when sampling reference data under conditions suitable for uncertainty quantification is expensive. Here, users are often forced to make a compromise between the degree of uncertainty and the costs of reference sampling, even when the original classifier built using machine learning may be performing well. In short, when it comes to quantifying and reducing uncertainty, it is not just about how well the classifier performs. One must also be able to collect enough data sampled under the right conditions.This thesis explores how users may better manage the cost-benefit trade-offs ofreference sampling when quantifying and reducing uncertainty in machine learning classifiers. Specifically, this thesis investigates how a framework for adaptively sampling reference data can be used to better manage uncertainty using two land cover mapping case studies to evaluate the proposed framework. With these case studies, the following problems are considered: (i) quantifying uncertainty in area estimation and mappings; (ii) proposing efficient sample designs under uncertainty; (iii) proposing sample designs when the cost of reference sampling varies across a mapped region.
AB - In recent decades, the use of machine learning techniques in classification problems has become increasingly popular across a wide variety of domains. For users to have trust in such classifiers though, one must be able to reliably quantify uncertainty. A common way of quantifying uncertainty in classifiers is through reference sampling where a smaller set of ground-truths is sampled and compared to their predicted counterparts to make inferences about the precision and accuracy of classifiers using statistical methods.However, classification via machine learning can bring some additional challenges to uncertainty quantification, as machine learning techniques are often (i) trained using data that has not been sampled with formal statistical inference in mind; (ii) are often black-box when compared to traditional modelling.These issues are further compounded when sampling reference data under conditions suitable for uncertainty quantification is expensive. Here, users are often forced to make a compromise between the degree of uncertainty and the costs of reference sampling, even when the original classifier built using machine learning may be performing well. In short, when it comes to quantifying and reducing uncertainty, it is not just about how well the classifier performs. One must also be able to collect enough data sampled under the right conditions.This thesis explores how users may better manage the cost-benefit trade-offs ofreference sampling when quantifying and reducing uncertainty in machine learning classifiers. Specifically, this thesis investigates how a framework for adaptively sampling reference data can be used to better manage uncertainty using two land cover mapping case studies to evaluate the proposed framework. With these case studies, the following problems are considered: (i) quantifying uncertainty in area estimation and mappings; (ii) proposing efficient sample designs under uncertainty; (iii) proposing sample designs when the cost of reference sampling varies across a mapped region.
U2 - 10.17635/lancaster/thesis/2372
DO - 10.17635/lancaster/thesis/2372
M3 - Doctoral Thesis
PB - Lancaster University
ER -