Final published version, 4.76 MB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Research output: Thesis › Doctoral Thesis
Research output: Thesis › Doctoral Thesis
}
TY - BOOK
T1 - Methods for missing time-series data and large spatial data
AU - Duncan, Rachael
PY - 2024
Y1 - 2024
N2 - Performing accurate statistical inference requires high-quality datasets. However, real-world datasets often contain missing variables of varying degrees both spatially and temporally. Alternatively, modelled datasets can provide a complete dataset, but these are often biased. This thesis derives a simplified approach to the skew Kalman filter that tackles the computational issues present in the existing skew Kalman filter by using a secondary dataset to estimate the skewness parameter. In application, this thesis implements the skew Kalman filter using surface-level ozone to bias-correct the modelled ozone data and use the bias-corrected data to infill missing data in the observed dataset. Further, this thesis explores working with large spatial datasets. When carrying out spatial inference, using all the possible data available allows for more accurate inference. However, spatial models such as Gaussian processes scale cubically with the number of data points and thus quickly become computationally infeasible for moderate to large datasets. Divide and-conquer methods allow data to be split into subsets and inference is carried out on each subset before recombining. While well documented in the independent setting, these methods are less popular in the spatial setting. This thesis evaluates the performance of divide-and-conquer methods in the spatial setting to achieve approximate results compared to carrying out inference on the full dataset. Finally, this is demonstrated using USA temperature data.
AB - Performing accurate statistical inference requires high-quality datasets. However, real-world datasets often contain missing variables of varying degrees both spatially and temporally. Alternatively, modelled datasets can provide a complete dataset, but these are often biased. This thesis derives a simplified approach to the skew Kalman filter that tackles the computational issues present in the existing skew Kalman filter by using a secondary dataset to estimate the skewness parameter. In application, this thesis implements the skew Kalman filter using surface-level ozone to bias-correct the modelled ozone data and use the bias-corrected data to infill missing data in the observed dataset. Further, this thesis explores working with large spatial datasets. When carrying out spatial inference, using all the possible data available allows for more accurate inference. However, spatial models such as Gaussian processes scale cubically with the number of data points and thus quickly become computationally infeasible for moderate to large datasets. Divide and-conquer methods allow data to be split into subsets and inference is carried out on each subset before recombining. While well documented in the independent setting, these methods are less popular in the spatial setting. This thesis evaluates the performance of divide-and-conquer methods in the spatial setting to achieve approximate results compared to carrying out inference on the full dataset. Finally, this is demonstrated using USA temperature data.
U2 - 10.17635/lancaster/thesis/2258
DO - 10.17635/lancaster/thesis/2258
M3 - Doctoral Thesis
PB - Lancaster University
ER -