No packages match

llmimpute - Missing Data Imputation via Language Models and Statistics

Provides missing data imputation through two complementary engines: a large language model engine that communicates with the 'Anthropic' 'Claude' application programming interface for context-aware semantic imputation, and a fully self-contained offline engine implementing nineteen statistical and machine learning algorithms entirely in base R with no additional package dependencies. Offline methods include mean, median, mode, last observation carried forward, next observation carried backward, hot-deck, predictive mean matching, k-nearest neighbours, ordinary least-squares regression, Lasso with coordinate descent, Ridge with closed-form solution, Bayesian Ridge regression with evidence approximation following MacKay (1992), support vector regression with a radial basis function kernel, classification and regression trees, random forests, gradient boosting, iterative random forest imputation, principal component analysis imputation via iterative singular value decomposition, and nuclear-norm minimisation via singular value thresholding. When no API key is available the package automatically falls back to the offline engine, ensuring full operation in environments without internet access. Every imputed value is accompanied by a confidence score and a plain-language reasoning string, producing reproducible audit trails. The automatic method selector chooses the best algorithm per column based on data type, skewness, missingness rate, and inter-column correlations.

Last updated

2.00 score