Projects
As part of the MEng (Structured) programme with focus on Data Science our students are required to complete a final 60 credit data science research project where they are required to apply and consolidate the data science knowledge gained throughout the programme. For this purpose, students will solve a real-world data science project, providing solutions for each step of the data science project life cycle and document it in a research assignment.
For these projects, we collaborate with industry and academic partners who are willing to propose a topic, to provide the necessary data (if not publicly available) as well as to act as domain mentors. The data set needs to be complete.
If you are interested in partnering with us for such a project, please contact [email protected] for further information about a short project proposal and deadlines.
Project proposals reviewed by end of term 3 of a given year will be assigned to students for the following year.
Below, find a list of completed research assignments. The assignments are grouped under the year of graduation.
March 2025 Graduation
â–¶ Set-based Particle Swarm Optimization for Training Support Vector Machines
This research explores the application of set-based particle swarm optimization (SBPSO) to the training of support vector machines (SVMs), addressing challenges in hyperparameter tuning, noisy datasets, and computational efficiency. SVMs, celebrated for their classification precision, often face limitations due to their sensitivity to parameter selection and difficulties in handling high-dimensional or noisy data. SBPSO, an extension of traditional particle swarm optimization (PSO), is tailored for discrete optimization problems, making it a promising approach for optimizing SVM performance.
The study investigates two approaches: standard SBPSO-SVM training and SBPSO-SVM training with Tomek links preprocessing, which enhances data quality by reducing noise and refining decision boundaries. Experiments conducted on five benchmark datasets reveal that both methods significantly reduce the number of support vectle maiors whintaining competitive accuracy and F1 scores. However, training times were substantially longer than those of standard SVMs, highlighting a need for further optimization.
To address these challenges, dynamic control of SBPSO parameters was introduced, alongside advanced preprocessing techniques such as principal component analysis (PCA) with Gaussian mixture model (GMM) noise filtering and Wilson editing. While these enhancements improve training efficiency and performance for complex and noisy datasets, the algorithm still struggles to scale effectively to very noisy, large, and highly complex datasets.
â–¶ Property tax validation using automatic building footprint extraction from aerial images
Property valuation is essential to determine the rates and taxes needed for municipal services. Property valuation depends on the number and size of buildings on a property. A tedious manual process is used to create the outlines of buildings and calculate the building area. Therefore, this project aims to develop a process to generate building outlines from unmanned aerial vehicle raster images with as little human intervention as possible. The solutions developed uses semantic pixel classiacation to detect buildings and and the building outlines. The outlines can then be used to validate property valuations.
To perform semantic pixel classication, an U-Net architecture was selected. Various experiments were conducted to and the optimal U-Net architecture. The output of the semantic pixel classication was used along with a contour extraction method to extract the building’s outline. Similarly, experiments were conducted to select the optimal contour extraction method. The U-Net model and contour extraction method are combined to create a process capable of extracting building outlines from raster images.
Experiments were performed using a human-in-the-loop approach, a variant of active learning. The training results show accuracy, recall, precision, and intersect over union above 90%. Even though the training showed excellent training and validation metrics for the experiments, the project shows how critical the training data is to predicting test data and determining the quality of image segmentation and building outline extraction. Finally, producing vector data that accurately represents 80 to 90% of buildings with an area error of less than one square meter.
â–¶ Credit Scoring and Risk Assessment Using Machine Learning and Overdraft History
Credit scoring, by definition, is a quantitative methodology and evaluation method by which lenders assess whether a borrower (either an individual or a business) is able to repay a debt if credit is granted. A credit score is typically generated at the end of the credit scoring process, and it is a fundamental element that influences an individual’s access to credit. It acts as a gateway to financial resources such as loans, credit cards, and others, highlighting the importance of fairness, non-discrimination, and ethical practices to ensure equitable access to credit, free from prejudice and bias.
Credit history is typically the key factor in traditional scoring methods, including the FICO score, logit models, and expert judgment-based models, among others. As a result, individuals who have never borrowed may be overlooked or subjected to high-interest rates. To address these limitations, this study leverages overdraft information to develop a dynamic, inclusive, and effective credit scoring framework. This framework integrates both traditional credit history data and overdraft data, which is often underutilized but can potentially serve as an indicator of good versus bad borrowers.
Through a series of hyperparameter tuning across the algorithms, the results of this study suggest that Naive Bayes is particularly effective when both credit history and overdraft data are available, as it demonstrated minimal misclassifications and robustness in classifying customers correctly. The algorithm performed best on the three tested datasets, achieving accuracy rates of 99.01% for the credit history dataset, 99.5% for the hybrid dataset, and 100% for the overdraft dataset.
â–¶ Machine Unlearning of Convolutional Neural Networks to Address the Right to Be Forgotten
This research assignment examines whether personally identifiable information can be removed from a convolutional neural network using a machine unlearning algorithm and verified as removed to ensure compliance with the right to be forgotten as outlined in the General Data Protection Regulation. Machine unlearning examines whether data removal can be achieved while preserving machine learning model performance without fully retraining a machine learning model.
Machine unlearning demonstrated effectiveness in removing specific data from the convolutional neural network, as measured by a membership inference attack. The machine unlearning algorithm, which utilises Kullback-Lieber divergence and weight regularisation, enabled the removal of data for a single individual as well as for a forget set composed of a sampled group of individuals without requiring full retraining.
â–¶ Towards an automated medical image classification pipeline
Radiological departments have high demands for efficiency and diagnostic quality, and interpretation of radiographs are highly variable between radio- graphers. The process followed in a radiological department to support patients with health services can be made more efficient. Parts of the process, such as retrieving and processing data can be automated with artificial intelligence to expedite the process and increase the quality of services offered.
This research assignment covers the creation of a medical imaging dataset that is sourced from open source data sets. A variety of transfer learning models such as residual neural networks, dense neural networks, and efficient neural networks are evaluated on this data set. The best performing models of both components are combined in a transfer learning classification pipeline. The transfer learning pipeline produced a predictive accuracy of 96.3034% on testing data.
â–¶ Evolving Oblique Decision Trees
This study investigates the induction of classification oblique decision trees using genetic programming, with constraints imposed on the genetic operators and the fitness function. Additionally, the study examines the effect of introducing pre-defined genetic programs in the initial population of the evolutionary process on the performance of the genetic programs in solving classification tasks.
The results demonstrate that using genetic programming with applied constraints for classification purposes is feasible and results in decision trees that perform exceptionally well compared to standard axis-aligned and oblique decision trees, albeit at the cost of increased computational resources.
â–¶ Horticulture Supplier Delivery Forecast
Supermarket retailers rely on suppliers to meet customer demands, but suppliers often face disruptions that prevent them from delivering the agreed quantities. This is true in the horticultural sector, where weather and logistical challenges affect the delivery reliability. Accurate forecasting of horticultural supplier deliveries is critical for supermarket retailers, as fresh fruit is a key source of revenue.
The research assignment found that majority of the models outperformed the baseline with random forest and GRU models performing the best based on standard evaluation metrics. The baseline model achieved a mean absolute error (MAE) of 30.35, while the random forest model reduced the MAE to 0.47, demonstrating a significant improvement in forecasting accuracy.
â–¶ Behavioural Scorecards Development and Machine Learning
This study compares traditional behavioural scorecards based on logistic regression (LR) with machine learning (ML) for credit risk assessment. The study aims to improve predictive performance while maintaining model interpretability to comply with Basel regulatory standards. To achieve this, the study introduces the Bayesian Weight of Evidence Optimizer (BWOpt) for binning optimization in LR models.
Results show that traditional scorecards outperform ML models, particularly with oversampled data. BWOpt-enhanced LR models outperform both ML methods, highlighting the value of feature engineering. Given their balance of interpretability and predictive power, traditional scorecards remain well-suited for regulated environments.
â–¶ Evaluating Heterogeneous Graph Embeddings for Product Substitute Identification with LLM-Generated Attributes
In the context of the food retail sector, the identification of product substitutes is crucial for several reasons, including the determination of the assortment of store products, the design of marketing campaigns, and the avoidance of potential cannibalisation. This study seeks to investigate product substitutes through the process of product clustering.
The findings indicate that the use of product attributes does not constitute the most effective and scalable approach to achieve substitute product categorisation. This limitation arises from the inherent sensitivity of heterogeneous graphs to both configuration settings and input data, which may require tailored and context-specific model calibrations.
â–¶ Deriving an Agricultural Soil Quality Index from Soil Microbiome using Autoencoders
Soil quality plays a pivotal role in sustaining ecosystems, influencing climate change and supporting agricultural productivity. Degradation of soil can severely threaten food security and exacerbate global warming. This study proposes the use of autoencoders to develop a soil quality index derived from soil microbiome data.
The soil quality index showed a strong correlation with the Chao1 diversity index and moderate correlations with the Shannon and Simpson diversity indices. The soil quality index derived using a sparse autoencoder was particularly favored due to its simplicity, as it reduces to a sigmoid function during inference, enhancing explainability and interpretability.
â–¶ Incremental Feature Learning: A Constructive Approach to Training Neural Networks with Dynamic Particle Swarm Optimisation
Incremental feature learning (IFL) is a supervised machine learning (ML) paradigm for feedforward neural networks (NNs), where the input layer of the NN is incrementally constructed over time. The benefits of such a paradigm are twofold; the first is the ability afforded to a NN to dynamically incorporate new features without the need for retraining; the second is a reduction in overfitting behaviour.
The results show that IFL effectively enables NNs to dynamically incorporate new features as they become available over time, and that IFL provides desirable performance in terms of overfitting behaviour and can be used as a regularisation technique.
â–¶ Grading Infrastructure Conditions through Machine Learning using Infrastructure Report Cards and Media Reports
Public infrastructure is of critical importance to advance job creation and economic growth, yet there is a lack of information regarding infrastructure conditions. This research assignment develops a machine learning model that automatically rates infrastructure conditions from online news articles as an alternative data source.
The findings suggest that the models performed well in the in-domain learning and is able to label news articles, but struggled with domain adaption in cross-domain learning due to misalignment between the features and labels of the two datasets.
â–¶ Automated Road Detection and Classification for Urban and Rural Areas Using Aerial imagery
This research assignment presents an automated approach to digitise roads from aerial imagery using deep learning techniques, focusing on distinguishing between paved and gravel roads. The solution utilises a DeepLab model based on the EfficientNetV2M architecture to identify and extract roads and perform condition assessment.
The developed pipeline incorporates parallel processing and optimised contour detection algorithms to efficiently handle large datasets. This automated approach significantly reduces the manual effort required for road digitisation, offering a scalable solution for updating digital maps.
â–¶ Utilizing unsupervised machine learning to identify patterns and anomalies in the JSE Top 40 equities
This study investigates using unsupervised machine learning to uncover hidden relationships and anomalies among the Johannesburg Stock Exchange (JSE) Top 40 equities. By transforming raw time-series data into informative metrics, the research aims to uncover patterns that can be used to inform investment management strategies.
The results indicate that t-SNE combined with hierarchical clustering produced the most well organised clusters. The analysis uncovered both expected sector groupings and notable anomalies, such as companies clustering outside their designated sectors due to similar financial characteristics.
â–¶ Advancing Distal Radius Fracture Classification Using Metric Learning: A Triplet Neural Network Approach
Recent advancements in computer vision and deep learning have enhanced distal radius fracture analysis. This research investigates the application of metric learning architectures, particularly triplet neural networks, for the classification of distal radius fractures according to the AO/OTA fracture classification system.
The research followed the CRISP-DM process and utilized the GRAZPEDWRI-DX dataset as the source domain for transfer learning. The study aims to address challenges associated with data scarcity and model generalisation while improving automated fracture detection and classification accuracy.
â–¶ Carbon Dioxide level prediction using model confidence and signal resolution
Indoor air quality (IAQ) is considered to have major health and wellness implications, with indoor air pollution (IAP) estimated to have tenfold the negative impact on people relative to outdoor pollution. This study focuses on the prediction of indoor CO2 concentrations using machine learning algorithms to develop robust monitoring and control systems.
The research addresses the impact of noise in data on prediction performance. It explores a method implementing various wavelets at varying decomposition levels, training an ensemble of LSTMs, and subsequently selecting the most confident models for prediction. For comparative analysis, a predictive model based on fixed signal resolution was also developed.
The results showed that the fixed resolution model demonstrated superior performance, reporting an R2 of 0.99. In contrast, the dynamic signal resolution model resulted in limited prediction capability in areas of high CO2 concentrations, confirming potential risks of information loss possible with signal filtering.
â–¶ Data-Driven Predictive Maintenance for Enhanced Reliability of Continuous Miners in Underground Coal Mining
This research assignment explores the application of data science in analysing electrical data from continuous miners to identify anomalies and alert maintenance personnel of potential failures before they occur. By employing both conventional machine learning and deep learning techniques, the aim is to determine the most effective approach for predictive maintenance in the coal mining sector.
The study represents a pioneering effort in South Africa, focusing on the application of Markov chains for anomaly detection. By leveraging the Markov property and integrating it with the Mahalanobis distance, the research developed a robust framework that enhances anomaly identification. This approach bridges traditional clustering techniques with advanced statistical methods to offer valuable insights for industrial maintenance.
â–¶ Advancing the Argument for Shallow Models: A Comparative Analysis against Deep Learning Approaches
The rapid adoption of deep learning has led to an overreliance on complex architectures, often at the expense of simpler models. This research evaluates the necessity of deep learning models by conducting a comparative analysis against shallow models to determine when simpler models are preferable for resource efficiency and interpretability.
The findings reveal that shallow models, when properly optimised, can achieve performance levels comparable to those of deep learning models in several contexts. These models offer benefits in terms of lower computational demands and greater interpretability, challenging the prevailing trend of defaulting to deep learning solutions and advocating for a more thoughtful selection of models based on specific application needs.
â–¶ Data Science Approaches for Addressing Missing Values in the Transcriptome of Plasmodium falciparum
Accurate imputation of missing values in transcriptomic data is imperative for the analysis of 'omics datasets and the discovery of novel antimalarial drugs. This research investigates various missing value imputation techniques, including single imputation, multiple imputation, machine learning, and deep learning approaches.
The Self-Organising Map (SOM) was selected as the optimal imputation method. The results consistently yielded error values lower than the standard deviation of the data, falling within an acceptable range given the natural variability of gene expression data. Subsequent clustering performed on the imputed data confirmed that the SOM imputation adequately preserves the biological structure of the data.
â–¶ An Automated Computer Vision System to Measure Excavator Productivity
This study develops an excavator productivity model using computer vision techniques to measure and optimise construction operations. Object detection algorithms, including YOLO and Faster R-CNN, were explored for tracking excavator movements. Results indicated that YOLO offered superior generalisation and accuracy for tracking on-site.
A two-phase activity recognition model was developed: first, a VGG16 feature extractor combined with an LSTM model to detect movement, and second, an advanced model to classify specific tasks like soil pick-up and hauling. Despite challenges like lighting variations, the model achieved between 80% to 100% accuracy depending on the environment, demonstrating significant potential for real-time performance tracking.
December 2024 Graduation
â–¶ Active Learning in Bagging Ensembles
This study investigates the integration of dynamic pattern selection (DPS) and ensemble learning (EL) to enhance the performance of feed-forward neural networks. DPS is an active learning technique that incrementally adds high-error patterns to training data to reduce computational expense, while Bagging combines multiple models to improve generalization.
Experiments on classification and regression problems showed that DPS achieved similar performance to standard backpropagation with lower costs. While the combined "EL AL NN" approach matched EL's generalization in most cases, some instances of overfitting were noted in specific classification tasks, suggesting that while the hybrid model is efficient, it requires careful tuning for certain data structures.
â–¶ Adaptive Machine Learning for the Optimization of a Water Treatment Clarification System
This research developed an intelligent system to optimize a water clarification process used for treating organically rich wastewater. Previously reliant on manual interaction, the system was transformed into a prescriptive feedback loop that adjusts coagulant and flocculant dosages on a continuous basis.
A Random Forest model proved optimal, achieving a testing RMSE of 0.0761. Live testing verified significant improvements, with a 49.1% increase in overflow quality and a 28.6% reduction in coagulant dosage. The study also successfully implemented online retraining and exploration routines to ensure the model adapts to changing feed water conditions in real-time.
â–¶ Automated Screening of Chronic Sinusitis from Voice Recordings using Machine Learning
Chronic sinusitis is traditionally screened via invasive endoscopies or expensive imaging. This research proposes a non-invasive alternative: using machine learning to distinguish the speech of sinusitis patients from healthy individuals. Audio features, including Mel-frequency cepstral coefficients and spectral centroid, were extracted from processed voice recordings.
A Deep Neural Network (DNN) outperformed other models, achieving a test accuracy of 0.63. While the results demonstrate that voice-based diagnosis is a viable screening tool, the study suggests that performance could be further enhanced through larger datasets and refined feature selection techniques.
â–¶ Predicting patient outcomes based on adverse drug events using graph neural networks
This research explores using graph neural networks (GNNs) in pharmacovigilance to predict adverse drug events (ADEs). By representing relationships between patients, drugs, and reactions as a complex graph data model, the study utilized a graph multi-layer perceptron (graph MLP) to enhance prediction accuracy.
The GNN model outperformed conventional approaches, providing deeper insights into drug safety and patient outcomes. The study highlights the potential for GNNs to improve clinical decision-making and personalized medicine, while acknowledging that data quality remains a critical factor in model interpretability and success.
March 2024 Graduation
â–¶ Convolutional neural network filter selection using genetic algorithms
The large size of convolutional neural networks (CNNs) often hampers their deployment. This project proposes using a genetic algorithm to optimize filter selection and pruning, allowing for the adaptive removal of the least important filters without materially degrading predictive capabilities.
The algorithm achieved 90.91% model compression with only a 0.13% drop in accuracy for audio data models, and even increased accuracy by 2.37% on certain image datasets. These results demonstrate that genetic algorithms can successfully compress any given CNN architecture while maintaining or improving performance.
â–¶ The value of Zero-rating internet services to provide essential services to low-income communities
This study analyzed usage patterns on the zero-rated MoyaApp platform in South Africa to determine the value of free access to essential services. Using temporal association rule mining, the researcher found that many users initially joined for government grant services but eventually transitioned into regular users of education and job-seeking categories.
The research concludes that zero-rating—or reverse billing data—is an effective strategy for bridging the digital divide. Recommendations include expanding job categories and using recommendation engines to guide low-income users toward services that improve their socio-economic status.
â–¶ Intelli-Bone: Automated fracture detection and classification in radiographs using transfer learning
Incorrectly diagnosed fractures account for over 80% of reported diagnostic mistakes in emergency departments. This research evaluated AI models, including YOLOv8 and Faster R-CNN, to locate and classify fractures according to the AO/OTA system. To combat data scarcity, transfer learning was employed by pretraining models on larger medical datasets.
Pretraining on domain-specific datasets like GRAZPEDWRI-DX improved mean average precision (mAP50) by 33.6%. The best-performing model, YOLOv8l, achieved a mAP50 of 59.7%, demonstrating that AI can significantly assist healthcare professionals in accurate fracture prognosis.
â–¶ Evolutionary multi-objective optimisation for truck and drone scheduling
Integrating drones with traditional trucks can optimize last-mile delivery, but balancing delivery time against distance is complex. This study introduces a multi-objective traveling salesman problem with drone interception (TSPDi) using adapted NSGA-II and SPEA2 algorithms.
Results showed that NSGA-II performed better on larger datasets, while SPEA2 had an advantage with fewer nodes. Overall, these multi-objective algorithms outperformed traditional single-objective approaches on larger datasets, proving to be more competitive in reducing both delivery time and operational truck distance.
â–¶ Evolving encapsulated neural network blocks using a genetic algorithm
This project investigates using a genetic-based evolutionary algorithm to automate the discovery of modular "blocks" within CNNs for image classification. Inspired by ResNet and GoogLeNet, the framework utilizes neuroevolution to evolve architectures through mutation, speciation, and crossover.
The evolved blocks proved to be highly competitive against manually designed counterparts. This study validates that evolutionary computation can successfully automate the discovery of optimal subnetworks, rivaling human-designed architectures and overcoming the limitations of manual design processes.
â–¶ Machine Learning for Aquaponic System Mortality Prediction and Planting Area Optimisation
This project proposes an IoT-based system for aquaponics that leverages machine learning to improve efficiency. By collecting data on water quality, fish behavior, and plant growth, the system trains models to predict fish mortality and optimize crop growing areas.
The integration of machine learning and IoT has the potential to significantly improve the profitability of aquaponic plants. This could encourage wider adoption of aquaponics as a sustainable farming method by reducing risk and maximizing yield.
â–¶ Spatio-Temporal Modelling of Road Traffic Fatalities in Western Cape
Responding to the WHO’s Decade of Action for Road Safety, this project sought to develop a machine learning model to predict road fatal events in time and space. Relevant features of the Western Cape were aggregated into an H3 grid to learn spatial patterns.
By moving beyond historical average models currently used in the industry, this research represents the first attempt in South Africa to use deep learning for modeling road traffic fatalities. The resulting decision support system offers a more sophisticated way to allocate resources and improve road safety interventions.
â–¶ Tree-Based ML Models for Quantifying Mineralogy using Bulk Chemical Compositional Data
Element-to-mineral conversion (EMC) is often hindered when there are more minerals than known elements. This study investigated tree-based machine learning algorithms (Decision Tree, Random Forest, and Extra Trees) to predict mineral grade quantities using geochemical data from the Kalahari Manganese Deposit.
The Extra Trees regressor outperformed traditional least-squares methods, achieving R2 scores > 0.5 for most mineral groups. The results conclude that machine learning can overcome the mathematical limitations of traditional EMC, providing more reliable mineral quantity predictions for mineral processing models.
â–¶ Optimisation algorithms for a dynamic truck and drone scheduling problem
This research addresses the dynamic traveling salesperson problem with drone interception (DTSPDi), where customer coordinates change in real-time. Three algorithms were tested: Ant Colony System (ACS), MAX-MIN Ant System (MMAS), and a modified ACS-KT that transfers pheromone knowledge between time slices.
Benchmarking showed that ACS-KT outperformed the other algorithms in both time and distance dimensions. The study found that ACS-KT is significantly better at handling dynamic environmental changes, proving that "pheromone knowledge" transfer is key to maintaining efficient routes as customer locations shift.
â–¶ Review of Big Data clustering methods
In an era of complex datasets, this study explores the evolving landscape of big data clustering. It introduces a novel taxonomy categorizing models into four distinct groups, offering a roadmap for scalability and efficiency in the face of the "four Vs": velocity, variety, volume, and veracity.
Through a series of empirical experiments, the research identifies the operational dynamics and performance trade-offs of various algorithms. Insights highlight the efficiency of parallel k-means and mini-batch k-means for large-scale applications, while uncovering computational constraints in models like selective sampling-based scalable sparse subspace clustering (S5C) and purity-weighted consensus clustering (PWCC).
The study provides a comprehensive performance summary and lays the foundation for a centralized database for clustering research, aiming to bridge existing knowledge gaps and facilitate optimal model discovery based on specific infrastructural capabilities.
â–¶ Clustering free text procurement data
The mining industry faces challenges in leveraging advanced data analytics for data-driven decisions. Company A grapples with 50% of its group-wide procurement spend stored as unstructured text data, hindering in-depth cost analysis due to variations in item descriptions. Manually aggregating these diverse strings for analysis is laborious and inefficient.
This research delved into techniques such as Tfidf feature selection, LSA, and word embedding feature transformation. Exploration of k-means and agglomerative hierarchical (AHC) text clustering revealed that AHC performed better, yielding a high silhouette coefficient validated by domain experts. Results analyzed in Power BI concluded that modern approaches to feature selection and dimension reduction are essential for optimal results in clustering free text data.
â–¶ Few-shot learning for passive acoustic monitoring of endangered species
The Hainan gibbon is facing extinction, and bioacoustics is critical for studying their declining population. Passive acoustic monitoring captures months of data, but the low population numbers of endangered species make manual analysis time-consuming. While machine learning can automate identification, many algorithms require large amounts of data to perform reliably.
This assignment explores few-shot learning using a Siamese framework based on convolutional neural networks (CNN). Within this framework, contrastive-loss and triplet loss architectures were investigated. Results indicate that the triplet-loss architecture produces the most accurate models, achieving an accuracy of 99.08% and an F1-score of 0.995, effectively identifying the bioacoustic signature of the Hainan gibbon even with low data volumes.
â–¶ Digitization Of Test Pit Log Documents For Development Of A Smart Digital Ground Investigation Companion
Geotechnical companies in South Africa document ground investigations in PDF format, creating a need for digitization to allow for thorough analysis. This project presented an automated way of digitizing documents using an object detection model for layout analysis and optical character recognition (OCR) for extracting alphanumeric characters.
The object detection model was developed by fine-tuning a faster R-CNN model, while PaddleOCR achieved a word recognition rate of 96% for character extraction. An interactive application was developed to visualize soil characteristics, and a semantic search algorithm was fine-tuned using sentence transformers to allow users to query the dataset using natural language.
â–¶ Comparison of machine learning models on financial time series data
This research focuses on developing multiple machine learning models combined with a financial trading strategy to compare the performance of different algorithms on financial time series data, including USD/ZAR and ZAR/JPY exchange rates and various global indices. Twelve machine-learning models were developed, ranging from logistic regression to complex neural networks.
Results indicate that the support vector machine (SVM) and baseline logistic regression model performed the best. It was determined that non-neural network machine learning models were less computationally complex and less dependent on a balanced data set than the neural network models, which generally showed poorer performance in forecasting these financial data sets.
â–¶ Trends in Infrastructure Delivery from Media Reports
Investment in public infrastructure is critical for economic growth, yet data availability for monitoring infrastructure conditions is often limited. Online news articles offer a promising alternative data source. This research applied natural language processing techniques to collect and analyze news from nine South African news websites.
Topic modelling was applied to group articles into specific issues like potholes or sewage spills, with summaries generated by a large language model. A dashboard was designed to visualize these topics, concluding that topic modelling is a feasible and effective way to address data gaps in compiling infrastructure report cards for South Africa.
â–¶ Investigating sales forecasting in the formal liquor market using deep learning techniques
This research assignment focuses on forecasting sales in the liquor industry by examining the effectiveness of deep learning techniques and a stacked ensemble approach. Time-series forecasting is critical for operations research, and this study involved a thorough analysis of datasets to understand inherent structures in sales data.
The findings indicate that deep learning techniques and ensemble theory can be successfully applied to liquor sales forecasting. A stacked ensemble approach was particularly effective at improving performance while reducing the computational complexity and expenses associated with granular forecasting models, offering a more efficient alternative to traditional methods.
â–¶ Automated Localisation and Classification of Trauma Implants in Leg X-rays through Deep Learning
Orthopedic surgeons often need to identify failed implants pre-operatively, a process that is currently time-consuming and prone to error. This study investigates the use of deep learning to automate the identification of trauma implants in leg X-rays, using a dataset that presented challenges such as limited data and imbalanced class distributions.
The optimal solution was a two-model pipeline employing a YOLO object detection model and a DenseNet classification model. The pipeline achieved a mean average precision of 0.967 for implant localisation and an accuracy of 73.7% for classification, providing proof that deep learning models are capable of reliably identifying trauma implants.
â–¶ Association between CNN features for skin cancer diagnosis and clinical ABC-criteria
Convolutional neural networks (CNNs) show promise in classifying skin lesions, but a lack of transparency prevents clinical application. This research developed a methodology to evaluate whether the features used by a CNN correspond to established clinical indicators like the ABCDE criteria and the 7-point skin lesion malignancy checklist.
The study found a strong association between the features extracted by the CNN and the clinical ground truth. Correlation tests and performance analysis on grayscale datasets indicated that colour is a key feature used by the CNN. Overall, the methodology proved that the CNN uses features aligned with clinical standards to determine whether a skin lesion is malignant or benign.
â–¶ Association between the features used by a convolutional neural network for skin cancer diagnosis and the ABC-criteria and 7-point skin lesion malignancy checklist
Melanoma cases and the associated mortality rate are rising rapidly, making early detection crucial. While Convolutional Neural Networks (CNNs) show promise in improving the efficiency of classifying skin lesions, a lack of transparency in their decision-making prevents clinical application. For approval, it must be shown that the features used by a CNN correspond to clinical indicators like the ABCDE criteria and the 7-point skin lesion malignancy checklist.
In this research, a methodology was developed to evaluate this correspondence using an InceptionResNetV2 model with a leaky ReLU activation. The study investigated associations using statistical methods to establish a ground truth, t-distributed stochastic neighbour embedding (t-SNE) for feature extraction analysis, and Local Interpretable Model-agnostic Explanations (LIME) to provide insights into the decision-making process.
The results showed a strong association between the features used by the CNN and the clinical criteria, with the exception of vascular structures and the colors brown, red, and black. A performance decrease on grayscale datasets confirmed that color is a critical feature for the CNN. While the model was robust to most dataset issues, it showed sensitivity to the presence of hair and immersion fluid. Ultimately, the study demonstrated that the CNN does indeed use clinical features to determine malignancy, supporting its potential for clinical use.
December 2023 Graduation
â–¶ A dynamic optimisation approach to training feed-forward neural networks that form part of an active learning paradigm
Active learning describes a paradigm of continually selecting the most informative patterns to train a model while training progresses. This research assignment investigates the effect of changing the optimiser of a feed-forward neural network (FFNN) from backpropagation to a dynamic optimisation algorithm, specifically the cooperative quantum-behaved particle swarm optimisation (CQPSO).
It was found that the CQPSO algorithm located and tracked the global minimum of four out of the six problem sets more effectively than backpropagation in the DPS active learning paradigm. Conversely, backpropagation performed better in the SASLA paradigm. The study concludes that CQPSO performance is highly dependent on search space dimensionality and the interdependence of training patterns.
â–¶ Course Recommendation Based on Content Affinity with Browsing Behaviour
This study aims to build a course recommender system for Physioplus to overcome the "distressing search problem" in MOOC platforms. The system uses a user’s recent Physiopedia browsing history to provide a tailored, rank-ordered list of relevant courses using collaborative-based filtering (CF) techniques and natural language processing.
The results showed a recall score of 76% and an accuracy rate of 53% in offline experiments. The research suggests that an enhanced recommender engine has significant potential to increase subscriber satisfaction and reduce cancellations by aligning course suggestions with real-time user interests.
â–¶ An Evolutionary Algorithm for the Vehicle Routing Problem with Drones with Interceptions
This study proposes an evolutionary algorithm (EA) to solve the vehicle routing problem with drones and interceptions (VRPDi), where drones can meet trucks mid-route. The research demonstrates a metaheuristic strategy for scheduling multiple pairs of trucks and drones leaving from a central depot.
Benchmarking against standard VRP datasets showed improvements in total delivery time between 39% and 60%. While the algorithm effectively solved 50 and 100-node problems, performance and computation time deteriorated as the number of nodes increased, highlighting current scalability limits for this specific evolutionary approach.
â–¶ Metaheuristics for Training Deep Neural Networks
This research compares the use of metaheuristics—specifically Particle Swarm Optimisation (PSO), Genetic Algorithms (GA), and Differential Evolution (DE)—as alternatives to traditional backpropagation with stochastic gradient descent (SGD) for training deep convolutional neural networks.
Through five different experiments on image datasets, the results concluded that while metaheuristics offer a robust alternative for certain objective functions, SGD continues to perform better in terms of overall accuracy and efficiency for deep architectures. The study highlights the trade-offs between traditional gradient-based methods and population-based search strategies.
â–¶ Diversity preservation for decomposition particle swarm optimization as feed-forward neural network training algorithm under the presence of concept drift
This project addresses the challenge of "concept drift" in time series forecasting. It investigates diversity preservation techniques for decomposition cooperative particle swarm optimisation (DCPSO) to ensure swarms do not converge too quickly and lose the ability to adapt to environmental changes.
Techniques including random decomposition and diversity-based penalty functions were tested across five non-stationary forecasting problems. The results showed that random decomposition significantly impacted swarm diversity, while the penalty function improved training and generalization errors, though with a slight trade-off in performance.
March 2023 Graduation
â–¶ Adaptive thresholding for microplot segmentation
The within-season evaluation of experimental wheat plots is often performed via high throughput phenotyping (HTP) on UAV-collected images. This research developed the Adaptive Thresholding Procedure (ATP), an automated method using unsupervised learning to identify and localise microplots without manual grid adjustment.
The ATP yielded superior performance compared to manual methods in favourable conditions, though it faced challenges in weed-heavy environments. Despite this, the tool significantly reduces the time researchers spend on manual post-processing, providing a more scalable solution for agricultural pre-breeding programmes.
â–¶ Decision Support Guidelines for Selecting Modern Business Intelligence Platforms in Manufacturing
As digitalisation in manufacturing increases, selecting the right Business Intelligence (BI) tool becomes critical. This research uses thematic analysis from industry professional interviews to identify essential criteria and challenges for BI implementation in traditionally "laggard" sectors like manufacturing.
The study proposes a nine-step selection process to help decision-makers evaluate BI software. The findings underscore that while BI is vital for prioritising tasks and maximizing profit, the choice of tool must be balanced against the specific technological and organisational foundations of the manufacturer.
â–¶ An Investigation into the Automatic Behaviour Classification of the African Penguin
This project applies deep learning to animal behaviour analysis for the endangered African penguin. By using mounted video cameras and non-invasive monitoring, the research developed a dual-model pipeline: first extracting movement coordinates and then classifying the behaviour (e.g., braying, preening, resting).
The models achieved AUC scores between 72.9% and 84.2% across various case studies. This foundational work provides a path toward passive monitoring systems and anomaly detection that can help conservationists respond faster to colony distress.
â–¶ Set-based Particle Swarm Optimization for Medoids-based Clustering of Stationary and Non-Stationary Data
Set-based particle swarm optimization (SBPSO) substitutes vector-based mechanisms with set theory to find optimal subsets of elements. This research applied SBPSO to medoids-based clustering across fifteen diverse datasets to evaluate its effectiveness in stationary and non-stationary environments.
SBPSO ranked third among seven algorithms, proving to be a viable clustering tool, though it was less effective for datasets with a high number of clusters. The study identified a critical trade-off between swarm diversity and clustering ability, providing a framework for future hyperparameter tuning in dynamic environments.
â–¶ An Extension of the CRISP-DM Framework to Incorporate Change Management
Digital transformation projects often fail due to human factors rather than technical ones. This research extends the widely-used CRISP-DM framework by incorporating a generalised change management model to address barriers like lack of understanding or knowledge to operate new technology.
The extended framework was validated against a real-world case study, demonstrating that identifying change management gaps early can significantly improve project adoption. This provides data specialists with a structured roadmap to mitigate the high risk of failure in AI implementations.
â–¶ An evaluation of state-of-the-art approaches to short-term dynamic forecasting
This research compares traditional statistical forecasting (Exponential Smoothing) with state-of-the-art deep learning (NBEATS) for short-term order volume forecasting in logistics. Short-term forecasts are inherently stochastic, making the use of exogenous covariates critical for accuracy.
The study found that the NBEATS model provided a 36.01% improvement in RMSE over traditional models. Furthermore, incorporating external variables resulted in a 16.15% increase in accuracy, suggesting that modern neural architectures are far more consistent at capturing environmental shifts in logistics demand.
â–¶ Cross-Camera Vehicle Tracking in an Industrial Plant Using Computer Vision and Deep Learning
To detect fraud at paper recycling buy-back centres, this research developed a multi-vehicle multi-camera tracking (MVMCT) framework. Using Faster R-CNN and DeepSORT, the system tracks vehicles as they move through different camera feeds in the plant to identify suspicious activity.
The framework achieved an IDF1 score of 0.58 and demonstrated reasonable accuracy in counting stationary vehicles at loading bays. This provides a digital tool for buy-back centres to estimate expected stock volumes and flag discrepancies, protecting the sustainability of the recycling ecosystem.
â–¶ A Bagging Approach to Training Neural Networks using Metaheuristics
This research investigates using metaheuristics like genetic algorithms and particle swarm optimisation to train neural networks using subsampled datasets (bagging). This approach aims to reduce the massive computational costs currently associated with training large architectures on full datasets.
The results indicate that training with sub-samples can maintain similar accuracy to training on full datasets while significantly reducing the risk of overfitting. This highlights metaheuristics as a robust, computationally efficient alternative to traditional stochastic gradient descent for specific training paradigms.
â–¶ Link prediction of clients and merchants in a rewards program using graph neural networks
This study investigates potential relationships between clients and merchants within a bank's rewards program using GraphSAGE, an inductive Graph Neural Network (GNN) framework. The network is represented as a complex graph of interconnected entities to predict the existence of future links.
The model achieved a ROCAUC value of 0.65. While the sparse nature of the network presented challenges for precision, embedding visualizations successfully identified distinct merchant groups and client clusters. The research highlights the GNN's ability to detect network topology in financial rewards ecosystems despite data sparsity issues.
â–¶ Evaluating active learning strategies to reduce labelled medical images for CNN classifiers
Training CNNs typically requires massive amounts of manually labelled data, which is both costly and time-consuming in the medical field. This study investigates how active learning—selecting the most informative images for human annotation rather than random selection—can reduce this burden.
Using a chest x-ray pneumonia dataset, the research found that a DenseNet-121 architecture combined with least confidence sampling reduced the number of required labelled images by 39% compared to baseline random sampling, maintaining high performance while significantly lowering annotation costs.
â–¶ A Dynamic Optimization Approach to Active Learning in Neural Networks
Traditional neural network training assumes a static environment, but active learning creates a dynamic training set that changes as the model selects informative instances. This study explores whether dynamic metaheuristics, such as variations of Particle Swarm Optimisation (PSO), outperform backpropagation in these shifting environments.
Testing across seven benchmark datasets showed improved generalization in three cases when using dynamic metaheuristics. However, the results were not consistent across all metrics, leading to the conclusion that while promising, dynamic metaheuristics are not yet a definitive replacement for standard learning algorithms in all active learning scenarios.
â–¶ Rule Extraction from Financial Time Series
Extracting understandable "rules" from complex financial data is a major challenge in data mining. This research develops a framework for rule induction and extraction that focuses on the shapes and trends of time series rather than just raw values.
The most significant finding was the critical importance of balanced data; predictive performance improved significantly when excessive class distributions were minimized. The study concludes that the success of rule extraction in finance depends more on data preparation and addressing class imbalance than on the specific algorithm used.
â–¶ Binning Continuous-Valued Features using Meta-Heuristics
Discretization—partitioning continuous data into "bins"—is a vital preprocessing step for model interpretability. This report proposes a new discretization algorithm that uses Particle Swarm Optimization (PSO) to find optimal bin boundaries for multivariate classification problems.
The proposed PSO-based discretizer was compared against traditional equal-width and equal-frequency binning. While it offered strong interpretability, it was occasionally outperformed by evolutionary cut-point selection when paired with specific classifiers like C4.5, suggesting that the choice of discretizer should be closely aligned with the final model type.
â–¶ A Genetic Algorithm Approach to Tree Bucking using Mechanical Harvester Data
Bucking—the process of crosscutting trees into logs—is an optimization problem where different log lengths and diameters carry different values. This research applies a Genetic Algorithm (GA) to data from mechanical harvesters in South African forests to determine if existing bucking practices could be improved.
By using PSO to tune the GA’s hyperparameters, the study found that the metaheuristic approach outperformed existing manual bucking by a large margin. The results were validated against dynamic programming solutions, proving that GA is a highly effective tool for maximizing timber value at the harvest site.
â–¶ Crop recommendation system for precision farming: Malawi use case
Precision farming uses data-driven recommendations to boost productivity. This project developed a crop recommendation system for the central region of Malawi, using meteorological and soil data to forecast the best crops (maize, cassava, rice, beans, or sugarcane) for specific farmlands.
After testing ten different classifiers, K-Nearest Neighbours (KNC) emerged as the top performer with 99% accuracy and a simple, fast-training structure. The model was successfully integrated into a test web application, providing a proof-of-concept for ICT-based climate change mitigation in sub-Saharan agriculture.
â–¶ Financial Time Series Modelling using Gramian Angular Summation Fields
This research investigates encoding financial time series into images using Gramian Angular Summation Fields (GASF) and Markov Transition Fields (MTF), allowing the application of computer vision techniques to financial forecasting.
While traditional time series models performed better in isolation, the study found that a combination of GASF and MTF images allowed the model to learn superior features when used alongside sequence-based approaches. This combinatorial method improved overall model performance for complex financial data sets.
â–¶ Machine Learning-based Nitrogen Fertilizer Guidelines for Canola
Soil degradation and inefficient land practices are major concerns for South African agriculture. This study uses machine learning to predict the optimal amount of nitrogen (N) fertilizer for canola crops to maximize yield while minimizing costs and environmental emissions.
The Random Forest Regressor proved the most accurate, using features like rainfall, soil nitrogen levels, and planting dates. The output is a practical fertilizer recommendation table that helps farmers make data-guided decisions to increase productivity and profit in conservation agriculture systems.
â–¶ The use of historical tracking data to predict vehicle travel speeds
York Timbers manages a massive 10,000 km road network in its plantations. To optimize timber delivery, this project used GPS tracking data and map-matching techniques to estimate travel speeds across thousands of road segments, including those without active GPS measurements.
A regression tree model achieved the best results, with a mean absolute error of 10.02 km/h. The research suggests that including additional factors like weather data and identifying dangerous road portions would further refine the model's accuracy for industrial logistics optimization.
â–¶ A Review and Analysis of Imputation Approaches for Missing Data
Missing data is a common hurdle that can introduce significant bias into any dataset. This research assignment evaluates ten different statistical and machine learning imputation methods, including Mean, MCMC, and k-Nearest Neighbor (kNN).
The Markov Chain Monte Carlo (MCMC) method performed best with 75.71% accuracy and the lowest bias. The study concludes that simple statistical methods (like mean imputation) should generally be avoided, while MCMC provides a robust and easy-to-use solution for maintaining data integrity.
â–¶ Crawler Detection Decision Support: A Neural Network with PSO Approach
Distinguishing between human users, "good" crawlers (like search engines), and "bad" malicious crawlers is vital for website security and traffic analytics. This research implements neural networks optimized with Particle Swarm Optimisation (PSO) to classify website traffic sessions.
The models were tested in both stationary environments and non-stationary environments where "concept drift" occurs (crawlers changing behavior over time). Results demonstrated that quantum-inspired PSO effectively optimized the neural networks to successfully identify malicious crawling patterns in real-time.
â–¶ A comparative study of metaheuristics for hyper-parameter optimisation
Hyper-parameter optimization is often a "black-box" problem where derivatives aren't available, making standard gradient descent unusable. This study compares traditional methods like Grid and Random search against metaheuristics including Genetic Algorithms (GA) and PSO.
Using a test suite of SVMs, MLPs, and CNNs, the research employed Friedman and Nemenyi tests to rank the algorithms. The results indicate that metaheuristic frameworks offer high-level, problem-independent guidelines that can optimize complex deep learning architectures more efficiently than manual tuning or exhaustive search methods.
â–¶ Predicting employee burnout using machine learning techniques
While AI is often used for recruitment and attrition, its application in identifying employee burnout is still developing. This research applied multiple classification models to human capital data to proactively guide wellbeing interventions.
The results showed that predicting burnout is highly complex; none of the models achieved more than 50% accuracy, with artificial neural networks performing the best of a difficult set. The study highlights the need for better data quality and ethical guardrails when applying AI to human-centric workplace challenges.
â–¶ Comparison of Machine Learning Models for the Classification of Fluorescent Microscopy Images
Long COVID symptoms like fatigue and brain fog are often caused by microscopic blood clots that inhibit oxygen exchange. This research investigates using computer vision and machine learning to automate the identification of these microclots in fluorescent microscopy images, a task that is currently labor-intensive and manual.
By extracting specific features from the images, the study compared several algorithms and found that Logistic Regression provided strong, reliable performance in classifying both positive and negative cases. This automation offers a scalable way to speed up diagnosis and help patients access treatment faster.
â–¶ Anomaly Detection in Support of Predictive Maintenance of Coal Mills
Shifting from preventive to predictive maintenance is a cornerstone of Industry 4.0. This research focuses on coal mills, using supervised machine learning to move away from fixed maintenance schedules and toward a "maintenance when necessary" approach based on real-time data.
The study identifies critical data quality issues in industrial sensor logs and develops a model to predict likely failure points. The findings demonstrate that supervised learning can effectively maximize asset availability, ensuring that heavy machinery is serviced only when failure is imminent, thereby saving costs and reducing downtime.
â–¶ Unsupervised Models for Identification of Financial Time Series Regimes
Financial markets constantly shift between different "regimes," such as bull and bear markets. Detecting these shifts accurately is vital for portfolio management. This research reviews unsupervised machine learning algorithms to identify regime changes in multivariate data from the Johannesburg Stock Exchange (JSE).
The study compared various algorithms based on their detection accuracy and the profitability of the resulting investment strategies. The results provide investors with a framework for understanding market trends and making more informed, data-driven decisions during periods of high volatility.
â–¶ Detection of Chronic Kidney Disease (CKD) using Machine Learning Algorithms
Chronic Kidney Disease (CKD) affects 10% of the global population, but early detection is difficult due to overlapping symptoms with other illnesses. This is particularly critical in South Africa, where the majority of the population relies on under-resourced public health systems.
This study developed and evaluated multiple classification models using three international datasets. The goal was to build a high-performing model capable of identifying hidden correlations in patient symptoms, ultimately providing a non-invasive, early-warning tool that could revolutionize treatment access in developing nations.
â–¶ Feature Engineering Approaches for Financial Time Series Forecasting
Noise and non-stationarity make financial forecasting notoriously difficult. This research tests various feature engineering methods—like Fourier transforms, wavelets, and log-transforms—across models including SVR, MLP, and LSTM to see if "cleaning" the data actually improves price predictions.
The investigation found that while denoising can help specific models, there is no "silver bullet" method. Interestingly, while some techniques improved directional accuracy (predicting if the price goes up or down), the gains in raw price prediction were small, suggesting that future work should focus on alternative data sources rather than just past price history.
â–¶ Forecasting Armed Conflict using LSTM Recurrent Neural Networks
Predicting social conflict can help humanitarian agencies intervene before violence escalates. This dissertation applies Long Short-Term Memory (LSTM) networks to forecast events in the Afghanistan conflict, using world news data from GDELT and event data from the Uppsala Conflict Data Program.
The results show that incorporating news media data—specifically actor and event attributes—significantly improves baseline models. By consolidating media sentiment with actual recorded death tolls, the model produces predictions that are grounded in reality, offering a powerful tool for governments and NGOs.
â–¶ Comparison of Machine Learning Models on Different Financial Time Series
This study challenges the "Efficient Market Hypothesis" by investigating if AI can find a 1% edge in markets like the S&P 500, Gold, and Bitcoin. It compared seven different models, ranging from simple Linear Regression to complex Gated Recurrent Units (GRU), using both single out-of-sample and walk-forward validation.
Surprisingly, Linear Regression remained the most effective model for many assets due to its simplicity (parsimony). However, for specific assets like the USD/ZAR pair and Gold, Neural Networks (MLP) achieved accuracies over 51-53%, proving that even a small algorithmic edge can lead to market outperformance.
â–¶ Proximal Methods for Seedling Detection and Height Assessment
Forest nurseries currently rely on laborious manual sampling to monitor seedling growth. This study proposes a tech-driven alternative: using smartphone RGB imagery and photogrammetry to create 3D digital surface models of seedling trays.
Using a RetinaNet object detection model and AdaBoost regression, the pipeline successfully detected 98.97% of seedlings and accurately assessed their heights. This allows nursery operators to understand stock quantities and growth stages instantly without manual intervention, significantly improving operational efficiency.
â–¶ Automated Tree Position and Height Estimation from RGB Aerial Imagery
Determining the fiscal value of a forest requires accurate tree counts and height data. This research uses UAV-derived imagery and a hybrid machine learning framework (RetinaNet + SVM + MLP) to improve upon traditional "local maxima" algorithms for a Eucalyptus plantation in KwaZulu-Natal.
The hybrid approach improved tree position accuracy by 15% and height estimation accuracy by 25%. By filtering out "illegitimate" tree positions (like shadows or dead trees) with an SVM, the system provides a much more reliable estimate of total forest biomass and value.
â–¶ Fantasy Premier League Decision Support: A Meta-learner Approach
Fantasy Premier League is a strategic game with millions of players. This project treats "dream-team" selection as a mathematical optimization problem, using a stacked meta-learner that combines predictions from five different families of machine learning algorithms.
The system uses linear programming to suggest player transfers while respecting budget and squad constraints. In a case study of the 2020/21 season, this AI-driven approach would have ranked in the top 5.98% of all 8 million managers, proving the power of ensemble learning in sports analytics.
â–¶ Requirements for 3D stock assessment of timber on landings and terminals
Inaccurate estimations of log pile volumes can lead to significant bottlenecks in the timber supply chain. This project addresses the need for a low-tech, frequent, and accurate stock assessment system suitable for rural areas, using simple consumer-grade cameras or smartphones.
By using Terrestrial Structure from Motion (SFM) and K-means clustering, the system extracts log piles from 3D point clouds and generates "alpha shapes" to predict volume. The results prove that computer vision can provide an accessible and reliable alternative to manual estimations, even in remote environments with limited infrastructure.
â–¶ A predictive model for precision tree measurements using applied machine learning
Traditional forest enumeration is laborious and often only covers a small fraction of a plantation. While high-end laser scanning exists, it is expensive and difficult to scale. This thesis investigates a more accessible route: using ordinary smartphone video and Monocular Depth Estimation (MDE) to measure tree diameter at breast height (DBH).
Working with the South African Forestry Company (SAFCOL), the researcher compared AI-generated measurements against "ground truth" fieldwork data. While the error rate remains a work in progress, the model successfully generated spatial representations that closely mirror real-world coordinates, offering a scalable path toward precision forestry using everyday mobile devices.