Current Projects

biplotEZ

Sugnet Lubbe, Johané Nienkemper-Swanepoel, Niël le Roux, Raeesa Ganey, Ruan Buys, Zoë-Mae Adams, Peter Manefeldt

Biplots have proved to be valuable visualisation tools in exploratory data analysis. To date the use of biplots for interdisciplinary applications has been limited since current implementation tools are constraint to expert users. The biplotEZ R package was published on CRAN as a user-friendly package to enable and empower practitioners and researchers of varying skills to apply biplots more widely in many disciplines, especially in the current era of big data. Currently the package makes provision for Principal Component Analysis biplots, Canonical Variate Analysis biplots, Correspondence Analysis biplots and Regression biplots (both with linear regression axes or spline based axes). Work is continuing with respect to Multiple Correspondence Analysis biplots, Analysis of Distance biplots and more.

BIPLOTS FOR INDUSTRY

Niël le Roux, Roelof Coetzer (NWU), Ruan Rossouw (SASOL)

A multivariate reactor performance index (RPI) is developed for complex Multivariate Process Monitoring. The newly proposed RPI integrates subject-matter knowledge with a data driven approach for real time performance monitoring. A new approach to process deviation monitoring on many variables is presented based on the confidence value (α) at a specified -value. This methodology is proposed as a general data driven performance index as it is objective, and very little prior knowledge of the system is required. A performance index visualized on an appropriate and interactive graph is invaluable in the monitoring of multiple similar production processes, as it makes it easy to visually identify production processes not performing as expected.

EEG SIMULATIONS

Niël le Roux, Pieter Schoonees (Erasmus University)

This research focuses on methods for assessing the similarity of brain responses within and across subjects (individuals). Typically, the data comes from fMRI or EEG studies, and concern spatiotemporal measures of brain activity while the subject is exposed to some stimulus. A particular focus of these studies is on naturalistic stimuli, which typically means video content such as television and films. This is an important departure from traditional neuroimaging studies where subjects perform simple tasks multiple times in a highly controlled setting. fMRI offers high spatial resolution through dividing the brain into many voxels, but this comes at the cost of lower temporal resolution as it takes roughly two seconds to complete a single scan of the brain. In contrast, EEG trades spatial resolution for high temporal resolution. In EEG, a limited number of electrodes (e.g., 64) are placed on the scalp to measure activity, but by sacrificing temporal resolution in this way measurements can be made several times a second (typically 256 of 512 times). Our focus is on the statistical analysis of EEG data. To this end we developed an extensive R-based EEG simulation statistical model for generating EEG data in a wide variety of controlled conditions. This allows us to evaluate statistical procedures currently in use in the field of analysing EEG data.

Applying biplotEZ

Sugnet Lubbe, Johané Nienkemper-Swanepoel, Niël le Roux, Raeesa Ganey

This project builds upon the biplotEZ project. The focus of this project is the application of multi-dimensional data visualisation s in a broad range of different fields. Collaborative projects in the following application fields have been considered: archaeology, agricultural sciences, chemometrics, industrial applications, finance, sensory profiling, microbiology and wood science. Experience has shown that a collaborative project with ongoing support from MuViSU contributes to deeper insights and interpretation in the application. The knowledge-flow goes both ways: application necessitates new theoretical developments; new theoretical developments lead to better understanding.

GENERALISED SINGULAR VALUE DECOMPOSITION FOR THREE-WAY DATA

Sugnet Lubbe, Raeesa Ganey (WITS)

Where the singular value decomposition decomposes a single matrix into three components, the generalised singular value decomposition decomposes two matrices simultaneously, with a single matrix of right singular vectors. This could be useful to visually represent more than one data matrix simultaneously. A paper is being finalised for submission.

FAULT DIAGNOSIS IN MULTIVARIATE STATISTICAL PROCESS MONITORING

Sugnet Lubbe, Roelof Coetzer (NWU)

While Prof Coetzer was at SASOL we co-supervised Dr André Mostert's PhD at UCT. Dr Mostert sadly passed away from COVID-19 in June 2021. Prof Coetzer and I plan to publish at least two papers from his thesis. Prof Coetzer is organising a session “Methodologies for process monitoring and fault detection in complex industrial processes" at the International Conference of Computational Methods in Science and Engineering (ICCMSE). Since the conference is in hybrid format, Prof Lubbe will remotely present a paper on this work.

Biplots for linguistic patterns

Raeesa Ganey, Johané Nienkemper-Swanepoel

This project was inspired by the The Economist article, "What is the world's loveliest language?", and aims to show how a variety of multivariate visualisations, specifically biplots, enhances the interpretations complex interactions between linguistic features and aesthetic perceptions.

Biplots for Missing Data

Johané Nienkemper-Swanepoel, Mokgeseng Ramaisa

A paper by Johané is currently under review to provide guidance to users to decide on appropriate imputation strategies based on the underlying data characteristics.

Mokgeseng is completing his Masters under the supervision of Dr Nienkemper-Swanepoel. He has developed methodology to extend GPAbin biplots for categorical data to incomplete continuous data using principal component analysis biplots. GPAbin biplots allow the unified visualisation of visualisations from multiple imputations.

Biplots for Text and Sentiment Visualisation

Zoë-Mae Adams, Johané Nienkemper-Swanepoel

The overarching aim of this dissertation is to gain insight from text data by summarising the content and visualising the results of sentiment classification. This can be achieved by developing suitable visualisation tools for the optimal representation of sentiment classification.

The following research objectives will be investigated in pursuit of the overarching aim:

Review of sentiment visualisation literature
Application of adaptive sentiment lexicon to improve sentiment classification accuracy
Enhancement of the interactive EW-MCA biplot tool
- The inspection of the ordinal nature of sentiment classification categories
- The visualisation of topic modelling results

EXPLODING BIPLOTS R PACKAGE

Ruan Buys

Ruan Buys published the R package bipl5 on CRAN as part of this Masters research. The package provides for reactive biplots rendered in HTML. The traditional biplot view is enhanced by automated translation of the axes and superimposing interclass kernel densities on the axes. Work is continuing on integrating bipl5 with biplotEZ.

The correspondence analysis of ordered categorical variables