Epileptic Seizure Prediction using Rotation Forest in a Parallel Environment
Abdulhamit Subasi, SamedJukic
Effat University, Jeddah, Saudi Arabia, International Burch University, Sarajevo, Bosnia and Herzegovina
Epilepsy is a neurological disorder characterized by a frequent tendency of the brain to yield abrupt bursts of abnormal electrical activity . In order to diagnose epileptic seizures, a patient’s EEG must be monitored on many channels for several days, making parallel processing essential. We employ MATLAB’s Parallel Computing Toolbox with wavelet packet decomposition (WPD) and Rotation Forest classifier. High-level parallel for-loops, special array types, and parallelized numerical algorithms allow to parallelize MATLAB applications without CUDA or MPI programming. One instance of MATLAB automatically creates multiple concurrent instruction streams. Big data applications in biomedicine are becoming attractive as data generation and storage capacities increase. Processing to extract knowledge remains challenging, however, since machine learning methods are not adapted to this requirement. In this study, we analysed the EEG signals for epileptic seizure prediction. Specifically, Multiscale Principal Component Analysis (MSPCA) is used for denoising, WPD for feature extraction, and Rotation Forest is used for classification in a parallel framework to correctly predict the epileptic seizure. The results show that the proposed framework reduces the execution time significantly while accomplishing a high level of performance in classifications.
EEG Signal Preprocessing
The EEG data were collected at the Epilepsy Center of the University Hospital of Freiburg and visually inspected by expert neurologists. Recordings are sampled 256 Hz using 60 channels. EEG recordings are taken from 21 patients with 88 seizures including preictal data and seizure-free (interictal) .
Multiscale PCA (MSPCA) is combination of the PCA ability to remove the cross-correlation between the variables. The observations are decomposed using wavelet transform for each variable to combine the PCA and benefits of wavelets. This results in data matrix transformation, X into a matrix, WX, where W is an orthonormal matrix demonstrating the orthonormal wavelet transformation. The quantity of principal components to be reserved at each scale is not transformed by the wavelet decomposition because it doesn’t change the fundamental relationship between the variables at any scale [3-5].
WPD utilizes both the low frequency components (approximations) and also the high frequency components (details) As a result, WPD provides a better frequency resolution for the decomposed signal. The advantage of the wavelet packet decomposition is to combine the different levels of decomposition in order to construct the original signal .
Rotation Forest is another recently introduced effective ensemble classifier generation method, where the training set for every base classifier is made by using PCA to rotate the initial attribute axes. Precisely, to generate the training data for a base classifier, the attribute set F is randomly divided into K subsets and PCA is used to every subset. All principal components are kept because of preserving the variability data information. Therefore, K axis rotations are positioned to generate the new attributes for a base classifier. The key point of Rotation Forest is to simultaneously inspire diversity and individual accuracy inside the ensemble: diversity is presented while applying feature extraction for every base classifier and accuracy is required by storing all principal components and also using the entire data set to train each base classifier .
Parallel or concurrent computing denotes a group of autonomous processors working together to solve a computational problem. This needs to diminish the execution time and employ larger memory/storage resources. The use of parallel computing is to divide and distribute the whole computational task among the processors. But, the hardware architecture of any multi-processor system is rather different than a single-processor computer which requires specifically adapted parallel software .
MATLAB’s Parallel Computing Toolbox is used to solve computationally and data-intensive problems employing multicore processors, GPUs, and computer clusters. High-level parallel for loops, special array types, and parallelized numerical algorithms allow to parallelize MATLAB applications without CUDA or MPI programming. Furthermore, this toolbox can execute applications for the full processing power of multicore desktops with MATLAB computational engines running locally.
In our experiment, we test running MATLAB code normally and multithreaded parallelism (MATLAB parallel). One instance of MATLAB automatically creates multiple concurrent instruction streams in multithreaded parallelism. Multiple processors or cores, sharing the memory of a single computer, execute these streams. In explicit parallelism, numerous examples of MATLAB run on several processors or computers, mostly with distinct memories, and concurrently execute a single MATLAB command or M-function. New programming concepts, including parallel loops and distributed arrays, describe the parallelism.
In this era, big data applications including biomedical are becoming attractive as the data generation and storage is increased in the last years. The big data processing to extract knowledge becomes challenging since the machine learning methods are not adapted to this requirement. In this study, we analyse the EEG signals for epileptic seizure prediction in the big data scenario using Rotation Forest classifier. Specifically, MSPCA is used for denoising, WPD is used for feature extraction and Rotation Forest is used for classification in a parallel framework to correctly predict the epileptic seizure. Furthermore, this study presents signal processing and machine learning algorithms for epileptic seizure prediction in a parallel environment. The results show that the proposed framework reduces the execution time significantly while accomplishing a high level of performance in classifications.