Predictive value of functional MRI and EEG in epilepsy diagnosis after a first seizure

It is often difficult to predict seizure recurrence in subjects who have suffered a first-ever epileptic seizure. In this study, the predictive value of physiological signals measured using Electroencephalography (EEG) and functional MRI (fMRI) is assessed. In particular those patients developing epilepsy (i.e. a second unprovoked seizure) that were initially evaluated as having a low risk of seizure recurrence are of interest. In total, 26 epilepsy patients, of which 8 were initially evaluated as having a low risk of seizure recurrence (i.e. converters), and 17 subjects with only a single seizure were included. All subjects underwent routine EEG as well as fMRI measurements. For diagnostic classification, features related to the temporal dynamics were determined for both the processed EEG and fMRI data. Subsequently, a logistic regression classifier was trained on epilepsy and first-seizure subjects. The trained model was tested using the clinically relevant converters group. The sensitivity, specificity, and AUC (mean ± SD) of the regression model including metrics from both modalities were 74 ± 19%, 82 ± 18%, and 0.75 ± 0.12, respectively. Positive and negative predictive values (mean ± SD) of the regression model with both EEG and fMRI features are 84 ± 14% and 78 ± 12%. Moreover, this EEG/fMRI model showed significant improvements compared to the clinical diagnosis, whereas the models using metrics from either EEG or fMRI do not reach significance (p > 0.05). Temporal metrics computationally derived from EEG and fMRI time signals may clinically aid and synergistically improve the predictive value in a first-seizure sample.


Introduction
Worldwide, approximately 4% of the people will experience an epileptic seizure during lifetime [1], while only 1% will develop epilepsy [2]. When a patient experienced one or more likely epileptic seizure(s), an accurate diagnosis is important to initiate correct treatment. In accordance to International League Against Epilepsy (ILAE) statements, a diagnosis of epilepsy requires at least one unprovoked seizure and/or a risk (>60%) of recurrent seizures or diagnosis of an epilepsy syndrome [3]. Electroencephalography (EEG) can aid the diagnosis of epilepsy. For example, epileptiform abnormalities in the EEG increase the risk of seizure recurrence. Abbreviations: SD, standard deviation; M, male; F, female; AED, antiepileptic drugs; EPI, echo-planar imaging; TR, repetition time; TE, echo time; TI, inversion time; ASR, Artifact Subspace Reconstruction; PLI, Phase Lag Index; SL, synchronization likelihood; FWHM, full-width half maximum; BOLD, blood-oxygen level dependent; fALFF, fractional amplitude of low-frequency fluctuation; ReHo, regional homogeneity; ROC, receiver operating characteristic; PFC, prefrontal cortex. However, in many cases a routine EEG is not sufficient to assess the risk of seizure recurrence. For example, when no epileptiform discharges are observed in the EEG, an adult has a 47% probability of developing epilepsy after a first unprovoked seizure, which increases to 77% when epileptiform dischargers are observed in the EEG [4]. The diagnosis of new onset epilepsy in patients with a first seizure event can be further improved using neuroimaging. For instance, CT or MRI allows detection of abnormalities specific for epilepsy and aid in a more accurate diagnosis [5]. A prior study found MRI abnormalities in 45% of new-onset epilepsy patients, of which half could be identified as epileptogenic lesions [6]. However, a large portion of patients has no structural abnormalities. These patients, without abnormalities on EEG or neuroimaging, cannot be stratified with current methods into patients with low or high risk of seizure recurrence.
To aid the risk assessment of seizure recurrence of patients presenting with a first seizure event, advanced methods focussing on functional dynamics from either EEG or functional MRI (fMRI) measurements have been proposed [7][8][9]. Whereas EEG and fMRI both relate to neuronal activity, their underlying mechanisms are different. Electroencephalography measures spontaneous electrical brain activity directly via electrodes on the scalp, with typical sampling frequencies of 250-500 Hz. In contrast, fMRI measures changes in the blood-oxygen level as an indirect consequence of neural activity, at a typical sampling rate of 500 mHz. While EEG has a superior temporal resolution allowing analysis of high frequency temporal activity, fMRI has the advantage of a high spatial resolution providing insights into specific brain structures, including deep regions. Therefore, EEG and fMRI measurements could ideally complement each other in predicting seizure recurrence. One method that exploits both EEG and fMRI measurements is simultaneous acquisition of EEG and fMRI (EEG-fMRI). It is most commonly used to identify at which time points epileptic activity occurred during the fMRI acquisition [10]. However, EEG-fMRI is a complex method that relies on several technical requirements and high demands of artifact removal [11]. Instead, in this study we will evaluate the diagnostic value of separately acquired EEG and fMRI, which is readily available in clinical practice.
The goal of this study was to assess the predictive value of physiological signals measured using EEG and/or fMRI using a classification model. A classifier is a machine learning approach that aims to distinguish two or more classes based on a number of features. It is a so-called supervised learning technique that identifies the class of new observations based on a training data set. Ideally, the model will be able to better distinguish patients who will and will not develop recurrent seizures. In particular those patients developing epilepsy (i.e. a second unprovoked seizure) that were initially evaluated as having a low risk of seizure recurrence (<60%) are of interest. From here on, this group of patients will be referred to as converters (e.g. converting from a single seizure event to epilepsy). The predictive values (in terms of sensitivity, specificity, positive predictive value and negative predictive value) of both modalities (EEG and fMRI) are assessed separately, as well as the added value of using features from both EEG and fMRI.

Participants
From June 2012 until January 2017, we recruited patients from the so-called ''first-fit" service at the Maastricht University Medical Center. Patients were referred after a neurologic outpatient consultation or a neurologic consultation at the emergency department, for an epileptic seizure. The patients underwent careful history taking, routine EEG, structural and fMRI, electrocardiography (ECG), and laboratory examinations. Patients were included if they suffered an unprovoked first epileptic seizure or if they were diagnosed with epilepsy. Current ILAE definitions were used to diagnose either a single epileptic seizure, or epilepsy [3]. In general, epilepsy was diagnosed after the occurrence of two or more unprovoked epileptic seizures prior to the examination, or if patients had abnormalities in EEG and/or MRI associated with a high chance (!60%) of seizure recurrence within the next 10 years. In total 178 patients were recruited through the first-fit service, of which only patients who suffered from an epileptic seizure are included. For example, patients diagnosed with a possible psychogenic non epileptic seizure or vasovagal syncope were excluded. In total, 26 epilepsy patients, of which 8 were converters (i.e. initially only having had a single seizure), and 17 subjects with a single seizure were included. During clinical follow-up, all patients were independently re-examined by two experienced neurologists (RPWR and MCGV). There were no differences in the follow-up diagnosis between raters. Clinical follow-up ended in September 2018, and the follow-up ranged from 21 to 74 months. Patient characteristics are shown in Table 1. Our institutional review board approved and issued a waiver of informed consent for our retrospective study, which was compliant with the Health Insurance Portability and Accountability Act.

EEG
For all participants, a routine EEG was acquired using a 10-20 system with a sample frequency of either 250 or 256 Hz. The EEG recording takes about 20 minutes and consists of hyperventilation, photo stimulation tasks, and resting-state periods with eyes-open and eyes-closed. In this study, the EEG data acquired during the 3-min hyperventilation task and the subsequent 3min resting period are considered. Frontoparietal and basal temporal electrodes were excluded (Fp1, Fp2, A1, and A2) from further analyses as these channels are particularly vulnerable to myogenic artifacts that could influence the computation of the functional connectivity metrics [8,12].

EEG
The EEG recordings with a 250-Hz sample frequency were first resampled to a sample frequency of 256 Hz using the EEGLAB toolbox in Matlab [13]. Second, to remove slow-wave drifts and high amplitude non-brain activity from the EEG data, a forward-backward filter as well as ASR (Artifact Subspace Reconstruction) was applied [14,15]. Subsequently, all channels were converted to an average reference montage using EEGLAB. The resulting EEG recordings were visually checked for artifacts. To ensure stability of the calculated EEG metrics, the two 3-min periods (hyperventilation and rest) were each divided into six consecutive epochs of 20 s, disregarding the first and last 30 seconds to prevent possible artifacts related to the transition period between tasks. Prior to further analysis, epochs were band-pass filtered into the commonly used frequency bands delta (2-4 Hz), theta (4-8 Hz), lower alpha (8-10.5 Hz), upper alpha (10.5-13 Hz), lower beta (13-20 Hz), higher beta (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma (30-45 Hz) [16].
For each frequency band, the Phase Lag Index (PLI), coherence, and synchronization likelihood (SL) were calculated between all channels and for each epoch. The PLI is a measure of phase synchronization between two time signals based on their phase differences [17]. Coherence measures the linear correlation between two signals as a function of frequency [18]. Synchronization likelihood is a measure of generalized synchronization between two time signals [19]. Each calculated metric was subsequently averaged over all channels and the consecutive epochs for each period, resulting in average 'connectivity'-values per subject for both the hyperventilation and the resting period. This results in a total of 42 EEG features.

fMRI
Preprocessing of the fMRI data and T1w structural images was performed using the Statistical Parametric Mapping software package, SPM12 (https://www.fil.ion.ucl.ac.uk/spm/) in Matlab R2016b. First, a slice timing correction was applied to the functional images. Second, to correct for head displacement, all slices were computationally realigned to the first volume in the sequence. Third, the functional images were smoothed through convolution with a 6-mm full-width half maximum (FWHM) Gaussian kernel. Fourth, a band-pass filter of 0.01-0.1 Hz was applied to extract the frequency band of interest. Fifth, the first 5 volumes (10 s) of the functional images were discarded to ensure steady-state longitudinal magnetization of the blood-oxygen level dependent (BOLD) signal [20,21]. Last, the time signals of each region of interest were averaged and corrected for head motion (relative translational movement [22]) as well as white matter and cerebrospinal fluid (CSF) signals via linear regression to reduce the contribution of physiological noise [23]. The T1w image and corresponding Freesurfer atlas were nonlinearly co-registered to the functional images using Elastix v4.9.0 [24]. The structural images were automatically parcellated into 68 cortical and 14 sub-cortical regions using Freesurfer (version 5.1) based on the Desikan-Killiany atlas [25,26].
Subsequently, fractional amplitude of low-frequency fluctuation (fALFF) analysis was performed. To calculate the fALFFs, the time series of each voxel was Fourier transformed to the frequency domain, and subsequently the power spectrum was calculated in specific frequency subbands. The fALFF is computed per voxel as the ratio of the power spectrum in a specific subband to that of the full frequency range (0-250 mHz). The ratio with the full frequency band helps eliminate the confounding influence of the relatively strong signal of the pulsating CSF and physiologic noise. The amplitudes of fluctuations were calculated for well-known predefined frequency subbands: 10-80 mHz (conventional frequency band), 10-27 mHz (slow-5), 27-73 mHz (slow-4), 73-198 mHz (slow-3), and 198-250 mHz (slow-2) [27]. Subsequently, the fALFF values were averaged over several spatial regions of interest, the frontal cortex, parietal cortex, temporal cortex, occipital cortex, and thalamus.
Furthermore, measures of regional homogeneity (ReHo) were determined using Kendall's coefficient of concordance of the fMRI time series of a given voxel with its 8 immediate (in-slice) neighboring voxels. The ReHo is a measure of functional connectivity based on the (rank) similarity of the fMRI time series in adjacent voxels [28]. Although most prior studies have also used the 26 immediate neighbors of a voxel, due to the anisotropic voxel size (i.e. thicker slices) we have opted to use the 8 in-slice neighbors.
In many prior studies, graph metrics of fMRI time series were found to be different in patients with epilepsy compared to healthy controls [29][30][31][32]. Therefore, besides the fALFF and ReHo measures, the functional times series are also expressed as a network by calculating Pearson's correlation coefficients for each pair of regions. Only those region pairs with a significant Pearson's correlation coefficient (p < 0.05) were added as an edge to the network. Furthermore, these coefficients were used as weights, where a higher correlation is considered a 'stronger' connection. Negative correlations were set to zero because the role of negative weights in the functional network is unclear [33]. Subsequently, the functional network was quantitatively described by two of the most robust and widely applied global graph metrics, the characteristic path length (L) and clustering coefficient (C) [34]. To obtain global graph metrics that are normalized with respect to variations in topology, the global graph metrics are determined relative to the average of 100 random networks with similar degree distribution (k ¼ L=L rand and c ¼ C=C rand ). A measure of small-worldness can now be defined as r ¼ c=k, where r > 1 indicates that a network has a small-world topology. Furthermore, to decrease the occurrence of false positives and false negatives in the network, only the nodes and edges present in the network of at least half of the subjects are considered in the graph analysis (i.e. group thresholding) [35]. The number of edges in each network is chosen such that the functional networks are 80% sparse. Graph metrics were calculated using the Brain Connectivity Toolbox (http://www.brain-connectivity-toolbox.net). This results in a total of 51 fMRI features.

Feature selection
The total feature set consists of 93 features; 51 fMRI and 42 EEG features. Since there is only a limited number of independent observations (i.e. 43 subjects), we first select candidate fMRI and EEG features from the total set based on their potential to distinguish new-onset epilepsy patients from single-seizure subjects using the t-statistic of the independent samples Student t-test. Note that feature selection is performed on new-onset epilepsy and single-seizure subjects only, and the converter group is not considered here (minimizing issue related to over-fitting). Furthermore, since highly correlated features do not add a lot of information, high correlations in the feature set are removed (i.e. when two features are highly correlated, the one with the lowest t-statistic is removed from the feature set). Correlation was assessed using Pearson's correlation coefficient, and a high correlation was defined as |r| > 0.70 [36]. To assess the effect of chosen correlation threshold, the analysis was redone for thresholds related to moderate (|r| > 0.50) and very high (|r| > 0.90) correlations. A common rule of thumb to prevent overfitting is to select ffiffiffiffi N p features, where N is the number of observations. Since our model is trained using 24 independent observations, we include the 4 features with the highest t-statistic in the model. A flowchart of the feature selection is shown in Fig. 1A.

Model training and prediction
A logistic regression model with L2 regularization (i.e. ridge regression) was trained on a subset of epilepsy and single-seizure subjects, after which it was used to distinguish clinically relevant converters from patients with a single seizure. L2 regularization adds the squared magnitude of the regression coefficients as a penalty to prevent over-fitting. The model was trained using a balanced set of 12 randomly selected epilepsy patients and 12 randomly selected single-seizure subjects. The remaining 5 single-seizure subjects as well as 5 randomly selected converters were used as test set. To ensure stable results, the selection of training and test set, as well as model training and testing was repeated 1000 times (Fig. 1B).
To study the predictive value of EEG and fMRI separately and combined, three models are trained. The first is trained using the 4 highest ranked EEG metrics, the second is trained using the 4 highest ranked fMRI metrics, and the third is trained using the 2 highest ranked EEG and the 2 highest ranked fMRI metrics. Predictive value was assessed in terms of the sensitivity and specificity of the classifier. To determine the sensitivity and specificity, the optimal operation point [37] of the receiver operating characteristic (ROC) curve was calculated for each repetition.

Statistics
Model performance was compared to the performance in the clinic. To assess this, we have tracked how many times each subject of the total set (17 single-seizure and 8 converters) was classified as a converter (relative to the number of iterations a specific subject was included in the model), resulting in a 'score' or 'probability' that the subject is classified as converter or single-seizure, and can be binarized using a threshold of 50%. Subsequently, the three regression models can be statistically compared to the clinic (i.e. in the clinic all test subjects are classified as single-seizure subjects) using the McNemar test. To correct for the imbalance of the single-seizure and converter group, a classification cost was added for the converter group proportional to the imbalance (17/8) [38].
Statistical significance was inferred when p < 0.05.

Results
In Fig. 2 the absolute value of the t-statistic is shown for the calculated EEG and fMRI measures. The higher the t-statistic the more likely it is that the measure differs between epilepsy and singleseizure subjects. Noteworthy is the strong resemblance between the EEG features for the rest and hyperventilation periods.
The absolute correlation coefficients between all features are shown in Fig. 3. Relatively high correlations can be observed between each of the fALFF measures (mean ± SD = 0.44 ± 0.26) as well as between the graph metrics (mean ± SD = 0.60 ± 0.32) (note that fALFF measures vary over spatial location while the graph metrics vary over sparsity). In case of the EEG, relatively high correlations are observed between the PLI, coherence, and SL for each frequency band (mean ± SD = 0.67 ± 0.31). Furthermore, the EEG metrics obtained during rest show high correlations with those obtained during hyperventilation. Relative low correlations (mean ± SD = 0.13 ± 0.10) are found between fMRI and EEG measures.
The selected features (after removing highly correlated features) are shown in Table 2. Noteworthy is that all the different EEG metrics (i.e. PLI, SL and Coherence) are selected, although for different frequency bands. Interestingly, the thalamus is the only brain region selected from the fALFF analysis.
Next, the three regression models are trained using selected features from (1) both EEG and fMRI, (2) only EEG and (3) only fMRI. Fig. 4 shows the aggregate ROC curves of the three regression models over all iterations. The sensitivity, specificity, and AUC (mean ± SD) of the regression models with both EEG and fMRI features are 74 ± 19%, 82 ± 18%, and 0.75 ± 0.12, respectively. Using only the EEG features resulted in a sensitivity of 71 ± 22%, specificity of 80 ± 20% and AUC of 0.69 ± 0.14. While, the classifier with only the fMRI features has a sensitivity of 70 ± 30%, a specificity of 67 ± 26%, and AUC of 0.57 ± 0.16. Positive and negative predictive value (mean ± SD) of the regression model with both EEG and fMRI features are 84 ± 14% and 78 ± 12%, respectively.  Using only the EEG features resulted in a positive predictive value of 76 ± 15% and a negative predictive value of 74 ± 14%. While, the classifier with only fMRI features has a positive predictive value of 68 ± 16% and a negative predictive value of 69 ± 16%. The McNemar test revealed that the model trained with features from both EEG and fMRI was significantly better compared to the clinical diagnosis (p = 0.04), while the model with only EEG features showed a trend toward significance (p = 0.07) and the model with only fMRI features was not significantly different from the clinical diagnosis (p = 0.38).
Comparable results are obtained when using a threshold of 0.50 and 0.90 (instead of 0.70) for the absolute correlation coefficient during feature selection (i.e. sensitivity, specificity, positive and negative predictive values, and AUC of EEG & fMRI > EEG > fMRI; results are not shown).

Current findings
In this study, we set out to investigate the predictive value of physiological signals measured using EEG and/or fMRI after a first ever seizure event. We showed that a predictive model with features from both EEG and fMRI gives rise to significant improvements compared to the clinical diagnosis, whereas the other models do not reach significance. While the predictive model trained with only features from the fMRI does not provide a lot of predictive power (AUC = 0.57 ± 0.16), the fMRI measures, combined with EEG, are required to significantly outperform the clinical diagnosis. This result seems to imply that EEG and fMRI measurements complement each other in terms of predictive value after a first seizure.

EEG
To aid the diagnosis of patients presenting with a first seizure, advanced methods focussing on functional dynamics have been proposed previously. For example, Douw et al. showed that SL was higher in the theta frequency band of patients with epilepsy compared to single-seizure subjects, and that it could classify epilepsy with a sensitivity of 62% and specificity of 76% [8]. Remarkably, the SL in the theta frequency band is included as one of the predictors in our model; however, in contrast to Douw et al. it is found to be lower in patients with epilepsy. Synchronization measures are highly dependent on seizure type and activity [39], and differences could be explained due to the heterogeneity of the first-fit clinic samples. Furthermore, Diessen et al. showed that SL of weighted functional networks metrics has the potential to discriminate new-onset epilepsy from single-seizure subjects, with mean specificity of 95% and mean sensitivity of 96% [40]. Moreover, Diessen et al. also showed that phase dependencies between EEG electrodes showed differences between new-onset epilepsy and single-seizure subjects using the PLI and a minimal spanning tree analysis [9]. Overall, higher phase dependencies in the delta and upper beta frequency bands as well as lower synchronicity and coherence in theta and lower alpha frequency bands, respectively, are the best predictors of epilepsy in our sample. While most prior EEG studies in epilepsy have investigated functional connectivity related to ictal activity, there are some resting-state studies showing altered power in theta and alpha and beta bands [41,42].

fMRI
In our study, the fALFF measures in the thalamus are found to be most valuable predictors, since the fALFFs of the slow 2, slow 5 and conventional frequency band measures in the thalamus are three of the four selected features. Prior studies have found varying outcomes of the fALFF measure in epilepsy. For the conventional frequency band, we obtained a decreased fALFF in the thalamus of epilepsy patients compared to single-seizure subjects. Similarly, a decreased fALFF in the prefrontal cortex (PFC) subregion of the thalamus was reported previously in patients with idiopathic generalized epilepsy [43]. However, an increased fALFF was previously reported in the thalamus of untreated first-episode idiopathic epilepsy patients [44] and juvenile myoclonic epilepsy patients [45]. Besides the fALFF measures, the global normalized clustering coefficient was also selected as a valuable predictor. A network with a lower clustering is considered to be less efficiently organized, and was previously related to several focal and generalized types of epilepsy [21,29,30]. In our study, the ReHo measure was not selected for the regression model for any of the frequency subbands.

Clinical perspective
In our study, the predictive model is trained on epilepsy and single-seizure subjects, while it is applied on a sample of converters and single-seizure subjects. Converters are those cases that were evaluated as having a low risk of seizure recurrence (<60%) by neurologists at their first presentation, but developed epilepsy  (e.g. experienced a second unprovoked seizure) in the follow-up. Therefore, methods to correctly classify these cases are especially clinically relevant, since a correct assessment of seizure recurrence risk allows the appropriate follow-up treatment. Here, we have shown that using information from the fMRI and EEG time series can result in a more accurate risk assessment of seizure recurrence after a single seizure event. Furthermore, our model indicates that a large portion of the converters can be identified as having a high seizure risk recurrence after the first ever seizure event, which allows the appropriate follow-up treatments.

Study considerations
A major strength of our study is the characterization of our sample with a clinically relevant subgroup of converters, and an independent re-evaluation of the initial diagnosis by two experienced neurologists. One of the major limitations of this study is the relatively small sample size. Machine learning algorithms excel when an abundant amount of data is available for training. Unfortunate, our small sample size is relatively small which could hamper generalization of the predictive model. Future studies with larger sample sizes are needed to corroborate our findings and to further assess the reproducibility and generalization of the seizure risk recurrence prediction. However, even with our small sample, we can already show that EEG and fMRI measures complement each other in terms of diagnostic value. Another limitation of the current study is the lack of geometric and B 0 field distortion correction of the fMRI data. This could have some bearing on our results, especially in regions prone to susceptibility artifacts such as the inferior frontal and temporal lobe regions. Future measurements should include special acquisition methods for this.

Concluding remark
In this study, the diagnostic value of physiological signals measured using EEG and/or function MRI in a first-seizure clinical sample is evaluated. The predictive value is assessed using logistic regression models, which are tested on subjects that developed epilepsy (e.g. a second unprovoked seizure) while they were initially evaluated as having a low risk of seizure recurrence (<60%) (i.e. converters). The following predictive values (mean ± SD) of the model including metrics from both modalities were found: sensitivity of 74 ± 19%, specificity of 82 ± 18%, positive predictive value of 84 ± 14%, and negative predictive value of 78 ± 12%. The predictive values of the EEG model were: sensitivity of 71 ± 21%, specificity of 80 ± 20%, positive predictive value of 76 ± 15%, and negative predictive value of 74 ± 14%. The predictive values of the fMRI model were: sensitivity of 70 ± 30%, specificity of 67 ± 26%, positive predictive value of 68 ± 16%, and negative predictive value of 69 ± 16%. Moreover, the EEG/fMRI model showed significant improvements compared to the clinical diagnosis, while the other models did not reach significance. These results indicate that analysis of EEG and fMRI time signals complements each other in terms of diagnostic value in a first-seizure sample.