Resting-state EEG for the diagnosis of idiopathic epilepsy and psychogenic nonepileptic seizures: A systematic review

Quantitative markers extracted from resting-state electroencephalogram (EEG) reveal subtle neurophysiological dynamics which may provide useful information to support the diagnosis of seizure disorders. We performed a systematic review to summarize evidence on markers extracted from interictal, visually normal resting-state EEG in adults with idiopathic epilepsy or psychogenic nonepileptic seizures (PNES). Studies were selected from 5 databases and evaluated using the Quality Assessment of Diagnostic Accuracy Studies-2. 26 studies were identified, 19 focusing on people with epilepsy, 6 on people with PNES, and one comparing epilepsy and PNES directly. Results suggest that oscillations along the theta frequency (4-8 Hz) may have a relevant role in idiopathic epilepsy, whereas in PNES there was no evident trend. However, studies were subject to a number of methodological limitations potentially introducing bias. There was often a lack of appropriate reporting and high heterogeneity. Results were not appropriate for quantitative synthesis. We identify and discuss the challenges that must be addressed for valid resting-state EEG markers of epilepsy and PNES to be developed.


Introduction
Epilepsy is a neurological disease defined by the occurrence of at least two unprovoked seizures that are >24 h apart, or one unprovoked seizure and high recurrence risk [1]. The diagnosis of a seizure disorder is clinical; a specialist-led process based on consideration of multiple patient characteristics. This primarily involves a detailed clinical history including a thorough description of the seizure events. To support the diagnosis, scalp electroencephalogram (EEG), magnetic resonance imaging (MRI), and further investigations may be performed as indicated [2].
Diagnostic uncertainty is common following paroxysmal neurological presentations involving transient loss of consciousness, as this can be a symptom of epilepsy, as well as a number of different conditions, including psychogenic nonepileptic seizures (PNES), syncope, metabolic disorders, migraine, sleep and movement disorders, transient ischemic attacks and transient global amnesia [3]. Syncope and PNES are the most common differential diagnoses of epilepsy [3]. Psychogenic nonepileptic seizures are episodes of observable abrupt paroxysmal change in behavior or consciousness in the absence of the electrophysiological changes in the brain that accompany an epileptic seizure [4].
Capturing a patient's typical event on simultaneous video-EEG recording (video telemetry) is the most direct evidence pointing to a diagnosis of epilepsy or PNES. To observe a seizure event while recording normal scalp EEG activity supports a diagnosis of PNES over epilepsy [5]. However, the use of EEG is time-and resourceintensive, and the limited recording time of routine EEG appointments is often inadequate to detect seizure events or interictal abnormalities that occur infrequently. According to a recent meta-analysis, the estimated diagnostic sensitivity of routine EEG for adults with a first unprovoked epileptic seizure was 17.3% (at 94.7% specificity), with a positive test defined by the presence of interictal epileptiform discharges (IEDs; [6]).
Studies on misdiagnosis of epilepsy highlight the difficulty of identifying the nature of seizure events, with reports of misdiagnosis rates in adults between 5.6% and 26% [7]. Differentiating between PNES and epileptic seizures represents a significant problem in clinical practice, resulting in a mean diagnostic delay of 7 years [8]. This is further complicated by the possibility of The study of spontaneous EEG dynamics could be of great relevance to neurological conditions that arise as a result of intrinsic neurophysiological abnormalities, such as the epilepsies of idiopathic origin, which are thought to be genetically determined [15].
While current neurophysiology practice is mainly based on visual inspection of EEG recordings, the development of computational techniques for EEG signal processing has allowed the investigation and analysis of subtle dynamics that are not detectable by visual inspection alone (Box 1). These could potentially have a clinical role as adjunctive diagnostic indicators, shall our understanding of these markers in epilepsy and PNES advance enough and demonstrate appropriate levels of validity.
The aim of this review is to systematically summarize current knowledge of resting-state quantitative EEG findings in adults with idiopathic epilepsy or PNES and explore their potential utility as adjunctive diagnostic markers of disease. We will examine methodological limitations and sources of bias in an attempt to guide further advances in the field.

Methods
We performed a systematic review. The protocol for this study was registered in the online PROSPERO database before search execution and can be accessed from crd.york.ac.uk/prospero (Record ID: CRD42020179174). The only deviation from the protocol was the age boundary for inclusion of study populations; this was moved from 18 to 16 years, as we observed that many studies defined peo-ple > 16 years of age as adults. Our intention was to perform a metaanalysis, but this was not deemed appropriate for reasons outlined in Section 3.4, i.e., no marker was investigated by more than 5 studies within the same diagnostic group, and a high risk of bias label was assigned to most studies. This manuscript has been prepared according to the PRISMA-DTA (preferred reporting items for systematic reviews and meta-analyses of diagnostic test accuracy studies) guidelines [16]; a checklist is available in Appendix 1.

Objectives
To systematically review the literature concerning the characteristics and diagnostic accuracy of quantitative resting-state EEG markers in people with idiopathic epilepsy or psychogenic nonepileptic seizures.

Index test
Studies were eligible for inclusion if they utilized whole-brain EEG, defined as at least 4 electrodes placed bilaterally to overlie one anterior and one posterior location, and if EEG was recorded during awake resting-state, interictally. Studies were excluded if the recordings took place in close time proximity (i.e. a few hours) to a seizure event, or if there was evidence of interictal epileptiform discharges (IEDs) on the EEG considered for analysis.

Population
Studies were eligible if they included human adults (>=16 years old) with a clinical diagnosis of idiopathic epilepsy or psychogenic nonepileptic seizures.

Comparator
Studies were included only if a control group was present, consisting of people that did not have the same diagnosis of the population group.

Outcomes
Studies were included if they reported group-level descriptors, and/or diagnostic accuracy indices (sensitivity and specificity as a minimum) for the EEG measures examined, or adequate information for these to be calculated or obtained from personal communication.

Type of studies
Studies with analytic designs were included, i.e., observational or experimental if a baseline resting-state condition was present. Languages considered were English, Italian, and Spanish. In order to reduce the influence of convenience sampling, studies with a total sample size n < 20 were excluded. No restrictions by year or type of setting were applied.

Information sources
We searched the following databases for relevant literature up to the 17/04/2020: MedLine, EMBASE, PsychINFO, and Web of Science, and the first 200 references as sorted in the relevance ranking of Google Scholar (as recommended by [17]). The exact search strategy is reported in Appendix 2. We scanned the references of all included studies to identify further relevant work. Email alerts were set for all the databases in order to continue screening studies up to the start of data extraction, on the 23/07/2020.
In order to correct for publication bias, a call for gray literature was emailed to relevant groups identified through the search.

Study selection
A two-stage screening process was followed. In stage one, titles and abstracts were independently screened by two reviewers (IF and SS). In stage two, the full texts of the potentially eligible articles were independently screened by two reviewers (IF and SS), and reasons for exclusion were documented. Any disagreements were resolved through discussion and if necessary, by third party arbitration (PS). Inter-rater reliability was calculated [18].
If study eligibility could not be established following full text screening, authors were contacted, and further details were requested. A maximum of 3 contact attempts were made before excluding studies due to insufficient information available.
When studies included a mixed population cohort (e.g., adults and children, or people with lesional and non-lesional epilepsy), authors were contacted with a data sharing request for the eligible sub-group. In line with our inclusion criterion on sample size, these were requested if the study included a minimum of 20 participants fulfilling the inclusion criteria.
No studies were excluded from the systematic review based on their risk of bias or applicability, as measured by our quality assessment tool (see Section 3.3). High risk of bias was an exclusion criterion for the meta-analysis (see Section 3.4).

Data extraction and quality assessment
Data extraction was carried out by one reviewer (IF) and double-checked by a second reviewer (SS). The data extraction form has been developed based on the Cochrane Handbook for Systematic Review Checklist of items to consider in data collection or data extraction and can be found in Appendix 3 [19]. Study authors were contacted to request missing information or clarify ambiguities. If impossible to obtain otherwise, means and measures of dispersion were approximated from figures. When overlapping reports on the same sample were individuated, the ''core" paper containing the key study data was considered for data extraction, using the other papers as supplements.
Risk of bias and applicability were assessed independently by two reviewers (IF and SS) by means of the QUADAS-2 (Quality Assessment for Diagnostic Accuracy Studies 2, [20]). As by guidelines, QUADAS-2 items have been tailored to the present review (Appendix 4).

Data synthesis and analysis
Characteristics of included studies were synthetized in results tables, and qualitatively described. Results of the included studies with associated diagnostic accuracy indices (sensitivity and specificity) or effect sizes were reported. Effect sizes were calculated by means of standardized mean difference (Cohen's d; [21]). Cohen's d was calculated based on means and standard deviations or standard errors for all studies with available data, except for [22] and [23] for which the F-statistic and t-statistic values, respectively, were used, and [24] for which Mann-Whitney U-values were used. Statistical synthesis was to be performed if 5 or more studies that investigated the same resting-state EEG metric in people with the same diagnosis and a control group were individuated [25]. Moreover, studies must not have been labeled as ''low" quality on the QUADAS-2, as meta-analysis of poor-quality studies may be seriously misleading [26]. Since we have not been able to perform a meta-analysis, we do not report all pre-specified meta-analytic methods here. These are extensively described in our protocol (PROSPERO Record ID: CRD42020179174).

Study selection
8574 studies were identified through our database search. 2 additional studies were identified through weekly email alerts based on the same search terms. Following duplicate removal, 5305 studies were subject to abstract screening. Of these, 507 were selected for full-text review (94.1% inter-rater agreement, Cohen's k = 0.66 indicating substantial agreement). Authors from 17 studies were contacted to request whether a full-text report was produced from conference abstracts of interest; 8 provided a response. Three confirmed that no full text report had been created and were therefore excluded as ''abstract only". The 9 studies whose authors did not provide a response after 3 contact attempts were also excluded as ''abstract only". Authors from 45 studies were contacted to request clarifications directly related to our inclusion and exclusion criteria; 32 provided a response. The remainder were excluded following 3 contact attempts as not enough information was available to determine eligibility (n = 13). Of those who provided a response, 25 were able to retrieve the information requested. The remainder were excluded as not enough information was available to determine eligibility (n = 7).
Authors from 13 studies were contacted with a sub-group data sharing request, as only a sub-group of the study participants met our inclusion criteria. In 3 cases, no response was received. In 8 cases, sub-group data were not available due to the nature of the analyses, as only group data were saved. Sub-group data were provided for 2 studies [27,28], which were therefore included in our systematic review. Analyses for the eligible sub-sample have been reported in Appendix 6.
Following a request for additional data for one study [29], we repeated the analyses and found that our results were different from those reported in the original paper. We contacted the authors, who agreed that their original analysis approach was incorrect. It consisted of running independent-sample t-tests on a set of values that included 6 separate repeated measures per participant, rather than averaging across the 6 repeated measures before running a t-test on the mean values. This resulted in pvalues reflecting a sample of 168 people rather than the actual sample size of 28. This study was therefore excluded due to unreliable analysis methods.
After reviewing full texts and clarifying information with authors, a total 484 studies were excluded for the reasons listed in Fig. 1. Twenty six studies satisfied all inclusion criteria and were included in our review (100% inter-rater agreement, Cohen's k = 1). Authors from 9/25 studies for which communication was needed provided additional information during the data extraction phase. No studies were included in quantitative synthesis. Table 1 outlines the main characteristics of the included studies. 19 articles from 12 independent research groups investigated resting-state EEG dynamics in the population with epilepsy as compared to controls, 1 study compared epilepsy and PNES specifically, and 6 studies from 4 independent research groups compared people with PNES and controls.

Studies in epilepsy cohorts
All studies used a case-control design. The total sample sizes ranged from 27 to 158. Fifteen studies had the investigation of group differences as their only outcome, while diagnostic accuracy indices were the only outcome for two studies [30,31]. Three studies examined both outcomes, with diagnostic accuracy indices computed for those predictors that were significant in grouplevel analyses [27,32,33].
Early studies tended to describe the study population in terms of seizure types, with four studies including people with nonlesional epilepsy characterized by partial seizures [34], or a mixed sample with generalized or partial seizures [27,35,36]. The remain-  der of the studies categorized their population according to epilepsy types or epilepsy syndromes, including nine studies on idiopathic generalized epilepsy (IGE/PGE); [24, 28,30,32,37,38,39,40,41], two studies including both a sample with IGE and a sample with non-lesional focal epilepsy [42,43], two studies studying IGE and non-lesional temporal lobe epilepsy (TLE) as a single sample [31,33], two studies focusing on non-lesional TLE only [22,44], and one study focusing on cryptogenic focal epilepsy only [45].
Most studies had a single comparator consisting of healthy controls, with the exception of 3 studies including a control group with neuropsychiatric disorders [27,36] (one of which comprised of PNES patients exclusively, [22]). Three studies included an additional control group composed of people with a diagnosis of tension headache [34,36] or first-degree relatives unaffected by epilepsy [39].
The average age for the study populations ranged from 19.4 to 48, with participants having a mean age between 20 and 30 years in 5 studies, between 30 and 40 years in 7 studies, and between 40 and 50 years in 4 studies. For 4 studies, it was impossible to retrieve information on average age. Studies varied in their propor- Table 2a Visual summary of results for 20 studies examining group differences between people with epilepsy and control, with effect sizes. (See below-mentioned references for further information.) Notes: For each study, the frequency bands examined are represented by coloured cells. Cells are left blank for frequencies that were not investigated. Yellow indicates significant differences with Epilepsy > Control in at least one electrode location; blue indicates significant differences in the opposite direction, i.e. Epilepsy < Control. Gray indicates no significant differences. Values in the cells represent the effect size (Cohen's d) for the difference. n/d: not enough data were available to calculate the effect size. When more than one value for the effect size d is presented, these refer to different types of comparisons carried out by the study; see study-specific notes for further details. a People with epilepsy were compared to two control groups: neuropsychiatric or headache controls. For absolute power, reported Cohen's d values indicate a range of minimum to maximum values across the channels examined. For relative power, Cohen's d refers to the significant comparison of epilepsy versus neuropsychiatry controls. b Three comparisons are performed: healthy controls compared to 1) patients with genetic generalized epilepsy (GGE) with photosensitivity (mean power d = 2.14; significant difference); 2) GGE without photosensitivity (mean power d = 0.97; difference not significant; 3) non-lesional focal epilepsy (mean power d = À0.17; difference not significant). For Mean Frequency values, the three effect sizes d reported refer to the same three comparisons, and none of the differences are significant for this measure. Means and standard errors used to calculate Cohen's d were approximated from Fig. 1. c We considered 1) the supplementary analysis 7 (topographic analysis), comparing people with non-lesional epilepsies (IGE or FE) with poor seizure control to healthy controls (topographically specific significant differences are observed, but an effect size could not be calculated as per-channel data were not available)and 2) the comparison of global alpha power shift between people with IGE and healthy controls, which was calculated based on individual patient data available on the study's online repository (difference not significant: t(63) = À1.04; p = 0.301; d = 0.25; see supplementary analysis in Appendix 6). d We considered the comparison between healthy controls and epilepsy without abnormalities on EEG recordings. Note that in this paper, the values descriptively reported in the text are different from the values reported in the tables and reports on statistical significance are also not coherent. We therefore did not calculate an effect size for these comparisons. Colour coding for significant differences refer to the information reported in Table 1 results for a sub-group of the 114 participants, obtained by excluding patients with MRI/CT abnormalities and with IEDs on their EEG recordings (see Appendix 6 for analyses and results). Individual patient data for this specific sub-group have been provided by the study authors upon request. The results presented in this table have been derived based on such data, and therefore do not exactly reflect results presented in the original paper, although the overall conclusions remain unchanged. The data on age are approximations, as the age of each participant represents the mean of the 4-year age range. The original paper performed analysis on power values as well. This has not been possible to replicate on the eligible sub-group as raw data for this measure were no longer available. c In this study, authors examine wakefulness recordings while participants were not performing any tasks (i.e., in resting-state), as assessed by their report of daily activities (personal communication). d results for a sub-group of the 30 participants, obtained by excluding 3 patients that were <16 years old (see Appendix 6 for analyses and results). Individual patient data for this specific sub-group have been provided by the study authors upon request. The results presented in this table have been derived based on such data, and therefore do not exactly reflect the results presented in the original paper, although the overall conclusions remain the same. tions of males to females, with information on gender not always reported.
Patients were taking Antiepileptic Drugs (AEDs) in 17/20 studies, were not on any medications in two studies [30,37] and in one study a minority of patients were taking other medications (hypnotics, benzodiazepines, or antidepressants; [27]). People included in ''healthy control" groups were not taking any medications in twelve studies. Five studies did not report any information on medication use in healthy controls. Headache controls were taking analgesics or sedatives in one study [34] while they were not taking any medications in one other study [36]. A minority of neuropsychiatry controls were taking hypnotics, benzodiazepines, Table 2c Visual summary of results for 6 studies examining group differences between people with PNES and control, with effect sizes. (See below-mentioned references for further information.) Notes: For colour codes, refer to notes in Table 2a. n/d: not enough data were available to calculate the effect size. When more than one value for the effect size d is presented within one cell, these refer to different types of comparisons carried out by the study; see study-specific notes for further details.

Table 2b
Visual summary of results for 20 studies examining group differences between people with epilepsy and control, with effect sizes (continued). (See below-mentioned references for further information.) Notes: For colour codes, refer to notes in Table 2a. Note that the range of frequency for each cell in this table differs from the one in Table 2a. This subtle change was made to reflect the frequency bands examined by the majority of studies in this table. n/d: not enough data were available to calculate the effect size. When more than one value for the effect size d is presented within one cell, these refer to different types of comparisons carried out by the study; see study-specific notes for further details. a Effect sizes refer to the comparison between people with IGE and healthy controls. They were calculated based on numerical values obtained via personal communication.
People with IGE were also compared to a group of unaffected relatives. No differences between patients and unaffected relatives were found in any band for any measure (with d ranging from 0.001 to 0. or antidepressants in one study [27] and were not taking any medications in another study [36]. In one study, information on medication use for a control group with PNES was not provided [22]. In all studies, resting-state EEG recordings used for the analyses were normal on visual inspection, i.e., free from abnormalities such as interictal epileptiform discharges or background slowing. Most studies investigated resting-state EEG while participants were asked to remain awake with their eyes closed, except one study which examined eyes open recordings [41], and one which examined a mixture of eyes open and eyes closed recordings [35]. For one study, this information was unclear [28]. In one study, the resting state was defined as a period of awake recordings during which participants were not performing any tasks, as assessed by reports of daily activities, with no information on eye state ( [44], personal communication).
The number of EEG electrodes varied from 8 to 64, with the majority of studies using 19 electrodes. The amount of EEG data used across studies vary from 13 s to 20 min (mean: 2.7; SD: 4.85 min), with epoch length ranging from 1 to 30 s (mean: 10.2; SD: 8.7 s). Discussion on the range of markers and oscillation frequencies examined is provided in the next section.

Results of studies in epilepsy cohorts
Group differences for a total of 26 EEG markers were investigated by 20 studies (Tables 2a, 2b). It should be noted that there was a degree of variability between studies in the boundaries of the frequency band examined, and results reported in Tables 2  and 3 are grouped based on approximate boundaries for visual representation purposes. Please refer to Table 1 for specific details on the frequency bands examined by the individual studies.
Measures of power (9 studies). The most investigated were absolute power and mean power, examined by three and four independent studies, respectively, from 1991 to 2020. The most consistent finding identified by 5/5 studies investigating the theta band (h: 4-8 Hz approximately) was an increased absolute or mean theta power in the epilepsy cohorts as compared to controls [35,37,40,41,44]. This effect was large (Cohen's d = 0.92-1.20, as suggested by two studies; Table 2a). Evidence was concordant despite the five studies varied widely in their methodology and patient characteristics, including different epilepsy types. One additional study obtained the same finding for theta power [27], but this has not been reported in Table 2a because it was not possible to confirm whether findings applied to the subsample of eligible (i.e., non-lesional) patients specifically as individual patient data on power was not given. Results are mixed for the delta (d: 1-4 Hz) and alpha (a: 8-13 Hz) bands, with approximately half of the studies reporting increased power in the epilepsy cohorts (d = 1.29-2.14), and half reporting no differences (d = 0.17-4.12). 4/5 studies described increased beta power (b: 13-30 Hz; d = 1.05-1.14), and 2/2 studies described increased gamma power in epilepsy as compared to control (c: 30-70 Hz; d = 1.61), although is worth noticing that muscle activity artifacts were not always excluded from EEG recordings.
Three studies explored ratios of power between different frequency bands (i.e., mean power shift, ratio of high to low power on the left (PHLL) and right (PHLR), and Relative Power), generally observing a significant shift of power toward low frequency rhythms in epilepsy as compared to controls [34,36,43].
Measures of EEG frequency (4 studies). Measures relating to frequency (i.e., peak frequency, mean frequency/Hjorth Mobility; Table 2a) were investigated by four studies, with one study observing that the highest alpha power value (i.e., peak) appears at lower frequencies in the epilepsy cohort as compared to controls (slowed dominant frequency, d = À0.69 to À0.99; [40]), one study reporting decreased mean frequency (as indexed by Hjorth Mobility) in the epilepsy cohort [34] and two studies reporting no group differences in mean frequency (d = À0.12 to À0.93; [37,42]). Table 3 Visual summary of results for 5 studies examining diagnostic accuracy indices for the diagnosis of epilepsy, and for 2 studies examining diagnostic accuracy indices for the diagnosis of PNES, with sensitivity and specificity values, and values for the decision threshold. (See below-mentioned references for further information.) Notes: The gradient of green reflects low to high indices of diagnostic accuracy. SD = standard deviation. FC = functional connectivity. Sens = sensitivity. Spec = specificity. n/ s = not specified. LOO = leave one out. a Results for a sub-sample of participants meeting the inclusion criteria for this review. For details, see sub-group analysis in Appendix 6. Hemispheric differences (2 studies). Two studies focusing on people with non-lesional epilepsy characterized by focal seizures examined measures of hemispheric differences and reported higher power and frequency asymmetry in epilepsy as compared to controls across a range of frequency bands from delta to beta ( [34,44]; Table 2a).
Functional connectivity measures (2 studies). Measures of functional connectivity were examined by two studies, one reporting increased Synchronization Likelihood (SL; non-linear method) in patients with generalized and partial seizures in the theta band (d = 0.41; [27]) and one reporting findings in the opposite direction, i.e., decreased Mean Lagged Coherence (MLC; linear method) in patients with focal epilepsy in the theta and alpha bands (d = À0.57 to À1.02; [45]; Table 2a).
Graph theory measures (5 studies). Five studies, three of which were based on the same cohort of patients and controls (see Table 1), investigated graph theory metrics. Findings for Mean Degree, Degree Distribution Variance, Global Order Parameter, Participation Index, Onset index, and Escape Time are all based on evidence from single studies (Table 2b). These generally indicate significantly higher values for the epilepsy cohort as compared to control in the theta and low alpha bands (d = 0.66-1.07; [39,32]), with exception of Escape Time (which was significantly increased also in the beta and gamma bands; [38]), and Participation and Onset indices (for which no significant differences were detected, d = 0.02-0.79; [24]).
Two independent studies, one examining people with IGE [39] and one examining people with cryptogenic focal epilepsy [45], provided contrasting evidence on measures of Clustering Coefficient and Path Length. Critical Coupling in the theta and low alpha bands was reduced in two independent studies on IGE (d = À0.70 to À0.88; [32,24]; Table 2b).
Chaos and information theory measures (2 studies). Measures based on chaos and information theory were examined by two studies (Table 2b); one reported increased Hurst Exponent (d = 0.67) and decreased Approximate and Sample Entropy (d = À1.21; d = À2.37) in epilepsy as compared to control, indicating higher predictability (lower complexity) and dependency on previous values in the epilepsy resting-state EEG [33]. The second study, on the contrary, reported increased Shannon Spectral Entropy in epilepsy, indicating lower predictability (higher complexity) in epilepsy, when the alpha band specifically was considered (d = 2.26; [28]).
Diagnostic accuracy (5 studies). Diagnostic accuracy of 9 groups of measures were explored by five studies, based on three fully independent study samples ( Table 3). Two studies only focused on exploring diagnostic accuracy of resting-state EEG markers [30,31], while the other three studies computed these based on previous exploration of group-level analyses on the same sample [27,32,33]. Three studies focused on the theta and alpha bands, reporting poor discriminatory performance for measures of Power Peak and Mean Degree (sens: 0-0.03 at spec: 1, and spec: 0-0. 16 [30,32]), and for a measure of functional connectivity, i.e., Synchronization Likelihood (h sens: 0.73, spec: 0.82; [27]). Two studies based on the same sample reported high discriminatory power for measures of Wavelet Energy (sens: 0.90, spec: 0.99) and complexity measures (sens: 0.92, spec: 0.90) on examination of the whole frequency spectrum, and for measures based on chaos/fractal theory (sens: 1, spec: 1), which are independent from frequency information [31,33]. Caution should be taken when interpreting these results as none of the studies explored how evidence generalizes to a fully independent sample, except [30] in their examination of Power Peak as based on findings from [29] (which was excluded from this review due to detected inconsistencies in the analysis methodssee Section 3.1). It is therefore unknown whether results suggesting high discriminatory performance only reflect sampling characteristics such as narrow inclusion criteria, or analytical factors such as model overfitting, and how discrimination indices might differ if tested on novel datasets (i.e., independent from the samples where group-level analyses are performed to guide marker selection). Values of the thresholds for discrimination were reported by one study only and were generally optimized based on the sample under study.

Studies in PNES cohorts
Six studies based on four independent samples examined resting-state EEG dynamics in people with PNES as compared to a control group (Table 2c). These were published between 2011 and 2020. All studies used a case-control design. Four studies had group-level descriptors as their only outcome [23,46,47,48], one focused on diagnostic accuracy indices only [49], and one examined both [50].
Total sample sizes ranged from 20 to 86. In all of the studies, a group of healthy controls was used as comparator. The average age for the study populations ranged from 20 to 40. In four studies (based on two fully independent study samples), a comparable number of males and females were examined. In two studies, the patient sample had higher prevalence of females [23,49]. All participants were not taking any medications in three studies [23,48,49], while in the sample shared by the remaining three studies most patients were taking AEDs, benzodiazepines or antidepressant medications [46,47,50].
Resting-state EEG recordings used for the analyses appeared normal on visual inspection in all studies. All examined eyesclosed EEG recordings, except one which examined a mixture of eyes-open and eyes-closed recordings [23]. The number of EEG cap electrodes was 19 or 20 in three studies, and 128 in the remaining three, for which source analysis was used. The amount of EEG data used varied from 20 s to 20 min, with epoch length ranging from 1 to 5 s.

Results of studies in PNES cohorts
PNES and healthy controls did not differ on most of the measures and frequency bands examined. Results for most measures are based on single studies, except for Absolute and Relative Power, Clustering Coefficient and Global Efficiency, each examined by two studies (Table 2c).
Measures of power and EEG frequency (3 studies). Absolute power was investigated by two studies; one reported significantly higher power values in high beta and gamma in PNES as compared to control (d = 0.84-0.91; [23]), while the other reported no differences [50]. For Relative Power, higher delta and theta values in PNES were reported by one study [46], while the other study reported no differences [23]. One study reported no differences in Total Spectral Power and Mean Frequency [23].
Hemispheric differences (1 study). No hemispheric asymmetries were detected on three indices examined by one study [23].
Functional connectivity and graph theory measures (4 studies). Various indices of functional connectivity were explored by four studies, with scattered results [23,46,48,50]. Five measures based on graph theory were examined by two studies, overall indicating no differences between PNES and controls (d = 0 to À0.88), with the exception of a report of higher Assortativity Index in beta in the PNES population (d = 0.63-0.73; [47]), and a report of lower gamma Clustering Coefficient in PNES as compared to control (d = À0.90; [48]).  Diagnostic accuracy (2 studies). Diagnostic accuracy was explored by two separate studies. Descriptive indices of power achieved high discriminative performance in one study (acc: 0.81-0.99; [49]), and Lagged Functional Connectivity was reported to be a good predictor of diagnosis by the second study (sens: 0.67, spec: 0.67; [50]). As no validation was performed on novel samples, no information on generalizability of these models is available. Values of the thresholds for discrimination were not reported.

Quality assessment
Most studies were affected by high risk of bias related to patient selection and the index test, as assessed by the QUADAS-2 [20] ( Table 4, Fig. 2). Risk of bias was high or unclear with regard to flow and timing of patient evaluations for most studies, but generally low in relation to the reference standard (risk of bias was considered unclear when lack of reporting prevented evaluation of bias). Concerns regarding applicability of the index test were high or uncertain for most studies, while these were generally low regarding patient selection and reference standard. See Appendix 5 for detailed results for individual studies.
It is important to note that the QUADAS-2 is designed to assess bias in diagnostic accuracy studies. Here, this has been applied to all studies, including those examining group differences. In such cases, an indication for a ''high risk of bias" label does not relate to their usefulness for the pathophysiological understanding of seizure disorders. Instead, it reflects on the level of concern should the measures examined be implemented as diagnostic markers or translated to clinical practice without further validation by diagnostic accuracy studies. This is relevant as 10/19 studies exclusively examining differences between groups suggest that the EEG markers studied have the potential to be applied in clinical practice to increase the yield of routine EEG examinations, differentiate between disorders, or develop novel treatment strategies.

Patient selection
Risk of bias was high in all studies due to implementation of case-control designs, and exclusion of ''difficult to diagnose" patients such as those with suspected disease and no confirmed diagnosis. These factors lead to overoptimistic estimates of diagnostic accuracy, or effect sizes that are inflated as compared to when cases and controls are sampled from the same population, which more closely reflects what is encountered in clinical reality [51][52][53][54]. Most studies failed to describe their sampling method. 10/26 studies failed to describe demographic features for patients and controls, or control for any differences. Age and gender differences are main confounders in EEG research [55][56][57][58].
Concerns regarding applicability were generally low, indicating confidence that the included patients match the review question in most cases.

Index test
The main reasons why almost all of the studies scored high on risk of bias for the index test were incorporation bias (i.e., failure to implement blinding to diagnosis during selection of EEG epochs for the analyses [59]), and failure to control for the effect of medications on the EEG. Six studies included study cohorts that were not taking any medications, and only three of the remaining twenty studied medication effects quantitatively to rule out confounding. Additionally, measures to prevent or control for daytime sleepiness or circadian effects were implemented by 4/26 studies only. These are main confounders in EEG research [60,61]. However, most of the studies adopted methods for EEG artifact removal, most commonly selection of non-artifactual epochs by means of visual inspection.
There were concerns regarding applicability of the index test, indicating that the conduct and interpretation of the EEG may not be up to state-of-the-art standards, mainly due to lack of reporting on EEG equipment, technical details, and personnel training.

Reference standard
The diagnostic methods used were likely to classify epilepsy or PNES accurately in most cases, with epilepsy diagnoses given by epilepsy specialists according to operational guidelines in most cases, or a diagnosis of PNES made following observation of a typical seizure event on video-EEG in the absence of any EEG changes indicating epilepsy [5]. Therefore, concerns regarding applicability were also generally low.

Flow and timing
Approximately half of the studies failed to report information on the period of participants' recruitment, and whether or not all people who were recruited were subsequently included in the analyses. Most studies reported results selectively, meaning that only a subset of the measures examined was reported, usually based on their statistical significance. Most of the studies reporting all results gave graphical representations only, with no numerical values.

Meta-analysis results
In accordance with our pre-specified criteria, a meta-analysis was not performed as no marker was investigated by more than 5 studies within the same diagnostic group, and a high risk of bias label was assigned to most studies. This field of research is not mature enough to allow quantitative synthesis.

Discussion
This is the first systematic review comprehensively examining resting-state EEG markers in people with idiopathic epilepsy and people with psychogenic nonepileptic seizures (PNES). We summarized studies reporting on the group differences and diagnostic accuracy of quantitative indices computed from interictal EEG recordings without any abnormalities on visual inspection.
Twenty six relevant studies were identified, 19 of which examined people with epilepsy, 6 people with PNES, and one compared these two populations directly. Although some potentially relevant studies have been excluded due to insufficient information to determine eligibility (Appendix 7), we consider this to be a representative sample of the available evidence relating to our review question.
Results suggest that resting-state EEG recordings have the potential to reveal subtle differences in the spontaneous neural dynamics of the idiopathic epilepsies, with oscillations along the theta frequency (4-8 Hz) likely playing a relevant role.
The association between epilepsy and the theta band has previously been identified [62][63][64][65]. Studies included in the present review consistently indicate that the resting-state EEG of people with idiopathic epilepsy is characterized by (i) increased theta power, and (ii) a pattern of EEG slowing, as indicated by a shift of power and power peak toward lower frequencies. These findings were persistent across a range of conditions including generalized and focal seizure types, eyes-closed and eyes-open recordings, and different clinical and experimental settings.
There is also an indication for aberrant functional connectivity and network organization in idiopathic epilepsy along the theta band (4-8 Hz), extending into low alpha (6-9 Hz). These findings are supported by a lower number of studies and datasets, with conflicting findings potentially relating to differences in study methods and patient characteristics. Nevertheless, consistency in the frequency bands detected highlight the relevance of these measures which deserve further investigation. Similarly, measures based on chaos and information theory hold some promise but require further study.
Collectively, results suggest that altered resting-state EEG is an aspect of the pathophysiology of idiopathic epilepsy. Epilepsies of idiopathic origin are associated to EEG slowing, and to the intensified presence of a low frequency rhythm occurring interictally with potentially pathological connectivity and network organization; these are not necessarily detectable by visual inspection alone and reflect a continuous underlying pattern of abnormal neuronal firing and neural communication.
These observations could be explained in the context of the thalamocortical dysrhythmia framework whereby the altering of fine balances in neuronal electrochemistry generates low frequency thalamocortical rhythms, abnormal inhibitory patterns, and disrupted signaling to connected regions [66]. The present review suggests that altered mechanisms could be at work not only before and during ictal states, but also during interictal periods, in the absence of interictal epileptiform discharged (IEDs).
The extent to which findings of the included studies are driven by the effect of AEDs remains to be quantified, as most studies included people on AED monotherapy or polytherapy. AEDs influence quantitative EEG indicators, including oscillations along the theta frequency [67]. However, theta overactivity and EEG slowing occurred also in studies including unmedicated groups of patients [37], or after controlling for medication effects [35,43].
In people with psychogenic nonepileptic seizures, neurophysiological alterations appear to be less marked, as most of the measures examined so far are not significantly different from what is observed in healthy controls, despite the fact that some changes have been sparsely reported. The limited evidence available so far supports the notion that PNES is less likely to have strong neurophysiological origin as detected by resting-state EEG.
The scattered changes described by the included studies would benefit from replication, and do not closely resemble the pattern observed in idiopathic epilepsy. This is encouraging as it opens the possibility of identifying differences (and potentially, diagnostic indicators) between the resting-state EEG of people with epilepsy and people with PNES. This is an unexplored field of research with evidence from a single study so far [22].
Although general trends can be observed, no quantitative summary of the available evidence can be presented due to significant heterogeneity between studies at the level of participant characteristics, EEG and analysis methods. This is in line with a previous systematic review on resting-state EEG specifically focusing on network measures in idiopathic generalized epilepsy, where high heterogeneity was also identified as a factor limiting study comparability [68]. Studies on lesional focal epilepsies have been found to be less heterogeneous and allow meta-analysis [69].
Importantly, in the present review most of the included studies were subject to a number of methodological limitations potentially introducing bias at the level of patient selection and index test procedures. Lack of appropriate reporting on study populations, methods, and results was frequent.
We identified numerous challenges that must be addressed for valid resting-state EEG markers of epilepsy and PNES to be developed. In order for a marker to be useful, analytical validity, clinical validity, and clinical utility must be demonstrated [70,71].
Analytical validity refers to a test's ability to achieve robust and reproducible technical results. Reproducibility of EEG research is a well-known, longstanding issue, described as one of the main limitations to clinical implementation of quantitative EEG since 1987 if not before [72]. This remains true to the present day mainly due to the variety of approaches that can be implemented to acquire, process and analyse EEG data, all of which have the potential to affect results.
Of the 26 studies included in this review, only seven had adequate reporting on the conduct and interpretation of the EEG, as indicated by our assessment on applicability of the index test. Just 2 articles made their analysis code available [30,43], and 5 made their dataset publicly available, or available upon request [24, 28,32,39,43]. Only one of the included studies [30] attempted replication of previous research (i.e. [32,39]).
Collective effort must be made to adhere to best practices of EEG data acquisition and analysis, to report methods and results comprehensively and transparently, including making data and codes publicly available. This would additionally benefit study comparability and allow meta-analyses. Future research should comply with the latest recommendations for reproducible EEG research (to date, [73]). Studies using machine learning algorithms should make additional efforts to report information such as model architecture and training parameters (see [74] for reproducibility guidelines) and to improve model transparency and interpretability; methods to assess the predictors' contribution to a model have been proposed [75].
In studies assessing the accuracy of EEG markers to identify a diagnosis, a threshold (i.e., cutoff score) on the predictor is established which segregates participants with a diagnosis from those without. In EEG research, thresholds for discrimination are generally optimized to the specific population under study to yield the highest values of sensitivity and specificity, rather than being pre-specified. Studies should report values of the threshold for test positivity in order to allow comparability and assess whether any differences in diagnostic accuracy are to be ascribed to study characteristics rather than threshold variations.
A second step in the development of EEG markers of clinical relevance is establishing their clinical validity. Clinical validity refers to the accuracy with which a test detects a clinical diagnosis [70]. In order for a test to be clinically valid, it needs to produce accurate estimates of diagnostic accuracy, such as sensitivity, specificity and positive and negative predictive values. To this end, it is essential for studies to control for sources of bias which could lead to over-or under-estimation of diagnostic accuracy indices. When selecting the study population, balance between internal validity and generalizability should be carefully considered. All patients suspected of having epilepsy or PNES over a specific period of time should be consecutively enrolled; such a selection would reflect the population in whom the marker under study would be used to inform the diagnostic decision-making [53]. On the contrary, implementing a case-control design whereby a group of patients with known disease is compared to a control group without the condition leads to exaggerated estimates of diagnostic accuracy, especially when cases and controls are sampled from different source populations [51,53,54]. Case-control designs remain essential for understanding the pathophysiology of seizure disorders and to guide future research, but findings should be further validated on appropriately sampled cohorts in order for valid markers of disease to be developed.
With regard to the index test, efforts should be made to document technical and analytical details and control for common sources of bias systematically, including EEG artifacts, demographic differences, medication effects, circadian variation, and state of alertness [55,56,60,61,76,77]. Additionally, it is important to ensure independence between the process of selecting and interpreting the resting-state EEG, and that of establishing a diagnosis in order to avoid incorporation bias. This occurs when results of the index test are explicitly used as part of the diagnostic decision-making [78]. While this is reasonable in clinical practice, especially when a diagnosis is established clinically as in the case of epilepsy and PNES, this incorporative process can lead to overestimation of diagnostic accuracy in research studies [59,78]. Independence can be achieved by blinding the investigator who selects resting-state EEG epochs to the diagnosis.
Only after analytical and clinical validity are achieved, restingstate EEG markers will hold enough promise for clinical utility to be assessed. This will involve determining whether the marker's ability to identify a disorder is actually useful to inform clinicians in their diagnostic decision-making and whether it provides any advantages to the patients' health outcome over existing diagnostic practices [70].
Identification of EEG markers for the diagnosis of epilepsy or PNES is a challenging task that will require careful consideration of the factors discussed in order to advance the field. There is need for future research to be collaborative in order to bridge the clinical and computational fields. During our full text screening, we excluded 219 papers claiming to have developed 95-100% accurate classification tools for the diagnosis of epilepsy based on the Bonn EEG dataset [79]. This is composed of a set of restingstate scalp EEG data from 5 healthy volunteers, and a set of intracranial EEG data from 5 people with drug-resistant epilepsy acquired during pre-surgical evaluations. Such patient sampling, intermixing of scalp and intracranial data, and sample size are not appropriate for development of a diagnostic tool; results are not applicable, nor generalizable to different datasets [80]. Authors with a background in computational sciences should make an effort to communicate with the medical field to understand the context and reality of clinical practice and avoid overpromising language which leaves studies vulnerable to being misinterpreted. Such studies further highlight the importance of making appropriate databases publicly available. Prospective data collection consortiums could also be established to combine data from different research groups and allow mega-analyses and replication studies.
The present review has a number of limitations. We have only included studies published in English, Italian, or Spanish. Author response was a potential source of bias as we excluded 20 studies that did not have enough information to determine eligibility. Equally, we included two studies for which authors were able to provide sub-group data for only a fraction of the total sample eligible for inclusion [27,28] but excluded 11 studies for which authors where not able to provide individual patient data. Studies excluded for the aforementioned reasons have been referenced in Appendix 7.
Remission of seizure disorders was not addressed. A number of studies included some people who had been off AEDs and seizure-free for up to 36 years [32,35,38,39]. The question emerges of whether these people still have epilepsy, and therefore meet the inclusion criteria for this review. Criteria for determining resolution of epilepsy have been proposed and defined as a 10-year seizure-free period, the last 5 of which should be off antiepileptic drugs [1]. In this review, we considered epilepsy to be more than seizure expression, as seizure remission might not necessarily represent the absence of subtle neurophysiological alterations that characterize idiopathic propensity to seizures, nor the absence of cognitive and psychosocial associates of this condition [81].

Conclusion
Numerous studies have explored the potential for resting-state EEG markers to describe or differentiate between people with seizure disorders and control groups. This is an emerging field of research, and currently quantitative comparability of studies is not possible. Results highlight the potential for valid quantitative EEG markers to be identified and eventually, for their clinical utility to be assessed. Collective effort is required in order to improve transparency and reproducibility of resting-state EEG research, and to control for sources of bias by addressing shortcomings in study design. This will allow comparability between studies and potentially identification of valid adjunctive markers of disease.

Funding
This work was supported by the Bergqvist Charitable Trust through the Psychiatry Research Trust as a PhD scholarship to Irene Faiman. Professor Young's independent research is funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London. The views expressed are those of the authors and not necessarily those of the funding Trusts, the NHS, the NIHR, or the Department of Health. The funders were not involved in any aspects of this work's planning, execution, or article preparation.