Small cohort of patients with epilepsy showed increased activity on Facebook before sudden unexpected death

Sudden Unexpected Death in Epilepsy (SUDEP) remains a leading cause of death in people with epilepsy. Despite the constant risk for patients and bereavement to family members, to date the physiological mechanisms of SUDEP remain unknown. Here we explore the potential to identify putative predictive signals of SUDEP from online digital behavioral data using text and sentiment analysis tools. Specifically, we analyze Facebook timelines of six patients with epilepsy deceased due to SUDEP, donated by surviving family members. We find preliminary evidence for behavioral changes detectable by text and sentiment analysis tools. Namely, in the months preceding their SUDEP event patient social media timelines show: i) increase in verbosity; ii) increased use of functional words; and iii) sentiment shifts as measured by different sentiment analysis tools. Combined, these results suggest that social media engagement, as well as its sentiment, may serve as possible early-warning signals for SUDEP in people with epilepsy. While the small sample of patient timelines analyzed in this study prevents generalization, our preliminary investigation demonstrates the potential of social media data as complementary data in larger studies of SUDEP and epilepsy.


Supplemental Material
: Statistics from a Negative Binomial Regression on Word Count per Post.µ 1 and n 1 correspond to the mean word count and number of posts before the last two months, while µ 2 and n 2 correspond to the mean word count and number of posts during the last two months before SUDEP.Also included are the intercept of the regression, the coefficient on the last month indicator variable time coef , its standard error timese, the p-value of the coefficient timep, and the dispersion parameter θ with its standard error θse.
A negative binomial model is often used to model over-dispersed count data, i.e. when the variance is considerably larger than the mean [S1].Here a negative binomial model is estimated through a generalized linear regression with log link function on word count per post over a dummy variable representing whether the post's word count occurs during the last month.The significance of the time-indicator dummy variable estimates the significance of the change in the last month over all other posts.As shown in Table S1 we see significant increases in the word count per post for four subjects at p < 0.05.The table is ordered according to the rank product of the number of posts before and during the last two months preceding SUDEP, and the two with the greatest number of posts in both periods by rank product are also the two with the greatest increase in word count, subjects 2 and 1, with two additional subjects showing significant increases, subject 6 and 10.There are five subjects with decreases in word count per post, with subject 11 and subject 3 with significant decreases.
An alternative formulation is to examine word count per day rather than per post.Perhaps some subjects additionally start posting short posts with increased frequency during periods of stress.However, many days contain zero posts, thus zero words, for most subjects.We can model this with a zero-inflated negative binomial model that also estimates a probability that no words will be posted [S1, S2].As shown in Table S2 we see that subject 2 and 1 still have significant increases in word count per day (columns time coef and time p ) and both are significantly more likely to post during the last 2 months (columns 0 time coef and 0 timep ,    note the negative coefficient corresponds to a lower probability of having no posts on a given day).Subjects 8 and 10 are significantly less likely to post during the last two months.Subject 11, however, is significantly more likely to post during the last two months, although with significantly fewer words per day.Subject 3 is seen to have a significant drop in word count per day.Additionally, subjects 7 and 5 are significantly more likely to post in the last two months, but with non-significant changes in word count.This view of the posting behavior also reveals interesting patterns but is not particularly more informative than the negative binomial model per post.S2: Statistics of a Zero-Inflated Negative Binomial Regression on word count per day.This is similar to Table S1, but models the word count per day rather than per post, with the addition of a logistic regression model representing the likelihood of no post at all.Included are the intercept of the regression, the coefficient on the last month indicator variable time coef , its standard error timese, the p-value of the coefficient timep.Additionally, parameters of the logistic regression on no-post probabilities are shown: the intercept 0 intercept , the coefficient on the time indicator 0 time coef and the significance of this coefficient 0 timep .

Figure S1 :
Figure S1: Subject verbosity per post over different epochs.Difference between word count per post in the period immediately preceding SUDEP compared to word count per post during earlier posting periods.Different selections of the time window for the last posting period are displayed on the x-axis.The box plot on the far left represents all posts before the 12 weeks preceding SUDEP.The blue line represents the p-value of the time coefficient for the negative binomial regression.The direction of the arrow represents the sign of the coefficient, up indicates an increase in wordcount during the period preceding SUDEP and down indicates a decrease.The horizontal black line represents p-value=0.05.

Figure S2 :
FigureS2: Subject verbosity per day over different epochs.Difference between word count per day in the period immediately preceding SUDEP compared to word count per day during earlier posting periods.Different selections of the time window for the last posting period are displayed on the x-axis.The box plot on the far left represents all posts before the 12 weeks preceding SUDEP.The blue line represents the p-value of the word count time coefficient for the zero-inflated negative binomial regression.The direction of the blue triangle represents the sign of the coefficient, up indicates an increase in wordcount during the period preceding SUDEP and down indicates a decrease.The red line represents the p-value of the zero post time coefficient of the regression, with red triangles representing whether there is an increase in the likelihood of any post on a day (up) or a decrease (down).The horizontal black line represents p = 0.05.