There are ongoing concerns about the accuracy of COVID-19 case counts due to issues such as lack of access to testing and difficulty in measuring recoveries. This retrospective study applied machine learning, infoveillance techniques to detect and characterize user-generated conversations of COVID-19-related symptoms, experiences with access to testing, and mentions of recovery on Twitter.
The authors collected tweets from the Twitter public API between March 3-20, 2020. They filtered tweets for general COVID-19-related keywords and terms related to COVID-19 symptoms, and analyzed them using an unsupervised machine learning approach.
They collected a total of 4,492,954 tweets that contained COVID-19-related symptom terms; 3,465 (<1%) tweets included user generated conversations about experiences perceived to be related to COVID-19, and 63% (n-=1112) were from the U.S. They grouped these tweets into five main categories including: first and second-hand reports of COVID-19-related symptoms; symptom reporting concurrent with lack of access to testing; discussion of recovery; confirmation of negative COVID-19 diagnosis after receiving testing; and users recalling past symptoms and questioning whether they had been previously infected with COVID-19. Co-occurrence of themes was statistically significant for users reporting symptoms with lack of testing and with discussion of recovery.
The authors conclude that many Twitter users reported COVID-19-related symptoms, but never got tested due to lack of access. However, it is unclear how many of these users were actual cases and in the absence of further testing, accurate case estimations may never be known.