Aim 3: Social web and environmental annotation integration

 

Discussion led by Chirag L., Griffin W, Shawn , Jared H,  and Tianxi C. about 1) annotations services and implemented servers 2) linkage algorithms and proximal results.

Courtesy of Jared, here are some of the aims with regard to the social web:

Crowdsourced and Social Media Sources and Analysis

We complement traditional data integration and analysis approaches by using publicly available information from the social media databases in an attempt to reliably identify complementary data and evidence of individuals who exhibit signs or symptoms of a disease. These massive sample size sources have an exceptionally large amount of information pertaining to individual geographic, demographic, and behavioral variables. Case statistics generated from social media sources will be compared to previously published, peer-reviewed estimates of disorder and disease prevalence and incidence across various geographic and demographic areas, and in a novel extension, compared to that obtained in PICIs based on Aim 1 and 2 methods, to determine if the social media method pursued in this Aim complement that data derived in Aims 1 and 2; is reflective of the diseases for which the PICIs were created, and is accurate and reliable.

We propose to:

1. Develop a method to accurately and reliably identify individuals who display a targeted disease and disorder related behavior based upon online activity of social media websites.

2. Demonstrate that accurate identification of disease or disorders is possible using social media information

3. Contribute new, meaningful information to the scientific community on the predictors of disorders based on social media-related behaviors, personal online associations, and other social and demographic information.

4. Evaluate the accuracy of the case definition either via survey or by association to clinical record.

5. Develop methods for data linking to clinical data streams.

6. Use neurodevelopmental disorders (Aim 4) as the representative disease for our method and analysis development.

Current Progress:

1. Working in collaboration with Chirag Patel’s group, determine an appropriate group of keywords associated with asthma.

     Use the Brownstein lab’s geo-tweet database to pull down tweets containing these keywords in a two month time frame from historical data.

      - Potentially alter parameters to examine seasonality (i.e. better return in spring when pollen at peak)

      - Adjust keywords as needed

2. Perform a targeted prospective pull down of all tweets going forward after refining exploratory approach described above.