Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Advisory Council December 2013 Meeting Presentation: Global Interactive Network

ADVISORY COUNCIL ON ALZHEIMER'S RESEARCH, CARE, AND SERVICES

Monday, December 2, 2013

 

Global Alzheimer's Association Interactive Network

Arthur W. Toga
GAAIN Alzheimer's Association
Transforming the way researchers approach the study of Alzheimer's disease

 

Problem: Storage

  • Kryder's Law: Storage medium density is increasing faster than that of integrated circuits predicted by Moore
  • Data growth is outpacing storage growth
  • Many researchers do not have sufficient local storage and/or computational resources

Problem: Bandwidth

Problem: Data Analysis

  • Requires expertise across domains to understand data and know what questions may be asked
  • Requires extensive computational resources -- processes can take days even with parallel processing systems
  • Volume and complexity make it difficult to visualize data
  • Difficult to combine data across domains

Neuroimaging Study Size (Typical)

  Year     Size     Equivalent to  
1998 54MB 20 copies of War and Peace
2005 67MB 24 copies of War and Peace
2012   531MB   193 copies of War and Peace  

Image Data Expansion

Each neuroimaging scan can spawn many derived image leading to exponential growth

Typical Example:
One 22MB structural scan
   Five preprocessed images (176 MB)
Eleven postprocessed images (222 MB)

22MB of raw data produces 420MB data for one scan

Genetic Data

  • Circa 2010 GWAS Data (per sample)
    • 620,000+ rows of data
    • ~81MB
  • 2012: Whole Genome Sequencing (per sample)
    • Standard output from Illumina -- multiple files and formats
    • ~250GB per sample
  • Example
    • 800 subjects x 250GB = 195TB
    • Time to transfer 195TB:
      • High speed internet (90 Mbit/s): 26 days
      • DSL (45 Mbit/s): 59 days
      • Dial-up (56 kbit/s): 100+ years!

Image Data Activity

Research efforts in Alzheimer's disease

Research efforts could be vastly expanded in scope and capabilities if data were linked to a global infrastructure that would enable scientists to access and utilize vast, interlinked repositories of data on thousands of subjects at risk for or already suffering from the ravages of Alzheimer's disease.

GAAIN is the first Global Big Data Network for Alzheimer's Disease

Collaborative effort to provide researchers around the globe with access to a vast repository of Alzheimer's disease research data

Supercomputers and High Availability Storage

 

Data Resources

  • Storage
    • Fault-tolerant storage area network
    • 400 megabytes per second data throughput
    • Near 24/7 availability
  • Protection
    • Daily & weekly on-site backup
    • Monthly off-site backup
  • New Data Center
  • New 820+ node center:
    • > 7400 total cores
    • > 40 TB memory
    • > 4 PB storage

Aggregating accounts into one hub

  • A single location to obtain data from a variety of sources and accounts
  • Users can apply to partnering consortiums via GAAIN after surfing through meta-data
  • Users' active accounts with partnering consortiums are also active through the GAAIN portal

Example: Klout.com

Klout.com collects data of user's presence in social media (ie: Facebook, Twitter, LinkedIn).

Example: Mint.com

Mint.com combines a user's financial information from a variety of sources (ie: bank accounts, credit cards, loans).

Example: Tripit.com

Tripit.com aggregates a user's travel and booking information (ie: airline tickets, vacation rentals).

GAAIN Aggregator and Personal Dashboard

GAAIN recognizes a user's existing accounts for partnering data sources and allows him/her to analyze the data with our tools and/or apply for additional consortiums

The dashboard indicates which data sources are unavailable to the user (ie: the user must apply for access, data source is currently offline)

Gaain.org Homepage

Log In / Sign Up Page

News and Updates Page

One-stop Data Access

Data from thousands of subjects, including clinical, genetic and imaging data types from our partners

Comprehensive Analytical Tool Stack

Bank of sophisticated imaging and genetic analytical tools available

Tools are supported by the LONI Pipeline

Interactive Filtering/Selecting UI

GAAIN Global Federation Version 1.0

  • Provide federated integrated access to multiple distributed Alzheimer's disease datasets
    • Stepwise model development
      • Phase I: Similar or identical data models
      • Phase II: Different data models but with same representation
        • Such as (all) relational
      • Phase III: Heterogeneous models
        • Relational versus XML ...
    • Integration of data in varying data models
    • "Syntactic and Semantic Heterogeneity"
      • Simply put -- data sources differ in how they represent the same thing!
  • Mediator technology to combine these data
  • Common Data Model based on and linked to CDISC

Data Heterogeneity

AD Data Consortium X
XADC   XID   SEX   BIRTHYR   MMSE  
ADNI | | |
RID   ..   EXAMDATE   GENDER   DOB   MMSCORE  
AIBL | | |
RID   ..   PTGENDER   PTDOB   MMSETOT  
AD Data Consortium Y   | |  
PTID   ..   MF   BIRTHDATE   APOE  

Federated Data Access via Mediator

  • Mediation approach
  • One-stop data access
  • Actual integration of data -- not just a clearinghouse
  • Maintain autonomy of each source

A big solution for big data

  • GAAIN serves as a benchmark for large data research efforts
  • Provides seamless connections of a users' existing Alzheimer's disease consortium data accounts
  • Allows researchers to narrow down a study population that relates to their work across multiple partner consortiums
  • Provides tools capable of analyzing clinical, imaging and genetic data types via the LONI Pipeline

 

Global Partners and Affiliates

  • LONI
  • neuGRID
  • EMIF
  • ADNI
  • aibl
  • Dominantly Inherited Alzheimer Network (DIAN)
  • Critical Path Institute

Common Representation Across Partner Data

  • CDISC-CPATH Alzheimer's Therapeutic Area Standard
    • Domain Model
    • Common Data Elements
  • CADRO* Ontology
    • Categories, Topics, Themes
    • Dommon Data Model linked to CDISC standards and CADRO

*Common Alzheimer Disease Research Ontology (CADRO) is a collaborative effort between the National Institute on Aging (NIA) and the Alzheimer's Association (AA)

Current Status

  • Mediator operational at GAAIN
  • Integration of ADNI, AIBL and NACC data
    • Integrated domain ("global") model developed
    • Mappings created
      • Global model and source
  • Successful federated querying across data sources
  • Identification of necessary analytical tools for meaningful discovery of clinical, imaging and genetic data types

Return to

National Alzheimer's Project Act Home Page

Advisory Council on Alzheimer's Research, Care, and Services Page

Advisory Council on Alzheimer's Research, Care, and Services Meetings Page