Project Overview
Project Name: Advancing cancer screening through big data: a national initiative
Partners: Tata Memorial Center (TMC), GISE Hub IITB
Sectors: Healthcare, Data Analysis, Web Development
Description:
The global burden of cancer incidence and mortality is on the rise, necessitating the use of large-scale data handling and multi-dimensional analysis for the crucial first step in combating cancer: early-stage detection through screening. Although substantial data exists in the form of Hospital Based Cancer Registries (HBCRs) and Population Based Cancer Registries (PBCRs), there is a pressing need to centralize and reconcile this data for meaningful outcomes in the field.
The project’s central purpose is thus to develop analysis-ready cancer registries so as to improve cancer screening capabilities across India. The work will involve developing API-based access to cancer data, designing analysis tools for clinical significance, analyzing genomic data, and creating visualization dashboards. The collaborative effort of the project partners seeks to improve cancer research, prevention, early detection, clinical practice, and healthcare infrastructure development in India.
Current Status
The project has just started, with a clear plan and resource requirements identified.
Skills Learned
Participants in this project will have the opportunity to develop and enhance skills in the following areas:
- Large-scale data integration and analysis
- Genomic data analysis
- API development and data repository management
- Data visualization and dashboard creation
There is a large likelihood of achieving significant research results, with very real opportunities for publication in prestigious journals like Nature and Lancet.
Qualifications Required
Year of Study: Second year or above
Experience: One must necessarily be proficient with a programming language, preferably Python, and have an interest in the fields of epidemiology and data science. A background in web development and/or data analytics is appreciated.
Work Description
Roles: 2 Data Development and Clinical Data Analysis Positions
Initial Stipend: Rs. 20000/month
Project Duration: 2 months (for the first phase)
Tasks/Deliverables:
- Develop and evolve analysis-ready data shared under the cancer registry program through Application Program Interfaces (API)
- Build multi-dimensional genome sequencing analysis tools to process raw genome sequences
- Develop visual analytics tool through interactive dashboards to understand cancer risk factors and social determinants of health
- Build a secure website to host centralized cancer screening database/dashboard tools for India
How to Apply?
Submission Link: https://forms.gle/WcLGnemdHvr3BSFk9
Deadline - 11:59 PM, 15th October, 2023
To enroll for the project, you must fill the form above. For further credit, you can attempt and submit the assignment below to the best of your abilities, taking aid of any tools online. We will contact you personally if you are shortlisted for the interview.
(Optional) Assignment: gnomAD Data Analysis
You are provided with a subset of genetic variant data from the gnomAD exome data in a VCF file. Your task is to:
- Load and parse the VCF file using python hail tool.
- Calculate the allele frequencies for each variant. Plot the counts using matplotlib or any other data visualization library.
- Identify and list the top 10 variants with the highest allele frequencies.
Resources
Linked here are some resources that you may find helpful while solving the assignment:
Contact Information
For any general queries, join the ProSpace WhatsApp group- https://chat.whatsapp.com/E09qtrcuShp1uf2w82LCsa
For assignment queries, contact:
Email: 210050001@iitb.ac.in, 210040139@iitb.ac.in
Phone: 9324865787, 9987361968