Computing, Data, and Methodology
The Columbia Population Research Center supports researchers with tools and guidance for collecting, managing, analyzing, and sharing data across the full research lifecycle. Our services are tailored to population and social science research and include secure data environments, statistical and high-performance computing, and emerging AI-assisted methods
Research Computing Services
Data Platforms
- Platform
- Secure Data Enclave
- Description
- Citrix remote desktop, researchers can work on sensitive data and collaborate with other members of their project simultaneously
- Data Sensitivity
- HIPAA, PII, PHI, RHI
- Cost
- Free for CPRC Affiliates
- Platform
- Columbia Data Platform
- Description
- Cloud-based solution for research data storage, discovery, analysis, collaboration and archive
- Data Sensitivity
- HIPAA, PII, PHI, RHI
- Cost
- Free account for Columbia users, hosting cost may vary
- Platform
- LabArchives
- Description
- Electronic Research Notebooks designed to replace paper notebooks and lab manuals to support research productivity
- Data Sensitivity
- PII, PHI, RHI
- Cost
- Free for all Columbia users
- Platform
- Box
- Description
- Cloud-based service providing a simple and secure way to store and share files and folders online
- Data Sensitivity
- PII, RHI
- Cost
- Free for SSW faculty
- Platform
- AWS
- Description
- Provides on-demand IT resources like compute, storage, and databases
- Data Sensitivity
- PII, PHI through BAA
- Cost
- Free Tier & Pay-as-you-go
Globus
Globus is a high-performance data-transfer and sharing platform that allows you to move large and complex datasets directly between any two applications, systems, or local machines, eliminating the need for downloading and then uploading the data.
National High-Performance Computing – ACCESS
ACCESS is an NSF-funded program consisting of a nationwide collection of supercomputing systems available to researchers and educators. ACCESS resources are freely available for Columbia researchers and educators.
Columbia HPC Resources Insomnia & Ginsburg
CUIT’s High Performance Computing service provides a cluster of computing resources that power transactions across numerous research groups. CPRC has resources in both Insomnia and Ginsburg computing clusters. This is a free resource for CPRC users, for access please reach out [email protected]
Empire AI
Empire AI provides high-performance compute resources to eligible PIs across a consortium of New York State research institutions.
AI Resources
- Services
- ChatGPT Education
- Description
- The Education platform includes the Advanced Data Analysis, GPT Builder tool, DALL·E image creation/edition, and integrated browsing with Bing feature.
- Data Sensitivity
- PII, PHI, RHI
- Cost
- $300 per user, annually
- Services
- Chat
- Description
- CHAT is Columbia University's customizable AI Chat Platform, powered by LibreChat.
- Data Sensitivity
- Confidential
- Cost
- Paid service based on usage.
- Services
- Gemini
- Description
- Gemini supports multimodal capabilities—working with text, images, and code
- Data Sensitivity
- Confidential
- Cost
- Free and Pro versions available
- Services
- NotebookLM
- Description
- Integrated with Google Docs, NotebookLM can help you synthesize notes, generate insightful questions, and reveal connections within complex material.
- Data Sensitivity
- Confidential
- Cost
- Free and Pro versions available
For information on Columbia Data Classifications visit the University Policies Data Classification page
Qualtrics
Qualtrics is an easy to use, full-featured, web-based tool for creating and conducting online surveys.
Stata
Statistical software for data science. CPRC periodically conducts a school-wide bulk purchase, for more information on the next bulk software purchase please email [email protected]
Consultations and Methodology
CPRC’s Computing and Methods Core is dedicated to assisting with all levels of data acquisition and management. They can provide assistance in negotiating licensing arrangements for use of restricted or proprietary datasets and configure secure environments that meet the most stringent data security requirements. They also help researchers develop plans for the secure storage of confidential data and provides restricted Windows-based workstations for use of highly sensitive data.
Specifying the Technical Infrastructure (TI) requirements in research proposals can be difficult and time-consuming due to a lack of familiarity with the technology and services available at Columbia. The effort required to gather appropriate technical information can delay the completion of a grant proposal, sometimes to the point of missing a submission deadline. The Computing and Methods Core has TI information from around the University in order to provide CPRC researchers with "boilerplate" text for grant preparation.
The Computing and Methods Core provides consultation for CPRC researchers, connecting them to other faculty with specific types of methodological expertise. For consultation, please email [email protected].
Through a partnership with the Built Environment Health (BEH) research group, CPRC has secured GIS and spatial analyses consulting services for affiliates developing new research projects. Services include:
Consultation meetings with investigators to help them conceptualize research questions, familiarize them with geo-spatial/GIS research concepts, describe to them the available geo-spatial data, describe to them the statistical analysis methods developed for such research in New York City, and help them operationalize their definitions of neighborhoods and neighborhood characteristics.
Consultations on developing a research plan for using neighborhood data.
Development of Methods and Preliminary studies text and citations for grant applications.
Access to and analysis of geo-spatial data for preliminary studies.
Availability of well published faculty and/or geographers to serve as collaborators on grant proposals to execute the research.
For existing funded projects where investigators would like GIS support or access to geo-spatial data, the BEH group provides expertise, data, and GIS analyses on a fee for service basis.
CPRC affiliates interest in either set of services should reach out to [email protected].
The CPRC Survey Lab is a cluster of faculty and research staff with expertise in multimode survey data collection and hard-to-reach populations. Core Survey Lab projects include the New York City Longitudinal Survey of Wellbeing (NYCLSW, aka “Poverty Tracker”), an ongoing panel study of several thousand New Yorkers who are interviewed every 3-4 months, and the New York City Longitudinal Study of Young Children’s Health and Development (aka “Early Childhood Poverty Tracker”), both funded by Robin Hood. Survey Lab researchers also fielded surveys with a hard-to-reach sample of the Fragile Families and Child Wellbeing cohort, completing surveys with several hundred study participants who had been considered lost to follow-up, and are now fielding the Fragile Families Generation 3 study. The survey lab staff and management can be deployed by CPRC faculty to field surveys or for survey research consultations. The NYCLSW is also available to field test questions and collect experimental data on a probability sample of New Yorkers.
Foundations for Research Computing
Check out events and resources on programming, data science, and other skills related to computational research.
