Computing, Data, and Methodology

Overview on Services

CPRC’s Computing and Methods Core provides a range of services, described below, to assist our affiliates with preparing grant proposals and conducting research. For assistance, direct all queries to [email protected], unless otherwise noted below.

High Performance Computing (HPC) refers to using either supercomputers, or clusters of computers linked together, to function as a single computer. With HPC, users usually log into one central manager system and submit computing jobs using specialized software. HPC is often used to create complex models, handle large amounts of data, run imputation across multiple data sources, and more.

Habanero & Terremoto Shared HPC Clusters

Habanero, launched in November 2016, and Terremoto, launched in December 2018, are the two shared HPC clusters. As part of the Social Science Computing Committe (SSCC), CPRC was part of both purchases, comprimised of 32 research groups and departments as well as CUIT. The clusters were facilitated by SRCPAC (an advisory group with representatives from all HPC user groups, libraries, and CUIT) and the Office of the Executive Vice President for Research. Both systems also include an education tier to support computational research classes and other training efforts that was jointly funded by Arts & Sciences and the Fu Foundation School of Engineering and Applied Science. CPRC's purchase grants free access to all CPRC affiliated researchers and the Computing & Methods core offers free consultation and training on these systems. Specifications on both clusters can be found on CUIT’s shared research computing website.

The Secure Data Enclave (SDE) system was a pilot launched by the Social Sciences Computing Cluster, of which CPRC is a founding member.  The SDE is a virtual computing environment for analysis of sensitive data, including HIPAA protected data. The platform allows remote access to authorized users.

Specifying the Technical Infrastructure (TI) requirements in research proposals can be difficult and time-consuming due to a lack of familiarity with the technology and services available at Columbia.  The effort required to gather appropriate technical information can delay the completion of a grant proposal, sometimes to the point of missing a submission deadline.  The Computing and Methods Core has TI information from around the University in order to provide CPRC researchers with "boilerplate" text for grant preparation.

CPRC’s Computing and Methods Core is dedicated to assisting with all levels of data acquisition and management. They can provide assistance in negotiating licensing arrangements for use of restricted or proprietary datasets and configure secure environments that meet the most stringent data security requirements. They also help researchers develop plans for the secure storage of confidential data and provides restricted Windows-based workstations for use of highly sensitive data.

Commonly used datasets include but are not limited to the following:

  • Eurostat
  • Institute of Education Sciences (IES) / National Center for Education Statistics (NCES) Restricted Use Data
  • The National Longitudinal Study of Adolescent to Adult Health (Add Health)
  • National Longitudinal Survey of Youth (NLSY)
  • National Survey of Families and Households

The Computing and Methods Core provides consultation for CPRC researchers, connecting them to other faculty with specific types of methodological expertise. For consultation, please email [email protected].

Through a partnership with the Built Environment Health (BEH) research group, CPRC has secured GIS and spatial analyses consulting services for affiliates developing new research projects. Services include:

  • Consultation meetings with investigators to help them conceptualize research questions, familiarize them with geo-spatial/GIS research concepts, describe to them the available geo-spatial data, describe to them the statistical analysis methods developed for such research in New York City, and help them operationalize their definitions of neighborhoods and neighborhood characteristics.
  • Consultations on developing a research plan for using neighborhood data.
  • Development of Methods and Preliminary studies text and citations for grant applications. 
  • Access to and analysis of geo-spatial data for preliminary studies.
  • Availability of well published faculty and/or geographers to serve as collaborators on grant proposals to execute the research.

For existing funded projects where investigators would like GIS support or access to geo-spatial data, the BEH group provides expertise, data, and GIS analyses on a fee for service basis.

CPRC affiliates interest in either set of services should reach out to [email protected].

The CPRC Survey Lab is a cluster of faculty and research staff with expertise in multimode survey data collection and hard-to-reach populations. Core Survey Lab projects include the New York City Longitudinal Survey of Wellbeing (NYCLSW, aka “Poverty Tracker”), an ongoing panel study of several thousand New Yorkers who are interviewed every 3-4 months, and the New York City Longitudinal Study of Young Children’s Health and Development (aka “Early Childhood Poverty Tracker”), both funded by Robin Hood. Survey Lab researchers also fielded surveys with a hard-to-reach sample of the Fragile Families and Child Wellbeing cohort, completing surveys with several hundred study participants who had been considered lost to follow-up, and are now fielding the Fragile Families Generation 3 study. The survey lab staff and management can be deployed by CPRC faculty to field surveys or for survey research consultations. The NYCLSW is also available to field test questions and collect experimental data on a probability sample of New Yorkers.

Foundations for Research Computing Resources

Check out events and resources on programming, data science, and other skills related to computational research.