Gen3 supports a flexible graph-based data model, which can be customized for a wide variety of projects and use cases. At this community event we will hear from several data commons operators on how they have created their dictionaries and about any tools or processes they use for updating and configuring them. The event will include the following presentations:
Introduction to Gen3 Data Models (Michael Fitzsimons, Robert Grossman - Center for Translational Data Science, University of Chicago)
Presentations from Data Commons
Streamlining Gen3 Data Dictionaries: Python Tools and Google Sheets for simple, automated and efficient dictionary development - We will describe our python Gen3 Schema mapping library and how it enables an automated workflow to edit, test, validate and publish Gen3 Data Dictionaries, using a google sheet as input. We will then describe how we applied these tools to develop a data dictionary for the Australian Cardiovascular disease Data Commons.
Marion Shadbolt - Australian BioCommons
Spreadsheet-based data ingest with Gen3 dictionary-based validation - Spreadsheet templates are provided to Aotearoa Genomic Data Repository users for data ingest purposes, because it is a more straightforward user experience than the native submission portal. This talk provides an overview on our particular use cases and motivations, and demonstrates an extensible validation tool used to check metadata captured in a spreadsheet against an arbitrary Gen3 dictionary.
Eirian Perkins - New Zealand eScience Infrastructure (NeSI)
Evolution of the MIDRC Data Model - We will present how the data model for the Medical Imaging Data Resource Center (MIDRC) was created and is maintained. Topics discussed will include a brief introduction to the MIDRC project, considerations for creation of a new data model, how the data model was created and maintained, major changes to the model and how to migrate data, and introduction and maintenance of derived data elements.
Chris Meyer - Center for Translational Data Science, University of Chicago
Versioning, migrations, and data release processes in the Pediatric Cancer Data Commons - The Pediatric Cancer Data Commons (PCDC) supports multiple independent consortia within a single Gen3 instance and currently has data on more than 35,000 patients. In this session, we will discuss the PCDC’s approach to data set versioning, data releases, and data migrations, highlighting some of the operational impacts of our approach.
Brian Furner - Data for the Common Good, University of Chicago