TSVs, or tab-separated values files, are the most commonly used format for data submission to a Gen3 Data Commons.
One TSV is submitted per node in the graphical data model starting with the root node
program and next its child node
project. Note: often a data commons administrator will create these first two nodes for a data submitter, thus the data submitter should begin their submission by preparing TSVs for the child node(s) of
The column headers of a TSV are the properties of a node, and the rows in TSVs are individual data records. Thus, the number of rows in a TSV minus the header is the number of records that will be created or updated when the TSV is submitted to a Gen3 Data Commons.
typefor every row. This is the name of the node (or
node_id), and it must be exactly the same for every row.
submitter_idof every record by entering a unique text identifier in each row. Make sure you don’t use the same value in more than one row of your TSV because every record in a project must have a unique
Every row in the TSV / record in a node must have a unique
submitter_id. This can be any textual identifier, usually something human-readable or encoding some information, e.g.,
Every record in every node in the data commons has the property
id, which is a UUID, not to be confused with “submitter_id” or “project_id”
Links are a special kind of property. Every node besides the root node
program requires a link to it’s “parent node”. The link is specified by “
Note that you can also specify links with the UUID (or the
id property). So you could also make your links:
studies.id, the value of which would be the
id (UUID) of your study instead of the
Some child-parent node relationships are one-to-many or many-to-many (i.e., “-to-many” relationships), meaning that one child record can have multiple parent records. For example, if a single
subject belonged to two ‘studies’, then the
subject TSV would specify the two links with the headers:
studies.submitter_id#2. If there was a third study, it would be
studies.submitter_id#3. The values of those links are the
submitter_id properties of the
program node has the property
project node has the property
code, and every other node has the property
submitter_id. These all serve the same basic function, which is to give that record a more human-readable call-name (the internal call-name is the UUID).
A project also has the property
project_id, which is the unique combination of the program
name and the project
code. So, for your project, the program is
prog and the project code is
proj; so your
project_id is the dash-separated combination:
Here are some sample TSV file templates for example nodes in a Gen3 data dictionary.
Look at the “Dictionary” section, for the Gen3 data commons, to find the complete list of nodes and their template TSVs.