📄Research component

Fractional information about the contents of the data DAG of a research object.

A component entity provides context to nodes in the data DAG of a research object. This could be to comment on its use, indicate the type of data, give it a descriptive name, and attach arbitrary metadata. The research object author uses instances of these entities to enrich context around important parts of the publication, like a dataset, a paper, or a piece of code.

Other actors can also create component instances to enrich the context of a research object, or to include the same CID's as part of their publications. A user-facing gateway can choose how to reconcile these different sources of information, and potentially use other author's components to suggest metadata for similar files.

Since any actor can create instances of a research component, it's likely that a gateway operator only will show those who are created (or accepted) by the DID that published the research object.

Schema

Field

Type

Description

name

String

Descriptive name of the component

mediaType

String

Media Type indicating the type of data

researchObjectID

ID

Unique identifier of the target research object

researchObjectVersion

Commit

Unique identifier of the research object version

dagNode

CID

Target node inside the data DAG

pathToNode

String

The unixFS path through the DAG (since a CID could exist in more than one place)

metadata

CID

JSON representation of arbitrary metadata

Media type

To aid gateways in picking how to represent data, individual files should have information about file type attached. This information isn't otherwise available, part from the extension if that's included in the UnixFS filename field. There is a rich variety of media types to pick from, and a gateway can implement a fallback representation based on the top-level type and map specific viewers depending on the subtype. An example of this would be showing text/* as regular text, but specialize the view of text/csv data as a table.

Since not all file types have a media type, a gateway can also draw conclusions from the file extension. A good example of this are code files in different programming languages. Conversely, not all media types have a singular corresponding file extension, so both sources of information are required to paint a rich picture of the content.

Motivation of separation from research object

Separating component information from the research object creates a specialized dataset which simplifies reverse CID look-ups, like finding which research that use a particular dataset or contain the same PDF's, and similar questions. Consider the diagram at the top of the page, where we can go through component entities and find which research objects have a particular dataset included in their publication.

The reason this is not embedded directly in the IPLD DAG is that it would change the CID. This would make it difficult both to trace re-use of data in other research objects and reduce usefulness of IPFS deduplication in nodes, which lead to higher maintenance cost for the network.

Last updated 1 year ago