Introducing Dataset Traceability

Henning Kuich

Nov 13, 2024

3 min

The basis of Verisian's code traceability is rooted in a concept called dataset traceability. This type of traceability captures every dataset imported, created, and exported during the execution of your programs. All datasets and their relationships are displayed in a dependency graph that extends across all files in your analysis. Our tools let you seamlessly navigate all the way from TLFs, through ADaM, SDTM to raw data inputs, clearly displaying upstream dependencies and downstream effects, resulting in a complete, navigable and transparent data lineage (Figure 1).

Figure 1: Example of Dataset Traceability. Try it in our demo.

In the example below (from our demo study), manually tracing a result to its source requires a programmer to go through eight different files. With dataset traceability, we can now automatically create a virtual file containing just the 1,010 lines of code required to create the dataset of interest, providing instantly the complete data lineage of any dataset in your analysis (Figure 2).

Figure 2: Dataset traceability from a result to source data. Explore our demo study here.

Besides dataset traceability, Verisian also provides value and variable traceability. To learn more about this, and how we combine traceability with AI to help statistical programmers, have a look at some of our other articles:

Finally, you can try out the Verisian Platform yourself with our demo study.

Introducing Dataset Traceability

Explore Further

Traceability and AI for Better Understanding, Communication, and QC of Trials

The Limitations and Opportunities of Large Language Models

Introducing the Verisian Community