AI-based SDTM Define-xml Automation

Henning Kuich

Oct 11, 2024

4 min

Today’s pharmaceutical industry has come to rely on metadata-based approaches to generate much of the define-xml. However, one critical part continues to rely on considerable manual effort: the computational methods section. This requires programmers to describe, in natural language, how each computation was performed. It is time-consuming and highly prone to human error. While specifications are sometimes copied over into the define.xml to describe the derivations, there is no guarantee that the derivation specifications will match the actual analysis code and provide enough detail for reviewers to assess their validity.

Consistency between define-xml, analysis code, and other submission documents is essential for timely and successful regulatory review. Discrepancies can trigger delays and raise concerns about the integrity of the study. The lack of an automated process creates a risk, potentially compromising the integrity and reliability of the submission process. Automating define-xml creation will enable sponsors to eliminate human error, guarantee consistency, and ensure they are always submission ready.

Even though the rise of Large Language Models (LLMs) has led to attempts to automate this process, their inconsistent output and tendency to hallucinate have prevented their adoption across the industry thus far.

At Verisian, we are building a solution that fully automates define-xml generation. Based on our code traceability, we have developed an AI system that auto-generates computational method descriptions in natural language, in addition to providing variable-level traces for every single CDISC variable. We have already deployed this in a live study with Lindus Health, where the system has created 172 computational methods across 15 SDTM domains in minutes.

At the press of a button, you can now generate define-xml in real-time that is guaranteed to be consistent with your analysis code to ensure submission readiness. Any changes made in the study’s derivations are instantly reflected, eliminating the risk of discrepancies.

To use our system, upload the SAS logs generated during the execution of your analysis code to our platform. From these logs, our system extracts the code that was used to derive each variable, generating a complete and deterministic representation of its functionality. Our proprietary AI system summarizes this information, creating a concise overview of the computational process. Since our code extraction is deterministic, we achieve a high-level of confidence in both accuracy and consistency. Reviewers get the best of both worlds: a natural language summary and the executed code for detailed questions, eliminating inquiries to give sponsors and reviewers the greatest possible confidence (Figures 1-6).

Figure 1: AESER Origin/Source/Method/Comment section in the AE (Adverse Events) table. Each variable displays the AI-generated summary together with a link to the computational method for the variable.

Figure 2: Computational method entry for AE.AESER from our demo study.

Figure 3: Computational methods entries for RFSTDTC and RFENDTC from our demo study.

Figure 4: Computational methods entry for LBNRIND from our demo study.

Figure 5: Computational methods entry for LBDY from our demo study.

Figure 6: Computational methods entry for XPDTC from our demo study.

If you want to see more samples, you can explore the SDTM define-xml generated by our system for our demo study here.

We are leveraging this technology for far more than just define-xml automation. Our platform validates that your analysis code adheres to specifications, verifies result outputs, and ensures traceability across entire studies. Paired with our AI system, it will generate additional submission documentation like the reviewer’s guides (SDRG and ADRG) and beyond. If you would like to learn more about how this can benefit your biometrics team and ensure the highest form of submission readiness, do not hesitate to get in touch.

‍

AI-based SDTM Define-xml Automation

Explore Further

Traceability and AI for Better Understanding, Communication, and QC of Trials

The Limitations and Opportunities of Large Language Models

Introducing the Verisian Community