Generative AI Bill of Materials (GBOM)™: A Primer

Table of contents

The adoption of Generative AI tools by developers for software creation has undeniably introduced a wave of innovation and efficiency, offering a multitude of benefits to organizations. However, this technological leap forward is not without its inherent complexities and potential pitfalls.

While Generative AI holds immense promise, it poses significant challenges across various domains, including legal, compliance, vendor relationships, and team dynamics. These challenges can encompass concerns related to intellectual property rights, data privacy regulations, contractual obligations, and the need for effective collaboration within cross-functional teams.

The Generative AI Bill of Materials (GBOM)™ serves as a critical mechanism for organizations to proactively address and mitigate these risks.

What is a Generative AI Bill of Materials (GBOM)™

A Generative AI Bill of Materials (GBOM)™ is a comprehensive list of all of the code in an organization that is created in full or in part by Generative AI tools. It’s a strategic framework designed to help organizations navigate Generative AI terrain with confidence and foresight.

Generative AI tools like ChatGPT and code-specific tools like GitHub and CoPilot help developers draft code, complete code reviews, create tests, and brainstorm new coding approaches. The GBOM itemizes code created or modified from these tools, provides automated contextual information to understand the risks, and structures organizations’ triaging and disposition of the potential risks.

By providing a structured framework for risk disposition, a GBOM empowers organizations to navigate the complex terrain of Generative AI code confidently, ensuring both operational excellence and risk mitigation in the rapidly evolving AI-driven software development.

Lessons from Open Source Software Bill of Materials (SBOM)

A GBOM is analogous to a Software Bill of Materials (SBOM). A GBOM is needed to manage the same sets of risks as an SBOM address, while also addressing risks unique to the use of Generative AI code. Diving into SBOMs can, therefore, provide some context on the need for a GBOM.

A Software Bill of Materials is a comprehensive list of the third-party code that a codebase includes. The most common form of third-party code is Open Source code. Developers go to Open Source repositories, most commonly on GitHub, and copy or reference that code for their own work.

It is appropriate and expected for developers to use Open Source code – it saves time by not having to “reinvent the wheel” on components that are not the core intellectual property or “secret sauce” of the organization. As a result, at least 70-90% of code today contains Open Source code, though that number could be as high as 97%.

It is now standard for medium and larger companies, and companies going through an M&A transaction to have a SBOM. Why? The benefits of Open Source code are accompanied by significant risks and requirements, including:

Security risk. Open Source code contains significant security risks, such as the log4j incident in 2022. The good news is that the Open Source community tracks the security risks, publishes the warnings (CVEs), and prioritizes upgrades to address those risks. Perhaps because of the collective responsibility for this code, Open Source code is generally safer from a security perspective than in-house code. Sema’s research on $1T+ worth of software organizations found that organizations have 15 times more security risks in the code they wrote than from Open Source.
Legal risk. Open Source code can come with legal risk, specifically that software users are not following the license requirements of that code. As stated, the most restrictive Open Source license type, “Copyleft”, requires that the user of the code give away their own code for free. This legal risk is rarely enforced operationally but must be checked and mitigated during transactions. Being in compliance with Open Source licenses is effectively a requirement of technical due diligence.
Vendor compliance. Sophisticated software purchasers such as financial services institutions and health care organizations require their vendors to provide an SBOM to identify Open Source code used in creating their software, in light of the risks described above.
Regulatory compliance. In the US, for example, a Software Bill of Materials is required for vendors doing business with federal agencies following a 2021 Executive Order.

Why do companies need a GBOM

A Generative AI Bill of Materials is not just a compliance tool; it's a strategic imperative for companies in the digital age. Companies need it for several crucial reasons.

While Generative AI is still in its early stages of regulation compared to Open Source Software, it is evident that regulatory assessment is on the rise. Governments and regulatory bodies are increasingly focusing on AI systems due to ethical, accountability, and risk-related concerns.

Furthermore, as companies integrate Generative AI into their operations, they face amplified risks, including data privacy, security, bias, and unintended consequences. A GBOM is essential for identifying, quantifying, and mitigating these risks.

Beyond compliance, it demonstrates transparency and responsible AI practices, and builds trust with customers, partners, and stakeholders.

The overlapping risks include:

Security risk. Generative AI code also contains security risks. Snyk’s recent survey of IT professionals found that 59% raised fears that AI-generated code will introduce security risks based on the original training sets. Scientific research backs up that finding: a Stanford study found that “participants who had access to an AI assistant wrote significantly less secure code than those without access” while believing their code was safer.
Legal risk. The new GenAI Code may not receive copyright protection if it is not sufficiently modified by humans. In other words, if there is too much “Pure” GenAI (copy-paste) code rather than “Blended” GenAI code (derivative, or modified). The GenAI code can infringe on existing copyrights of the code from the training data, too. In both cases, there is a strong imperative to ensure that GenAI code that is customer-facing [“distributed” in the Open Source / SBOM sense] is “Blended”, not “Pure”, from the onset.
Vendor compliance. We are already hearing of large software purchasers planning Generative AI disclosure requirements for their vendors, given these reasons.
Regulatory compliance. US regulations are being developed as of the writing of this document. The new Executive Order may require affirmative disclosure of material created with Generative AI, such as via an electronic watermark. Individual departments and agencies such as the US Navy and Space Force are beginning to regulate or prevent the use of Generative AI tools, too.

In addition to the 4 overlapping sets of risks between Open Source and Generative AI code, there is a 5th risk category that is particularly relevant for GenAI code: Team risks.

These team risks take several different forms, including:

Developers may not be using GenAI enough and, therefore, not capturing the benefits of a product with substantial productivity improvements;
Developers may be using GenAI too much to the point where there is no longer subject matter expertise in the organization’s code
Developers may be using Generative AI code in a way that generates additional legal risks, such as an emphasis on “Pure” GenAI code rather than “Blended”.

What is included in a Generative AI Bill of Materials (GBOM)™

The GBOM is a structured document that provides comprehensive and detailed information regarding the incorporation of GenAI code within a codebase to monitor and manage 5 distinct sets of risks. It includes:

The name of the file with Generative AI;
Its location in the codebase. This matters because different code uses will undoubtedly receive different legal and regulatory treatments, just like “Distributed” vs. “Non- Distributed” code in the Open Source context;
Confidence level that the code is AI Generated. The longer the snippet of code, the higher the confidence that the code is AI-generated. It is likely that stakeholders will focus on higher confidence code, just like security warnings are triaged;
Percentage of GenAI code. In other words, how “Blended” vs. “Pure” is the code. As discussed above, this distinction will likely matter significantly for copyright protection;
Date of code addition and date of code modification. Organizations will use this information to assess the robustness of their GenAI management program. For example, how has the balance between “Pure” and “Blended” GenAI code changed over time;
Contributing developers. The coders who added and modified the code. Developers will need to be involved to help with follow-on triaging in events such as technical due diligence;
Disposition. In certain compliance / legal situations, Code Owners will be expected to have reviewed and triaged the GenAI code for risks, just like they are asked to do today reviewing SBOMs. Dispositions include “No Legal risk - internal tooling” or “Remediation Necessary”;
Next steps. Proper GBOMs should also include work tracking to keep track of the follow-up tasks from the disposition.

Sample GBOM excerpt

Here are two code snippets from a GBOM for illustration:

Both of these coding examples have high confidence to be generated in part from GenAI.

Explanation:

In the first example, the GBOM indicates that the code is entirely GenAI (“Pure”) and has not been modified.
In the second example, the code was modified one hour after it was included and is only 10% GenAI (“Blended”).
The first example of code is only used by the internal development organization, so it is very unlikely that the legal / compliance risks described above will be applied.
The second example, however, is for code used in Financial Services. Even though this code, in particular, is “blended”, the organization needs to take additional steps to lower the total amount of GenAI code to comply with a vendor’s requirement of total GenAI composition.

Conclusion

While Generative AI regulation and compliance are currently in an early stage, it is imperative to recognize the undeniable course toward increased scrutiny and oversight. When compared to the well-established regulatory frameworks governing Open Source code, Generative AI presents a unique set of challenges encompassing security, vendor relationships, legal considerations, regulatory compliance, and team dynamics. These challenges are poised to grow in importance and complexity, eventually matching or even surpassing the risks associated with Open Source code.

To proactively address these risks, adopting a GenAI Bill of Materials (GBOM) from the outset is the most wise course of action. By leveraging it, organizations can gain a better understanding of their AI landscape, identify vulnerabilities, and devise mitigation strategies before issues escalate. This proactive approach not only safeguards against regulatory pitfalls but also cultivates a culture of responsible AI governance and risk management.