Introducing GenAI to software development: Measurement techniques

Table of contents

When introducing GenAI to your software development lifecycle (SDLC), you’ll need to run a tight ship that keeps your engineering organization in alignment and in compliance. That means understanding where and how, exactly, GenAI code is being utilized.

With a shared set of KPIs, technical organizations can work with greater cohesion to harness the productivity benefits of GenAI while preventing potential problems.

CTOs can communicate with a greater degree of clarity, certainty, and accuracy across the c-suite, with general counsel, and with board members.
VPEs can more effectively align engineering teams around executive level objectives.
Engineering directors gain resources to actively establish, shepherd, and course-correct momentum towards GenAI code adoption milestones
Engineering manages gain support to guide their teams through best practices for writing high quality code

The following measurement techniques will ensure the highest standards of code quality.

‍

1. GenAI usage relative to standards

Different organizations have different standards for how GenAI code should be used and where GenAI code should be used.

At any given time, anyone responsible for your codebase should be able to answer the question, “is AI generated code in line with the levels of the appropriate thresholds for my organization?” In some situations, GenAI code should not be measured at all.

While standards will be organization specific, Sema has established a set of best practices that we publish periodically (last update: April 15) as a starting point.

‍

Standards Part 1 of 2: GenAI-Originated Code

Definition:

Included: Code that originated with a GenAI tool, as opposed to created by in-house developers.
Not included: code written in house, copied from an external source such as Open Source or Google/ Stack Overflow, or automatically generated.

Standards:

Strength: 5-10% of the codebase

Low Risk: <5%, or 10-20%

Medium Risk: 20-50%

High Risk: >50%

Part 2 of 2: Pure GenAI Code

Definition:

Code that originated with a GenAI tool and was not modified by developers afterwards.
By contrast: Blended GenAI code was modified by a developer.

Standards:

Strength: <10% of the codebase

Low Risk: 10-15%

Medium Risk: 15-25%

High Risk: >25%

The process to monitor usage against standards should be straightforward.

‍

2. GenAI code usage by repository

Code is a craftsperson’s job, and the quality of what’s produced depends on the situation for which that code is necessary. Measuring GenAI code at the repository level provides insight into how that code is being used relative to the broader context of the organization’s codebase — and the part of the codebase to which the Large Language Model (LLM) has been trained.

This level of precision can guide code reviews and troubleshooting protocols, particularly around nuanced issues that have the potential to culminate into bigger picture technical debt problems.

‍

3. GenAI code usage by developer

Life at work, for developers, is a human experience.

Solving code quality problems, pragmatically speaking, is an educational exercise that requires shepherding, guidance, encouragement, communication, and clear progression towards organizational goals.

The key is to think of this effort as an education exercise. That means establishing clear expectations and empowering developers with clear expectations, policies, and tools to achieve an end goal.

Here are some practical and tactical recommendations that you can follow as an engineering leader. Depending on your industry, it may make sense to ban the use of GenAI code until a clear set of developer policies are in place. Depending on the size and complexity of your organization, these protocols may include:

Specific, tactical recommendations for every level of your engineering organization (directors, managers, developers, specialist responsibilities within teams, etc.)
Touchpoints to loop in cross-functional technologists within your organizations.
Pair programming, mentorship, and teaching protocols to build a stronger sense of alignment between junior and more experienced developers.
Systems to proactively mitigate against legal, regulatory, and compliance risks with respect to legislation that has the potential to impact your organization.

‍

Getting started

Measurement starts with implementing the right systems. Take a look below to see how Sema’s AI Code Monitor (AICM) helps.

‍

Keeping track of global GenAI compliance standards

Periodically, Sema publishes a no-cost newsletter covering new developments in Gen AI code compliance. The newsletter shares snapshots and excerpts from Sema’s GenAI Code compliance Database. Topics include recent highlights of regulations, lawsuits, stakeholder requirements, mandatory standards, and optional compliance standards. The scope is global.

You can sign up to receive the newsletter here.

‍

About Sema Technologies, Inc.

Sema is the leader in comprehensive codebase scans with over $1T of enterprise software organizations evaluated to inform our dataset. We are now accepting pre-orders for AI Code Monitor, which translates compliance standards into “traffic light warnings” for CTOs leading fast-paced and highly productive engineering teams. You can learn more about our solution by contacting us here.

‍

Disclosure

Sema publications should not be construed as legal advice on any specific facts or circumstances. The contents are intended for general information purposes only. To request reprint permission for any of our publications, please use our “Contact Us” form. The availability of this publication is not intended to create, and receipt of it does not constitute, an attorney-client relationship. The views set forth herein are the personal views of the authors and do not necessarily reflect those of the Firm.