Sema software neon logo
Sema software neon logo

Help Center

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

A

A

B

B

C

C

D

D

E

E

F

F

H

H

I

I

L

L

M

M

N

N

P

P

R

R

S

S

T

T

W

W

Team Module -- What does"Days Since Last Commit" measure and how are risk levels assessed?

Team Module -- What does"Days Since Last Commit" measure and how are risk levels assessed?

Days since last commitmeasures the number of days from the day that Sema received code access to thelast time a developer committed to any repository. It does not measure the dayssince the last commit just to that particular repository.  

 

A developer is consideredactive if they have made at least one commit to any repository in the last 90days, measured from the data that Sema received code access. Active developersare likely part of the organization, and marked Green for strength.

 

If a developer has notcommitted within 90 days, they are likely not part of the organization, andmarked Red for risk.

 

An interview is required tounderstand if the active developers are actually not part of the organization(marked Green but should be Red), because they left the organization within thelast 90 days, and vice-versa (marked Red but should be Green) because theybecame non-coding memebrs of the team (such as leadership or Product).

 

The risk assessment isfocused on whether subject matter expertise is still "in thebuilding" -- it is a standard designed for due diligence.

 

90 days was selected as thestandard because:

- Some release cycles can belonger, and involve planning that requires longer than 30 days.

- Some managers may not makemore than a few commits over the course of a quarter.

 

This risk assessment is notbased on optimal development practice. In general, Sema recommends thatEngineers commit at least several times per week.

Tagging the AWS Account / 2d level data request

Tagging the AWS Account / 2d level data request

If a Code Owner has already tagged their accounts in AWS, does this avoid the need for a 2d data collection effort?

In AWS there are two levels of tags applicable to the conversation: account-level and resource-level (e.g. individual server).

If they have their accounts tagged by customer and product, we can handle easily without a CSV file. If they have their resources tagged by customer and product, we can also handle, but the likelihood of missing tagging is rather high, so the process would be somewhat limited.

Dark Web Botnet Leaks -- what information gets on the list?

Dark Web Botnet Leaks -- what information gets on the list?

The DarkWeb Botnet Leaks Sub-module of the Cyber Security Module provides Individuals'credentials, passwords, and personal information that was released to the Dark Web from a security breach such as a botnet/ malware attack. Note this scope is more than just credentials/ passwords.


The only way for information about a person to get on to that report is if someone associated with the scanned company had access to the information. This includes:
* Employees
* Customers
* Suppliers


It may not be obvious at first what the connection is to the scanned company, but there is one.

Lines of Code

Lines of Code

The Sema scan performs a static analysis of all files in each repository, not just executable code.

Sema uses linguist for our primary line count. This is the same tool used by GitHub to define code.

This tool excludes Vendor Files and Documentation so only In-House lines of code are counted.

If there are no breaks in the code we count a line every 200 characters.

Reps and Warranties Insurance

Reps and Warranties Insurance

It is very common that when Reps and Warranties Insurance (RWI) is pursued, a Sema scan is provided to the Insurer. A Sema representative frequently joins the call with the Insurer, too.

Why?

Specifically, three of Sema’s modules speak to risk that RWI Insurers are interested in:

1. Intellectual property risk – i.e. CopyLeft risk

2. Code Security—is the code written in a way that lends itself to information loss / hacking

3. Cyber Security—outside in scan of the system’s security

Generally, RWI insurers typically appreciate seeing the level of thoroughness that a Sema scan provides, via a 100+ document.

Does the cyber security scan trigger alerts on the Code Owner’s system?

Does the cyber security scan trigger alerts on the Code Owner’s system?

The cyber security scan sends a small number of data requests, in the form of data packets, to all TCPand UDP ports of the enumerated subdomains.

“Small” means 3 packets per assetper second are sent.

These data requests to the subdomains are automatically sent to the IP address. 

If a Code Owner has a very small number of subdomains, such as fewer than five, the Code Owner's infrastructure monitoring tool will likely notice the 3 packets / asset /second request from the cyber scan. This is not a recommended practice and in this situation Sema recommends increasing the number of subdomains to ensure availability to customers in general.

What is Third Party Code / Vendor Code?

What is Third Party Code / Vendor Code?

Sema separates the code in a codebase into three types:

- Documentation is explanations or instructions for developers

- In-House code is code that the organization created itself, whether by employees or contractors

- Vendor code is code created by another organization. It is the same thing as Third Party code. This includes commercial code, where you pay for using the code, and Open Source code, which is publicly accessible... but comes with risks.

Unit test coverage standards

Unit test coverage standards

General

   

------------------------------------------------------------------------------

                                                              Age of Organization           

------------------------------------------------------------------------------

Unit test coverage       0-2 years                3-5 years                  5+ years

------------------------------------------------------------------------------

0-20%                            Good                       Low                          Very low

20-40%                          Good                       Good                        Low

40-60%                          Likely excessive     Good                        Good

60-80%                          Likely excessive     Likely excessive       Very good

80-100%                        Likely excessive      Likely excessive      Likely excessive

------------------------------------------------------------------------------

Health and Safety Organizations

Health and Safety organizations create software that can impact the physical well-being of individuals- such as medical software, transportation/ traffic control software. Testing standards are higher for these types of organizations.

------------------------------------------------------------------------------

                                                              Age of Organization           

------------------------------------------------------------------------------

Unit test coverage       0-2 years                3-5 years                  5+ years

------------------------------------------------------------------------------

0-20%                           Very low                   Very low                   Very low

20-40%                         Very low                   Very low                   Very low

40-60%                         Low                           Low                          Low

60-80%                         Good                         Low                         Low

80-100%                        Good                         Good                       Good

------------------------------------------------------------------------------

Implementation guide

1. Determine which grid applies- “General” or “Health and Safety.”

2. Evaluate whether these standards apply to the particular organization or should be adjusted. Factors to consider:

     * The presence and quality of another method to measure code quality other than unit testing

     * Customer/user perceived code quality, measured by customer satisfaction ratings, net retention, frequency and severity of bugs.

3. If the unit test standards are overridden, the rationale should be codified and results reviewed quarterly.

4. Once set, the code should be reviewed to see if there are any subsections of the code (at the repository level or within repositories) that require more or less testing.

5. Available budget for creating unit tests should not be a factor in setting optimal unit test coverage. However, an organization may decide to accept lower

Notes

     * The age of the repository/ application is not the determining factor, it is the age of the organization. Customers and users will expect that all of its applications should meet code quality (unit test coverage) standards based on the organization’s maturity, unless the application is      * explicitly noted as a Beta/ Proof of Concept product.

     * It is possible to have too much test coverage. Development organizations should have crisp and compelling explanations why the standards are exceeded, as developer time spent on unit test writing takes away time from feature/ functionality development. One compelling rationale occurs when the programming language used makes test writing particularly easy/ fast.

File Changes to Commits Ratio

File Changes to Commits Ratio

A 100% stacked bar graph showing the proportion of file changes to commits. Sema recommends fewer than 10 file changes per commit.

Increase or decrease granularity using the +/- towards the graph origin.

Filter by:

Datetime, repository, developer

Commit Summary

Commit Summary

Files Changes

Shows the number of file changes. Files can be added, modified or removed. Commits usually contains numerous file changes.

Filter by:

datetime, repository, developer

Commits

The commits across all selected repositories.

Filter by:

datetime, repository, developer

Commits over Time (repository breakdown)

This graph shows a count of commits in a given time-frame. Each color block represents a repository.

Increase or decrease granularity using the +/- towards the graph origin.

Filter by:

datetime, repository, developer

File Changes by Repository

Current totals of all file changes for all time.

Filter by:

show/hide repository

Commits by Repository

Current totals of all commits for all time.

Filter by:

show/hide repository

File Changes to Commits Ratio

A 100% stacked bar graph showing the proportion of file changes to commits. Sema recommends fewer than 10 file changes per commit.

Increase or decrease granularity using the +/- towards the graph origin.

Filter by:

Datetime, repository, developer

Average File Changes per Commit

The all time average of file changes per commit.

Filter by:

show/hide repository

Number of Source Files - All Time

A count of files by filetype. Only filetypes associated with source code are displayed.

Filter by:

repository, developer, show/hide filetype

Reports Overview

Reports Overview

The Reports page shows a number of visualizations that will be familiar to customers who have seen a Sema Code Health Check that are especially useful for cross-repository comparison. There are currently 3 sections to the report:

1. Commit Analysis

2. Line Level Warnings

3. Process Analysis

The reports page shows a default range of the 3 previous calendar months. This can be adjusted using filters.

Graphs showing data over time can be adjusted to show higher or lower granularity. As granularity decreases, data will aggregated to show the sum of the period, not an average.

Data from reports can be downloaded using the download icon in the upper right hand corner. If the PDF option is not available, contact professional services, or create a support ticket.

How to grant Sema access to your self-hosted repositories (GIT / SVN / GitLab)?

How to grant Sema access to your self-hosted repositories (GIT / SVN / GitLab)?

If you repository is self-hosted, follow these instructions to grant Sema access.

1. Create read-only account for Sema (use 'customers@semasoftware.com' as the contact email address for the account)

2. Provide the username, repository URL and password to customers@semasoftware.com

3. Password can be shared over the phone or via encrypted email for security purposes

How to grant Sema access to your repositories in GitHub, Bitbucket or GitLab?

How to grant Sema access to your repositories in GitHub, Bitbucket or GitLab?

1. If the repository is public, simply provide Sema with the URL

2. If the repository is private, grant access to the ID “Sema-Customer”

How to sign in to Sema

How to sign in to Sema

For Subscription Customers

Current Sema subscription customers can access the platform using credentials supplied by your Professional Services Account Manager.

1. Visit smp.semalab.io

2. Enter the email and password provided by Professional Services

For New Users

Login with a GitHub account to access the free Sema Tech Debt Calculator.

1. Visit smp.semalab.io

2. Select the Log In With GitHub button

3. First time users will complete the account creation steps

4. Select a repository to analyze.

That's it! You can wait on the loading screen until analysis completes or return once the confirmation email has arrived. This can take a few minutes depending on traffic.

Notes:

1. Sema Code Quality Platform is an enterprise subscription product. Access for a single repository in the Tech Debt Calculator is free. To get all the code, teams and process analysis, and analyze all your repositories, contact us.

2. If you are a customer but require new credential to log in, please contact your Professional Services advisor

How to find tech debt?

How to find tech debt?

One effective way to improve code and lighten the workload of developers is to address tech debt. While there many ways to define tech debt (and we're working to solve them all!) we find a few fundamental metrics can have great benefits without significant refactoring effort.

Set up the repository

1. Complexity

    1.1 Click to expand the Complexity panel

    1.2 Define the current repositories threshold for Excessive Complexity

    1.3 Set a Target to address - what percentage of the files over the above threshold are you targeting to reduce complexity on?

    1.4 Set an estimated time to address - how long would you expect a developer to spend simplifying or breaking down an overly complex file?

2. Duplicate Blocksare blocks of code that appear more than once.

   2.1 Set a target for what percentage of duplicate blocks you would like to de-duplicate.

    2.2 Set an estimated time to address - how long would you expect a developer to find the duplicates and consolidate to a single block of code.

3. Test Coverage

    3.1 Set a goal for Unit Test lines per Line of Code

         Very good coverage is 1:1

         A new repository with little testing may have next to nothing for test lines.

    3.2 Set an estimated  Time to create unit test.

4. Line Level warnings

    4.1 Set an approximate time fix a line level warning

    4.2 Select the percentage of warnings you aim to fix in each category

5. Developer wages

    5.1 Set an average loaded cost per developer per day. This is used to calculate the cost of work to address the target tech debt.

6. Languages

    6.1 Toggle the languages you would like to to include/address in the tech debt total.

Reading the results

1. Days to fix

    1.1 Check the 'Days to fix' section to see an estimate of how long addressing each of the Tech Debt Indicator categories requires.

2. Cost to address

   2.1 Technical Debt Total is a dollar value based on Days to fix and Developer wages

    2.2 Cost per Line of Code is the total divided by Lines of Code. Under $3.00/LoC is a good target.

What types of line-level warnings does Sema analyze?

What types of line-level warnings does Sema analyze?

Sema searches the following types of line-level warnings:

Security:

These warnings are known security flaws, which may allow various exploits at varying severity levels. The suggested action is to fix high risk issues as soon as possible.

Environment Sensitive:

These warnings that are relevant depending on whether strict mode is turned on or not. They are issues that may affect functionality in the future. As the underlying platform evolves, the use of deprecated features may be risky, given that they may disappear in the future. Sema recommends to check the context in which the code is deployed and to fix it if the underlying libraries are updated.

Misleading

These warnings may mislead developers in such a way as to introduce bugs in the future. Sema recommends to fix these warnings to enhance the readability or extendibility of the organization’s codebase.

Potential Bugs

These warnings may affect functionality in some circumstances. They are at the very least misleading to developers and so should be fixed as they represent a high risk that they either currently affect functionality or may mislead developers in such a way as to introduce bugs in the future. Sema suggests to change the code to eliminate these warnings as soon as possible.

Smell

These warnings happen in the case in which the code structure that is abnormal, such as a large or overly complex method or class. Sema suggests the client to fix these warnings to enhance the code readability, but it is considered low risk.

Stylistic

These warnings represent code structure that is abnormal at the line level. e.g. brackets on same line as for statement, instead of on the next line. This category has no bearing on performance generally. Sema recommends to fix them only if the client considers the style important.

Performance

These warnings represent issues which result in less efficient code. Sema recommends to fix these warnings if the performance is important or there is a known issue in the application.

Core Technical Debt Ratio -- Explanation and Calculation Method

Core Technical Debt Ratio -- Explanation and Calculation Method

Technical debt can be thought of as any imperfection in the code, whether it relates to code quality, security, intellectual property, or other attributes.

All code has, and should have, technical debt: to create code without imperfections is an impossible (and expensive!) task and the code would never be released.

However too much technical debt can slow down development, impact users, or increase organizational risk. The challenge for organizations is to set the ideal level of technical debt for the stage of the codebase and the company-- typically younger codebases and younger companies have more tech debt, as they figure out product market fit.

Sema's Core Technical Debt Ratio includes four components of technical debt, all within the Quality module-- Duplicate code, Line level warnings, Excessive complexity, and lack of unit testing. See Core Technical Debt Ratio -- Components for a description of those components.

The Ratio converts these elements of technical debt into a comparable benchmark, normalized for codebase size, based on how much development time it will take to bring the code from current state to the gold standard.

Note that the Core Technical Debt Ratio should not be viewed as a prescription. It is rare that any codebase should achieve "gold standard" levels for the reasons described above. Instead, the Ratio is useful as a quick comparison of codebases.

To calculate Core Technical Debt:

1. The current amount of each of the four components is calculated.

2. The current amount is compared to the gold standard.

3. The development time is calculated to bring the code from current state to gold standard. For example, unit tests are assumed to take 45 minutes to write.

4. That time is translated into the cost of developers, assuming a global average fully-loaded developer cost of $100K/ year.

5. Cost calculated in Line #4 is divded by the number of lines of code.

Traceability

Traceability

It is the ability to trace work items across the software development lifecycle.

TIOBE Index

TIOBE Index

This index is an indicator of the popularity of proramming languages, which is updated on a monthly bases. You can learn more in its website: https://www.tiobe.com/tiobe-index/

Technical debt, Third-party

Technical debt, Third-party

Technical debt from third-party code.

Sema Code Quality Platform (SCQP)

Sema Code Quality Platform (SCQP)

This is the gateway to data driven development - a new way of understanding and improving software quality and technical debt. The platform gives leaders, managers and developers a shared understanding of code, teams , and process.

ScanCode

ScanCode

Sema used this to detect licenses, copyrights, package manifests, direct dependencies, and more, both in source code and in binary files. It is considered as the best solution in its class and serves as reference tool in this domain.

Reusability

Reusability

Use of existing assets in some form within the software product development process. These assets are products and by-products of the software development lifecycle, including code, software components, test suites, designs, and documentation.

Retention

Retention

In the case of developers, Sema defines this as whether or not a developer has contributed in the past 90 days - if they have, we assume they are still within the organization. If they haven't, we assume they are no longer within the organization.

Polymorphism

Polymorphism

See Number of Polymorphic Methods (NOP).

Number of Polymorphic Methods (NOP)

Number of Polymorphic Methods (NOP)

Number of methods in a class excluding private, static, and final classes. It is a Design Quality Indicator in the Due Diligence Report.

Number of Methods (NOM)

Number of Methods (NOM)

Number of methods declared within a class. It is a Design Quality Indicator in the Sema Health Check Report.

Number of Hierarchies (NOH)

Number of Hierarchies (NOH)

Total number of “root” classes in the design of the software development project. It is a Design Quality Indicator in the Due Diligence Report.

Name reconciliation

Name reconciliation

Process that reduces duplicate entries in the CSV file of committers at the beginning of the cleaning process.

Mode Analytics

Mode Analytics

Collaborative data platform that combines SQL, R, Python, and visual analytics in one place (

The Collaborative Data Science Platform | Mode ).

Measure of Functional Abstraction (MFA)

Measure of Functional Abstraction (MFA)

Ratio of the number of inherited methods per total number of methods contained in a class. It is a Design Quality Indicator in the Due Diligence Report.

Measure of Aggregation (MOA)

Measure of Aggregation (MOA)

Count of the number of attributes whose type is contained in user-defined classes. It is a Design Quality Indicator in the Due Diligence Report.

Linter

Linter

It is a static code analysis tool used to flag programming errors, bugs, stylistic errors, and suspicious constructs.

Ingestion

Ingestion

Transportation of data from assorted sources to a storage medium where it can be accessed, used, and analyzed by an organization. The destination is typically a data warehouse, data mart, database, or a document store.

Fortify

Fortify

Fortify Static Code Analyzer uses multiple algorithms and an expansive knowledge base of secure coding rules to analyze an application's source code for exploitable vulnerabilities. This technique analyzes every feasible path that execution and data can follow to identify and remediate vulnerabilities. You can read more in the following link: https://www.microfocus.com/media/data-sheet/fortify_static_code_analyzer_static_application_security_testing_ds.pdf

Fork

Fork

When developers take a copy of the source code from one software package and start independent development on it, creating a distinct and separate piece of software. They are the standard way to modify opens-source code for organization-specific purposes. On GitHub, they can be identified using their UI. The result displayed by the list repositories API endpoint contains a “fork” attribute.

Non-organization repositories can also be cloned and committed to a new repository within the organization. Sema can guess that an organization repository with a specific name is a clone of an open-source repository with the same name, but this is not full proof.

Moreover, organization repositories can themselves be forked and released to the open-source community for public use. Sema does not analyze these repositories in the Health Check Repository, since most of the work can be safely assumed to come from the organization’s developers.

Flexibility

Flexibility

Ability to allow changes in the design of the software.

Extendibility

Extendibility

Ability to incorporate new functional requirements.

Effectiveness

Effectiveness

Ability to fulfill the functionalities defined in the requirements analysis during the software development lifecycle.

Direct Class Coupling (DCC)

Direct Class Coupling (DCC)

Number of other classes a class relates through a shared attribute or a parameter in a method.

Direct Access Metric (DAM)

Direct Access Metric (DAM)

Ratio of the number of private and protected attributes to the total number of attributes in the class.

Developer Coaching Grid (DCG)

Developer Coaching Grid (DCG)

Measures the changes in code quality and process metrics over time. These metrics are also covered in other sections of the Sema Health Check Report. A DCG consists of up to 4 charts:

1. Contribution: How developers score across a variety of contribution-based metrics like code created and code changed. i.e. How much work is each developer doing, and how are they spending their time?

2. Skill - Line Level Warnings: How developers score on line-level warnings added or removed. i.e. Who writes the most/least clean code

3. Skill - Architectural Impact: How developers score on AQIs. i.e. Who is making the most/least positive impact on code architecture?

4. Summary of Contribution and Skill: A summary of the scores on each of the previous charts.

Design Size in Classes (DSC)

Design Size in Classes (DSC)

Total number of classes in the design of a software development project. It is a Design Quality Indicator in the Due Diligence Report.

Design Quality Indicators (DQI)

Design Quality Indicators (DQI)

Design Quality Indicators (DQIs) are the base metrics used to calculate the 6 Architecture Quality Indicators (AQIs) we calculate for object-oriented languages.

Average Number of Ancestors (ANA - Abstraction)

Average number of classes in the inheritance tree for each class.

Measure of Aggregation (MOA - Composition)

Count of number of attributes whose type is user defined classes.

Direct Access Metric (DAM - Encapsulation)

Ratio of the number of private and protected attributes to the total number of attributes in the class.

Cohesion Among Methods of Class (CAM - Cohesion)

This metric computes the relatedness among methods of a class based upon the parameter list of the methods. The metric is computed using the summation of the intersection of parameters of a method with the maximum independent set of all parameter types in the class. A metric value close to 1.0 is preferred. (Range 0 to 1)

Class Interface Size (CIS - Messaging)

Number of public methods in a class.

Measure of Functional Abstraction (MFA - Inheritance)

Ratio of the number of inherited methods per the total number of methods within a class.

Direct Class Coupling (DCC - Coupling)

Number of other classes a class relates to, either through a shared attribute or a parameter in a method.

Design Size in Classes (DSC - DesignSize)

Total number of classes in the design.

Number of Methods (NOM - Complexity)

Number of methods declared in a class.

Number of Hierarchies (NOH - Hierarchies)

Total number of “root” classes in the design.

Number of Polymorphic Methods (NOP - Polymorphism)

Any method that can be used by a class and its descendants. Counts of the number of methods in a class excluding private, static, and final ones.

Common Weakness Enumeration (CWE)

Common Weakness Enumeration (CWE)

It is a category system for hardware and software weaknesses and vulnerabilities. It is sustained by a community project to understand flaws in software and hardware, creating automated tools that can be used to identify, fix, and prevent these flaws. You can learn more here: https://cwe.mitre.org

Common Vulnerabilities and Exposures (CVE)

Common Vulnerabilities and Exposures (CVE)

It is a system that provides a reference-method for publicly known information-security vulnerabilities and exposures. It is funded by the US National Cyber Security Division of the US Department of Homeland Security, and is maintained by The Mitre Corporation. This corporation defines each vulnerability with a unique identifier. You can learn more here: https://cve.mitre.org

Commit

Commit

Data has been safely stored in your local database when using a VCS.

Cohesion Among Methods of Class (CAMC)

Cohesion Among Methods of Class (CAMC)

Measure of how related are the methods contained within a class in terms of used parameters, being calculated by 1 - LackOfCohesionOfMethods(). It is a Design Quality Indicator in the Due Diligence Report.

Clone

Clone

Full copy of an existing Git repository.

Cloc

Cloc

Count Lines of Code

Class Interface Size (CIS)

Class Interface Size (CIS)

Number of public methods in a class. It is a Design Quality Indicator in the Health Check Report.

Benchmarks

Benchmarks

You can learn more about the benchmarks that Sema uses in the Health Check Report here:

:check_mark:

Health Check Benchmarks

Average Number of Ancestors (ANA)

Average Number of Ancestors (ANA)

Average number of classes in the inheritance tree for each class. It is a Design Quality Indicator in the Sema Health Check Report.

Architectural Quality Indicators (AQI)

Architectural Quality Indicators (AQI)

Sema analyzes code quality for object-oriented languages based on QMOOD.

Understandability

The degree of understanding and the easiness of learning the design implementation details. The properties of designs that enable it to be easily learned and comprehended. This directly relates to the complexity of design structure. This is calculated in the following way:   

            (-0.33 * ANA) + (0.33 * DAM) - (0.33 * DCC) + (0.33 * CAM) - (0.33 * NOP) - (0.33 * NOM) - (0.33 * DSC)

Reusability

The degree to which a software module or other work product can be used in more than one computer program or software system. This is calculated in the following way:

            (-0.25 * DCC) + (0.25 * CAM) + (0.5 * CIS) + (0.5 * DSC)

Functionality

Classes with given functions that are publicly stated in interfaces used by others. This is calculated in the following way:

            (0.12 * CAM) + (0.22 * NOP) + (0.22 * CIS) + (0.22 * DSC) + (0.22 * NOH)

Extendibility

Measurement of design’s allowance to incorporate new functional requirements. This is how it is calculated:    

            (0.5 * ANA) - (0.5 * DCC) + (0.5 * MFA) + (0.5 * NOP)

Effectiveness

The degree to which a design is able to achieve the desired functionality and behavior using object-oriented design concepts and techniques. This is how it is calculated:

            (0.2 * ANA) + (0.2 * DAM) + (0.2 * MOA) + (0.2 * MFA) + (0.2 * NOP)

Flexibility

The ease with which a system or component can be modified for use in applications or environments other than those for which it was specifically designed. The degree of allowance of changes in the design. This is how it is calculated:

            (0.25 * DAM) - (0.25 * DCC) + (0.5 * MOA) + (0.5 * NOP)

Abstraction

Abstraction

When used as a Design Quality Indicator (DQI), please see: Average Number of Ancestors (ANA).

Line-level warnings

Line-level warnings

Sema searches the following types of line-level warnings:

Security:

These warnings are known security flaws, which may allow various exploits at varying severity levels. The suggested action is to fix high risk issues as soon as possible.

Examples:

Use of “eval” or “exec” - It is a classic, high severity security issue where an external user can submit arbitrary code via a web interface and have the server execute this code. This could allow anyone to submit code which accesses the database and sends info back to the attacker.

“The user-supplied array 'lines' is stored directly.”

“eval can be harmful.”

Environment Sensitive

These warnings that are relevant depending on whether strict mode is turned on or not. They are issues that may affect functionality in the future. As the underlying platform evolves, the use of deprecated features may be risky, given that they may disappear in the future. Sema recommends to check the context in which the code is deployed and to fix it if the underlying libraries are updated.

Examples

"Unit tests should not contain more than 1 assert(s)."

"If you run in Java5 or newer and have concurrent access, you should use the ConcurrentHashMap implementation"

"Consider replacing this Hashtable with the newer java.util.Map"

Misleading

These warnings may mislead developers in such a way as to introduce bugs in the future. Sema recommends to fix these warnings to enhance the readability or extendibility of the organization’s codebase.

Examples

Multiple conditions in an if statement (if this and that or the other and not such-and-such) can be confusing to a developer. Changing this code is risky.

“A method should have only one exit point, and that should be the last statement in the method”

“Use equals() to compare object references.”

“Avoid unused imports such as ‘java.net’”

Potential Bugs

These warnings may affect functionality in some circumstances. They are at the very least misleading to developers and so should be fixed as they represent a high risk that they either currently affect functionality or may mislead developers in such a way as to introduce bugs in the future. Sema suggests to change the code to eliminate these warnings as soon as possible.

Examples

“Expected an assignment or function call and instead saw an expression.”

“Invoke equals() on the object you've already ensured is not null”

“Use equals() to compare strings instead of '==' or '!='“

Smell

These warnings happen in the case in which the code structure that is abnormal, such as a large or overly complex method or class. Sema suggests the client to fix these warnings to enhance the code readability, but it is considered low risk.

Examples

“Potential violation of Law of Demeter (method chain calls)”

“A high number of imports can indicate a high degree of coupling within an object.”

“High amount of different objects as members denotes a high coupling”

Stylistic

These warnings represent code structure that is abnormal at the line level. e.g. brackets on same line as for statement, instead of on the next line. This category has no bearing on performance generally. Sema recommends to fix them only if the client considers the style important.

Examples

“Parameter 'category' is not assigned and could be declared final”

“Field comments are required”

“Comments are too large: line too long”

Performance

These warnings represent issues which result in less efficient code. Sema recommends to fix these warnings if the performance is important or there is a known issue in the application.

Examples

Deeply nested loops are sensitive to data size, execution may grow exponentially with larger data. Deeply nested loops may contain opportunities for reducing the number of iterations needed. Multiple calls to the same method may recalculate data multiple times, opportunity here to cache results and skip recalculating.

“StringBuffers can grow quite a lot, and so may become a source of memory leak (if the owning class has a long life time).”

“System.arraycopy is more efficient”

Core Technical Debt Ratio -- Components

Core Technical Debt Ratio -- Components

The Core Technical Debt Ratio (Core Tech Debt Ratio) includes four common measures of code quality that are consistent across most software languages.

Duplicated code blocks

A duplicate block is a section of code that appears more than once in a repository. Identical blocks of code are tagged as duplicates, provided the block is at least 100 tokens long. Duplicates create additional work because developers need to make two or more times as many changes to code.

Complexity

Sema uses Cyclomatic Complexity aka McCabe Complexity: the count of the IF and ELSE statements in a file. This is a seminial code quality standard and the original research is available here: http://www.literateprogramming.com/mccabe.pdf. Highly complex code is difficult to work on and hard to avoid, and therefore impedes developer producitivity. If the codebase has many overly complex files, Overly complex files are candidates to be broken into smaller functional pieces.

Line-Level Warnings

Line-Level Warnings are potential shortcomings of individual lines of code. These are the warnings that can be detected by linters such as SonarQube. Sema combines thousands of warnings across languages and groups the outputs into language-agnostic categories to prioritize warnings within and across codebases.

Unit Testing

The code scan estimates the unit testing coverage by counting the lines of code in directories with “test” or “spec” in the name, and comparing this to non-test lines of code. The "gold standard" of Unit Testing coverage per Google is 80%.