Debugging license problems

Tidelift's license scanning compares the output of your package manager's license metadata and your source repository's license metadata. If they don't match, or are missing, we report a problem and ask you to fix it. We want accurate, machine-readable license information so users can verify and report their license compliance.

Below are some common causes of license problems, steps to debug, and suggested solutions. 

Remember: you can always contact us at lift@tidelift.com—we're happy to help walk you through debugging and deciding on next steps.



GitHub

How GitHub license scanning works

Access and design choices

The license parser that is used to find the license information shown on every main GitHub project page is based on licensee, which can be installed and run locally. It can also be consumed through GitHub's license API.

By design, the license parser's goal is 'no false positives'. It would rather fail than provide an incorrect answer. (Other, more comprehensive, license scanners return probabilities or multiple options, relying on the user to understand and discover the best option.) Unfortunately, because of this design decision, GitHub returns "no license detected" (specifically: NOASSERTION) both when there is too little, and too much, license information. So we can't currently automatically distinguish between the two cases.

What files the parser examines

GitHub's license parser looks at a variety of files, primarily any license in the root directory whose filename includesLICENSE. (It does not, by default, checkREADME  and cannot check the package manager's licensing API.) If no license information is found in any of them, it will report "none" as the license.

Most other failure modes will result in a return of "NOASSERTION". It does not attempt to discern any logic or structure - for example, if you (helpfully!) have a  LICENSE-DEPENDENCIES file to explain the license of vendored dependencies, as well as the primary LICENSE file, it will not attempt to understand which one is the main license, and will simply return NOASSERTION.

How it parses

From the list of licensing files, the parser discards lines that look like copyright statements ("Copyright (c) 1994 Linus Torvalds") and then compares the remaining contents of those files to the full text of known licenses. If the license information is ill-formatted, slightly tweaked, not the full license text (for example, "This project is under the BSD license" rather than the actual BSD license), GitHub will report NOASSERTION. Because of this simplistic (but robust) approach to scanning, another common failure case is a  LICENSE file with more than one license. 



Debugging and fixing GitHub errors

Because GitHub's API responses are limited (either the license or a single, undifferentiated error code) we can't programatically diagnose why GitHub may report no license for your project. You can check what GitHub reports by looking at the license information shown in the GitHub web view (upper right hand corner), or if you have a GitHub API key, checking the license API.



No license information

If you don't have a  LICENSE file (or similar), that is almost certainly the problem. 

Resolve it by following GitHub's directions to license the project.



Subtle differences in LICENSE files

If you have only one  LICENSE file, and it looks like a valid complete license text to your eye, there may be slight problems (like changes to a handful of words in the license, or formatting/whitespace errors) that cause GitHub to be uncertain about your license.

You can do a quick diagnosis of whether your license is recognized by using the SPDX License Diff browser plugin. You can also compare it to GitHub's internal representation of the standard license by installing licensee and running licensee diff. We can also help you do this—just contact lift@tidelift.com

These tools may surface problems, like very slightly non-standard text formatting, that can confuse GitHub's parser even on what appear to be correct license files.



Multiple license files

If more than one filename in your repo contains the string LICENSE, GitHub is likely confused by this. 

We can't fix GitHub's license scanning in the case where there is more than one license file with conflicting information. Their suggested solution is to move complex information to a file whose name does not contain "LICENSE". We don't recommend this, because it can obscure the genuine complexity that enterprise users must deal with if they want to comply.

Instead, in this situation we will typically ask you to ensure that your  package manager metadata accurately reflects the complex situation. For example, if your source code is under the bsd-3-clause license, and your LICENSE-DEPENDENCIES file mentions a vendored dependency under apache-2.0, your package manager's metadata should mention both bsd-3-clause and apache-2.0. (Dependencies that are managed through the package manager don't need to be tracked this way - they should surface their licensing to your users through their own license metadata.)

How you do this may depend on which package manager you're using. For example:

  • in npm's package.json, use an SPDX license string: bsd-3-clause AND apache-2.0(docs)
  • in RubyGems' gemspec or Rakefile, licenses = [BSD-3-Clause, Apache-2.0](docs)

More links to package manager documentation are below, under "Package Managers".



Non-standard copyright statements

GitHub's license parsing (and many other licensing tools) distinguish between copyright statements and license text. Copyright statements are information about who owns the copyright (usually featuring (c), years, and names), while the license text is what tells us what restrictions the copyright owner places on the code. Depending on the tool and use case, copyright statements are either ignored or stored separately. 

For example, in this file, licensee sees:

Copyright (c) 2004-2020 David Heinemeier Hansson
GitHub's license parsing code recognizes this as a copyright statement (because of its standard formatting), and so ignores it when attempting to determine the license.
Arel originally copyright (c) 2007-2016 Nick Kallen, Bryan Helmkamp, 
Emilio Tagua, Aaron Patterson
This line of the LICENSE is trying to be more informative about the history of copyright ownership in this repo. Unfortunately, it doesn't follow the standard pattern, so GitHub's license-parsing code does not recognize it as either a copyright statement, or a license. (You can see this, though without useful explanation, in the output of licensee diff.) As a result, GitHub reports NOASSERTION about this repository's license.
Permission is hereby granted, free of charge, to any person obtaining...
	
The actual license text, which GitHub's license parsing code would correctly pick up as MIT if the unparseable copyright statement above were removed.


Package managers

Most package managers provide metadata about the licenses used by a package. However, the specifications for each of these are inconsistent, which can make building enterprise tooling difficult.

Add missing license metadata

One of the most common problems with package manager metadata is that it is missing. The lifter dashboard will alert you and prompt you to fix it if we couldn't find the license for a package.

Usually the package manager will have a metadata field for this. For example,

Was this article helpful?
0 out of 0 found this helpful

Articles in this section