Tidelift's license scanning compares the output of your package manager's license metadata and your source repository's license metadata. If they don't match, or are missing, we report a problem and ask you to fix it. We want accurate, machine-readable license information so users can verify and report their license compliance.
Below are some common causes of license problems, steps to debug, and suggested solutions.
Remember: you can always contact us at lift@tidelift.com—we're happy to help walk you through debugging and deciding on next steps.
GitHub
How GitHub license scanning works
Access and design choices
The license parser that is used to find the license information shown on every main GitHub project page is based on licensee, which can be installed and run locally. It can also be consumed through GitHub's license API.
By design, the license parser's goal is 'no false positives'. It would rather fail than provide an incorrect answer. (Other, more comprehensive, license scanners return probabilities or multiple options, relying on the user to understand and discover the best option.) Unfortunately, because of this design decision, GitHub returns "no license detected" (specifically: NOASSERTION) both when there is too little, and too much, license information. So we can't currently automatically distinguish between the two cases.
What files the parser examines
GitHub's license parser looks at a variety of files, primarily any license in the root directory whose filename includesLICENSE
. (It does not, by default, checkREADME
and cannot check the package manager's licensing API.) If no license information is found in any of them, it will report "none" as the license.
Most other failure modes will result in a return of "NOASSERTION". It does not attempt to discern any logic or structure - for example, if you (helpfully!) have a LICENSE-DEPENDENCIES
file to explain the license of vendored dependencies, as well as the primary LICENSE
file, it will not attempt to understand which one is the main license, and will simply return NOASSERTION.
How it parses
From the list of licensing files, the parser discards lines that look like copyright statements ("Copyright (c) 1994 Linus Torvalds") and then compares the remaining contents of those files to the full text of known licenses. If the license information is ill-formatted, slightly tweaked, not the full license text (for example, "This project is under the BSD license" rather than the actual BSD license), GitHub will report NOASSERTION. Because of this simplistic (but robust) approach to scanning, another common failure case is a LICENSE
file with more than one license.
Debugging and fixing GitHub errors
Because GitHub's API responses are limited (either the license or a single, undifferentiated error code) we can't programatically diagnose why GitHub may report no license for your project. You can check what GitHub reports by looking at the license information shown in the GitHub web view (upper right hand corner), or if you have a GitHub API key, checking the license API.
No license information
If you don't have a LICENSE
file (or similar), that is almost certainly the problem.
Resolve it by following GitHub's directions to license the project.
Subtle differences in LICENSE files
If you have only one LICENSE
file, and it looks like a valid complete license text to your eye, there may be slight problems (like changes to a handful of words in the license, or formatting/whitespace errors) that cause GitHub to be uncertain about your license.
You can do a quick diagnosis of whether your license is recognized by using the SPDX License Diff browser plugin. You can also compare it to GitHub's internal representation of the standard license by installing licensee and running licensee diff
. We can also help you do this—just contact lift@tidelift.com.
These tools may surface problems, like very slightly non-standard text formatting, that can confuse GitHub's parser even on what appear to be correct license files.
Multiple license files
If more than one filename in your repo contains the string LICENSE
, GitHub is likely confused by this.
We can't fix GitHub's license scanning in the case where there is more than one license file with conflicting information. Their suggested solution is to move complex information to a file whose name does not contain "LICENSE". We don't recommend this, because it can obscure the genuine complexity that enterprise users must deal with if they want to comply.
Instead, in this situation we will typically ask you to ensure that your package manager metadata accurately reflects the complex situation. For example, if your source code is under the bsd-3-clause
license, and your LICENSE-DEPENDENCIES
file mentions a vendored dependency under apache-2.0
, your package manager's metadata should mention both bsd-3-clause
and apache-2.0
. (Dependencies that are managed through the package manager don't need to be tracked this way - they should surface their licensing to your users through their own license metadata.)
How you do this may depend on which package manager you're using. For example:
- in npm's package.json, use an SPDX license string:
bsd-3-clause AND apache-2.0
(docs) - in RubyGems' gemspec or Rakefile,
licenses = [BSD-3-Clause, Apache-2.0]
(docs)
More links to package manager documentation are below, under "Package Managers".
Non-standard copyright statements
GitHub's license parsing (and many other licensing tools) distinguish between copyright statements and license text. Copyright statements are information about who owns the copyright (usually featuring (c), years, and names), while the license text is what tells us what restrictions the copyright owner places on the code. Depending on the tool and use case, copyright statements are either ignored or stored separately.
For example, in this file, licensee sees:
Copyright (c) 2004-2020 David Heinemeier Hansson
Arel originally copyright (c) 2007-2016 Nick Kallen, Bryan Helmkamp,
Emilio Tagua, Aaron Patterson
Permission is hereby granted, free of charge, to any person obtaining...
Package managers
Most package managers provide metadata about the licenses used by a package. However, the specifications for each of these are inconsistent, which can make building enterprise tooling difficult.
Add missing license metadata
One of the most common problems with package manager metadata is that it is missing. The lifter dashboard will alert you and prompt you to fix it if we couldn't find the license for a package.
Usually the package manager will have a metadata field for this. For example,
- Maven has a licenses section in pom.xml; ideally, use an SPDX identifier in the
name
field. - npm uses an SPDX identifier in the
license
field ofpackage.json
- PyPI packages list license classifiers
- Rubygems has
license
andlicenses
fields which can be set to an SPDX identifier