Open source software contains licensing data in order to communicate, in a human- and machine-readable manner, information about which licenses and copyright notices apply to creators’ software and intellectual property.
Risk averse customers need to monitor the license on each of their open source dependencies for every release of that dependency. These customers also need to understand the licenses embedded in any additional files in the dependency download. Having accurate data that is updated per-release ensures that customers understand their risk profile and that risk policies can be built and applied accurately.
It’s important that this data is refreshed on a per-release basis, because licenses can change, and it can result in copyright violation consequences. TinyMCE and iText are two recent examples of licensing changes that have a very different set of requirements for legal usage. When an open source project with restrictive licensing is used inside of your product, the licensing requires that you now distribute the source code of your product.
Licenses are not always declared, and discoverable. When Tidelift is unable to identify any license through automated collection, a manual review process is queued.
Our process
Tidelift maintains machine-readable SPDX license data for over 1 million open source packages. For our supported ecosystems, we perform an automated license normalization process:
- Ingest the raw license string found in each release of the package
- Ingest any license information found in the package's source repository
- Use textual analysis to clean this data up, detect inconsistencies or typos, and normalize it to an SPDX license expression
When a customer queries any package, it triggers deeper analysis that goes to our internal data team. This process ensures accuracy, standard, consistent licensing data.
The process that this team uses for each request is:
- Review source code repository, package management / build tools, package files including pom files, source and/or jar files, manifest files, and the internet archive for licensing documents
- Verify parent / child relationships and that the licensing applies to all
- When the package source bundles other packages → The team performs package download analysis to determine conjunctive (AND), or disjunctive (OR) SPDX expressions
- When there are multiple licenses detected → The team performs package download analysis to determine if and how each license applies
- When there is no license file detected in this analysis → If all (ALL!) efforts to identify a license have failed, we use the SPDX identifier "NONE", annotated with a comment of “unable to identify license for the project”.
All of this is based on the specific version of each package, and can get very deep on package review and verification, including examining the written differences of embedded licensing text to the SPDX description of a license.
Non-SPDX licenses
Any license that does not match to a license found in the SPDX license list or has been modified from a license found on this list which is no longer considered a match to the original license will be a LicenseRef license.
Guidelines for modifying the license text is found in Annex B. Section B.3.4 Guideline: replaceable text indicates that any text in red font can be replaced with the authors own wording and Section B.3.5 Guideline: omittable text indicates that any text in blue font can be omitted from the license. In these instances the license would be considered contextually the same as the original license. If the text of the license is modified beyond these allowable changes the "new" license will be identified as LicenseRef-tidelift-***.
Our Service Level Objective for delivering the research to go from ‘license is not detected’ to ‘concluded license’ is within 4 business days.
Why our data is better
Tidelift’s data is more accurate and has fewer false positives than any other data source in the market today.
Our data team, under the leadership of Tidelift’s co-founder and General Counsel Luis Villa, has industry expertise and experience to expedite your internal legal processes. This creates operational efficiency to understand and examine what conditions trigger very nuanced concluded licenses. The team can also quickly recognize variants of licenses, like the multiple variants of the BSD license, and when a particular detected license is outside the bounds of SPDX matching guidelines.
Tidelift also partners with the maintainers of thousands of the most commonly used open source packages! Our contracts with these open source creators ensure that:
- Their packages use an open source license, such as those approved by the Open Source Initiative.
- Fix any inconsistencies or mistakes in their licensing data. We've found that a surprising number of packages have mistakes in this area.
- Agree to work with Tidelift and any Tidelift customer to fix violations, prior to filing a lawsuit
- Certify that they wrote the code they've contributed to the project, or that they got it from another source where it was under an appropriate open source license.
Use cases
This data allows our customers to answer the following questions:
- What licenses are in use in our organization?
- Are these licenses appropriate for use, given the risk profile of certain applications in the organization?
- How widespread is license mis-use in our organization?
- Is our use of open source software in alignment with our appetite for revenue and reputational risk as an organization?
Outcomes
- We reduce the number of false positives and incorrect licensing data that can lead to misapplied policies
- We eliminate the time and cost for developers to track down licensing details for your legal team
- We increase operational efficiency for your legal team
- We clarify and reduce risk for your organization
Using our licensing data
If you're pulling data into other tools, Tidelift's package information APIs contain SPDX license expressions, and the source of the data. In the case where a detected license is “unknown”, the license analysis process described above is triggered. Read more about our licenses APIs in our API documentation.
If you're using Tidelift's policy management or SBOM tracking tools, the allowed license standard can flag the use of disallowed licenses in your SBOMs, once you’ve configured a licensing policy. You can also generate a report with a list of licensing policy violations packages in use to prioritize and work through. You can also export a list of licenses used in your organization.