Aqua Nautilus researchers evaluated the vulnerability disclosure process for tens of thousands of open-source projects and found flaws in the process. These flaws allowed harvesting the vulnerabilities before they were patched and announced. This could enable attackers to exploit security holes before the project’s users are alerted.
By conducting an extensive analysis of commits, pull requests, issues on GitHub, and extracting insights from the National Vulnerabilities Database (NVD) dataset this research yielded many findings. In this blog we shed light on our work, the process, research methods, highlight the stages of vulnerability discovery, and the gravity of early exposure of vulnerabilities in open-source projects. We additionally advocate for the need for standardized responsible disclosure processes.
Vulnerabilities disclosure stages
To understand the current research, we’d like to clarify some definitions related to the vulnerability disclosure process:
- ‘0-day’: A vulnerability that is unknown to the maintainer of the project.
- ‘1-day’: A vulnerability that is known to the maintainer. Typically, the CVE/announcement is publicly published. At this stage, there is typically an available patch. However, the release of this patch can sometimes be delayed by a day or, in some cases, may not be released at all.
In our research, we found many cases of vulnerabilities that can’t be categorized as either ‘0-day’ or ‘1-day’, because they fall between these definitions (“between the times”), we therefore conclude that the world of vulnerability disclosure is more complex than a binary distinction. Consequently, we offer two more states in the vulnerability disclosure process:
- A vulnerability that is known to the maintainer.
- Alarmingly, information about the vulnerability is exposed to the public through platforms such as GitHub (commit/PR/issue), NVD, etc.
- A commit to fixing the vulnerability may have been created, but an official release is not yet available.
- A CVE may or may not be assigned during this phase.The risks of ‘Half-Day’: Attackers can harvest any indicators of a new vulnerability being formed or disclosed on public platforms (GitHub, NVD). Utilizing messages and meta-data found in PRs, commits and issues the attackers can locate references to the vulnerable code, use reported proof of concept (if exists) and even write their own exploit.
To further illustrate our point, imagine a case where there is an open issue on GitHub about a possible vulnerability in the project. This was acknowledged by the maintainer and a commit that fixes the vulnerable code exists and refers to the issue. Nevertheless, the latest release of the project does not include the commit that resolves the vulnerability.
- A vulnerability that is known to the maintainer.
- An official patch is available.
- A CVE or CPE is not available.The risks of ‘0.75-Day’: Attackers can harvest any indicators of a new vulnerability being formed or disclosed on public platforms (GitHub, NVD). Utilizing messages and meta-data found in PRs, commits and issues the attackers can locate references to the vulnerable code, use reported proof of concept (if exists) and even write their own exploit.
The main difference between this state and ‘Half-Day’, is that even though an official patch is available, because a CVE or CPE has not been assigned yet, vulnerability scanning tools can’t detect this component in your environment, and you may not be aware you need to fix it.
Please note that all stages from ‘Zero-Day’ to ‘One-Day’ carry the same level of seriousness and impact for practitioners. This is because the details of the vulnerability and the patch are not publicly available, and security solutions are unable to detect or offer remediation before reaching the ‘One-Day’ status.
Naturally, these are not the only scenarios that exist out there. But in our research, we found that these are the most common ones, and easiest to harvest, as we demonstrate in this blog.
Below, we present several case studies that demonstrate the impact of our research on the open-source vulnerability disclosure process. Although there is no concrete evidence to suggest that attackers are actively exploiting this in the wild, it is reasonable to assume that threat actors may harvest information from open-source projects. They could be using this data to gain a deeper understanding of the projects and to search for potential vulnerabilities.
In this section we shell show 2 case studies of flaws in the vulnerability disclosure process.
Case Study 1: Analysis of Log4Shell (CVE-2021-44228) Disclosure Process
To shed light on this topic, let’s examine the disclosure and patch timeline of the well-known Log4Shell vulnerability – We will highlight the inherent discrepancies in the disclosure/reporting process.
For many practitioners the Log4Shell vulnerability symbolizes as a turning point for how they view vulnerabilities. Many indicate that they had to work day and night to patch this vulnerability when it just came out. So, for a quick refresher, Log4Shell was discovered and reported to Apache by Alibaba on November 24, 2021. It then gained wider attention with a tweet on December 9, 2021, and rapidly became a significant concern.
We’ll investigate the activity found in the relevant repository on GitHub during this period to highlight the evolving states of the vulnerability.
As they say, a picture is worth a thousand words, so we’ll begin with a visual timeline chart, followed by a detailed breakdown of each stage.
Breakdown of the Chart:
- The vulnerability was reported to the Apache team on November 24th, 2021.
- On November 30th, 2021, six days later, a maintainer opened a pull request on GitHub with a commit that fixed the issue. From this point on, the vulnerability and its details are available to anyone on GitHub and are practically publicly exposed.
- On December 5th, 2021, five days after the pull request, it was merged. However, no official patch was available at this time – the fix was solely in the open-source code.
- On December 6th, 2021, a day later, the first official patch became available on Apache’s website.
In summary, regarding the ‘Half-Day’ window: Over a span of 6 days (from November 30th, 2021 to December 6th, 2021) the vulnerability was exposed on public platforms (like GitHub). This timeframe allowed attackers to detect the problem, pinpoint the vulnerable code, and potentially craft an exploit before users became aware and could implement a patch.
It’s worth noting that, at this stage, scanning tools couldn’t detect the issue because the CVE number and CPE for the vulnerability hadn’t been created yet.
So, we proceed to the ‘0.75-Day’ window.
- On December 10th, 2021, four days later, an official CVE identifier for the vulnerability was released. This marked the first time that some vulnerability scanning tools had the necessary data to detect this vulnerability.
In summary, regarding the ‘0.75-Day’ window: Over a span of 4 days (from December 6th, 2021 to December 10th, 2021), the vulnerability was exposed on open-source platforms. An official patch was available from Apache during this time. However, attackers could still exploit this vulnerability against users who hadn’t applied the patch. This is because only after December 10th could scanning tools effectively identify this CVE in user environments.
- On December 13th, the CVE was assigned its score and CPE by NIST. This gave scanning tools a more detailed insight into the vulnerability’s impact on different software, products, and versions. Furthermore, it helped users prioritize this CVE in their vulnerability management system.
Another example to this process flaw it the Text4Shell: CVE-2022-42889, only this time it’s even worse than the one before, as the windows in this case is much bigger and illustrates a more severe scenario.
There were 75 days of ‘0.5-Day’ windows, and 14 days of ‘0.75-Day’ windows.
The initial pull request addressing this issue was made on July 16th, 2022. It was merged on September 23rd, 2022, and the official release for this issue was available on September 29th, 2022, marking 75 days since the first indication of this vulnerability on GitHub. Only 14 days later, on October 13th, 2022, the CVE was identified by NVD and publicly announced.
Case Study 2: Half-Day and 0.75 Day at Binwalk (CVE-2022- 4510)
If you’re unfamiliar with Binwalk, it is a tool used for searching a given binary image for embedded files and executable code. We are showcasing this CVE here because we were able to catch this case in real time and observe the behavior and timeline of this issue using our first method to harvest vulnerabilities from open source projects.
For a quick refresher, on January 31, 2023, ONEKEY released a blog about Remote Command Execution in the Binwalk tool. In this case, at the time of publication, the vulnerability has yet to be patched. This is not a rare case and sometimes helps warn users and apply pressure on the project team to release a patch.
Below we review the process and different milestones during the vulnerability disclosure as they appear here.
Dates may vary slightly due to discrepancies between different information sources
Breakdown of the Chart:
- The vulnerability was reported to ReFirmLabs on GitHub via a pull request, on October 26th, 2022.
The reporter mentioned that they took the liberty to report it openly since another issue (#556) was resolved in this manner, and they could not find any security/coordinated disclosure policy or contact information.
At this point, the issue was publicly exposed, making it possible to identify this vulnerability before its official disclosure. This pull request also included the commit containing the vulnerable code and a Proof of Concept, which could allow attackers to exploit this vulnerability.
- On January 26th, 2023, 92 days later, the vulnerability was published on NVD with a description and reference to the GitHub pull request, which included a description of the vulnerability and the Proof of Concept (PoC). This allows attackers, who harvest data from NVD to learn about this vulnerability before a working patch was available to users.
- On February 1st, 2023, six days later, the pull request was merged, and an official patch was released.
In summary, regarding the ‘Half-Day’ window: Over a span of 98 days (from October 26th, 2022, to February 1st, 2023), the vulnerability was exposed on public platforms. This timeframe allowed attackers to detect the vulnerability and possibly exploit it before users became aware and could implement a patch.
These case studies are just a small part of the many cases and scenarios we come across.
How is it possible to identify such vulnerabilities on a large scale? Through our research, we discovered two sources that enabled us to collect data from various popular open-source projects, aiming to identify ‘Half-Day’ and ‘0.75-Day’ vulnerabilities within them over time.
Our aim is to assist practitioners in detecting and mitigating issues within their vulnerability disclosure lifecycle. To this end, we first want to bring this issue to your attention and secondly, to help you reduce the risk of exposing sensitive information about vulnerabilities during the disclosure process.
We will illustrate our methodology in two parts: first on GitHub, and then on NVD.
Harvesting Method 1: GitHub Pull Requests, Commit Messages, and Issues
On GitHub, we carried out the following steps:
- Compiled a list of around 15,000 popular GitHub projects along with their URLs.
- Utilized the GitHub REST API to analyze Open Pull Requests, issues, and commit messages.
- Within the Pull Requests, issues, and commit messages, we searched for “trigger words” as listed below, which could indicate vulnerabilities or unwanted behavior.This word list is only a subset of the complete one. Some terms are given higher weight than others to prioritize the possibility of a real vulnerability or security issue.
- We reduced the data volume by checking for existing releases that were published after the Pull Request, commit, or issue was merged/closed, etc. – indicating a high likelihood of an official patch for the identified “vulnerability”. We ended up with ~2,200 relevant results which were further analyzed and narrowed to ~50 (~2.3%) projects which required further in-depth analysis.
|use after free
An example of some vulnerability/issue identified by this method spans from November 2022 to February 2023.
As you can imagine, this approach yields many false positives. It’s essential to understand the context of each project, issue, and PR. Some terms that might seem indicative of a security issue, like ‘overflow’, can have various meanings. For instance, the term ‘overflow’ often appears in the context of GUI overflow, unrelated to security concerns. If you wish to replicate this process, you’ll need several GitHub tokens due to the numerous API requests. However, if you’re only focusing on a project under your responsibility, a few GitHub API tokens should suffice. Understanding the context becomes simpler in such cases. It’s also advisable to use the LLM module to process the results and filter out irrelevant ones.
Harvesting Method 2: Monitoring NVD Early Exposure
Sometimes, CVEs are uploaded to the NVD before an official patch is released, happening too early in the disclosure process’s lifetime – This can occur various reasons. Interestingly, some have references to their GitHub commit/PR/issue while vulnerability verification is ongoing, exposing them to attackers who could potentially “harvest“ them and develop an exploit.
Consider this: when you have the commit/PR that fixes a security issue, it becomes easier to understand the context of the CVE and develop an exploit because you have access to both the vulnerable code and the implemented fix, and sometimes even a Proof of Concept within the PR.
By utilizing the NVD API to fetch recently pushed CVEs and searching for GitHub references, we can then check if the commit/PR referenced by NVD has a release on GitHub that includes them. If not, this often presents a ‘Half-Day’ scenario, where a vulnerability is exposed without a patch at that stage.
Aqua Nautilus CVE-Half-Day-Watcher
For this method, we have developed a tool that you can find on GitHub.
It’s designed to scan the NVD for potential ‘0.5-Day’ vulnerabilities, going back as many days as you require. You’ll need to provide a GitHub token for the tool to query the GitHub API, specify your desired time frame for scanning the NVD, and define the minimum star rating for the projects you’re interested in.
Although this tool is in its PoC stage, it consistently delivers a large number of daily results. Each result has a significant likelihood of representing a ‘0.5-Day’ vulnerability, necessitating only your verification to confirm whether it is indeed one.
An example of the results from November 5, 2023: Here we chose to scan a very small timeframe of two days back, and we got around 20 results of potential ‘0.5-Day’ vulnerabilities, including some from very popular projects.
In contrast to the previous method, which generated numerous false positives and required effort to determine if an issue triggered by a “trigger word” is indeed a security vulnerability, this method assures a high probability that the issues caught are genuine vulnerabilities.
Summary and Mitigations:
Our research aims to minimize the risk of early exposure of vulnerabilities during the disclosure process.
By analyzing GitHub activity and NVD entries, we’ve developed methods to detect potential security issues before they become public knowledge.
Our goal is to minimize the gap between vulnerability discovery and patch release, reducing the window of opportunity for attackers.
One might claim that there is already extensive discussion about the “integrity” of CVEs, accompanied by considerable criticism on this topic. People argue that there is an influx of vulnerabilities, which are often perceived as unnecessary and may generate excessive noise for security teams due to their potentially redundant impact. For instance, some CVEs may be unreachable or irrelevant in various scenarios, among other issues.
To help the community, we would like to suggest some mitigation steps that every open-source maintainer should adopt:
- Responsible Disclosure:
- Leverage GitHub’s private reporting feature to manage vulnerabilities discreetly.
- Create a responsible disclosure policy that outlines a secure process for vulnerability management.
- Proactive Scanning of your Open-Source commits/issues/PRs: Conduct regular scans for trigger words in code commits, issues, and PRs to prevent early exposure.
- Runtime Protection: Our findings emphasize the crucial role of runtime protection strategies in bolstering security, particularly when a patch for a newly discovered and undisclosed vulnerability has not yet been released.
By combining these strategies, we enhance the security posture against the early exploitation of vulnerabilities, ensuring a safer open-source ecosystem.