Behind the Code: Discovering and Protecting Exposed Secrets

In this webinar, Aqua Nautilus researchers present original research on exposed secrets across GitHub and cloud registries, including cases where tokens from SAP, Microsoft, Mozilla, and Cisco provided access to millions of artifacts, internal registries, and even network devices. The session delivers practical methods to uncover hidden secrets and prevent them from being exploited.
Duration 30:34
Presented By:
"Our research showed that if organizations only scan with a regular git clone, they miss about eighteen percent of potential secrets in their code base."
Yakir Kadkoda, Dir. of Vulnerability Research
Aqua Nautilus
Transcript
Hello, everyone. Thank you for joining us today, and welcome to our session behind the code, discovering and protecting exposed secrets. In this session, we'll explore how secrets can be unintentionally exposed and share techniques to help you identify and protect them before they are exploited.
So some details about us. My name is Yakir Kadkoda, and with me is Ofek Itach. Hello, everyone. We are security researchers from Aqua Security Team Nautilus. We each have different experience such as in development, penetration testing, red teaming, and threat hunting.
Currently, we are mainly focused on in vulnerability across different platforms, particularly in cloud technology.
And we would also like to thank Asaf Morag for contributing to this research.
Let's discuss what we will cover during this session. We will introduce original research on secrets and how they can be hidden in your code base. Then we will also present methods to uncover these hidden secrets, and we will offer mitigation strategies to address these issues.
Let's briefly discuss the research studies that we will be presenting. First, we will talk about Kubernetes secrets and how they can be hidden in your cause base.
Then we will discuss shadow IT on GitHub repositories.
And lastly, we will present our latest research titled Phantom Secrets.
Now, Oferk, we will continue.
Hello, everybody.
So let's start by talking about Kubernetes secrets and how they can be hidden in your codebase.
So what are Kubernetes secrets?
Kubernetes secrets are objects we can define in Kubernetes.
They are used for sensitive information like credentials.
For instance, we use these secrets to pull images from a private container registry.
Secrets can be defined using the CLI declared in YAML files and then applied to the Kubernetes instance.
Here is a typical Kubernetes secret looks, noting that there are several types of secrets, mainly for different use cases and validation purposes for Kubernetes these secrets are base64 encoded which means that they are not secured by default and anyone with access to these YAMLs can decode them let's understand how we use this type of secrets assume we have a github repository containing a YAML file with Kubernetes secret object.
We clone the project and have this file locally.
When we hit the Kubernetes apply command with the YAML file, we instruct our Kubernetes cluster to store this file encoded in base64.
Now the secret is stored in our etcd database.
Let's focus on the fact that we got this secret from GitHub GitHub and an attacker can do the same so anyone with access to a GitHub account can actually search this type of secrets using just the GitHub search command and that's exactly what we have done we focused on two types of the Kubernetes secrets contain container registries, addresses, username, password, all encoded in base sixty four format.
We wrote several GitHub regex pattern to hunt specifically for these secrets across all GitHub using the GitHub advanced search feature as a result we found numerous instances of these secrets all we now have to do is to decode them and look for interesting results and the results were very worrying we found numerous tokens for several container registries almost half of the tokens were valid these tokens allowed us to pull images and even have a write access to some of the registries so let's briefly go over some examples one of the tokens we found on github belong to a SAP which is a huge company these tokens allowed us to gain full access to the Artifactory and in the artifactory we managed to find over ninety five million artifacts imagine a case where an attacker gained access to the to such an artifactory and starts enumerator and find more password and learn about the organization this could be devastating for the organization that's why we immediately reported to SAP and they mitigated the issue and acknowledged us another example from our acknowledged us another example from our research is Docker Hub accounts out of ninety four Docker Hub accounts that we found sixty four contained valid credentials with no MFA authentication these credentials allowed us to authenticate with high privileges to their docker accounts some of these credentials were for of a big companies and you can see in the table below the amount of images of each docker account and the total number of pools that each account had this is huge numbers so let's suggest some mitigation strategies for this kind of secret exposure first of all the issue is not with the Kubernetes secret themselves but with the encoded secrets often missed by secret scanners due to performance concerns to address the problem we made sure that Trivy which is open source tool secret scanner made by Aqua use custom regex patterns to search for this kind of secrets also if you really want to keep your secrets in your codebase you can consider using solutions like sealed secrets and Mozilla swaps which stores your secret in which is a research about shadow IT of GitHub repositories.
So first, to understand shadow IT we need to understand what approved IT is.
Approved IT includes any device or software that the organization has approved for use and can use for monitor and control Shadow IT includes unmanaged can sometimes considered as shadow IT.
We analyzed past research conduct by us to determine how many tokens we found in personal Github accounts of employees versus the organization's official GitHub accounts and the results revealed that most of the times the tokens were in personal GitHub repositories of employees these personal repositories are not scanned by default scanners configured in your organization so let's disclosure some interesting findings We discovered a token in a personal GitHub account of a Microsoft employee.
The token granted access to an internal Azure Container Registry used by projects like Azure IoT Edge.
The token was privileged, allowing read and write access to other images, thereby posing a potential risk for a supply chain attack against Azure.
We demonstrated the severity by pushing a test image in our POC, then we reported our findings to Microsoft, which took action and mitigated the risk and rotated the token.
Another example we found an internal token in the personal GitHub account of a Reddit employee which had access to an internal registry used by Reddit then we reported this token to the Reddit team we then rotated the token and mitigated the risk so these were two cases of shadow it of personal GitHub repositories in huge organization now let's discuss how we can deal with shadow ID risks.
First of all teach your employees to scan their personal GitHub accounts for secrets and avoid using company secrets in their personal repositories Secondly actively scanning GitHub and internal repositories for identifiers of your company such as specific domain names and other related keywords to find possible leaks. Last, always give a specific scopes to tokens when generating them and ensure they have a short lifespan.
Thank you, Oferk. So now let's jump into our latest research called Phantom Secrets.
To capture your attention in this part of the session, I will start by sharing two key findings we discovered using the Phantom Secret. This will illustrate you why it is crucial to scan repositories with these strategies as we will demonstrate.
Our first finding was a secret belonging to Mozilla organization.
This secret was in the organization GitHub repository, and it was related to something that's called the fuzzing manager.
Basically, it's an infrastructure that contains details of fuzzings. If you don't know what fuzzing is, it is a type of dynamic code analysis that helps find bugs or potential vulnerabilities by providing a lot of inputs to system to a system.
This token was valid, and in our POC, we were able to access a lot of fuzzing results of the Firefox project. Each one of these results represent represents a potential security vulnerability again a vulnerability against the Firefox browser.
Since the Tor browser is based on Firefox browser, each result could potentially apply to the Tor browser as well.
Of course, we reported this finding to the to the Mozilla organization which quickly rotated this token and secured this critical exposure.
Another interesting findings with Mozilla was a token for their telemetria dash This telemetry contains aggregate data of Firefox user to enchant the performance of the Firefox browser, but it also includes sensitive data about Mozilla and their business.
Of course, we reported this finding as well and they quickly routed this token and mitigate the issue.
So now after we have finished with the examples, let's delve into the research and discuss the findings we uncover.
In the past, we heard a lot of horror stories about leaked secrets in various source code management platforms.
These stories raised great awareness among developers and organizations around secret exposure. However, secrets in the organization still leak mainly to the company code base, especially if they have open source project.
And secrets can leak to any type of files. It can be via API token in a JavaScript file or a temp Terraform file that contain containing AWS credential or even an ELF or PE binary file that contains secrets. Let's quickly describe how we as developer organizations deal with leaked secrets.
Many organizations nowadays are aware of the risks of leaked secrets.
And a lot of of organizations use something that called git pre hooks to prevent secrets from being pushed to GitHub in the first place or other source code management platform.
These pre hooks alerts the user that their commit may contain a secret.
We also want to actively search our code base for secrets by using secret scanning tools. So developer, AppSec teams, and automation workflow will run secret scanning tool to search for secrets in their codebase.
And of course, we want to repeat this scanning often.
We have mentioned before secret scanning tool, and it's important to mention that that many are available out there. And it's also important to mention that each secret scanning tool operate differently.
Some are pattern based while others use entropy to find unknown secrets.
There is no good or bad approach, each has its own advantage advantages and disadvantages.
But there will always be blind spots for secret scanning tools. For example, not every secret follows a specific pattern.
And sometimes, things in our code have high entropy, which can cause entropy based scanners to generate false positives.
And secrets can exist in any files or form, not just as a parameter of URL.
And besides all of these blind spots, we found more issues that will cause us to find only part of the exposed secrets in our code base.
So let's reveal more blind spots that we found in this research.
For some backgrounds, most secret scanning tools walk behind the scenes or instruct their user to use the git clone command to download the repository before scanning it.
For this that, less familiar with the git clone command is simple. When we want to scan our project or repository for secret, we use the git CLI and provide the URL of the repository containing this code.
Git will then download the repository from our source code management platform, such as GitHub.
After this, we can run our secret scanning tools like Gitlix and more in our projects.
As we now have the code in our local machine.
It's turned out that due to edge cases and design issues with Git and some source code management platforms, using the git clone command is not enough and we may miss content that contains secrets.
Later, we will provide you with a severe example of findings resulting from this issue.
During this research, we will mainly focus on GitHub since it's the most popular source code management platform, but the findings are relevant for other source code management platform as well, for example, GitLab, Bitbucket, and more.
We can categorize secrets into three different categories. First, secrets that are accessible via the git clone command. Then, secrets that are only accessible via the git clone dash dash mirror command, and then secrets that are only accessible through something that's called cache view of source code management platform.
Let's start with the first category, secrets that are accessible via the git clone command.
So this approach of secret that only accessible via git clone command includes our standard practice where developers or automation workflows use the git clone command to create a local copy of the project and then run a secret scanning tool on it. When we run the secret scanner on the code fetched with the git clone command, it will detect any secrets that present in the codebase and its history, including all the branches available on the source code management. For instance, if you have a branch called bugfix on GitHub and there is an accidentally secrets within this, the history of this repository, using the git clone command following a secret scanning tool, detect command will identify the secret.
Now we are at the interesting part of this research, so pay attention.
Secrets that are accessible only on the mirrored version of our repository.
On February two thousand twenty two, a cyber security company called Nightwatch released an interesting finding that result in a CVE known as Gitbleed.
In this CVE, they mentioned that additionally parts of the repository only become visible when using the git clone dash dash mirror command.
They highlighted that this issue is related to secrets not being properly removed and other behaviors of git. Here, we will expand this research and explain other scenarios that we found and even flaws that can hide secrets if a user if a user perform a regular scan with the git clone command.
The bottom line of this finding is that when you use the git clone command, a specific set of secrets may be discovered.
However, when you use the git clone dash dash mirror command, the range of potential secrets will be larger. This is because the git the this is because the git clone dash dash mirror provides more content than a regular git clone command.
We can't explain this finding without explaining some basic git terminology.
I will try to keep it simple and provide you what you need to understand for this issue. In git, we have three basic objects. First, blob. A blob is a binary representations of a file content.
Then we have three. Tree points to blobs and contains their name because blob do not have file names. Think of tree as directory or subdirectory.
Then we have commit. Commit contains a reference to a tree along with other metadata such as their author, time, and more. And And it's also important to mention that commits, blobs, and trees all have unique identifier, which are Shawan hashes for each object in the project.
The next subject I want to mention is branches and references.
Basically, references are just pointer to git commit.
One type of reference is a branch. In this example, we can see that we have two references or branches called master and bugfix.
And instead of using the short name of the branch, we can also refer to its full name by by, full name, for example, refs slash add slash master.
Similarly, the bug fix brand's full names is ref slash ads slash bug fix. For simplicity, we can omit the, parts of the long name and just use the short name, like bug fix, without mentioning the ref slash adds part, and it will still be understand by git.
Now that we know references are just pointers to commits, and branches are considered references, let's understand how the git clone command works.
When we clone a project, we can enter it and observe a file called the config file. That file contain information about how git fetched the data from the remote source code management to to our local workstation.
We can view this specific configuration, and on the left side we can see that everything started with refs ads will be mapped to the following references in our local copy, in this case, ref remote horizon.
Now let's see what will happen when we clone the project with the git clone mirror command.
With the git clone mirror command, we can view the same config file and we can see that we took we take all the references, not just references that starts with ref slash add, which are actually branches, so we will get all the references that exist on the remote, in this case, GitHub.
When we use the git clone dash dash mirror command, we not just only bring branches but every references that exist on GitHub and this is what actually creates the delta and the gap between content that we get from the git clone command and the content that we get with the git clone dash dash mirror command.
Let's go to an example.
Let's assume that we have a git repository with the following commits and references that points to this commit history.
A regular git clone command will only fetch branches, so it will fetch only references that start with ref slash add slash, as we saw in the config file before.
A mirror clone fetched everything. It will give us an exact copy of the repository as it exists on the remote. And because of this, with a regular clone, we are left with some references that we missed, which may point to a commit or a commit history with secrets.
So let's give you one example of things that we found that will not be scanned with a regular git clone command, pull requests.
Pull requests on Github are stored as references in the format of ref slash pull. They are read only and you cannot delete them without GitHub support.
Because they start with ref slash pull, they will only be fetched on a mirror cloned version of your repository. So you must do git clone command to find them.
And this is how it's look when you list the the reference of your project in the remote. And in this case, you can see that there are several references of pull requests. In this case, we can see a pull request number one on GitHub.
We found that if you have in your repository pull requests containing secrets and the pull request was closed or squashed and merged are open for an fork, you will not find the secrets with the secret scanning tool unless you scan the mirror version of your repository.
And there are other scenarios that you can read in our blog that you will not find or scan if you are using only the git clone command. So, this is why it's so important always to use the git clone dash dash mirror command in order to find all the available references in your repository that might contain secrets.
Now, we are at the deepest part of where secrets can be hidden. And it's something that's called the cache view in the source code management.
So, what are cache views? In simple words, the source code management saves every commit that was ever pushed to it in a caching mechanism.
This means that everything committed to a source code management platform is endurable.
So even if you remove a secret a secret containing commit, it doesn't eliminate access to it.
And if you have the commit ash, you can retrieve it using the source code manage the source code management API or the GUI. For instance, you can use curl or your browser, enter the hash, and it will show you commits that might not even exist in the mirror version of your repository.
Not long ago, a cybersecurity company called Neodymy released a blog about some of these findings.
Here we will expand this and add some more, but you can read their blog. It's a great one about how it's possible to find hidden commits using the GitHub API.
So first of all, we must say that in order to find cache view, we must know the Shawan ash of the commit in order to reveal its con its contents and scan it for secrets.
And in addition for this, the option for Shawan on single project is really huge numbers so we have many possible commits ashes. But in these secrets, we found four different way to actually get this hidden sha of our project and then scan them for secrets.
As we mentioned earlier, there are several methods to find the ashes of the cache view commits and you can read the blog for more technical description. It is just an overview of some of them. First, it's possible to utilize the redirect mechanism of GitHub and other source code management platforms, and actually brute force them in order to find commit ashes. Secondly, it's possible to find them via the GitHub API, and it's also possible to find them via the GUI of GitHub. And our biggest discovery is that there are databases that record every action that made on GitHub making it possible to others commit Ashes from these databases.
You can read our blog for more technical description about this method and how you can actually find secrets or commit ashes that belong to your organization.
In our research, we decided to conduct a test to determine if there are any long hidden secrets yet to be discovered.
What we discovered was surprising.
We scanned the top one hundred organization on GitHub, which contain a lot of companies and collectively contain around fifty thousand of repository.
Then we scanned this repository for secrets that exist only in the mirror version of the repository.
We found that if organization only scan the repository with a regular git clone command, they will miss approximately eighteen percent of the potential secrets in their code base, which is a huge number.
So let's see more interesting and severe findings that we discovered during this research, beside Mozilla.
One critical token that we found was associated with Cisco. It was a privileged Meraki API token.
A Meraki API token is highly sensitive because it grants access to network devices enabling configuration and more.
This token provides us access to network devices, SNMP secrets, cameras, footage and more with several fortune five hundred companies.
This could lead to a severe incident.
Our final example involved a privileged Azure account token utilized by a major ethical company.
This token provides us access to Azure AD, their container registry in Azure ACR and their Kubernetes instances in Azure AKS.
We have observed numerous examples and scenarios in which secrets can be concealed.
Let's propose some mitigation strategies to prevent these hidden secrets.
First immediately rotate any secret that was accessible pushed to SCM we observed that they can remain concealed yet accessible.
Remember cached views on HCM platforms can still access old commit and if you discover a secret please contact GitHub support to have it completely deleted.
Remember, the most effective approach is to use pre commit hooks to scan for secrets and before each commit.
To sum everything up, please use the Github dash dash mirror to scan your repository for secrets if you are scanning the mirrored version of your repository for the first time we recommend scanning the differences between the regular and the mirror clones This approach helps you to discover secrets that have not been yet discovered before.
We also recommend using the GitHub dataset and other techniques shown to identify any exposed commits.
So now we have finished the session. We have talked about Kubernetes secrets, about Shadow IT secrets and GitHub repositories, and about Phantom secrets.
You can read our blog post of our research and studies in the aqua blog. Thank you for watching.
So thank you everyone and now we have some time for questions.
Watch Next