Dependency confusion is a vulnerability which gained popularity in 2021 when Alex Birsan discovered it. It is one of the most impactful vulnerabilities amongst the existing vulnerabilities as its results are as heavy or even more than a Remote Code Execution attack.
Dependency confusion uses the most common features of well-known programming languages to trick targets in pulling malicious dependencies created by an attacker.
Let us go through some essential concepts before understanding the Dependency Confusion attack.
What are Dependencies?
In programming, dependencies are packages or libraries with software code required to maintain the status of your software project or the environment. Programming languages offer effortless ways to install these dependencies with the help of public code repositories based on that language. Dependencies are code modules that are bundled for use in your application code. It is a method that allows you to reuse code for often-solved problems and import it into your apps.
Public code repositories where anybody can freely publish code packages for others to use, which are applied and installed by an organization or a client in their project or software that does not belong to them or is not owned by them.
Public and Private dependencies
Dependencies accessible to all the end users and organizations are Public. Everyone on the web has access to public repositories. Any user can create and publish his own repository on the internet. These repositories are reused by other end users as well.
Dependencies created and owned by the organizations are Private dependencies, and they are used internally. Only users, others, and individuals expressly share access with or, in the case of organization repositories, specific organization members have access to private dependencies. No end user can apply these registries for their use except internal members in an organization.
To run a tool called SecurityBoatRecon, we need the following dependencies in our system and to understand this concept thoroughly, here is an example:
The dependencies mentioned in the first four blocks are public dependencies and the dependencies in the remaining 3 are private Securityboat dependencies.
A package manager maintains a record of what software is installed on your computer and makes it simple to install innovative programs, upgrade application code to newer versions, or uninstall software.
Ruby gems are available on Ruby Gems, while Python’s pip utilizes PyPI (Python Package Index). npm and the npm registry are available in Node.
|pip install package_name
|npm install package_name
|gem dependency GEMNAME
Let us understand them one by one.
Npm is used to install node packages, and npmjs is their principal repository. The dependencies are normally stored in a file called package.json in the program attachments.
npm install package_name
the command is used to supply required packages in the package.json file.
Npm allows preinstalling scripts to run during the installation of the public packages, providing package owners with basic information about the machine upon which the package is being installed.
Pip is used to install Python packages, and Pypi is their principal repository. The dependencies are normally stored in a file called requirements.txt in the program attachments.
If you execute a programme in Python that depends on the “express” library and it is not available on your machine, you will see the following error:
ModuleNotFoundError: No module named 'express'
And Python provides an easy solution for that. To install the “express” library from PyPI you need to type
pip install express
If there is ever an error in the installation, most developers recommend using pip to install the dependencies.
To regulate installation sources and their priority, pip cli gives two options.
To allow pip to preserve the original Index URL, we use —extra-index-URL. This enables pip to install public packages on which your private package potentially relies.
Whereas the default package index is the standard module index used only by pip to install python packages, it may be altered using the —index-URL option to provide a custom index. However, you will lose access to public PyPi packages.
How does Pip Package Installation work?
At the time of installation, pip checks
1. If the package is available on the public index and proceeds with version available
2. If the package is available on the private index with mentioned internal index
3. The source with Higher version number is selected if the package name is available on both public and private dependencies.
Thus, an attacker is permitted to hijack a private package as a higher version number is applied in the publication.
Typosquatting is a technique of tricking victims with the use of similar package names in Npm, Pypi and Ruby Gem’s Registries. So, the idea is to create as many packages as possible for a single package name, so that if the victim commits any typing mistake, it will result in downloading a malicious package created with the same name. Another method is to create public packages with the same name as packages which do not exist anymore in public registries. The typosquatting names for the package “setting_libraries1” will be “seting_libraries1” or “setting_librarie1”. This increases the possibility of installing packages without control over the execution of that packages.
Why does a dependency confusion attack happen?
Dependency confusion, as the name suggests, is tricking an installer (npm, pip, gem) into fetching or retrieving a malicious public third-party repository rather than the desired private repository created by a private group. The default software development tools retrieve third-party packages from public and private repositories, causing dependency confusion.
1. At the time of including a dependency, an installer installs required library on the machine, whether it is public or private
2. Ideally, the installer must fetch the private dependencies first.
3. Instead, it searches for the dependencies with a higher version number.
4. So, if there are two dependencies with the same name, public and private, respectively, then the dependencies with higher version numbers are fetched by the installer.
This is the root cause of dependency confusion.
How does a dependency confusion attack happen?
There are three steps involved in a dependency confusion attack
1. Fetching private dependency names
2. Verifying private dependency names
3. Creating and Deploying packages on Public Registries
1. Fetching private dependency names
In order to get names of the private dependencies a company is implementing, there are ways listed below in which you will try to get package.json, requirement.txt and gemfile.
The dependency files for python, node and ruby can be found on GitHub. To get package.json in the case of nodeJS we can simply mention
it in the GitHub search. Similarly, you can specify requirement.txt and gemfile in case of python and gemfile in the case of ruby.
The company or online forums on the internet contained the company posts and solutions.
Leaked package names of companies are sometimes available on the package hosting services.
2. Verifying private dependency names
After discovering the packages from the above methods, depending on the package type, visit the Npm registry, PyPi or Ruby Gems and check whether that package name is present on the registry; it is a public package if the name is present and a private package if the name is absent.
3. Creating and Deploying private packages on Public Registries
Now, let us assume we got a private package
analytics_securityboat ^ 1.1.1
and see how to create packages in npm and python to deploy them to a public repository.
1. The command
, after successful registration on Npm Registry followed by login with
apt install npm
, will allow you to create a new public package.
"description": "This is a fake package",
"test": "echo \"Error: anything" && exit 1″
Specify the above details to create a package.json file.
3. Add a preinstall script in the script section of the package,json creation phase as shown below
"test": "echo \"Error: anything" && exit 1″
4. This index.js will contain a script that allows the publisher to run a script at the time of package installed on any machine trying to fetch this public package.
5. Index.js will contain your script. For example, if you want to retrieve the hostname, username, etc. script will include these details
6. Package.json and index.js will be deployed onto the public npm registry under the package
7. Script will fetch your details, such as Hostnames and usernames, once anyone has installed a package on their machine.
Likewise, python will need setup.py to be configured to exploit the internal hosts.
Impacts of Dependency Confusion
One of the objectives of adversaries uploading packages to public repositories using one of the techniques is to execute malicious code on a device pulling the package. The package could either be pulled from the developer’s workstation or from the build server.
When the malicious code runs, it can be used for credentials theft and lateral movement within the environment it is running in.
It is also possible for malicious code to make its way from the build server to production environments. The malicious package often carries over the original, safe functionality the user expected, reducing the likelihood of discovery.
Google’s security policy states, “if an attacker injects any code at all, it’s pretty much game over.” But with continuous deployment (CD) becoming more common, it is becoming harder and harder to spot such attacks before they are released to users.
The attackers’ goals are now changed. The theft of resources for cryptocurrency mining, harvesting username and password combinations for credential stuffing, and data scraping are a few examples.
Using version pinning
You can prevent substitution attacks by explicitly mentioning the version numbers of dependencies. In this way, package managers will not download dependencies from public repositories if a higher version number is available. Dependencies are specifically referenced with version numbers like 3.5.4 rather than >= 3.5 or 3.5*.
The version pinning feature is a client-side control.
Using client-side verification
When installing dependencies, integrity checking on the client side can prevent dependency confusion and substitution attacks. Python’s pip, for example, supports SHA256 hash checking that verifies all downloaded dependencies against the SHA256 hash on the client side. In such a case, substitution attacks require control over both the server and client to succeed. Maven provides plugins that verify PGP signatures of dependencies. Signatures can be used to determine if any changes have been made to the artefacts after they have been created.
You should never allow a client to fetch any code packages directly from the internet or from untrusted sources. Instead, implement the following controls:
If third-party packages are being pulled from an external repository, ensure that they are pulled through an internal proxy instead of directly from the internet. As a result, additional security controls can be deployed at the proxy layer, as well as increased investigation ability around packages pulled, should a security incident occur.
Disallow packages from being pulled directly from external repositories.
Using a single private internal repository
One method for combating dependency confusion or substitution attacks is to use a single private internal repository. This keeps the package manager from going outside the internal repository.
In the end, systems will always be vulnerable to security issues if their users are willing to take the necessary security measures. If you want to eliminate dependency confusion, you must stop using dependencies in projects, which is counter-productive and adds risk to developers trying to create their version of a dependency in areas they do not have experience in. It’s never a promising idea to block all dependencies. It has more to do with having an inventory of all components of your organization, programming languages, and CI/CD systems, ensuring security baselines are met, configuration across all of them and then having active monitoring tools around them to alert you when something goes wrong.