Internet-Draft | Handling DNSSEC-Unvalidated Data | April 2025 |
Zhang, et al. | Expires 9 October 2025 | [Page] |
Due to the prevalence of DNSSEC (Domain Name System Security Extensions) misconfigurations, many domain administrators troubleshoot the records of DNSSEC-signed domains via queries with CD (Checking Disabled) bit set. However, as DNS resolvers are not forced to perform DNSSEC validation for CD=1 queries, the unvalidated data introduced during troubleshooting could be mixed up with the routine ones in the resolver cache. Recent research has revealed that the reuse of the cached unvalidated data in subsequent resolutions could lead to the risk of Denial-of-Service (DoS). This document clarifies the definition of unvalidated data in the context of DNSSEC. Then, it demonstrates the DoS vulnerabilities of current DNS resolver implementations due to the reuse of cached unvalidated data. Accordingly, it provides several recommendations for DNSSEC-validating resolvers to handle the unvalidated data and mitigate the risk of DoS, so as to improve the availability of DNSSEC-signed domains.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 9 October 2025.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Domain Name System Security Extensions (DNSSEC) aim to protect the authenticity and integrity of DNS data against cache poisoning attacks. Over the past decade, DNSSEC has been increasingly deployed by both domains and DNS resolvers throughout the Internet. However, because of the design complexity, various misconfigurations of DNSSEC records frequently occur in the wild, resulting in massive service outages due to DNSSEC validation failure [IANIX24].¶
To troubleshoot DNSSEC record configurations and manage DNSSEC validation locally, one can leverage the CD (Checking Disabled) bit in the DNS message header as proposed in [RFC4035]. Specifically, the set of the CD bit in a DNS query indicates that the DNS client does not force the target DNS resolver to perform DNSSEC validation on the received records. Typically, DNSSEC-validating resolvers accept troubleshooting queries with CD=1 from arbitrary clients they serve.¶
However, current DNSSEC specifications are ambiguous about how DNS resolvers should handle the data introduced during troubleshooting, where records that have not passed DNSSEC validation could remain in the resolver's cache. Even though the unvalidated records are generally associated to the lowest trust level according to [Li23], the resolver could still reuse them in subsequent resolution tasks without re-querying for the validated ones. Previous study has revealed that the reuse of cached unvalidated data could result in persistent validation failures of DNSSEC-signed domains, i.e., Denial-of-Service (DoS).¶
To this end, this document proposes guidelines on how DNSSEC-validating resolvers should handle the unvalidated data introduced during DNSSEC troubleshooting. First, it clarifies the definition of unvalidated data in the context of DNSSEC. Then, it presents the DoS vulnerabilities in current resolver implementations due to the reuse of cached unvalidated data when resolving DNSSEC-signed domains. Finally, it provides recommendations for DNSSEC-validating resolvers to optimize the caching and reusing of the unvalidated data.¶
The key words "MUST", "MUST NOT", "SHOULD" and "MAY" in this document are to be interpreted as described in [RFC2119].¶
Section 4.7 of [RFC4035] proposed BAD cache to facilitate the caching of resource records that have been proven invalid, so as to avoid unnecessary query retries when encountering DNSSEC misconfigurations at the domain side. This document uses the term "unvalidated" to further refer to the DNSSEC-related data that have not passed DNSSEC validation in the resolver cache, including resource records that have not been validated, cannot be validated, or have been proven invalid. The unvalidated data also include the EDNS(0) (Extension Mechanisms for DNS) status of a particular nameserver host cached by the resolver, as Section 6.2.2 of [RFC6891] have stated that the DO (DNSSEC OK) bit proposed by [RFC3225] is only signaled through EDNS(0) to indicate DNSSEC support.¶
This document demonstrates a novel DoS attack surface introduced by the CD bit, where the resolver's reusing of cached unvalidated data could lead to persistent resolution failures of DNSSEC-signed domains.¶
In general, an attacker could bypass DNSSEC validation by sending DNS queries for troubleshooting (i.e., CD=1), causing a DNSSEC-validating resolver to accept and cache forged data without validation. Later, when handling legitimate DNS queries that demand DNSSEC validation, the resolver reuses the cached unvalidated records, leading to validation failures due to missing or bogus records along the chain of trust, such as DNSKEY or DS records with their signatures either removed or manipulated. Note that the forged, unvalidated data are not directly queried by clients in routine DNS operations, and thus simply retrying the failed DNS queries (e.g., typically for A records) cannot discard them. As a result, the resolver continuously encounters SERVFAIL responses, even if it has already obtained the valid records queried by the client. Additionally, the attacker can break the DNSSEC chain of trust by tampering with the cached information of nameservers, such as their IP addresses and EDNS(0) capabilities, thereby leading to validation failures for all DNSSEC-signed domains hosted on those nameservers.¶
Specifically, this document illustrates three variants of this DoS attack as follows.¶
First, the attacker sends a DNS query with CD=1 to the target DNSSEC-validating resolver, requesting the DNSKEY/DS record of the victim domain. Next, the attacker returns a forged response, in which the RRSIG record of domain's DNSKEY/DS is either removed or manipulated. Due to the set of the CD bit, the resolver is not required to perform DNSSEC validation. Instead, it caches the received records. Later, when an ordinary client queries the resolver for the victim domain's A record with CD=0, the resolver reuses the unvalidated DNSKEY/DS record in the cache rather than issuing new queries. In the case of RRSIG removed, the validation fails because the RRSIG required to establish the DNSSEC trust chain is missing. In the case of RRSIG manipulated, the manipulated RRSIG causes the validation of the DNSKEY/DS record to fail. However, as the unvalidated DNSKEY/DS cache entries are not directly hit by the client query, the resolver fails to discard them even after retrying the failed routine queries. Instead, it continuously reuses them until their TTLs (Time-to-Live) expire, leading to persistent validation failure for the duration of the caching TTL.¶
To prevent the resolver from obtaining necessary records for DNSSEC validation, an attacker can also forge the IP addresses of the nameservers. Particularly, apart from tampering with unsigned glue records stored in the parent zone, the attack is also applicable even if both a domain and all the nameserver zones along its delegation chain are properly signed by DNSSEC. Specifically, the attacker first sends a DNS query with CD=1 to the resolver, requesting the A/AAAA record of the victim domain's nameserver. Next, the attacker returns a response containing forged A/AAAA records, which is cached by the resolver without strict validation. Next, when an ordinary client sends queries with CD=0 to the resolver for domains delegating to this nameserver, the resolver reuses the forged IP address of the nameserver to query. If the forged IP address is inactive, the resolver will either return SERVFAIL or timeout to respond. Similar to the variant in Section 3.1, the resolver does not retry querying for the nameserver IP address after the resolution failure, and the DoS duration lasts long to the caching TTL.¶
Different from the previous variants that exploit resource records, the attacker can also trick the resolver into caching the fact that the nameserver host of the victim domain is not EDNS(0) capable, so as to prevent the resolver's requesting for RRSIG records and break DNSSEC validation. Note that some resolver implementations (e.g., [BIND9]) do not follow Section 3 of [RFC3225] to handle the EDNS(0) capability information, i.e., only fall back to queries without EDNS(0) when receiving a response with error status codes (e.g., FORMERR, NOTIMP) from the target authoritative nameserver. Instead, they simply treat the nameserver as EDNS(0)-incapable when receiving a response with no EDNS(0) OPT records, which makes this attack variant possible.¶
Specifically, after triggering an arbitrary DNS response from the victim's nameserver to the target resolver, the attacker strips off the EDNS(0) OPT record in the additional section. When the resolver receives the response, it considers that the nameserver IP is not capable of EDNS(0), and caches this information for future queries. Then, when an ordinary client requests for DNSSEC-signed domains served by the same nameserver host and requires validation, the resolver removes OPT in the query to the host. Subsequently, the nameserver host determines that the resolver cannot handle DNSSEC records, hence does not respond with the RRSIGs of the queried records. Hence, the resolver fails DNSSEC validation due to missing RRSIGs. The DoS duration depends on how long the EDNS0 capability information is cached by the resolver, which is usually determined by the resolver's refresh interval to retry adding OPT in queries, e.g., 30 minutes for [BIND9].¶
This document recommends that the unvalidated resource records obtained for CD=1 queries SHOULD be cached following the rules for the BAD cache specified in Section 4.7 of [RFC4035] and Section 2.6 of [RFC9520]. Specifically, the caching of unvalidated records MUST be assigned with a TTL. This TTL SHOULD be small by default, which typically ranges from 3 to 30 seconds, and MUST NOT exceed 5 minutes.¶
To prevent DoS attacks triggered by intentional response forging, DNSSEC-validating resolvers SHOULD be conservative to save records into the BAD cache. Specifically, to distinguish between forged responses and domain-side misconfigurations, this document suggests that DNSSEC-validating resolvers SHOULD perform a heuristic number of retries to validate the records, e.g., 5 times of retry queries to each available authoritative nameserver as implemented by [Unbound]. Resolvers SHOULD only move the records to the BAD cache when all the validation attempts fail. Note that the number of retries MUST be upper-bounded, thereby avoiding excess queries in case of misconfigurations at the domain side.¶
Based on the type of the records that have not been validated, this document assigns different priorities for resolvers to validate them.¶
When receiving CD=0 queries, i.e., DNSSEC validation is demanded, DNSSEC-validating resolvers MUST use the validated version of all records along domain's DNSSEC chain of trust.¶
If any record along domain's DNSSEC chain of trust has existed in cache but has been proven invalid, resolvers MUST expire the record from the cache and attempt to obtain the potentially valid one following the retry policy described in Section 4.2.¶
As referral records (e.g., NS, CNAME) themselves are not necessarily required in constructing DNSSEC chain of trust, this document allows resolvers to validate these records with a lower priority. Specifically, when receiving CD=0 queries, resolvers MAY use the cached unvalidated referral records to obtain necessary records along domain's DNSSEC chain of trust, and MAY not validate the referral records if the chain of trust of the queried domain has already passed DNSSEC validation. Otherwise, resolvers MAY expire the cached unvalidated referral records and attempt to obtain the potentially valid ones following the retry policy described in Section 4.2.¶
[RFC4035] stated that DNSSEC-validating resolvers SHOULD answer from the BAD cache upon receiving CD=1 queries, and MUST return RCODE 2 (Server Failure) when CD is not set. However, current RFCs remain ambiguous about whether records in the BAD cache could also be involved in other resolution processes, especially when they are not directly hit by the clients' query. For example, when a client queries for domain's A record and demands DNSSEC validation, the DNSKEY record of the queried zone is involved in the resolution, while the DNSKEY could exist in the BAD cache.¶
To handles these cases, this document further specifies that the reuse scope of records in the BAD cache SHOULD be restricted to only answering CD=1 queries that directly hit the cache with a matched (QNAME, QCLASS, QTYPE) tuple. When records in the BAD cache are not hit directly, but are involved in other resolution processes where DNSSEC validation is demanded, the resolver SHOULD expire them from the cache and attempt to obtain the potentially valid ones following the retry policy described in Section 4.2.¶
If a nameserver hosts any DNSSEC-signed domains, it must be capable of EDNS(0) to serve DNSSEC records (e.g., RRSIG). Hence, when resolving a DNSSEC-signed domain, it is expected that the corresponding nameserver is always EDNS(0)-capable. Section 6.2.2 of [RFC6891] suggested that if DNSSEC is required, no fallback to non-EDNS(0) queries should be performed.¶
This document further strengthens this condition that DNSSEC-validating resolvers MUST always enable EDNS(0) for queries to the nameservers of DNSSEC-signed domains. Upon receiving responses with error status codes (e.g., NOTIMP, FORMERR), resolvers SHOULD follow the retry policy described in Section 4.2 and move the nameserver-related records (e.g., domain's NS records or nameserver's A/AAAA records) into the BAD cache if necessary, and cache them in accordance with Section 4.1.¶
There could be cases where some nameserver hosts fail to enable EDNS(0) temporarily due to accidental errors, and Section 6.2.2 of [RFC6891] stated that resolvers MAY cache that a nameserver host is EDNS(0)-incapable for a brief time. This document agrees with the caching of the EDNS(0)-incapable information, in order to avoid fallback delays and disrupting the resolution of unsigned domains hosted on the same nameserver.¶
Nevertheless, this document further recommends that DNSSEC-validating resolvers SHOULD constantly verify the EDNS(0) capability status of nameserver hosts to mitigate the risk of attack variant described in Section 3.3. Specifically, this document suggests that the verification interval SHOULD be aligned with the TTL of the BAD cache, i.e., typically range from 3 to 30 seconds, and MUST NOT be more than 5 minutes. After caching that a nameserver host is EDNS(0)-incapable for the specific interval, DNSSEC-validating resolvers should retry to enable EDNS(0) in subsequent queries and see if the nameserver host returns good responses.¶
The major security risk introduced in the above recommendations is the risk of traffic amplification-based DoS attacks, which could consume resources to overload the resolver and the queried authoritative nameserver. To flexibly balance the risk of traffic amplification and persistent resolution failure based on specific operating scenarios, this document suggests that resolver implementations SHOULD make the variables configurable by users, including the restricted caching TTL described in Section 4.1, number of validation retries before saving into the BAD cache described in Section 4.2, and the verification interval of EDNS(0) status described in Section 6.2.¶
Specifically, this document discusses the potential risks of its recommendations as follows.¶
Section 5.9 of [RFC6840] suggested that DNSSEC-validating stub or forwarding resolvers always set the CD bit on queries to their upstream recursive resolvers. Hence, the implementation recommended in Section 4.1 could inadvertently reduce the records' caching duration at the recursive upstreams, even if the records are actually legitimate. Given the shortened TTL, there could be potential risks where the recursive upstreams frequently query the authoritative nameservers to refresh their caches, leading to traffic amplification. However, since the downstream DNSSEC-validating stub or forwarding resolvers are able to cache the records with their original TTL once the DNSSEC validation passes, clients are expected to be served by the cached validated records without frequently re-querying the authoritative nameservers. Also, once the recursive upstreams receive a CD=0 query and perform DNSSEC validation, they can also cache and serve the validated version of the records with their original TTL if the validation passes.¶
As stated in Section 4.2, DNSSEC-validating resolvers are forced to implement an upper bound to the number of validation retries before saving records into the BAD cache. Such upper bound is expected to be controllable and prevent the resolver from triggering excess queries.¶
The verification interval of EDNS(0) status recommended in Section 6.2 of this document is aligned with the BAD cache and negative cache TTL suggested by [RFC9520], which is the state-of-the-art standard to balance the risk of DoS due to the cached failure status and frequent re-querying.¶
This document contains no actions for IANA.¶