Internet-Draft | automation-preferences-ext | April 2025 |
Peiyuan | Expires 10 October 2025 | [Page] |
This document specifies extensions to the automation-preferences.txt protocol, providing advanced controls for server-side automation permissions. It builds upon the core specification by adding sophisticated features such as rate limiting, automation technology restrictions, API permissions, session requirements, and HTML asset annotations. These extensions enable content providers to exercise more granular control over automated interactions while maintaining backward compatibility with implementations of the core protocol.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://datatracker.ietf.org/doc/draft-liao-aipref-autoctl-ext/. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-liao-aipref-autoctl-ext/.¶
Discussion of this document takes place on the AI Preferences Working Group mailing list (mailto:[email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/ai-control/. Subscribe at https://www.ietf.org/mailman/listinfo/ai-control/.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 10 October 2025.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This document extends the automation-preferences.txt protocol defined in "Protocol for Basic Automation Control" [CORE-SPEC] by introducing advanced directives and capabilities for more sophisticated control over automated interactions. These extensions address complex automation scenarios while maintaining backward compatibility with implementations of the core specification.¶
The extensions defined in this document enable content providers to exercise more granular control over automated access, including rate limiting, specific technology restrictions, API usage policies, session validation requirements, and asset-level annotation methods. These capabilities are designed to complement the basic controls provided by the core specification, offering a progressive path to more comprehensive automation management.¶
This document builds upon the core specification without modifying its requirements. All directives and mechanisms defined in the core specification remain valid and are not redefined here. This document assumes familiarity with the core specification and uses its terminology and concepts throughout.¶
The extensions defined in this document are OPTIONAL for both servers and clients. Implementations that support only the core specification are considered compliant with the automation-preferences.txt protocol, though they will not benefit from the advanced controls defined here.¶
When both core and extended directives are present in an automation-preferences.txt file, parsers that do not support the extensions defined in this document MUST ignore the unrecognized directives, as specified in the core specification's extension mechanism.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document uses the terminology defined in the automation-preferences.txt protocol [CORE-SPEC]. The following additional terms are introduced in this document:¶
Rate limiting: Constraints on the frequency or concurrency of automated requests to prevent excessive server load.¶
Automation technology: Specific tools or frameworks used for automation, such as headless browsers or browser automation protocols.¶
XHR/Fetch: XMLHttpRequest or Fetch API calls performed programmatically.¶
Session validation: Mechanisms to verify that automated requests are part of a legitimate user session.¶
Asset annotation: Metadata embedded within HTML documents to specify automation policies for individual content elements.¶
This section defines additional directives that extend the automation-preferences.txt protocol. These directives may be used alongside the core directives in any group within the automation-preferences.txt file.¶
Rate limiting directives specify constraints on the frequency and concurrency of automated requests to prevent excessive server load. The following directives are defined:¶
RequestLimit
: Specifies the maximum number of requests allowed within a
time period, expressed as a count followed by a time unit (e.g., "60/minute").
Supported time units are "second", "minute", "hour", and "day".¶
ConcurrentLimit
: Specifies the maximum number of concurrent connections
allowed from a single client.¶
Example:¶
RequestLimit: 60/minute ConcurrentLimit: 5
Rate limiting directives apply to all requests within the scope of the group, regardless of HTTP method. If no rate limiting directives are specified, clients SHOULD NOT assume any specific rate limits, but SHOULD implement reasonable self-throttling to avoid overloading the server.¶
Automation technology directives specify whether specific automation tools or frameworks are permitted. The following directives are defined:¶
AllowCDP
: Boolean value indicating whether the use of Chrome DevTools
Protocol (CDP) is permitted.¶
AllowHeadless
: Boolean value indicating whether the use of headless
browsers is permitted.¶
AllowSelenium
: Boolean value indicating whether the use of Selenium
WebDriver is permitted.¶
AllowPuppeteer
: Boolean value indicating whether the use of Puppeteer
is permitted.¶
AllowPlaywright
: Boolean value indicating whether the use of Playwright
is permitted.¶
Example:¶
AllowCDP: false AllowHeadless: false AllowSelenium: false AllowPuppeteer: false AllowPlaywright: false
If an automation technology directive is not specified, clients SHOULD NOT assume that the use of that technology is permitted. Implementations SHOULD respect these directives when applicable, even if the specific detection methods may vary.¶
API and XHR permission directives specify rules for API usage and automated use of XMLHttpRequest, Fetch, or AJAX. The following directives are defined:¶
APIAutomation
: Indicates how API endpoints may be accessed by automated
clients. Valid values are:¶
AllowXHR
: Indicates how XMLHttpRequest or Fetch API may be used by
automated clients. Valid values are:¶
DisallowFetchFrom
: Comma-separated list of URL patterns from which
automated XHR/Fetch requests are prohibited. Wildcards MAY be used.¶
Example:¶
APIAutomation: with-key-only AllowXHR: read-only DisallowFetchFrom: /account/*, /checkout/*, /admin/*
If API and XHR permission directives are not specified, clients SHOULD assume the most restrictive value (i.e., "none" for APIAutomation and AllowXHR).¶
Session requirement directives specify whether automated requests must be part of a legitimate user session. The following directives are defined:¶
RequireHumanInitiatedSession
: Boolean value indicating whether automated
requests must be part of a session that was initiated by a human user.¶
SessionValidation
: Specifies the method used to validate sessions. Valid
values are:¶
SessionTTL
: Specifies the maximum time-to-live for a session, expressed
as a duration (e.g., "30m", "2h", "1d").¶
RequireUserAgent
: Boolean value indicating whether automated requests
must include a valid User-Agent header.¶
Example:¶
RequireHumanInitiatedSession: true SessionValidation: cookie-based SessionTTL: 1h RequireUserAgent: true
If session requirement directives are not specified, clients SHOULD NOT assume any specific session requirements, but SHOULD include a valid User-Agent header in all requests.¶
In addition to a site-level automation-preferences.txt file, automation preferences MAY be embedded directly within HTML documents to annotate individual assets. This mechanism enables content creators to specify fine-grained automation policies for particular content items.¶
Authors SHOULD use structured data markup using JSON-LD embedded in a <script>
element. The JSON object SHOULD use a defined type (e.g., "AutomationPolicyAnnotation")
and include relevant fields that mirror those used in automation-preferences.txt.¶
Note that unlike site-wide directives, asset-level annotations SHOULD NOT include HTTP method restrictions, request limits, or concurrency limits, as these concepts apply to endpoints and services rather than to individual content assets.¶
Example:¶
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "AutomationPolicyAnnotation", "automationPolicy": "limited", "allowCDP": false, "allowHeadless": false, "automationPurpose": { "require": true, "allowed": [[PLACEHOLDER_PURPOSE1], [PLACEHOLDER_PURPOSE2]], "disallowed": [[PLACEHOLDER_PURPOSE3]] }, "contactEmail": "[email protected]" } </script>
When both a automation-preferences.txt file and HTML asset annotations are present, the more specific rule (typically the HTML annotation) SHALL be applied to the corresponding content asset. Clients supporting HTML asset annotations SHOULD parse and respect these annotations when present.¶
The annotation schema MAY include any directives defined in the core or extension specifications. Fields in the annotation SHOULD use camelCase naming to align with JSON-LD conventions, while maintaining semantic equivalence to the corresponding directives in the automation-preferences.txt file.¶
The extensions defined in this document maintain backward compatibility with implementations of the core specification. This compatibility is achieved through the following mechanisms:¶
All directives defined in this document are OPTIONAL. Implementations that support only the core specification can safely ignore these directives, as specified in the core specification's extension mechanism.¶
The extensions do not modify or override the behavior of any directives defined in the core specification.¶
Extended directives enhance but do not replace core functionality.¶
Implementations supporting these extensions SHOULD degrade gracefully when interacting with servers or clients that support only the core specification:¶
Servers supporting extensions SHOULD still process all core directives correctly, even if extended directives are also present.¶
Clients supporting extensions SHOULD still honor all core directives, even if they do not recognize extended directives in a file.¶
When HTML asset annotations are not supported by a client, the client SHOULD fall back to the site-level automation-preferences.txt file for guidance.¶
This approach ensures that the introduction of extensions does not break existing implementations while providing a path for enhanced functionality.¶
Servers implementing the extensions defined in this document SHOULD:¶
Employ detection mechanisms (e.g., CDP fingerprinting, headless browser detection) to identify automated clients using specific technologies.¶
Implement rate limiting according to the specified directives.¶
Validate sessions as required by the session requirement directives.¶
Process HTML asset annotations when interpreting automation policies for specific content.¶
Respond with appropriate HTTP status codes for non-compliant requests, such as:¶
Clients supporting these extensions SHOULD:¶
Honor rate limiting directives by self-throttling requests.¶
Respect automation technology restrictions by avoiding prohibited tools.¶
Adhere to API and XHR permissions as specified.¶
Establish and maintain valid sessions when required.¶
Parse and respect HTML asset annotations when present.¶
Both servers and clients MAY implement additional detection and enforcement mechanisms beyond those explicitly described in this document, as long as they maintain compatibility with the specified directives.¶
In addition to the security considerations mentioned in the core specification, the extensions defined in this document introduce the following considerations:¶
Rate Limiting: Implementations of rate limiting SHOULD use secure methods to track request counts and prevent circumvention through IP spoofing or other means.¶
Technology Detection: Methods used to detect specific automation technologies MAY be circumvented by sophisticated clients. Servers SHOULD employ multiple detection approaches and adapt to evolving evasion techniques.¶
Session Validation: Session validation mechanisms SHOULD be resistant to replay attacks and session hijacking attempts.¶
HTML Asset Annotations: Parsing of JSON-LD annotations MUST be performed securely to prevent injection attacks or denial-of-service through malformed input.¶
The extensions provide more granular control over automated access, which can enhance security, but they also introduce complexity that may lead to misconfiguration. Implementers SHOULD carefully test and validate their configurations to ensure they provide the intended protections.¶
This document has no IANA actions.¶
Future enhancements to the automation-preferences.txt protocol MAY include:¶
Soliciting further feedback from browser vendors, content owners, AI model and automation tool developers.¶
Developing reference implementations and comprehensive detection libraries.¶
Formalizing the protocol in collaboration with the IETF and W3C.¶
Expanding interoperability with related protocols for consistent content preference signaling across the web.¶
Standardizing the HTML asset annotation schema through formal registration with schema.org or similar organizations.¶
The following is an example of a automation-preferences.txt file that includes both core and extended directives:¶
# Automation preferences for example.com # Version: 2.0 # Last updated: 2025-04-08 # Group 1: Applies to the entire site Host: example.com Scope: / AutomationPolicy: limited AllowedMethods: GET, HEAD DisallowedMethods: POST, PUT, DELETE, PATCH RequireAutomationPurpose: true AllowedPurposes: [PLACEHOLDER_PURPOSE1], [PLACEHOLDER_PURPOSE2] DisallowedPurposes: [PLACEHOLDER_PURPOSE3] ContactEmail: [email protected] # Extended directives RequestLimit: 60/minute ConcurrentLimit: 5 AllowCDP: false AllowHeadless: false AllowSelenium: false AllowPuppeteer: false AllowPlaywright: false APIAutomation: with-key-only RequireUserAgent: true AllowXHR: read-only DisallowFetchFrom: /account/*, /checkout/*, /admin/* RequireHumanInitiatedSession: true SessionValidation: cookie-based SessionTTL: 1h # Group 2: Specific preferences for the /admin/ path Host: example.com Scope: /admin/ AutomationPolicy: strict AllowedMethods: GET DisallowedMethods: POST, PUT, DELETE, PATCH AllowedPurposes: [PLACEHOLDER_PURPOSE1] DisallowedPurposes: [PLACEHOLDER_PURPOSE2], [PLACEHOLDER_PURPOSE3] # Extended directives for admin path RequestLimit: 10/minute ConcurrentLimit: 2 RequireHumanInitiatedSession: true SessionValidation: token-based SessionTTL: 30m