Advanced URL Filtering
Guidelines for URL Category Exceptions
Table of Contents
Guidelines for URL Category Exceptions
Use these guidelines to define exceptions to category-based filtering of web traffic.
Where can I use this? | What do I need? |
---|---|
|
This feature has no prerequisites.
|
The following guidelines describe how to populate
URL category exception lists—custom URL categories or external dynamic
lists of URLs. We provide examples of how to use wildcards and specific
entries.
Basic Guidelines For URL Category Exception Lists
Consider the potential matches an entry might have before
adding it to a URL category exception list. The following guidelines
specify how to create an entry that blocks or allows the websites
and pages you intend.
By default, the firewall automatically appends a trailing
slash (/) to domain entries that do not end in a trailing
slash (/) or asterisk (*). The addition of the trailing slash
changes the URLs that the firewall considers a match and for which
it enforces policy. In non-wildcard domain entries, the trailing slash
limits matches to the given domain and its subdirectories. For example, example.com (example.com/ after processing)
matches itself and example.com/search.
In
wildcard domain entries (entries with asterisks or carets), the
trailing slash limits matches to URLs that conform to the specified
pattern. For example, to match the entry *.example.com,
a URL must include at least one subdomain and end with the root
domain, example.com. The pattern is: <subdomain>.example.com; news.example.com is
a match, but example.com is not because it
lacks a subdomain.
We recommend manually adding trailing slashes
to clarify the intended matching behavior of an entry for anyone
who inspects it. The trailing slash is invisible when added by the
firewall.
Panorama™ management servers running PAN-OS® 10.2
can only enable this feature for firewalls on the same software
version. To enable this feature for firewalls running PAN-OS 10.1
or earlier, use the following CLI commands on each firewall:
admin@PA-850> debug device-server append-end-token on
admin@PA-850> configure
admin@PA-850# commit
To
disable this feature, select DeviceSetupContent-IDURL Filtering. Then, deselect Append
Ending Token. You may, however, block or allow access
to more URLs than anticipated if you disable this feature. The firewall
adds an implicit asterisk to the end of domain entries that
do not end in a / or *. For example, if you add example.com to
a URL list of allowed websites, the firewall interprets that entry
as example.com.*. As a result, the firewall
allows access to sites such as example.com.domain.xyz. URL Category Exceptions (PAN-OS
10.1 and earlier) describes the firewall’s behavior when you disable
this feature.
- List entries are case-insensitive.
- Omit http and https from URL entries.
- Each URL entry can be up to 255 characters in length.
- Enter an exact match to the IP address or URL you want to block or allow or use wildcards to create a pattern match.Different entries result in different exact matches. If you enter the URL for a specific web page (example.com/contact), the firewall limits matches to that page alone. Exact matching for domains restricts matches to the domain itself and its subdirectories.
- Consider adding the URLs most commonly used to access a website or page to your exception list (for example, blog.paloaltonetworks.com and paloaltonetworks.com/blog) if the original entry is accessible from more than URL.
- The entry example.com is distinct from www.example.com. The domain name is the same, but the second entry contains the www subdomain.
Palo Alto Networks does not support regular expression
use in custom URL category or external dynamic list entries. You
must know the specific URLs or construct the URL patterns you want
to match using wildcards and the following characters: . / ? & = ; +.
Wildcard Guidelines for URL Category Exception Lists
You can use asterisks (*) and carets (^) in URL category
exception lists to configure a single entry to match multiple subdomains,
domains, top-level domains (TLDs), or pages without specifying exact
URLs.
How to Use Asterisk (*) and Caret (^) Wildcards
The following
characters are token separators: . / ? & = ; +.
Every string separated by one or two of these characters is a token. Use
wildcard characters as token placeholders to indicate that a specific
token can contain any value. In the entry docs.paloaltonetworks.com, the
tokens are “docs”, “paloaltonetworks”, and “com”.
The following
table describes how asterisks and carets work and provides examples.
* | ^ |
---|---|
Indicates one or more variable subdomains,
domains, TLDs, or subdirectories. Can use asterisk after trailing
slash, for example, example.com/*. Ex: *.domain.com matches docs.domain.com and abc.xyz.domain.com. | Indicates one variable subdomain, root domain,
or TLD. Cannot use caret after trailing slash. The following
entry is invalid: example.com/^. Ex: ^.domain.com matches docs.domain.com and blog.domain.com. |
Key Point: Asterisks
match a greater range of URLs than carets. An asterisk corresponds
to any number of consecutive tokens, while a caret corresponds to
exactly one token. An entry like xyz.*.com matches
a greater number of sites than xyz.^.^.com; xyz.*.com matches
sites with any number of tokens between the strings, and xyz.^.^.com matches sites
with exactly two tokens. |
- A wildcard must be the only character within a token. For example, example*.com is an invalid entry because example and * are in the same token. An entry can contain wildcards in more than one token, however.
- You can use asterisks and carets in the same entry (for example, *.example.^).
Do not create an entry with consecutive
asterisks (*) or more than nine consecutive carets (^)—entries like
these can affect firewall performance.
For example, do
not add an entry like mail.*.*.com. Instead,
depending on the range of websites you want to control access to,
enter mail.*.com or mail.^.^.com.
URL Category Exception List—Examples
The following table displays example URL list entries,
matching sites, and explanations for the matching behavior when
the firewall automatically appends trailing slashes.
The entries in this table do not contain a trailing slash
to reflect that the firewall appends one to applicable entries in
the background. Additionally, exception lists may contain entries
added before the trailing slash guidance. URL Category Exceptions—Examples (PAN-OS
10.1) shows matching behavior when the firewall does not append
trailing slashes by default.
We recommend manually adding
trailing slashes to clarify the intended matching behavior of an
entry for anyone who inspects it. The trailing slash is invisible
if added by the firewall.
URL Exception List Entry | Matching Sites | Explanation |
---|---|---|
Example Set 1 | ||
paloaltonetworks.com | paloaltonetworks.com paloaltonetworks.com/network-security/security-subscriptions | The firewall appends a trailing slash to
the entry, limiting matches to the exact domain and its subdirectories. |
paloaltonetworks.com/example | paloaltonetworks.com/example | The firewall does not append a trailing
slash to this entry because the subdirectory example follows
the domain. When you enter the URL for a specific web page, the
firewall applies the exception action to the specified web page. |
Example Set 2—Asterisks | ||
*.example.com | www.example.com docs.example.com support.tools.example.com | The asterisk expands matches to all example.com subdomains. The
firewall appends a trailing slash to entry, excluding matches to
the right of example.com, the root domain. |
mail.example.* This entry yields
the same matches with or without the trailing slash feature enabled. | mail.example.com mail.example.co.uk mail.example.com/#inbox | The asterisk expands matches to any URL
following the mail.example.<TLD> pattern. |
example.*.com | example.yoursite.com example.es.domain.com example.abc.xyz.com | The asterisk expands matches to URLs where
the left-most subdomain is example and the
top-level domain is com. The trailing slash
excludes matches to the right of the TLD. |
example.com/* | example.com/photos example.com/blog/latest any
example.com subdirectory | The domain is followed by a / and
an asterisk, which indicates that a subdirectory must be present. The
asterisk serves as a token placeholder for any example.com subdirectory. The
firewall does not append a trailing slash because the entry ends
in an asterisk. |
Example Set 3—Carets | ||
google.^ Patterns such as example.co.^ are
typically used to match country-specific domains such as example.co.jp.
However, generic top-level domains (gTLDs) result in patterns such as
example.co.^ matching example.co.info or example.co.amzn, which
may not belong to the same organization. | google.com google.info google.com/search?q=paloaltonetworks | The caret expands matches to URLs beginning with google and
ending in a single TLD. The trailing slash excludes matches to the
right of the last token. |
^.google.com | www.google.com news.google.com | The caret expands matches to single-level subdomains
of google.com. The firewall appends a trailing
slash to the entry, excluding matches to the right of the root domain. |
^.^.google.com | www.maps.google.com support.tools.google.com | The two carets expand matches to URLs that
include two consecutive subdomains before google.com.
The firewall adds a trailing slash to the entry, excluding matches
to the right of the root domain. |
google.^.com | google.example.com google.company.com | The caret expands matches to URLs where google is
the left-most subdomain, followed by one token and .com. The
firewall adds a trailing slash to the entry, excluding matches to
the right of the TLD. |