Advanced URL Filtering classifies websites based on site content, features, and safety. A URL can
have up to four URL categories that indicate the
likelihood that the site will expose you to threats. As PAN-DB, the Advanced URL
Filtering URL database, categorizes sites, firewalls with Advanced URL Filtering enabled
can leverage that knowledge to enforce your organization’s security policies. In
addition to the protection offered by PAN-DB, Advanced URL Filtering provides real-time
analysis using machine learning (ML) to defend against new and unknown threats. This
provides protection against malicious URLs that are updated or introduced before URL
filtering databases have an opportunity to analyze and add the content, giving attackers
an open period from which they can launch precision attack campaigns. Advanced URL
Filtering compensates for the coverage gaps inherent in database solutions by providing
real-time URL analysis on a per request basis. The ML-based models used by advanced URL
filtering have been trained, and are continuously updated, to detect various malicious
URLs, phishing web pages, and command-and-control (C2).
Websites that indicate the presence of certain advanced threats
are additionally processed though a cloud-based inline deep learning
system, using detectors and analyzers that complement the ML-models
used by Advanced URL Filtering. Deep learning detectors can process
larger data sets and can better identify complex malicious patterns
and behaviors through multi-layered neural networks. When Advanced
URL Filtering receives HTTP response data from the firewall upon
receipt of a suspicious web request, the data is further analyzed
through the deep learning detectors and provides inline protection
against evasive zero-day web attacks. This includes cloaked websites,
in which web page contents are surreptitiously retrieved from unknown
websites—this can include malicious content that URL databases are unable
to account for, multi-step attacks, CAPTCHA challenges, and previously unseen
one-time-use URLs. Because evasive malicious websites are in a constant state
of flux, the detectors and analyzers used to categorize websites
are updated and deployed automatically as Palo Alto Networks threat
researchers improve the detection logic, all without requiring the
administrator to download update packages.
When a user requests a web page, the firewall queries user-added
exceptions and PAN-DB for the site’s risk category. PAN-DB uses
URL information from Unit 42, WildFire, passive DNS, Palo Alto Networks
telemetry data, data from the Cyber Threat Alliance, and applies
various analyzers to determine the category. If the URL displays
risky or malicious characteristics, the web payload data is also
submitted to Advanced URL Filtering in the cloud for real-time analysis
and generates additional analysis data. The resulting risk category
is then retrieved by the firewall and is used to enforce the web-access
rules based on your policy configuration. Additionally, the firewall
caches site categorization information for new entries to enable
fast retrieval for subsequent requests, while it removes URLs that
users have not accessed recently so that it accurately reflects
the traffic in your network. Additionally, checks built into PAN-DB
cloud queries ensure that the firewall receives the latest URL categorization
information. If you do not have Internet connectivity or an active
URL filtering license, no queries are made to PAN-DB.
The firewall
determines a website’s URL category by comparing it to entries in
1) custom URL categories, 2) external dynamic lists (EDLs), and
3) predefined URL categories, in order of precedence.
Firewalls configured to analyze URLs in
real-time using machine learning on the dataplane provides
an additional layer of security against phishing websites and JavaScript
exploits. The ML models used by local inline categorization identifies
currently unknown and future variants of URL-based threats that
match the characteristics that Palo Alto Networks has identified
as malicious. To keep up with the latest changes in the threat landscape,
local inline categorization ML models are added or updated via content
releases.
When the firewall checks PAN-DB for a URL, it also looks for
critical updates, such as URLs that previously qualified as benign
but are now malicious.
If you believe PAN-DB has incorrectly categorized a site, you can submit a change request in your browser
through Test A Site or directly from the firewall logs.
Did you know?
Technically, the
firewall caches URLs on both the management plane and the dataplane:
PAN-OS 9.0 and later releases do not download PAN-DB seed databases.
Instead, upon activation of the URL filtering license, the firewall populates
the cache as URL queries are made.
The management plane holds more URLs and communicates directly with PAN-DB. When the firewall
can't find a URL’s category in the cache and performs a lookup in PAN-DB, it
caches the retrieved category information in the management plane. The
management plane passes that information along to the dataplane, which also
caches it and uses it to enforce policy.
The dataplane holds fewer URLs and receives information from the management plane. After the
firewall checks URL category exception
lists (custom URL categories and external dynamic lists) for a
URL, it looks in the dataplane. If the firewall doesn't find the URL in the
dataplane, it checks the management plane and, if the category information
isn't there, PAN-DB.