Learn how Palo Alto Networks URL Filtering classifies
URLs and uses machine learning to protect against web threats and
enable safe internet access.
Advanced URL Filtering classifies websites based on
site content, features, and safety. A URL can have up to four URL
categories, including risk categories (high, medium,
and low) that indicate the likelihood that the site will expose
you to threats. As PAN-DB, the Advanced URL Filtering URL database,
categorizes sites, firewalls with Advanced URL Filtering enabled
can leverage that knowledge to enforce your organization’s security
policies. In addition to the protection offered by the PAN-DB database,
Advanced URL Filtering provides real-time analysis using machine learning
to defend against new and unknown threats. This provides protection against
malicious URLs that are updated or introduced before URL filtering
databases have an opportunity to analyze and add the content, giving
attackers an open period from which they can launch precision attack
campaigns. Advanced URL filtering compensates for the coverage gaps
inherent in database solutions by providing real time URL analysis
on a per request basis. The ML-based models used by advanced URL
filtering have been trained, and are continuously updated, to detect
various malicious URLs, phishing web pages, and C2.
When a user requests a web page, the firewall queries user-added
exceptions and PAN-DB for the site’s risk category. PAN-DB uses
URL information from Unit 42, WildFire, passive DNS, Palo Alto Networks
telemetry data, data from the Cyber Threat Alliance, and applies
various analyzers to determine the category. If the URL displays
risky or malicious characteristics, it is also submitted to advanced
URL filtering in the cloud for real-time analysis and generates
additional analysis data. The resulting risk category is then retrieved
by the firewall and is used to enforce the web-access rules based
on your policy configuration. Additionally, the firewall caches
site categorization information for new entries to enable fast retrieval
for subsequent requests, while it removes URLs that users have not
accessed recently so that it accurately reflects the traffic in
your network. Additionally, checks built into PAN-DB cloud queries
ensure that the firewall receives the latest URL categorization information.
If you do not have Internet connectivity or an active URL filtering license,
no queries are made to PAN-DB.
The firewall
determines a website’s URL category by comparing it to entries in
1) custom URL categories, 2) external dynamic lists (EDLs), and
3) predefined URL categories, in order of precedence.
Firewalls configured to analyze URLs in
real-time using machine learning on the dataplane provides
an additional layer of security against phishing websites and JavaScript
exploits. The inline ML models used to identify these URL-based
threats extend to currently unknown as well as future variants of
threats that match characteristics that Palo Alto Networks has identified
as malicious. To keep up with the latest changes in the threat landscape,
inline ML models are added or updated via content releases.
When the firewall checks PAN-DB for a URL, it also looks for
critical updates, such as URLs that previously qualified as benign
but are now malicious.
Technically, the
firewall caches URLs on both the management plane and the dataplane:
PAN-OS 9.0 and later releases do not download PAN-DB seed
databases. Instead, upon activation of the URL filtering license,
the firewall populates the cache as URL queries are made.
The management plane holds more URLs and communicates directly
with PAN-DB. When the firewall cannot find a URL’s category in the
cache and performs a lookup in PAN-DB, it caches the retrieved category
information in the management plane. The management plane passes
that information along to the dataplane, which also caches it and
uses it to enforce policy.
The dataplane holds fewer URLs and receives information from
the management plane. After the firewall checks URL category exception lists (custom
URL categories and external dynamic lists) for a URL, the next place
it looks is the dataplane. Only if the firewall cannot find the
URL categorized in the dataplane does it check the management plane
and, if the category information is not there, PAN-DB.