How Advanced URL Filtering Works
Focus
Focus

How Advanced URL Filtering Works

Table of Contents
End-of-Life (EoL)

How Advanced URL Filtering Works

Learn how Palo Alto Networks URL Filtering classifies URLs and uses machine learning to protect against web threats and enable safe internet access.
Advanced URL Filtering classifies websites based on site content, features, and safety. A URL can have up to four URL categories, including risk categories (high, medium, and low) that indicate the likelihood that the site will expose you to threats. As PAN-DB, the Advanced URL Filtering URL database, categorizes sites, firewalls with Advanced URL Filtering enabled can leverage that knowledge to enforce your organization’s security policies. In addition to the protection offered by the PAN-DB database, Advanced URL Filtering provides real-time analysis using machine learning to defend against new and unknown threats. This provides protection against malicious URLs that are updated or introduced before URL filtering databases have an opportunity to analyze and add the content, giving attackers an open period from which they can launch precision attack campaigns. Advanced URL filtering compensates for the coverage gaps inherent in database solutions by providing real time URL analysis on a per request basis. The ML-based models used by advanced URL filtering have been trained, and are continuously updated, to detect various malicious URLs, phishing web pages, and C2.
When a user requests a web page, the firewall queries user-added exceptions and PAN-DB for the site’s risk category. PAN-DB uses URL information from Unit 42, WildFire, passive DNS, Palo Alto Networks telemetry data, data from the Cyber Threat Alliance, and applies various analyzers to determine the category. If the URL displays risky or malicious characteristics, it is also submitted to advanced URL filtering in the cloud for real-time analysis and generates additional analysis data. The resulting risk category is then retrieved by the firewall and is used to enforce the web-access rules based on your policy configuration. Additionally, the firewall caches site categorization information for new entries to enable fast retrieval for subsequent requests, while it removes URLs that users have not accessed recently so that it accurately reflects the traffic in your network. Additionally, checks built into PAN-DB cloud queries ensure that the firewall receives the latest URL categorization information. If you do not have Internet connectivity or an active URL filtering license, no queries are made to PAN-DB.
The firewall determines a website’s URL category by comparing it to entries in 1) custom URL categories, 2) external dynamic lists (EDLs), and 3) predefined URL categories, in order of precedence.
Firewalls configured to analyze URLs in real-time using machine learning on the dataplane provides an additional layer of security against phishing websites and JavaScript exploits. The inline ML models used to identify these URL-based threats extend to currently unknown as well as future variants of threats that match characteristics that Palo Alto Networks has identified as malicious. To keep up with the latest changes in the threat landscape, inline ML models are added or updated via content releases.
When the firewall checks PAN-DB for a URL, it also looks for critical updates, such as URLs that previously qualified as benign but are now malicious.
If you believe PAN-DB has incorrectly categorized a site, you can submit a URL category change request in your browser through Test A Site or directly from the firewall logs.
Did you know?
Technically, the firewall caches URLs on both the management plane and the dataplane:
  • PAN-OS 9.0 and later releases do not download PAN-DB seed databases. Instead, upon activation of the URL filtering license, the firewall populates the cache as URL queries are made.
  • The management plane holds more URLs and communicates directly with PAN-DB. When the firewall cannot find a URL’s category in the cache and performs a lookup in PAN-DB, it caches the retrieved category information in the management plane. The management plane passes that information along to the dataplane, which also caches it and uses it to enforce policy.
  • The dataplane holds fewer URLs and receives information from the management plane. After the firewall checks URL category exception lists (custom URL categories and external dynamic lists) for a URL, the next place it looks is the dataplane. Only if the firewall cannot find the URL categorized in the dataplane does it check the management plane and, if the category information is not there, PAN-DB.