Regular Expression (Regex) in Legacy Search
Regular expression (regex) is a mechanism for describing a specific pattern instead of a static value. Legacy Search users on the Pro plan or above can incorporate regex into their Censys Search Language (CSL) queries. While wildcards can be used as a substitute for any number of characters with no specificity, a regex is a more specific wildcard.
Note
Censys regex searches are case-insensitive except when the exact match operator
=
is used. For example,services.software.vendor:/De[l]+/
returns results where the word is either capitalized or lowercase, while services.software.vendor=/De[l]+/
only returns results for the capitalized word.
Regex use-case
A regular expression provides great flexibility in defining search criteria to return relevant matches from large data sets.
As a simple example, take a wildcard search such as: services.http.response.body: *.js*
This query asks "Which unnamed hosts with an HTTP service contain a reference to any string containing .js?" While simple, it is noisy. It returns tens of millions of results.
A regular expression provides criteria for what the value must look like without limiting it to a single, static string.
For example, services.http.response.headers.location=/._(../)+._(.asp|.php|.js|.cgi).\*/
asks, "Which hosts have an HTTP location header that includes the sequence ../
( which is vulnerable to directory traversal attacks), followed by one of the more common executable page types like .js, .php, or .asp?"
Backslashes in regex
A backslash is used to escape a character that is otherwise interpreted as an operator. For example, because periods (.
) separate pieces of the pattern defined by a regular expression. You must put a backslash before a period to actually look for that character.
Best practices
Regular expressions are extremely powerful but computationally slow. While experience helps in knowing when to use a regex, a basic rule is to use one when a simple pattern match isn't sufficient.
Regex against fields with robust string values
While regular expressions are valid for most fields, the best use case is for fields with long string values.
Popular host fields to write regular expressions for include:
http.response.body
services.banner
Note
To search the full HTML markup in HTTP response bodies, use the exact match (
=
) operator. Make sure to add generic regex wildcards (.*
) before and after the expression to account for "everything else" in the body. Details here.
Popular certificate fields to write regular expressions for include:
parsed.subject_dn
parsed.names
Regex examples in CSL
These are example queries using regular expressions that show the power of regex.
Regular Expressions | Description | Link |
---|---|---|
services.http.response.headers.x_forwarded_for: /.*,.*/ | HTTP responses originating from behind a proxy. | See results |
names: /.*\..*\.censys\.io/ | Certificates that contain an eTLD+4-formatted subdomain of censys.io . | See results |
Syntax reference
Use this official regex syntax reference to see how to construct regular expressions.
Updated 20 days ago