Regex in CenQL

Regular expression (regex) is a mechanism for describing a specific pattern. Use regex in Censys Query Language (CenQL) queries in the Censys Platform to match patterns in field values, instead of an exact value.

Regex is particularly useful in the following cases:

  • Investigating Internet assets that may be impersonating another company or organization.
  • Identifying malicious programs with indicators that match a pattern but not a specific value.

In the Platform, queries that incorporate regex are "Advanced Queries" and cost 8 credits each to run. Advanced Queries are only available to Platform Starter and Enterprise users.

This article explains how to use regex in the Platform and provides some example queries.

In CenQL, use the =~ operator to search for regex matches in Censys data. The =~ operator is case-sensitive.

Anchors

Regex in CenQL queries is not anchored. A regex string will match target fields if any part of the field value matches an input regex.

Use the ^ and $ anchors to define a specific beginning and end for your string. In CenQL, these characters may only be used as the first and last characters of a regex. Reference the table below for detailed examples.

Regex query

Hits (returned by query)

Misses (not returned by query)

web.hostname=~`\w{3}\.censys\.\w{3}`
docs.censys.com
mail.censys.com.mx
martini.censys.cloud
www.censys.com
www.censys.biz
app.censys.com
go2.censys.com
random.censys.xyz
community.censys.com

app.censys.io

web.hostname=~`^\w{3}\.censys\.\w{3}$`

www.censys.com
www.censys.biz
app.censys.com
go2.censys.com
docs.censys.com
mail.censys.com.mx
martini.censys.cloud
app.censys.io
random.censys.xyz
community.censys.com

Backticks

Regex in CenQL can be input as a raw string wrapped in backticks ( ` ) or in double quotes ( " ). If you use double quotes, you must double escape special regex characters.

Operators and assertions

Regular expressions in CenQL may use the following operators and assertions.

Operator or assertion

Use

\

Use to escape the characters ., +, (), {}, [], ", *, ?, :, \, /, ^, or $.

.

Matches any character.

+

Repeat the preceding character one or more times.

*

Repeat the preceding character zero or more times.

()

Constitutes a group.

Useful for targeting specific top-level domains (TLDs) or file extensions, as in:

(org|com|net|biz|xyz)

or

(exe|py|msi|jar)

|

An "or" operator. Matches successfully if any of the patterns on either side of the operator are present. See TLD example for () above.

[]

Matches any one of the characters contained within brackets. Use - to indicate a range. For example, [a-c] will match one lowercase alphabetic character a, b, or c and is case-sensitive. Use ^ within these brackets to negate a character or characters.

For example, c[^e]nsys will match cansys and c0nsys. censys will not match.

{}

Defines the minimum and maximum number of times the preceding character can repeat.

For example, c[^e]{3}nsys will match caaansys. ceeensys will not match.

Use , to further specify ranges contained in brackets.

e{2,4} will match from 2 to 4 e characters.

^

An assertion indicating the beginning of a regex input.

$

An assertion indicating the end of a regex input.

Character classes

Regular expressions in CenQL may use the following character classes.

Character classUse
\wMatches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_].
\WMatches any character that is not a word character from the basic Latin alphabet. Equivalent to [^A-Za-z0-9_].
\dMatches any numeric digit. Equivalent to [0-9].
\DMatches any character that is not a digit. Equivalent to [^0-9].
\sMatches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. Equivalent to [\t-\n\r ].
\SMatches a single character other than white space. Equivalent to [^\t-\n\r ].

Example queries

👍

Tip

Use Collections to monitor changes to regex query results over time and webhooks to receive alerts about them.

Query description and link

Query syntax

Certificates that contain an eTLD+3 or greater subdomain of example.com

cert.names=~`.*\..*\.example\.com$`

HTTP responses originating from behind a proxy

host.services.endpoints.http.headers:(key="X-Forwarded-For" and value=~".*,.*") or web.endpoints.http.headers:(key="X-Forwarded-For" and value=~".*,.*")

Possible Google impersonations

host.services:(cert.names=~".*\\.google\\..*" and not cert.parsed.subject.organization:"Google" and not cert.parsed.subject.organization:"Alphabet" and labels.value: {VPN, WAF, DEFAULT_LANDING_PAGE, LOGIN_PAGE})

Web endpoints with a certificate issuer DN that matches a pattern associated with Viper C2

web.cert.parsed.issuer_dn=~`^C=\w{2},\s+ST=[a-z0-9]{8},\s+L=[a-z0-9]{8},\s+O=[a-z0-9]{8},\sOU=[a-z0-9]{8},\sCN=[a-z0-9]{8}$`

Web properties with an Okta login page

web.hostname=~`.*\.okta.com` and web.labels.value="LOGIN_PAGE" and web.endpoints.http.status_code="200" 

Swagger API docs on bare IPs on ports other than 80 and 443

web.endpoints: (path: {"/swagger/index.html"} and http.status_code: 200) and web.hostname=~ `^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$` and not web.port= {443, 80}