Inventory Aggregation API

Aggregations provide detailed counts of data points that are deeply nested in structured data models, such as Censys Attack Surface Management representations of Internet-facing hosts and web entities.

Use aggregations to discover patterns, gain insight, and better understand the makeup of an external attack surface.

Obtain Aggregations

Collect counts of values using the Aggregate endpoint in the Inventory API.

This endpoint returns a single page result that contains a report about the frequency of values present in an inventory for a specified field across all assets matching a search query.

ASM API URL

https://app.censys.io/api/

Method and Path

POST /inventory/v1/aggregate

Request Body

JSON-formatted object containing an aggregate specification.

Aggregation Types

Several aggregation types are supported:

  • Cardinality Aggregation: A count of the unique values for a field.
  • Filter Aggregation: A count filtered by a provided query.
  • Nested Aggregation: A count of all the documents nested in a repeated field.
  • Reverse Nested Aggregation: A count of parents of a nested field.
  • Rare Term Aggregation: A breakdown of least frequent values for a field.
  • Term Aggregation: A breakdown of most frequent values for a field.

These types can be used recursively to produce counts within counts.

Term

A term aggregation returns a count for each of the highest frequency values present in the inventory for a provided field (for example, term) across all assets matching a search.

📘

Note

This aggregation is the most familiar to current users. It is the type available in Censys Search on the Report page.

In the body of the request, as part of the term object, supply the dot-delimited key of the field to be aggregated, as well as the maximum number_of_buckets (that is, values) to provide counts for. The maximum allowed is 1000.

Example term aggregate request

Return the top 10 unique values present in the workspace’s inventory for the cloud field with a count of hosts reporting that value from most to least.

{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "type=HOST and host.cloud:*",
    "aggregation": {
        "term": {
            "field": "host.cloud",
            "number_of_buckets": 10
        }
    }
}

Example term success response

The aggregate was executed successfully, and 1,169 entities matched the query. The key of each bucket is a value for the host.cloud field, and the count is the number of hosts with that value.

{
    "queryDurationMillis": 167,
    "totalCount": 1,169, // the number of entities matching the query
    "result": {
        "term": {
            "buckets": [
                {
                    "key": "CloudFlare Inc", // the most common value for the field
                    "count": 948, // the number of entities with this value
                    "subResult": null
                },
                {
                    "key": "Amazon AWS",
                    "count": 77,
                    "subResult": null
                },
                {
                    "key": "Microsoft Corporation",
                    "count": 50,
                    "subResult": null
                },
                {
                    "key": "Akamai Technologies, Inc.",
                    "count": 39,
                    "subResult": null
                },
                {
                    "key": "Confluence Networks Inc",
                    "count": 17,
                    "subResult": null
                },
                {
                    "key": "GoDaddy Operating Company, LLC.",
                    "count": 19,
                    "subResult": null
                },
                {
                    "key": "Microsoft Azure",
                    "count": 18,
                    "subResult": null
                }
            ],
            "otherCount": 0,
            "errorUpperBound": 0
        }
    }
}
🚧

Warning

Nested fields (such as host.services in the asset schemas won’t work with simple term aggregations because these fields contain an array of objects. Use the nested aggregation instead.

Nested

A nested aggregation returns a count of the total number of documents nested within a repeated field present across all of the entities matching a query.

In the body of the request, in the nested object, supply the dot-delimited path to the nested field.

Example nested aggregate request

Return the count of all services present on all virtual hosts in the workspace’s inventory.

{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "host.name: *",
    "aggregation": {
        "nested": {
            "path": "host.services"
        }
    }

Example nested success response

The aggregate was executed successfully, and 16,336 services across the 4,435 virtual hosts matched the query.

{
    "queryDurationMillis": 186,
    "totalCount": 4435, // the number of entities matching the query
    "result": {
        "nested": {
            "count": 16336, // the number of nested documents across all the entities matching the query
            "subResult": null
        }
    }
}

Sub aggregation

A sub-aggregation performs an aggregation within one previously specified. Sub-aggregations are the same types as top-level aggregations.

In the body of the request, add a sub_aggregation object after the initial aggregation, and embed another aggregation in the object.

...
    "aggregation": {
        ...,
        "sub_aggregation":{...}
    }

Filter

A filter aggregation narrows the counted documents to only those that match a query. This aggregation is often used as a sub_aggregation.

In the body of the request, in the filter object, supply the query in the Censys Search Language that filters the counted results.

Example filter aggregate request

Return the count of the name-based services that have a software risk in the workspace’s inventory.

{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "host.name: * and host.services.software.risks:*",
    "aggregation": {
        "nested": {
            "path": "host.services"
        },
        "sub_aggregation": {
            "filter": {
                "query": "software.risks:*"
            }
        }
    }
}

Example filter success response

The aggregate was executed successfully, and 183 of the 267 services on the 90 virtual hosts with a software risk matched the query.

{
    "queryDurationMillis": 26272,
    "totalCount": 90,  // the number of entities matching the query
    "result": {
        "nested": {
            "count": 267, // the number of nested documents across all the entities matching the query
            "subResult": {
                "filter": {
                    "count": 183, // the number of nested documents filtered by the filter query
                    "subResult": null
                }
            }
        }
    }
}

Rare term aggregation

A rare term aggregation returns a count for each of the lowest frequency values present in the inventory for a specified field (for example, term) across all assets matching a search. Unlike term aggregations, this type of aggregation takes a numerical definition of "rare" instead of a number of buckets.

Why?

Well, for example, if 20 unique values are seen in only 1 document, it wouldn’t be possible to return "the 10 least common values accurately." Instead, defining rare by a count allows the aggregate to include as many or as few results that exist, fitting that definition.

In the body of the request, in the rare_term object, supply the dot-delimited key of the field to be aggregated, as well as the maximum number of values (maxCount) to provide counts for. The maximum allowed is 100.

Example rare term aggregate request

Return the provinces with 10 or fewer critical or high-risk hosts, including the number of hosts.

{
    "workspaces": [
        "{{workspace_id}}"
    ],
    "query": "host.services.risks.severity:{critical, high}",
    "aggregation": {
        "rareTerm": {
            "field": "host.location.province",
            "maxCount": 10
        }
    }
}

Example rare term success response

The aggregate executed successfully and out of the provinces with 10 or fewer hosts reporting that province as their location.

{
    "queryDurationMillis": 192,
    "totalCount": 485, // the number of entities matching the query
    "result": {
        "rareTerm": {
            "buckets": [
                {
                    "key": "Alabama", // the least common value for the term of the entities matching the query
                    "count": 1, // the number of entities with the province value shown in the key
                    "subResult": null
                },
                {
                    "key": "Alaska",
                    "count": 1,
                    "subResult": null
                },
                {
                    "key": "Baladiyat ad Dawhah",
                    "count": 1,
                    "subResult": null
                },
                {
                    "key": "Colorado",
                    "count": 1,
                    "subResult": null
                },
                {
                    "key": "Haifa",
                    "count": 1,
                    "subResult": null
                },
                {
                    "key": "Iowa",
                    "count": 1,
                    "subResult": null
                },
                {
                    "key": "Jerusalem",
                    "count": 1,
                    "subResult": null
                },
                {
                    "key": "Land Berlin",
                    "count": 1,
                    "subResult": null
                },
                {
                    "key": "Maryland",
                    "count": 1,
                    "subResult": null
                },
                {
                    "key": "Massachusetts",
                    "count": 1,
                    "subResult": null
                }
            ]
        }
    }
}

Reverse nested

A reverse nested aggregation enables aggregating on parent docs from nested documents.

This field is used in conjunction with the nested field.

In the body of the request, as part of the reverse_nested object, supply the dot-delimited path to the field to be aggregated.

Example reverse nested aggregate request

Return a count of hosts with one of the top 10 most common extended service names in the inventory.

{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "host.ip:* and not host.name:*",
    "aggregation": {
        "nested": {
            "path": "host.services"
        },
        "sub_aggregation": {
            "term": {
                "field": "host.services.extended_service_name",
                "number_of_buckets": 10
            },
            "sub_aggregation": {
                "reverse_nested": {
                    "path": "host"
                }
            }
        }
    }
}

Example reverse nested response

The aggregate executed successfully and found the count of hosts with at least one of the 10 most common extended service names in the inventory.

{
    "queryDurationMillis": 1793,
    "totalCount": 8778, // the number of entities matching the query
    "result": {
        "nested": {
            "count": 6214, // the number of nested documents across all the entities matching the query
            "subResult": {
                "term": {
                    "buckets": [
                        {
                            "key": "HTTP", // the most common value for the term on hosts matching the query
                            "count": 3167, // the number of nested documents whose value for the term is the key
                            "subResult": {
                                "reverseNested": {
                                    "count": 1776, // the number of parent documents with at least 1 of the services counted above
                                    "subResult": null
                                }
                            }
                        },
                        {
                            "key": "HTTPS",
                            "count": 1928,
                            "subResult": {
                                "reverseNested": {
                                    "count": 1549,
                                    "subResult": null
                                }
                            }
                        },
                        {
                            "key": "UNKNOWN",
                            "count": 293,
                            "subResult": {
                                "reverseNested": {
                                    "count": 252,
                                    "subResult": null
                                }
                            }
                        },
                        {
                            "key": "SSH",
                            "count": 123,
                            "subResult": {
                                "reverseNested": {
                                    "count": 109,
                                    "subResult": null
                                }
                            }
                        },
                        {
                            "key": "ANYCONNECT",
                            "count": 99,
                            "subResult": {
                                "reverseNested": {
                                    "count": 99,
                                    "subResult": null
                                }
                            }
                        },
                        {
                            "key": "DNS",
                            "count": 91,
                            "subResult": {
                                "reverseNested": {
                                    "count": 91,
                                    "subResult": null
                                }
                            }
                        },
                        {
                            "key": "SMTP-STARTTLS",
                            "count": 83,
                            "subResult": {
                                "reverseNested": {
                                    "count": 52,
                                    "subResult": null
                                }
                            }
                        },
                        {
                            "key": "NTP",
                            "count": 70,
                            "subResult": {
                                "reverseNested": {
                                    "count": 70,
                                    "subResult": null
                                }
                            }
                        },
                        {
                            "key": "IMAPS",
                            "count": 64,
                            "subResult": {
                                "reverseNested": {
                                    "count": 34,
                                    "subResult": null
                                }
                            }
                        },
                        {
                            "key": "POP3S",
                            "count": 58,
                            "subResult": {
                                "reverseNested": {
                                    "count": 33,
                                    "subResult": null
                                }
                            }
                        }
                    ],
                    "otherCount": 238,
                    "errorUpperBound": 0
                }
            }
        }
    }
}

Cardinality

A cardinality aggregation returns only the number of unique values for a field in the workspace’s inventory.

This aggregation is useful when trying to figure out the number_of_buckets needed for a term aggregation.

In the body of the request, in the cardinality object, provide the dot-delimited field whose unique values are counted.

Example cardinality request

Return a count of the number of unique operating system vendors in use on hosts in the inventory.

{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "type=HOST",
    "aggregation": {
        "cardinality": {
            "field": "host.operating_system.vendor"
        }
    }
}

Example cardinality Success response

The aggregate executed successfully, and hosts in the inventory reported 19 unique operating system vendors.

{
    "queryDurationMillis": 81,
    "totalCount": 13186, // the total number of entities matching the query
    "result": {
        "cardinality": {
            "value": 19 // the number of unique values for OS vendor across all entities matching the query
        }
    }
}

More example requests

These requests can be copied and pasted into your API client. Replace the placeholder text in the workspaces record with your organization’s workspace ID.

Example 1: Common Non-HTTP services

What are the 100 most common non-HTTP services on hosts in the inventory, and what are the five most common ports each of those services runs on?

This aggregate returns:

  • The number of hosts with a service that is not in the HTTP family.
  • The total number of services on those hosts.
  • The number of non-HTTP services on those hosts.
  • The 100 most common service names.
  • The five most common ports the services are running on.
{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "host.services: (not service_name: {HTTP, CWMP, KUBERNETES, PROMETHEUS, ELASTICSEARCH})",
    "aggregation": {
        "nested": {
            "path": "host.services"
        },
        "sub_aggregation": {
            "filter": {
                "query": "not host.services.service_name: {HTTP, CWMP, KUBERNETES, PROMETHEUS, ELASTICSEARCH}"
            },
            "sub_aggregation": {
                "term": {
                    "field": "host.services.service_name",
                    "number_of_buckets": 100
                },
                "sub_aggregation": {
                    "term": {
                        "field": "host.services.port",
                        "number_of_buckets": 5
                    }
                }
            }
        }
    }
}

Example 2: Common page titles

What are the 1,000 most common HTML titles of name-based HTTPS services returning a 200 status code?

{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "host.name: * and host.services:(extended_service_name: HTTPS and http.response.status_code: 200)",
    "aggregation": {
        "nested": {
            "path": "host.services"
        },
        "sub_aggregation": {
            "filter": {
                "query": "extended_service_name: HTTPS and http.response.status_code: 200"
            },
            "sub_aggregation": {
                "term": {
                    "field": "host.services.http.response.html_title",
                    "number_of_buckets": 1000
                }
            }
        }
    }
}

Example 3: Page titles of unencrypted web pages on port 80

What are the 1,000 most common HTML titles of HTTP services not returning a 301 status code?

This aggregation returns:

  • The number of hosts with a service on port 80 not returning an HTTP 301.
  • The total number of services on those hosts.
  • The number of services on port 80 (same as the first number).
  • The 1000 most common HTML titles for those services.
  • The five most common HTTP status codes returned by services with those status codes.
{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "host.services: (port: 80 and not http.response.status_code: 301)",
    "aggregation": {
        "nested": {
            "path": "host.services"
        },
        "sub_aggregation": {
            "filter": {
                "query": "port: 80 and not http.response.status_code: 301"
            },
            "sub_aggregation": {
                "term": {
                    "field": "host.services.http.response.html_title",
                    "number_of_buckets": 154
                },
                "sub_aggregation": {
                    "term": {
                        "field": "host.services.http.response.status_code",
                        "number_of_buckets": 5
                    }
                }
            }
        }
    }
}

Example 4: Top 10 host ports with most common risk categories of high severity risks

This aggregation returns:

  • The number of hosts with a high severity risk.
  • The total number of services on all of those hosts.
  • The number of services with a high severity risk.
  • The total number of risks on those services.
  • The number of high severity risks on those services.
  • The 10 most common risk categories of the high severity risks.
  • The number of services that each high-risk category is on.
  • The 10 most common port numbers of those services.
{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "host.services.risks.severity: high",
    "aggregation": {
        "nested": {
            "path": "host.services"
        },
        "sub_aggregation": {
            "filter": {
                "query": "host.services.risks.severity: high"
            },
            "sub_aggregation": {
                "nested": {
                    "path": "host.services.risks"
                },
                "sub_aggregation": {
                    "filter": {
                        "query": "severity: high"
                    },
                    "sub_aggregation": {
                        "term": {
                            "field": "host.services.risks.categories",
                            "number_of_buckets": 10
                        },
                        "sub_aggregation": {
                            "reverse_nested": {
                                "path": "host.services"
                            },
                            "sub_aggregation": {
                                "term": {
                                    "field": "host.services.port",
                                    "number_of_buckets": 10
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Example 5: Top 10 software packages reported by services with a high severity software risk

This aggregation returns:

  • The number of hosts with a software risk.
  • The total number of services on those hosts.
  • The number of services with a software risk.
  • The number of software risks.
  • The top 10 software risk types.
  • The number of services with each of the top 10 risks.
  • The 10 most common software packages reported by the services with each risk type.
{
    "workspaces": [
        "your-workspace-id"
    ],
    "query": "host.services.software.risks:*",
    "aggregation": {
        "nested": {
            "path": "host.services"
        },
        "sub_aggregation": {
            "filter": {
                "query": "software.risks:*"
            },
            "sub_aggregation": {
                "nested": {
                    "path": "host.services.software.risks"
                },
                "sub_aggregation": {
                    "term": {
                        "field": "host.services.software.risks.type",
                        "number_of_buckets": 10
                    },
                    "sub_aggregation": {
                        "reverse_nested": {
                            "path": "host.services"
                        },
                        "sub_aggregation": {
                            "term": {
                                "field": "host.services.software.uniform_resource_identifier",
                                "number_of_buckets": 10
                            }
                        }
                    }
                }
            }
        }
    }
}