Threat Dataset

The Censys Threat Dataset provides a real-time view of active adversary infrastructure by scanning and mapping malware, threat actors, and tactics to services or endpoints running on exposed hosts and web properties. Since the Threat Dataset is continuously updated, security teams can query the data to identify, investigate, and block threats earlier in the attack lifecycle.

The dataset adds threat-specific objects and fields to existing host and web property records. This enriched data powers the Threat Hunting Module, enabling users to query threat records, pivot with CensEye to discover related infrastructure, and validate suspicious findings using Live Discovery and Live Rescan.

Understanding the structure of the dataset structure helps you interpret relationships between threats, services, and actors, so you can better investigate malicious activity.

Data model structure

The diagram below illustrates how the Censys platform structures threat-related data. The following objects and relationships define how threats are identified, contextualized, and correlated across infrastructure.

The following section explains how the dataset models relationships between assets, services, threats, and actors, giving you the context needed to investigate threats and uncover related malicious infrastructure.

In the diagram above, each object is shown with its corresponding fields. The top-level asset can be a host (1.1.1.1) or a web property (censys.io:7443). Hosts can expose multiple services, and threats are detected at the web property and service level. A service (443/HTTP) may be linked to one or more threats when Censys identifies behavioral patterns, such as HTTP response characteristics or protocol fingerprints. These patterns form the basis of detection.

Each threat (Mythic) is tied to a specific service on a host or endpoint on a web property. Threats includes metadata and can be linked to one or more actors and malware (Mythic). Actors (FIN7) are groups that have used the identified malware in past or current operations. The data connects the presence of this malware on this host or service to the groups known to have used it in the past.

Dataset objects

Host object

Each host record includes:

  • ip: The IP address of the host.
  • services: Services exposed on the host, such as HTTP or SSH.

Web property object

Web properties are domain-based assets like hostnames or subdomains that map to IPs. They enable detection and tracking of threats linked to name-based infrastructure.

Each host record includes:

  • hostname: The fully qualified domain name (FQDN) of the web property (login.censys.com).
  • port: The service port associated with the property (443 for HTTPS).
  • endpoints: The resolved IP addresses and related metadata for where the property points.
  • threats: Threat objects associated with this web property. Threats are identified based on observed behavior like fingerprints.

Service object

Each service record includes:

  • port: The port number on which the service is exposed (e.g., 80 for HTTP).
  • protocol: The transport layer protocol used by the service, typically TCP or UDP.
  • threats: Threat objects associated with this service. Threats are identified based on observed behavior like fingerprints.

Threat object

Censys defines a threat as infrastructure that has identifiable malware fingerprints. Censys identifies threats by matching behavioral patterns, such as fingerprints, protocol responses, or infrastructure traits, to known malware or adversary tooling. Context is provided from URL endpoints associated with generic malware operations. Each threat is tied to a specific service on a host or endpoint on a web property.

Each threat record includes:

  • name: Name of the threat, Mythic for example.

  • id: Allows you to query, reference, and pivot around that specific threat across the Censys platform.

  • description: On the Threat Detail page, a detailed description of the threat is provided.

  • type: Describes the role of the service, such as C2 server, botnet node. These are available on the Threat Hunting homepage, in the Threat Type dropdown.

  • tactic: Describes how the threat behaves and the purpose of the activity, such as command and control.
    The screenshot below is from the Threat Detail page. To view this, go to the Explore homepage, scroll down, and click on Cobalt Strike.

    Key-value pairs from the threat dataset on the Threat Detail page.

  • source: Indicates the origin of the threat detection, identifying whether it was discovered by Censys or another provider.

  • confidence: Measures how reliably Censys identifies a threat on Internet-exposed assets. Confidence scores range from 0 to 1 and help users assess the likelihood of false positives in detected software fingerprints. The spectrum spans from .25 (low), .50 (moderate), .75 (high), 1 (very high).

  • malware: The malware it's associated with. A threat references one malware object.

  • actors: List[Actor] Historically associated with that malware. These are available on the Threat Hunting homepage, in the Threat Group dropdown.

    📘

    Note

    Actors are represented in two ways. The top-level name is the primary name used to identify the group, APT40 for example. The field below it lists all known aliases and names the group uses. This makes it easier to search and across different tools, regardless of the naming conventions.

In simple terms, a threat object represents an accumulation of relationships between observed infrastructure behavior and known adversary activity. It connects what we see (a pattern on a service) to what we know (malware, tactics, and threat actors). Threat types and tactics offer context to the nature and intent of the threat.

Actor object

Actors are groups that have used the identified malware in past or current operations. The screenshot below is from a Cobalt Strike Threat Detail page. To view this, go to the Explore homepage, scroll down, and click on Cobalt Strike.

Each actor record includes:

  • name: List[Text] The top-level name is the primary name used to identify the group, APT40 for example.
  • all_names: Lists all known aliases and names. This makes it easier to search and across different tools, regardless of the naming conventions. In the screenshot above, it's the list of names under APT40.
  • mitreGroupId: The resource ID that links to Mitre for additional context about the actor.
  • malpediaGroupId: The resource ID that links to Malpedia for additional context about the actor.
  • description: A description about the actor that provides additional context.

Malware object

Each malware record includes:

  • id: A unique identifier for the malware object.
  • name: The primary name Censys uses to identify the malware.
  • all_names: List[text] Known names or aliases for the malware, used for threat correlation and search.
  • malpediaId_id: The ID for the malware entry in Malpedia.
  • description: The Malpedia description that provides additional context.
  • confidence: Measures how reliably Censys identifies a malware on Internet-exposed assets. Confidence scores range from 0 to 1 and help users assess the likelihood of false positives in detected software fingerprints. The spectrum spans from .25 (low), .50 (moderate), .75 (high), 1 (very high).