Get Started with Data Downloads

The Censys Data Downloads API provides access to bulk Censys Platform data. This guide explains the structure of the data downloads and how to retrieve data files.

Structure

Data downloads have a three-tier hierarchical structure:

  • Datasets
  • Dataset snapshots
  • Snapshot files

A dataset defines the type and scope of data available. Each dataset contains one or more snapshots. These snapshots are generally available on a daily basis. Snapshots are comprised of one or more files.

For example, the 2025-01-01 snapshot within the host-ipv4 dataset would contain a snapshot of all IPv4 host data for that day.

To obtain historical data for a particular date, you must retrieve the dataset snapshot for that date.

Datasets

Censys provides the following datasets. The datasets available to you will vary based on what your organization has access to.

DatasetDescription
host-ipv4Contains IPv4 assets and their associated services. Includes CVE data if your organization has access to it.
host-ipv6Contains IPv6 assets and their associated services. Includes CVE data if your organization has access to it.
webContains web properties and their associated endpoints. Includes CVE data if your organization has access to it.
certificates-v2-fullContains records and details about certificates as of a given date.
certificates-v2-incrementalContains records and details about new certificates. Daily updates to the certificate dataset not included included in v2-full.
threatsContains asset information and attendant threat data.

File size

Approximate file sizes for the datasets are as follows.

DatasetApproximate size
certificates-v2-full (Avro)29 TB
certificates-v2-incremental130 GB
host-ipv4 (Parquet)500 GB
host-ipv6 (Parquet)7 GB
web1.2 TB
threats1 GB

Data schema

Data in the snapshots is structured following the Platform data schema.

The following tables provide more information about how data is structured for each asset type. The tables are not exhaustive but highlight in general how data is arranged for hosts, web properties, and certificates and focus on key fields for common use cases.

Host data schema

Detailed information about the host data schema is provided in the collapsible section below.

Host data schema

Hosts are composed of a top-level field for the IP address and GroupType fields for related host data like location, autonomous_system, whois, dns, and a list of observed services.

Core host information

Field nameRepetitionLogical typePhysical typeDescription
ipOptionalStringByteArrayThe IP address of the host.

Host location data (location)

Field nameRepititionLogical typePhysical typeDescription
continentOptionalStringByteArrayThe English name of the detected continent.
countryOptionalStringByteArrayThe English name of the detected country.
country_codeOptionalStringByteArrayThe detected two-letter ISO 3166-1 alpha-2 country code (US, CN, GB, RU, and so on).
cityOptionalStringByteArrayThe English name of the detected city.
postal_codeOptionalStringByteArrayThe postal code (if applicable) of the detected location.
timezoneOptionalStringByteArrayThe IANA time zone database name of the detected location.
provinceOptionalStringByteArrayThe state or province name of the detected location.
registered_countryOptionalStringByteArrayThe English name of the registered country.
registered_country_codeOptionalStringByteArrayThe registered country's two-letter ISO 3166-1 alpha-2 country code (US, CN, GB, RU, and so on).
coordinatesOptionalnullGroupThe estimated coordinates of the detected location.
latitudeOptionalnullDoubleThe estimated latitude.
longitudeOptionalnullDoubleThe estimated longitude.

Host autonomous system data (autonomous_system)

Field nameRepetitionLogical typePhysical typeDescription
asnOptionalnullInt64The autonomous system number of the host's autonomous system.
descriptionOptionalStringByteArrayBrief description of the autonomous system.
bgp_prefixOptionalStringByteArrayThe autonomous system's CIDR.
nameOptionalStringByteArrayThe friendly name of the autonomous system.
country_codeOptionalStringByteArrayThe autonomous system's two-letter ISO 3166-1 alpha-2 country code (US, CN, GB, RU, and so on).
organizationOptionalStringByteArrayThe name of the organization managing the autonomous system.

Host WHOIS information (whois)

Field NameRepetitionLogical typePhysical typeDescription
networkOptionalnullGroup (handle, name, cidrs, created, updated, allocation_type)Details about the network block.
handleOptionalStringByteArrayHandle for the network object.
nameOptionalStringByteArrayName of the network.
cidrsRequiredList (element: String)List (element: ByteArray)List of CIDR blocks.
createdOptionalTimestampInt64Network creation timestamp (Microseconds).
updatedOptionalTimestampInt64Network last update timestamp (Microseconds).
allocation_typeOptionalStringByteArrayType of IP address allocation.
organizationOptionalnullGroup (handle, name, address, street, city, state, postal_code, country, abuse_contacts, admin_contacts, tech_contacts)Details about the organization owning the network.
handleOptionalStringByteArrayHandle for the organization object.
nameOptionalStringByteArrayName of the organization.
addressOptionalStringByteArrayOrganization's general address.
streetOptionalStringByteArrayStreet address.
cityOptionalStringByteArrayCity.
stateOptionalStringByteArrayState/Province.
postal_codeOptionalStringByteArrayPostal code.
countryOptionalStringByteArrayCountry.
abuse_contactsRequiredList (element: Group (handle, name, email))List (element: Group)List of abuse contact information.
admin_contactsRequiredList (element: Group (handle, name, email))List (element: Group)List of administrative contact information.
tech_contactsRequiredList (element: Group (handle, name, email))List (element: Group)List of technical contact information.

Host DNS information (dns)

Field NameRepetitionLogical typePhysical typeDescription
reverse_dnsOptionalnullGroup (names, resolve_time)Reverse DNS lookup details.
namesRequiredList (element: String)List (element: ByteArray)List of resolved DNS names (hostnames).
resolve_timeOptionalTimestampInt64Timestamp of the DNS resolution (Microseconds).
forward_dnsRequiredList (element: Group (name, record_type, resolve_time))List (element: Group)Forward DNS lookup details.
nameOptionalStringByteArrayThe DNS record name (e.g., domain name).
record_typeOptionalStringByteArrayThe type of DNS record (e.g., A, CNAME, MX).
resolve_timeOptionalTimestampInt64Timestamp of the DNS resolution (Microseconds).

Host service information (services)

Field NameRepetitionLogical TypePhysical TypeDescription
endpointsRequiredList (element: Group (element))List (element: Group)List of endpoints (HTTP, Elasticsearch, Kubernetes, etc.) on this service.
portOptionalnullInt64The port number the service is running on.
protocolOptionalStringByteArrayThe application layer protocol (e.g., http, ssh).
transport_protocolOptionalStringByteArrayThe transport layer protocol (e.g., tcp, udp).
scan_timeOptionalTimestampInt64The time the service was scanned (Microseconds).
bannerOptionalnullByteArrayThe raw service banner/response.
banner_hash_sha256OptionalnullByteArrayThe SHA256 hash of the service banner.
tlsOptionalnullGroupTLS/SSL handshake information.
certOptionalnullGroupSSL/TLS certificate details.
jarmOptionalnullGroupJARM fingerprinting information.
softwareRequiredList (element: Group (element))List (element: Group)List of software detected running the service.
hardwareRequiredList (element: Group (element))List (element: Group)List of hardware details associated with the service.
operating_systemsRequiredList (element: Group (element))List (element: Group)List of operating systems identified for the service.
labelsRequiredList (element: Group (element))List (element: Group)Arbitrary labels applied to the service.
vncOptionalnullGroupDetails specific to the VNC protocol.
rdpOptionalnullGroupDetails specific to the RDP protocol.
sshOptionalnullGroupDetails specific to the SSH protocol.
mysqlOptionalnullGroupDetails specific to the MySQL protocol.
ipmiOptionalnullGroupDetails specific to the IPMI protocol.
amqpOptionalnullGroupDetails specific to the AMQP protocol.
memcachedOptionalnullGroupDetails specific to the Memcached protocol.
mssqlOptionalnullGroupDetails specific to the MSSQL protocol.
oracleOptionalnullGroupDetails specific to the Oracle protocol.
redisOptionalnullGroupDetails specific to the Redis protocol.
snmpOptionalnullGroupDetails specific to the SNMP protocol.
postgresOptionalnullGroupDetails specific to the PostgreSQL protocol.
mongodbOptionalnullGroupDetails specific to the MongoDB protocol.
bacnetOptionalnullGroupDetails specific to the BACnet protocol.
dnp3OptionalnullGroupDetails specific to the DNP3 protocol.
dnsOptionalnullGroupDetails specific to the DNS protocol.
ftpOptionalnullGroupDetails specific to the FTP protocol.
imapOptionalnullGroupDetails specific to the IMAP protocol.
ippOptionalnullGroupDetails specific to the IPP protocol.
modbusOptionalnullGroupDetails specific to the Modbus protocol.
mqttOptionalnullGroupDetails specific to the MQTT protocol.
ntpOptionalnullGroupDetails specific to the NTP protocol.
pc_anywhereOptionalnullGroupDetails specific to the pcAnywhere protocol.
pop3OptionalnullGroupDetails specific to the POP3 protocol.
s7OptionalnullGroupDetails specific to the Siemens S7 protocol.
smbOptionalnullGroupDetails specific to the SMB protocol.
smtpOptionalnullGroupDetails specific to the SMTP protocol.
telnetOptionalnullGroupDetails specific to the Telnet protocol.
foxOptionalnullGroupDetails specific to the Fox protocol.
openvpnOptionalnullGroupDetails specific to the OpenVPN protocol.
coapOptionalnullGroupDetails specific to the CoAP protocol.
sipOptionalnullGroupDetails specific to the SIP protocol.
team_viewerOptionalnullGroupDetails specific to the Team Viewer protocol.
x11OptionalnullGroupDetails specific to the X11 protocol.
skinnyOptionalnullGroupDetails specific to the Cisco Skinny protocol.
pptpOptionalnullGroupDetails specific to the PPTP protocol.
mmsOptionalnullGroupDetails specific to the MMS protocol.
ikeOptionalnullGroupDetails specific to the IKE protocol.
ssdpOptionalnullGroupDetails specific to the SSDP protocol.
upnpOptionalnullGroupDetails specific to the UPnP protocol.
any_connectOptionalnullGroupDetails specific to the Cisco AnyConnect protocol.
ldapOptionalnullGroupDetails specific to the LDAP protocol.
activemqOptionalnullGroupDetails specific to the ActiveMQ protocol.
checkpoint_topologyOptionalnullGroupDetails specific to Check Point Topology.
dhcpdiscoverOptionalnullGroupDetails specific to DHCP discovery.
epmdOptionalnullGroupDetails specific to the Erlang Port Mapper Daemon protocol.
ethereumOptionalnullGroupDetails specific to the Ethereum protocol.
krpcOptionalnullGroupDetails specific to the Kademlia RPC protocol.
l2tpOptionalnullGroupDetails specific to the L2TP protocol.
monero_p2pOptionalnullGroupDetails specific to the Monero P2P protocol.
opc_uaOptionalnullGroupDetails specific to the OPC UA protocol.
rocketmqOptionalnullGroupDetails specific to the RocketMQ protocol.
socksOptionalnullGroupDetails specific to the SOCKS protocol.
zeromqOptionalnullGroupDetails specific to the ZeroMQ protocol.
eipOptionalnullGroupDetails specific to the EtherNet/IP protocol.
elf_fileOptionalnullGroupDetails about a detected ELF file.
tplink_kasaOptionalnullGroupDetails specific to the TP-Link Kasa protocol.
darkcometOptionalnullGroupDetails specific to the DarkComet RAT protocol.
dcerpcOptionalnullGroupDetails specific to the DCE/RPC protocol.
hikvisionOptionalnullGroupDetails specific to the Hikvision protocol.
chromecastOptionalnullGroupDetails specific to the Chromecast protocol.
crestron_cp3OptionalnullGroupDetails specific to the Crestron CP3 protocol.
cwmpOptionalnullGroupDetails specific to the CWMP protocol.
dvr_ipOptionalnullGroupDetails specific to the DVR/IP protocol.
etcdOptionalnullGroupDetails specific to the etcd protocol.
gearmanOptionalnullGroupDetails specific to the Gearman protocol.
hid_vertxOptionalnullGroupDetails specific to the HID Vertx protocol.
iotaOptionalnullGroupDetails specific to the IOTA protocol.
iscsiOptionalnullGroupDetails specific to the iSCSI protocol.
lpdOptionalnullGroupDetails specific to the LPD protocol.
mdnsOptionalnullGroupDetails specific to the mDNS protocol.
minecraftOptionalnullGroupDetails specific to the Minecraft protocol.
murmurOptionalnullGroupDetails specific to the Murmur protocol.
nbdOptionalnullGroupDetails specific to the NBD protocol.
nfs_mountdOptionalnullGroupDetails specific to the NFS mountd protocol.
nmeaOptionalnullGroupDetails specific to the NMEA protocol.
onvifOptionalnullGroupDetails specific to the ONVIF protocol.
pgbouncerOptionalnullGroupDetails specific to the PgBouncer protocol.
portmapOptionalnullGroupDetails specific to the Portmap/RPCBind protocol.
rdateOptionalnullGroupDetails specific to the RDate protocol.
realportOptionalnullGroupDetails specific to the RealPort protocol.
rippleOptionalnullGroupDetails specific to the Ripple protocol.
rloginOptionalnullGroupDetails specific to the RLogin protocol.
rtspOptionalnullGroupDetails specific to the RTSP protocol.
sap_routerOptionalnullGroupDetails specific to the SAP Router protocol.
scpiOptionalnullGroupDetails specific to the SCPI protocol.
ser2netOptionalnullGroupDetails specific to the Ser2Net protocol.
seven_days_to_dieOptionalnullGroupDetails specific to the 7 Days to Die game protocol.
spiceOptionalnullGroupDetails specific to the SPICE protocol.
steamOptionalnullGroupDetails specific to the Steam protocol.
tacacs_plusOptionalnullGroupDetails specific to the TACACS+ protocol.
tibiaOptionalnullGroupDetails specific to the Tibia game protocol.
unitronics_pcomOptionalnullGroupDetails specific to the Unitronics PCOM protocol.
ventriloOptionalnullGroupDetails specific to the Ventrilo protocol.
weblogic_t3OptionalnullGroupDetails specific to the WebLogic T3 protocol.
winrmOptionalnullGroupDetails specific to the WinRM protocol.
ws_discoveryOptionalnullGroupDetails specific to the WS-Discovery protocol.
nats_ioOptionalnullGroupDetails specific to the NATS.io protocol.

Web property data schema

Detailed information about the web property data schema is provided in the collapsible section below.

Web property data schema

Internet assets that respond to hostname-based scans are classified as web properties.

Web properties are identified by a hostname and a port. Hostnames can be name-based records (such as app.censys.io) or IP-based records (such as 104.18.10.85). Example names of web property records include app.censys.io:443 and 104.18.10.85:8880.

You can use web properties to explore websites, APIs, web-based applications, and much more.

Top-level web property fields and groups

Field NameRepetitionLogical TypePhysical TypeDescription
hostnameOptionalStringByteArrayThe hostname of the web property.
scan_timeOptionalTimestampInt64The time that the web property was scanned.
endpointsRequiredListListA list of network endpoints discovered for the web property.
portOptionalN/AInt64The port number that was scanned on the web property.
operating_systemsRequiredListListList of operating systems identified for the web property/underlying server.
hardwareRequiredListListList of hardware details for the web property/underlying server.
softwareRequiredListListList of software running the web property or on the server.
jarmOptionalnullGroupJARM fingerprinting details for a TLS session used by the property.
labelsRequiredListListArbitrary tags or intelligence labels applied to the web property.
tlsOptionalnullGroupTLS/SSL handshake negotiation details.
certOptionalnullGroupSSL/TLS certificate details.

Web property endpoint data (endpoints)

Field NameRepetitionLogical TypePhysical TypeDescription
pathOptionalStringByteArrayThe URI path inspected on the endpoint (e.g., /api/v1).
endpoint_typeOptionalStringByteArrayA classification of the endpoint (e.g., web, elasticsearch).
scan_timeOptionalTimestampInt64Timestamp of the individual endpoint scan (Microseconds).
ipOptionalStringByteArrayThe IP address associated with the endpoint.
bannerOptionalnullByteArrayThe raw service banner/response.
banner_hash_sha256OptionalnullByteArraySHA256 hash of the service banner.
httpOptionalnullGroupHTTP protocol-specific metadata.
elasticsearchOptionalnullGroupElasticsearch server details.
kubernetesOptionalnullGroupKubernetes API/cluster details.
prometheusOptionalnullGroupPrometheus monitoring metrics.
fortigateOptionalnullGroupFortigate device specific info.
cobalt_strikeOptionalnullGroupCobalt Strike team server profile details.
pprofOptionalnullGroupGo pprof profiling metrics endpoint info.
prometheus_targetOptionalnullGroupPrometheus target metrics.
graphqlOptionalnullGroupGraphQL endpoint details.
ivanti_avalancheOptionalnullGroupIvanti Avalanche server info.
ollamaOptionalnullGroupOllama large language model server info.
chrome_devtoolsOptionalnullGroupChrome DevTools protocol details.
plex_media_serverOptionalnullGroupPlex Media Server information.
redlion_webOptionalnullGroupRed Lion Web Server information.
scada_viewOptionalnullGroupSCADA View application server information.
open_directoryOptionalnullGroupOpen directory contents.
screenshotsRequiredListListList of captured screenshots for this endpoint.

Certificate data schema

Detailed information about the certificate data schema is provided in the collapsible section below.

Certificate data schema

The Censys certificate dataset is the most exhaustive collection of X.509 documents in existence. It consists of over 15 billion records and records are added on a daily basis.

In the Censys Platform, certificates are indexed and identified by their "fingerprint," which is the SHA-256 digest of the entire raw certificate.

Top-level certificate fields

Field NameRepetitionLogical TypePhysical TypeDescription
fingerprint_sha256OptionalbytesbytesSHA256 fingerprint of the certificate.
fingerprint_sha1OptionalbytesbytesSHA1 fingerprint of the certificate.
fingerprint_md5OptionalbytesbytesMD5 fingerprint of the certificate.
tbs_fingerprint_sha256OptionalbytesbytesSHA256 fingerprint of the To Be Signed (TBS) part.
tbs_no_ct_fingerprint_sha256OptionalbytesbytesSHA256 fingerprint of the TBS part without the Certificate Transparency (CT) extension.
spki_subject_fingerprint_sha256OptionalbytesbytesSHA256 fingerprint of the Subject Public Key Info (SPKI).
parent_spki_subject_fingerprint_sha256OptionalbytesbytesSHA256 fingerprint of the parent certificate's SPKI.
parsedOptionalrecordGroupParsed details extracted from the certificate content.
precertOptionalbooleanbooleanIndicates if the certificate is a precertificate.
revokedOptionalbooleanbooleanIndicates if the certificate is known to be revoked.
namesRequiredstringarrayList of all domain names (CN, SANs) associated with the certificate.
validation_levelOptionalstringstringThe type of validation (e.g., DV, OV, EV).
validationOptionalrecordGroupExternal validation status from various trust stores.
revocationOptionalrecordGroupRevocation check details (OCSP and CRL).
ctOptionalrecordGroupCertificate Transparency log entry data.
ever_seen_in_scanOptionalbooleanbooleanWhether this certificate has ever been observed in a scan.
rawOptionalbytesbytesThe raw certificate bytes.
added_atOptionalTimestamplongTimestamp when the certificate was first seen (Microseconds).
modified_atOptionalTimestamplongTimestamp of the last modification (Microseconds).
validated_atOptionalTimestamplongTimestamp of the last validation check (Microseconds).
parse_statusOptionalstringstringStatus of the certificate parsing process.
zlintOptionalrecordGroupZLint (certificate linter) validation results.
labelsRequiredstringarrayArbitrary labels applied to the certificate.
not_valid_afterOptionalTimestamplongThe certificate's expiration time (Microseconds).
inserted_atOptionalTimestamplongTime when the record was inserted into the database (Microseconds).

Parsed certificate data (parsed)

Field NameRepetitionLogical TypePhysical TypeDescription
versionOptionallonglongThe X.509 certificate version.
serial_numberOptionalstringstringThe certificate's unique serial number.
issuer_dnOptionalstringstringThe Issuer's Distinguished Name (raw string).
issuerOptionalrecordGroupParsed fields of the Issuer's distinguished name (e.g., common_name, country).
subject_dnOptionalstringstringThe Subject's Distinguished Name (raw string).
subjectOptionalrecordGroupParsed fields of the Subject's distinguished name.
subject_key_infoOptionalrecordGroupDetails about the Subject's public key (algorithm, modulus/curve info).
validity_periodOptionalrecordGroupStart and end dates of the certificate's validity.
signatureOptionalrecordGroupDetails about the digital signature (algorithm, validity status).
extensionsOptionalrecordGroupParsed X.509 extensions (e.g., key_usage, subject_alt_name).
unknown_extensionsRequiredrecordarrayList of unrecognized X.509 extensions.
redactedOptionalbooleanbooleanIndicates if certificate information has been redacted.
serial_number_hexOptionalstringstringThe serial number in hexadecimal format.
ja4xOptionalstringstringJA4X certificate fingerprint value.

Format

Snapshots are available in AVRO and Parquet format.

The specific formats available may vary on a snapshot-by-snapshot basis.

Asynchronous downloads

Recent snapshots are generally ready to download without delay. However, when downloading older snapshots or the current day's snapshot, the files endpoint may return a 202 status with the message The files for this snapshot are currently being generated, and will be available soon. Please request them later.

For large datasets, it may take several hours before the data is ready to download. You can periodically retry the request until the files become available.

Data download API endpoints

API endpoints are available for the following operations.

Endpoint and link to documentation

Description

GET https://data.censys.io/api/v1/datasets

Retrieve a list of the datasets available to an organization.

GET https://data.censys.io/api/v1/datasets/{dataset}/snapshots

Retrieve the list of snapshots available for a dataset.

GET 
https://data.censys.io/api/v1/datasets/{dataset}/snapshots/{snapshot}/files/{format}

Retrieve the list of files available for a snapshot.

Example workflow

Prerequisites

  • Your Censys organization ID. To find your organization ID:

    1. Go the the Censys Platform web console. Ensure that you have your Enterprise account selected.
    2. Your organization ID is provided in the URL after org=.
  • Your Personal Access Token (PAT).

You must include your organization ID and PAT in all of your requests. Your organization ID is used to determine which datasets you have access to.

List and retrieve data

  1. To list the datasets that you have access to, use the following:
    curl -H "Authorization: Bearer <your-api-token>" "https://data.censys.io/api/v1/datasets?org=<your-organization-id>"
    This will return a list of datasets, such as the example response provided below.
    {
      "datasets": [
        {
          "id": "host-ipv4",
          "description": "Daily IPv4 Host Snapshots"
        }
      ]
    }
  2. List the available snapshots for a dataset using the following example. This will retrieve the snapshots available for the host-ipv4 dataset.
    curl -H "Authorization: Bearer <your-api-token>" "https://data.censys.io/api/v1/datasets/host-ipv4/snapshots?org=<your-organization-id>"
    You will receive a response similar to the one below.
    {
      "dataset": "host-ipv4",
      "snapshots": [
        {
          "id": "1234",
          "addedTime": "2025-01-01T00:00:00.00000000Z",
          "expireTime": "2025-11-31T00:00:00.00000000Z",
          "formats": [
            {
              "name": "AVRO"
            },
            {
              "name": "PARQUET"
            }
          ]
        }
      ]
    }
  3. Use the id field from a snapshot to list the files that constitute a snapshot.
    curl -H "Authorization: Bearer <your-api-token>" "https://data.censys.io/api/v1/datasets/host-ipv4/snapshots/datasets/1234/?org=<your-organization-id>"
    You will a receive a response similar to the one below.
    {
      "dataset": "host-ipv4",
      "snapshot": "1234",
      "file": [
        {
          "url": "<signed url>",
          "sizeBytes": 10000
        }
      ]
    }
  4. Use the signed URL to download the files or use a tool like DuckDB to query the files without downloading them.
    SELECT * FROM parquet_scan("<signed-url>") LIMIT 1;

BigQuery configuration

To ensure that your data exports function as expected when using BigQuery, enable the following boolean fields.

FormatFieldAdditional information
ParquetenableListInferenceBigQuery refernce guide
AvrouseAvroLogicalTypesBigQuery reference guide