Download Certs 2.0 Data
The Censys global scanning engine certificate repository is the largest in the world and is growing every day.
Acertificate’s contents are immutable after issued, although accompanying data can change, such as:
- which certificate transparency logs it has been submitted to
- whether it has been revoked
- whether it has ever been seen in a Censys global scanning engine scan of the Internet
Censys provides these changes to certificate records and new records representing newly seen certificates in a daily download.
Each day, an incremental dataset with just that day’s certificate record changes and new record additions is available to download. It contains new certificate records and diffs in existing certificate records’ metadata.
How downloads work
If you want to download certificate data for unlimited querying and use in custom workflows, complete a one-time download of the full certificate repository, followed by a daily download containing the day’s changes to be applied to the local copy of the full dataset.
A full snapshot of all of the certificates in the repository is available on the 1st of every month.
certificates-v2-full
Incremental downloads are available every day, including on the 1st of every month. In the event that a client runs fewer than 1 time per day, apply all changes in date and time order from every incremental update.
certificates-v2-incremental
Resource preparation for the dataset
The certificate dataset is large, both in terms of the overall storage space necessary to accommodate its daily growth and the client requirements for downloading the incremental changes published daily.
Series sizes
The certificates dataset is continually growing. Since the beginning of 2023, Censys's global scanning engine has added about 500,000,000 new certificates to the repository each month.
As of summer 2023, the size of the certificates-v2-full
dataset is about ~12TB.
As of the summer of 2023, the size of the certificates-v2-incremental
dataset is about 30-60GB.
Data formatting
The files containing the Certificates 2.0 datasets are serialized in an Avro binary, which stores the data schema within it. To get started with Avro, visit the official Arvo site.
Complete a one-time download of the Full Certificates Snapshot
You need to download the full snapshot one time.
After that, you download the incremental dataset each day and apply its changes to your copy of the dataset.
Use the Search URL:
- Base URL:
https://search.censys.io
With this API path:
- Path:
/api/va/data/
And this Series Endpoint:
- Series name:
certificates-v2-full
Example 200 response
"id": "certificates-v2-full",
"name": "Full Set of X.509 Certificates",
"description": "Parsed X.509 certificates featuring all certificates known to Censys. Schema version 2.",
"results": {
"latest": {
"id": "2023-03-01T12:50:16.804634Z",
"timestamp": "20230301T125017",
"details_url": "https://search.censys.io/api/v1/data/certificates-v2-full/2023-03-01T12:50:16.804634Z"
},
"historical": [
{
"id": "2023-03-01T12:50:16.804634Z",
"timestamp": "20230301T125017",
"details_url": "https://search.censys.io/api/v1/data/certificates-v2-full/2023-03-01T12:50:16.804634Z"
}
]
}
Then, follow up with a GET
request to the details_url
to see the list of files comprising the result.
GET \https://search.censys.io/api/v1/data/certificates-v2-full/2023-03-01T12:50:16.804634Z
Example 200 response (Truncated to a single file for display)
"series": {
"id": "certificates-v2-full",
"name": "Full Set of X.509 Certificates"
},
"id": "2023-03-01T12:50:16.804634Z",
"timestamp": "20230301T125017",
"task_id": null,
"metadata": null,
"total_size": 12336264834346,
"files": {
"certificates-000000000000.avro": {
"compressed_size": 73423483,
"download_path": "https://file-host-02.censys.io/snap
shots/certificates-v2-full/2023-03-01T12:50:16.804634Z/certificates-000000000000.avro",
"compressed_md5_fingerprint":"c399b93f9cb1e6c5b697955b718c96e", "file_type": null,
"compression_type": null
}
}
}
Finally, download each file by issuing a GET
request to each download_path
.
Schedule client to download the daily incremental series
The incremental dataset is not just new certificate records. Censys global scanning engine now regularly re-validates trust and revocation information of unexpired certificates to update relevant values in the structured data and labels.
Note
Apply changes from each incremental dataset in order.
Using the same URL and endpoint, request the new incremental dataset series:
- Series name:
certificates-v2-incremental
If you need to apply changes to more than one dataset, retrieve the ID of the latest result or the ID of historical datasets. Only incremental datasets with a timestamp after the full dataset you downloaded contain updates that need to be applied.
GET \https://search.censys.io/api/v1/data/certificates-v2-incremental
Example 200 response
"id": "certificates-v2-incremental",
"name": "Incremental Updates to X.509 Certificates",
"description": "Parsed X.509 certificates as incremental updates to the last full series snapshot. Schema version 2.",
"results": {
"latest": {
"id": "2023-03-07T12:50:11.773781Z",
"timestamp": "20230307T125012",
"details_url": "https://search.censys.io/api/v1/data/certificates-v2-incremental/2023-03-07T12:50:11.773781Z"
},
"historical": [],
}
}
Then, follow up with a GET
request to the details_url
of the result you need to see the list of files comprising the result.
GET \https://search.censys.io/api/v1/data/certificates-v2-incremental/2023-03-07T12:50:11.773781Z
Example 200 response (Truncated to a single file for display)
"series": {
"id": "certificates-v2-incremental",
"name": "Incremental Updates to X.509 Certificates"
},
"id": "2023-03-07T12:50:11.773781Z",
"timestamp": "20230307T125012",
"task_id": null,
"metadata": null,
"total_size": 24252152323,
"files": {
"certificates-000000000000.avro": {
"compressed_size": 34138,
"download_path": "https://file-host-02.censys.io/snapshots/certificates-v2-incremental/2023-03-07T12:50:11.773781Z/certificates-000000000000.avro",
"compressed_md5_fingerprint": "2f69439ebada1bc20bc6391a2ffa484f",
"file_type": null,
"compression_type": null
},
...
}
}
Finally, download each file by issuing a GET
request to each download_path
.
Ensure your client supports Avro formatting
The files containing the Certificates 2.0 datasets are serialized in an Avro binary, which stores the data schema within it.
Thanks to the compression features of Avro format, the full Censys global scanning engine dataset is now about ~12TB of data (compared to 26TB when the dataset was encoded in JSON), but always be sure your client can accommodate these storage needs.
To get started with Avro, visit the official Arvo site.
Updated 19 days ago