I've been using Certificate Transparency as a source to detect phishing campaigns by using simple regex and fuzzy term monitoring. Recently I started developing an app that simplifies some of the workflow and include additional features like additional intel sources (urlscan, phishtank) and automated screenshot creation for validation.

The app contains Faust agents, a Flask API and a VueJS-based UI to manage plain and simple keyword monitoring. The Faust streaming agents download new certificates, perform keyword matching, create screenshots and perform additional enrichment. The application is easily extendable to also include other datasources or perform other types of validation and automated response.

The application makes use of Faust tables to persist transparency offsets. In the case of a crash, the application will still fetch all the missed certificitates during downtime. You'll never miss a certificate :)

The project is available from https://github.com/d3vzer0/streamio. The repo contains the API (Flask), Faust agents and the UI. Still WIP, so please leave suggestions. I tried to summarise how each Faust agent and general flow works below. Enjoy!

streamio-faust/transparency: Streaming certificate transparency
The streamio-faust/transparency process is responsible for verifying transprancy records, fetching records and decoding certificates. The output of this process is stored in a kafka topic that is used by a secondary stream for wordmatching.

Faust flow responsible for transparency streaming
  • get_sources: Downloads the transparency source URL's from Google every 15 seconds. Pushes each URL to ct_sources topic.
  • get_tree_size: Fetches the count of available certificates (ie. tree size) from each transparency source. If total count is higher than previous check (stored inside ct-sources-states table), push the source URL to ct_sources_changed topic.
  • get_records: Downloads certificates from the transparency source (amount compared to last execution and newest tree size). Updates the processed count and decodes certificates via decode_records agent.
  • decode_certs: Decodes the certificate to JSON and send results to ct_certs_decoded topic.

The certificates present in the ct_certs_decoded topic will be read by a different Faust agent to perform initial keyword matching.

PS. You can also use a logshipper like Logstash to store all decoded certificates in Elasticsearch. The logstash config is available in the same streamio repo.

streamio-faust/wordmatching: Perform word matching
The streamio-faust/wordmatching process compares regular expressions and fuzzy terms with each certificate present in the ct-certs-decoded kafka topic. Regular expressions and fuzzy terms are both managed via the user interface.

Faust flow responsible for term matching
  • fuzzy_match_ct / regex_match_ct: Reads all decoded certificates from the ct-certs-decoded topic. Uses a global variable that contains a list of terms to monitor. When matched, the url is sent to the wordmatching-hits topic.
  • update_filters: When a new filter/term is added for monitoring (via the management UI), an update is sent to the wordmatching-update topic. The update-filter agent reads these messages and fetches the latest regular expressions and fuzzy terms from the database and updates the global filters variable used by the *_match_ct agents.
  • matched_certs: When a URL/domain/certificates is matched, the matched_certs agent reformats the message and stores the URL inside a MongoDB table.
Overview of matched entries

streamio-faust/snapshot: Create screenshots
The streamio-faust/snapshot process contains agents to create snapshot of matching URLs. By default, a screenshot is automatically created whenever a domain/url matches a keyword.

Screenshot creation flow
  • The snapshot_url agent ┬ámakes use of Selenium Grid to create a screenshot of all URLs present inside the ct-certs-decoded topic (ie. the result of our wordmatching agent). The image itself is stored inside MongoDB GridFS.
  • You have the ability to enable automated monitoring for each URL. This means that for every URL present inside the database a screenshot is created every 30 minutres and stored in the database. You can toggle monitoring per URL via the UI.
Overview of unique screenshots per URL