Darktrace

An email-classification system working in a corporate environment might try to cluster inbound emails sent from the same person or entity (henceforth “campaigns”). However, similarity classification can be a non-trivial problem given that, in the context of phishing for example, emails may be sent from addresses that subtly change throughout the campaign. Given the degree of automation in phishing production, a single actor can easily change the email address or subject line with each email in the campaign. Incidentally, this poses a problem for systems that rely on maintaining lists of known abusive email addresses.

We created a system that tracks a wide range of indices based on the From header, subject line, URLs, and attachments. When an email arrives, the system checks these indices for similarities to other recent emails—adding the email to an existing cluster or forming a new cluster if it considers the email similar to another email that does not already belong to a cluster. We could not assume any single index will remain fixed during a campaign, and so the system also tolerates a degree of fuzziness.

There is an additional problem that some emails may have coincidental similarities. If a coincidental similarity is established between one email and several others, and then further coincidental similarities are established between these emails and new arrivals, it is possible that a runaway effect quickly clusters a large number of emails that do not constitute a real campaign.

A mechanism for self-correction

Rather than solving the problem by tracking ever finer criteria, we added a mechanism for self-correction that solves the problem of coincidental similarity by detecting fluctuation in sequences of scores computed for a different purpose elsewhere in the system. We posited that emails sent by the same actor with the same intent will be marked by the Darktrace anomaly-detection system in a predictable way. We hypothesized that the chronological sequence of these anomaly scores would exhibit smoothness if the emails are indeed from the same actor. By contrast, if the system clusters emails that are in fact unrelated, the real-time sequence of anomaly scores would be far more likely to exhibit an unpredictable fluctuation. Our data shows this hypothesis to be correct and so we built a similarity classifier to compute a measure of fluctuation in real time. If the measure exceeds a certain threshold, the classifier stops clustering and removes the cluster’s status as a campaign.

‍

Researcher

Dr. Steven Haworth

Resúmenes de investigación

Detección rápida de anomalías en la cadena de procesos mediante un clasificador multietapa

Clasificación de largas listas de nombres de archivos por relevancia y contenido sensible

Un sistema de análisis de la actividad de la red para detectar la minería de criptomonedas furtiva

Detección autónoma de la función prevista de una bandeja de entrada corporativa a través de la meta-calificación

Utilizar la teoría de la epidemiología para identificar los dispositivos de red más dañinos

Identificación automática de los rangos IP escaneados

Un clasificador de similitudes en tiempo real y con autocorrección para correos electrónicos

Utilización de la teoría de grafos para identificar los nodos críticos de las redes informáticas

Analizar la actividad de la red para detectar los dispositivos comprometidos que envían correos electrónicos no deseados

Detección y prevención de correos electrónicos mal dirigidos con la semántica de la correspondencia

Investigación

Un clasificador de similitudes en tiempo real y con autocorrección para correos electrónicos

Reconocer cuando correos electrónicos similares pero sutilmente diferentes fueron enviados por el mismo remitente o como parte de una campaña, pero también reconocer cuando la similitud es casual.

Additional Products

Investigación

Un clasificador de similitudes en tiempo real y con autocorrección para correos electrónicos

Reconocer cuando correos electrónicos similares pero sutilmente diferentes fueron enviados por el mismo remitente o como parte de una campaña, pero también reconocer cuando la similitud es casual.

Backed in Research.

About the AI Research Centre