Use of HTML/CSS introduces complexity to the email space, and with it the potential for exploitation. For example, malicious emails often use legitimate CSS styling, or user-invisible text to conceal a malicious payload and mimic a known company layout.
We have found that legitimate email communication broadly falls into three categories:
The categories are characterized by features, such as the frequency of CSS appearance, frequency of HTML node appearance, and HTML tree depth. A classifier can use these categories to further direct feature extraction and tracking. The ability to quantify the complexity and style of a HTML document, and to track changes over time or against a model, allows the detection of anomalous and potentially malicious email communications.
This approach has been incorporated into the Darktrace Antigena Email product and contributes to detecting account takeovers and behavioral anomalies.