As a result, repeated patterns can be observed in network traffic, such as the read of a file, followed by a write of the same file with a new extension, typically accompanied by a MIME-type change given the encryption.
However, SMB is a notoriously chatty protocol, and for backwards-compatibility reasons, offers a wide range of commands for accomplishing the same task. For instance, a write may be accomplished by WRITE, WRITE_RAW, WRITE_ANDX, WRITE_AND_CLOSE, etc. The chattiness of the protocol may also mean that the write of a specific file may be distributed across multiple commands, not all of which may come in an expected order.
The ransomware may also encrypt the file in different ways (e.g. by first renaming, then encrypting, or vice versa), which often makes it difficult to reliably identify possible steps of encryption on a file-by-file basis.
Consequently, instead of trying to identify ransomware on an operation-by-operation basis, it is more fruitful to analyze higher level activity, beginning with the end goal of the ransomware: encryption.
This may be reflected by a number of higher-level actions on files, including MIME-type changes, and the addition of suspicious extensions. By taking all filenames observed in the SMB session and sorting them alphabetically, we obtain a list of pairs of similar filenames. These pairs can then be sorted by the time at which they were first observed, and the resulting pair represents a potential encryption step.
The two files can be compared to reveal MIME-type and extension changes, among other things. We can then analyze the general statistics of these properties for the entire session, and robustly identify ransomware encryption, in a manner that is not affected by the low-level complications of the SMB protocol.