First Part Of Why Edit Difference Is A Critical Detection Tool – Charles Leaver

By | March 17, 2017

Written By Jesse Sampson And Presented By Charles Leaver CEO Ziften


Why are the same techniques being used by attackers all of the time? The basic response is that they continue to work. For instance, Cisco’s 2017 Cyber Security Report tells us that after years of decline, spam email with malicious attachments is once again on the rise. Because traditional attack vector, malware authors usually mask their activities by using a filename similar to a typical system process.

There is not always a connection with a file’s path name and its contents: anyone who has actually tried to conceal sensitive details by offering it an uninteresting name like “taxes”, or altered the extension on a file attachment to get around e-mail rules is aware of this concept. Malware creators understand this as well, and will typically name their malware to look like typical system procedures. For example, “explore.exe” is Internet Explorer, however “explorer.exe” with an extra “r” could be anything. It’s simple even for professionals to overlook this minor distinction.

The opposite problem, known.exe files running in uncommon places, is easy to resolve, utilizing SQL sets and string functions.


What about the other scenario, discovering close matches to the executable name? Most people begin their search for near string matches by arranging data and visually searching for disparities. This generally works effectively for a little set of data, maybe even a single system. To find these patterns at scale, nevertheless, requires an algorithmic method. One established strategy for “fuzzy matching” is to utilize Edit Distance.

Exactly what’s the very best technique to determining edit distance? For Ziften, our technology stack consists of HP Vertica, which makes this job easy. The web has lots of data scientists and data engineers singing Vertica’s praises, so it will be enough to discuss that Vertica makes it easy to produce customized functions that make the most of its power – from C++ power tools, to statistical modeling scalpels in R and Java.

This Git repo is kept by Vertica enthusiasts working in industry. It’s not a certified offering, but the Vertica group is definitely familiar with it, and moreover is thinking everyday about the best ways to make Vertica more useful for data researchers – an excellent space to see. Best of all, it contains a function to calculate edit distance! There are likewise some other tools for the natural processing of langauge here like word tokenizers and stemmers.

By utilizing edit distance on the top executable paths, we can quickly find the nearest match to each of our top hits. This is a fascinating data-set as we can arrange by distance to find the closest matches over the whole dataset, or we can arrange by frequency of the leading path to see exactly what is the nearest match to our commonly utilized procedures. This data can also emerge on contextual “report card” pages, to reveal, e.g. the top 5 nearest strings for a given path. Below is an example to give a sense of usage, based on genuine data ZiftenLabs observed in a customer environment.


Setting an upper limit of 0.2 seems to discover excellent results in our experience, however the point is that these can be edited to fit specific use cases. Did we discover any malware? We see that “teamviewer_.exe” (ought to be simply “teamviewer.exe”), “iexplorer.exe” (must be “iexplore.exe”), and “cvshost.exe” (should be svchost.exe, unless possibly you work for CVS pharmacy…) all look weird. Because we’re currently in our database, it’s likewise minor to get the associated MD5 hashes, Ziften suspicion scores, and other attributes to do a deeper dive.


In this specific real life environment, it turned out that teamviewer_.exe and iexplorer.exe were portable applications, not known malware. We helped the customer with further examination on the user and system where we observed the portable applications given that use of portable apps on a USB drive might be evidence of naughty activity. The more troubling find was cvshost.exe. Ziften’s intelligence feeds show that this is a suspect file. Searching for the md5 hash for this file on VirusTotal verifies the Ziften data, showing that this is a possibly serious Trojan infection that may be part of a botnet or doing something even more destructive. When the malware was discovered, however, it was easy to resolve the issue and make sure it stays resolved using Ziften’s ability to kill and constantly obstruct procedures by MD5 hash.

Even as we develop innovative predictive analytics to identify harmful patterns, it is necessary that we continue to enhance our capabilities to hunt for known patterns and old techniques. Even if brand new risks emerge does not suggest the old ones go away!

If you enjoyed this post, watch this space for part 2 of this series where we will apply this technique to hostnames to find malware droppers and other malicious websites.

Leave a Reply

Your email address will not be published. Required fields are marked *