I have spent a lot of time recently with a “major Wall Street financial organization,” looking at Data Leak Prevention (DLP) tools and, specifically, how the Compliance component of Sendmail’s Mailstream Manager milter stacks up against their existing tool, a leading DLP vendor’s flagship product.
Before getting into the details, it is important to understand that Sendmail doesn’t claim to replace DLP products. A true, full-featured DLP implementation looks at all egress points from the Enterprise (as well as intra-company data transfers, data at rest, and any number of other things) and applies company policies as appropriate to protect confidential information. Sendmail, on the other hand, primarily looks at email — either at the routing layer or at the gateways (or both.)
Numbers I have seen, though, suggest that as much as 80% of data leakage occurs via email — simple things like employees emailing work home so they can extend their work day, or setting Outlook to send a copy of their mail to their Blackberry. Whatever the cause, a lot of “stuff” that a company doesn’t want to let out leaves via email.
So this “major Wall Street financial organization” has looked at what they are actually monitoring, what they really care about, and how they are doing their jobs, and they’ve come to the conclusion that more effectively monitoring outbound email is mission critical. As a result, they invited us in to show what we can do in a proof-of-concept.
Banks, particularly large international banks, have a really tricky compliance model. Things that are okay in Dallas are illegal in Paris and things that are of concern in Singapore might be happening in London. Keeping track of the various laws and regulations, as well as how they are enforced and the penalties for failing to follow the rules, is the full-time job of a whole group of lawyers literally around the world. Fortunately for Sendmail, all we have to do is provide a flexible framework against which the bank’s compliance team can build, test, and deploy policy rule-sets that implement the compliance model… That, in a nutshell, is what Mailstream Manager is.
The DLP vendor we were competing against utilizes an interesting technique to intercept email — they use the SPAN port of the gateway router to copy every packet to their device. The packets are collected and assembled on the device. Once the message is recreated, it is run against the policies and their very robust interface will allow compliance officers to take action.
I see a few weaknesses in this model — some are inescapable, and some, I suppose, can be circumvented.
The weaknesses I see include the fact that messages that leave the gateway mail server using TLS or messages that are encrypted prior to transmission (Voltage, PGP, etc.) cannot be read and are therefore cannot be scanned. Additionally, messages that are captured on the SPAN port have already been delivered out of the Enterprise before you spot the violation. At best you can only reprimand the sender and train him not to send that type of information out in the future. At worst you’ll let confidential information out like Social Security numbers or Credit Card Numbers, for example. The vendor does offer an inline solution (at considerable additional expense), but the bank’s messaging team, justifiably, does not wish to have a black-box touching each message that goes out. DLP vendors are not MTA vendors. I also question the ability of any single device to keep up with the huge processing requirements necessary to reconstruct and monitor the packets of a multinational bank’s internet traffic. In our test environment, it simply missed some messages. In fairness I am told that the production interfaces and appliance hardware are considerably more powerful and this type of dropout does not occur.
In any event, the Sendmail approach is to do the DLP processing as a component of the message transmission (making DLP a messaging application.) This overcomes many of the weaknesses identified above and allows us to help the bank reduce architectural complexity, and implement more robust policies. Since the Sendmail server sees the message prior to any enterprise encryption, and prior to entering the TLS tunnel, those concerns are eliminated. As every message goes through a Sendmail server, we don’t have to worry about missing a message (or having the horsepower on hand to reassemble the packet stream.) Because we have the ability to queue messages, even during periods of unexpectedly high volume, we can meet the load. Best of all, Sendmail is an MTA vendor, so the messaging team has confidence in our ability to properly manage the message.
The Proof of Concept required us to show the ability to selectively apply policies based (for our test) on the home country of the sender. Additionally we had to show the effectiveness of our policies vis-à-vis the established DLP vendor.
Our approach was straight forward.
Step one was to split off a copy of the message to operate against. The original message was transmitted on out so the DLP vendor’s server could take its shot.
Next I configured the lab gateway server to do an LDAP lookup on the sender. Based on the information returned I was able to identify the sender’s home country. I also grabbed the sender’s real name and the email address of his/her direct supervisor.
Third, the message dropped down through various high-recall policies that Sendmail’s policy guru, Daniel Hedrick), had built for the job — if we had a possible hit then the message was forwarded, based on the home country, to a high-precision server that managed country-specific policies. Since most messages will not trigger a policy hit, this methodology allowed us to drop non-hit messages and increase performance at the gateway — considering that, in production, millions of messages a day will be looked at, even small performance increases are important.
The country-specific server performed a deep (high precision) scan on the message and either found a policy violation or not. Again, we just dropped messages that were not in violation.
If there was a violation, we used the information obtained in the earlier LDAP lookup to do three things: 1) We sent a note to the sender informing them that their message was in violation of a specified policy (which we described) and how to avoid future violations; 2) we sent a note to the sender’s supervisor informing him of the violation and describing the nature (but not the content — in support of some of the draconian privacy laws in the EU) of the violation; and 3) we placed a copy of the violation (if it was severe enough) into quarantine and notified compliance of the violation.
Once we were done, we had a beautifully simple configuration that proved to be superior to the DLP vendor’s in their testing in the lab. We were able to capture 100% of test violations with a considerably lower false positive rate than that of the DLP vendor. Out of our pool of messages, the DLP vendor never saw about 10% of the messages (I found this to be important, but, as described above, the customer was certain that those results would not be seen in the production environment so that finding was discounted.) Additionally, the existing DLP vendor had a very high false positive rate: in testing, if they identified 100 Social Security Numbers, 44 of them were not real social security numbers (66 were, and we identified all of them.) We were able to predict, based on their rules, and then show in the lab, that we would reduce false positives dramatically. This is less impressive than it sounds as this drop in false positives is more a function of the effort placed into developing good policies than it is a reflection on the underlying technology. I have no doubt that if our policies were translated onto their policy engine their false positives would drop as well. Having said that, I do find it fascinating that their core business is implementing polices and their policy sets were so poor that we were able to highlight dramatic differences. In practice high false positive rates cause the compliance team to discount lower-scoring violations because they are generally not worth the trouble to sort out.
In the end the customer was impressed. It was their view that we were the technical equal, as far as implementing policies, to the existing DLP vendor in the email space. We beat the DLP vendor with TLS and encryption issues and we beat the DLP vendor in terms of simplicity and flexibility. When the customer considered the fact that moving from a monitoring solution to a blocking solution, where we would operate on the message itself rather than on a copy, was a simple matter of modifying a few policies at any point in the future versus modifying their architecture and deploying more new hardware for the DLP vendor, we again came out on top. Our solution was also priced millions (literally) less. For Sendmail, DLP is an application within our framework rather than a standalone solution.
I believe we surprised the customer with how robust our product was.
Did we get the business? I don’t know yet. I do know that the Messaging Engineering Team, Compliance Engineering Team, Messaging Operations Team, and the Compliance Operations Team all endorsed Sendmail as a viable solution for their compliance monitoring and future enforcement needs. I’d like to think that, if I were them, I’d select us regardless of costs — our solution was just more elegant but I might be biased. With the current financial mess, the fact that we can save them a substantial amount of money, now and in the future, while meeting their technical challenges with a robust and elegant solution, should bode well for Sendmail.
Bob
Informative writeup!
Passive DLP Vendor. Wall Street. Millions to implement. Not hard to figure out.
Symantec/Vontu?
What is the cost of false positives?
I would be more concerned about false negatives.
JW
Hi Jonathan,
Thank you for your note.
I cannot identify the competitor (or the customer)…sorry.
You asked two questions that I can address: What is the cost of false positives and, in reverse, shouldn’t we be more concerned about false negatives?
In an organization with hundreds of thousands of employees sending dozens (if not more) messages out every day, there are real costs related to false positives. The primary cost is in the area of compliance review and enforcement. The bank we are working with has a lot of 9-digit numbers going out in email — so many, in fact, that currently, unless there are 50 or more numbers that appear to be US Social Security Numbers, they just ignore the message, when they turn that number down to (for example) 5, they get flooded with false positives and the Compliance Team cannot address all the possible violations — and, because so many of the possible violations are innocent, the compliance team quickly begins to ignore the issue and real violations go undiscovered.
Sendmail’s compliance/DLP component uses a mixture of text matching and business logic to reduce false positives. For example, the number 123456789 would be caught by a text match as being a 9-digit number, and some DLP vendors would flag it. However, by using some simple checks, we can determine that this number is not a valid SSN (it has never been issued by the Social Security Administration) and ignore it. However, a number in the form: SSN: xxx-xx-xxxx has a couple of “secondary attributes” (the “SSN:” and the two dashes) that, if identified, let us be more confident that it is a real SSN. By using a proprietary scoring system, and fairly complex rules that help us identify primary and secondary attributes, we are able to virtually eliminate false positives. — Also, although this was a simple example, please note that we can do this same thing via custom policies to identify almost any type of information that might be contained in an email message. Mailstream Manger is very flexible.
False negatives are also, as you point out, a real issue. However, primarily because of the good work done in support of anti-spam engines over the last several years, it is very difficult to “miss” content in an email message that you are actively looking for. In lab testing we have a false positive rate, with our tuned policies going against known content, of zero.
The key, of course, is knowing what you are looking for, and knowing how to describe it within capabilities of the policy engine…which is the provenance of our policy guru, Daniel Hedrick.
I hope this addressed your questions.
All the best,
Bob
p.s. Sendmail received the Purchase Order for this project from the bank yesterday.