Skip to content

Evaluating Born-in-the-Cloud DLP Solutions: Is Proxy- or API-Based Right for You?

David Politis

March 12, 2015

6 minute read

Optimized-servers

If you’re an IT admin at a sizeable organization, there’s a good chance you need to comply with a data security standard like PCI, HIPAA, FISMA, FERPA, or SOX. From a regulatory standpoint, these standards are essential. From a technology and infrastructure standpoint, they can be challenging, particularly when it comes to answering the question: “How do you make sure your employees aren’t sharing sensitive information?”

That’s the crux of data loss prevention (DLP), a core part of IT infrastructure for the past decade. With legacy on-premises systems, solutions to this problem came with great cost, complexity, and inconvenience. With newer cloud office systems, the paradigm has shifted significantly for both IT and DLP providers, creating an entirely new playing field. At BetterCloud, we’ve worked with thousands of large enterprises who have implemented cloud office systems, and have gained a deep understanding of how DLP solutions have evolved in this context. The landscape is complex but evolving in exciting ways that make it possible to provide more value to customers at lower costs. Today, we announced our DLP solution for Google Apps, which is now available to BetterCloud Enterprise customers. Through the process of developing this solution, we’ve received a lot of questions from customers, analysts, and investors about how this works, why it’s different, and what it ultimately means to BetterCloud and our customers. This post explains the major differences between the cloud-based DLP solutions available, and we believe it will help customers better understand the landscape as they evaluate different options.

Cloud-Based Proxy

By applying the architecture of legacy network proxies to solutions purpose-built for the cloud, cloud-based proxy providers offer a subscription-based model to host and manage network proxies on the behalf of customers.

How It Works

By routing the flow of data—coming from any third-party cloud application that has been authenticated or whitelisted by the service—through a secure proxy, data access to that cloud app can be inspected and controlled. With a proxy acting as an intermediary for third-party requests to connect with your data, organizations have more holistic visibility into the types of data being shared.

Pros

Using a proxy to intercept the flow of data into and out of your organization blankets all integrated cloud services and helps to address concerns about the security and regulatory compliance practices of third-party vendors. While proxy-based solutions often cover the most popular apps, they frequently require cumbersome setup. This level of control—knowing that no matter the device or product, users are accessing their cloud applications through the secure proxy—can allow organizations to implement an otherwise prohibitive BYOD policy.

In the case of BYOD, all activity on the user’s device, personal or work-related, is routed through the solution provider’s proxy, potentially causing frustratingly slow load times for end users on the go, not to mention a host of privacy concerns.

Cons

But, when it comes to cloud-based proxying, there’s one critical downside—it hinges on uptime. If the provider’s servers go down, you’re done. Your users can lose access to their email and documents, and even if they manage to circumvent the down proxy through a back door (another ever-present risk), any activity that occurs during downtime is essentially unregulated and never passes through the DLP system.

More protection against third-party attacks, but potentially slow with potentially cumbersome setup and total reliance on uptime.

Proxy-based solutions offer more protection against third-party attacks, but slower load times and a single point of failure.

Though no vendor is immune to the occasional service disruption, proxy-based solutions are, by design, especially susceptible. With a surface-deep relationship to your data, the solution is blind to requests for access when the proxy is down, leaving the door open for malicious activity and giving IT no ability to remediate this activity at a later point in time. Often times proxy DLP solutions will interfere with the functionality of a cloud-based application. This can cause issues for your users when they try to complete tasks within the application and sometimes leads organizations to disable a portion of an application to accommodate the network proxy.

Cost

What’s more, proxy-based solutions can also cause crippling performance issues since users’ data usage and activity on the web must pass through the proxy server—an extra step that can potentially add seconds to load time. To compensate for the delay, implementing this type of solution can require additional bandwidth that isn’t factored in to the cost of the solution itself. And if the performance issues aren’t addressed? Don’t be surprised if users look for a way around the proxy. Couple hidden bandwidth costs with a list price averaging anywhere from $60-100 per user per year, and both the technical implications and price tag of a proxy-based solution seem steep.

API-Based

Whereas proxy-based systems take a broad and shallow approach to DLP—sitting between the source of data and the person or service requesting access—API-based solutions do the opposite, hooking deeply into your data at its source.

How It Works

An API (or application programming interface) is essentially a channel into a software or service, making communication and integration with third-party tools possible. For a cloud platform like Google Apps, APIs allow vendors to connect with, manipulate, enhance, and build upon functionality not natively offered, such as DLP.

Not all API-based solutions are equal, but the best ones promise flexibility, simple set-up and reasonable cost.

API-based solutions do not force your data through an intermediary proxy and instead connect with and protect your data at its source.

Pros

API access allows solution providers to connect with your data at its source, including even metadata that is never surfaced through a UI (and thus lost on proxies). From a data writing perspective, API-based solutions also afford IT the ability to make changes to their data, which is another important distinction from proxying, which simply blocks access to a document containing sensitive data without the option to take corrective action. For example, if an employee shares a spreadsheet containing social security numbers, a proxy would detect the SSNs in real time and lock access to the document. An API-based solution could also quickly detect the sharing, but goes a step further to allow IT to change the sharing, owner, or edit the content of the document. Other significant advantages to API-based DLP are ease of setup and lack of complexity. Once the customer grants API access to the platform in question, the solution is up and running—changes to firewall or network settings, or where you send traffic, are not required. IT can safely test and implement these solutions without disrupting users’ work.

Cons

Until recently, most API-based DLP solutions have relied on time-based scans (commonly running every 24 hours), which can leave time for malicious activity to occur before you even realize what’s happened. And for that reason, not all API-based solutions are created equally. The products are further separated into two major camps: built using scheduled or real-time APIs, and the difference between the two can be the difference between effective DLP and effectively none at all.

Scheduled vs. Real-Time API-Based Solutions

An API call, or a request to scan your data for changes, that is scheduled once per day leaves a large window for malicious activity to occur undetected. A user could take any number of actions that cause a great deal of harm and data loss in the time between scheduled syncs.

With a real-time API-based solution, any sensitive or personally identifiable information would’ve been detected immediately, giving IT time to react, identify the user, and take corrective action.

Real-time APIs are fairly new, and since vendors often integrate with several APIs to build a single product, go-to-market solutions relying solely on real-time APIs are rare to come by. As cloud service providers transition their APIs to listen for changes in real-time, solutions leveraging them will become more and more common in coming years.

Cost

Despite the deep API-integration and development effort required to built this type of solution, they also remain the best value—products range from $10-24 per user per year.

Final Thoughts

Having built our own DLP solution for Google Apps using Google’s Drive Activity Report and Push Notifications APIs, we’re somewhat biased, but before the first line of code was written we spent significant time researching all of the options and painting a full picture of the DLP-delivery landscape. If you’re interested in learning more about our DLP solution for Google Apps or just speaking in more detail about how the DLP solutions on the market compare, feel free to drop us a note.

Categories

Sign up for our newsletter