Protecting SaaS data in an AI-powered world

As a service that helps folks back up and restore terabytes of data every week, we’ve seen data loss in all stripes. In countless chats with customers, we know that 40% of SaaS users have suffered data loss, making your odds of keeping your data fully intact about as good as a coin flip.

We’ve also validated that nearly 90% of all data loss incidents come from human error and unclear communication, like a malicious developer on their way out the door or a well-meaning employee who accidentally deletes a production database. The next-biggest threat after human error? Third-party integrations gone rogue.

Those data events have grown and waned over the years but have never truly been disrupted. The rapid adoption of AI tools capable of overwriting your SaaS data threatens to do just that. Plus, 85% of IT pros still don’t understand the true implications of the Shared Responsibility Model on protecting SaaS data—it’s never been more at risk to loss or corruption…

Not at the hands of humans, but the stealthy and dangerously fast meddling of AI.

Data exposure or leakage to AIs: A known problem

Today, most conversations around the risks of AI involve exposure or leakage.

According to a recent report from Netskope, enterprise AI app usage in mid-2023 rose by 22.5% in just two months, with organizations with over 1,000 employees leveraging 3-5 AI apps daily. The risk of employees accidentally leaking sensitive data, hoping to extract insights or get feedback, is rising.

This is a real danger: the same report found that 22 out of 10,000 users sent 158 ChatGPT queries per month containing confidential code. AI tools can quickly teeter from “helpful assistant” to “friendly-looking backdoor in the corporate firewall,” putting you at risk of accidental disclosure, breach of personal identifiable information (PII), or leaks of passwords and API keys, all of which make an attacker’s job far easier.

Or gift-wraps all the data they wanted in the first place.

Recent research from the Offensive AI Research Lab and Salt Labs demonstrates the viability of other data exposure risks. In the former, attackers can capture packets containing the length of each token the AI streams to you, then use a different LLM to infer the content with surprising accuracy. In the latter, vulnerabilities in how ChatGPT handled plugins created situations where you could easily and unknowingly authenticate an attacker’s fake plugin via OAuth.

Whatever the means, the risk is clear: The data you send and receive from AI tools is a tempting hiding spot from which attackers can eavesdrop. That said, it’s also an active risk. By typing a question or inserting code into an AI tool’s prompt, you implicitly agree to let the AI process and potentially store that information. You should understand the danger. You can also train yourself and others to use AI more carefully or enforce policies to punish offenders.

But what happens when you willingly open the door to AI? That’s the conversation we’re eager to start.

AI-caused data loss: A new, unconsidered threat

Some AIs only need to read your data. An AI assistant might hook into your customer relationship management (CRM) SaaS to analyze prospects who are most likely to close or suggest new touchpoints with long-term customers. With these tools, the biggest risk is data exposure or leakage of confidential PII.

An AI tool can only manipulate or destroy your SaaS data if it has write access. That might seem like an easy barrier to protect against, but many of the SaaS platforms your organization uses today already have AI assistants built in, like:

Others have app marketplaces, like the Shopify App Store, where you can install AI tools with a click or two. Start a search for “YOUR_SAAS_APP + AI,” and you’ll find dozens of services that promise to reorganize your data or improve your efficiency… if only you approve a simple OAuth request, giving it unabridged write access to your data.

Folks have heard many warnings from data protection and security experts about not moving data outside your organization or staying vigilant against phishing attacks. Still, we haven’t yet learned the collective lesson of letting AIs run amok not just in a browser tab but within our vital SaaS platforms.

AIs already have a history of producing bizarre, misleading, or downright unsettling outputs. With write access, AI tools have enormous potential to introduce said flawed outputs directly into your SaaS data. The risks vary from obvious to insidious:

Accidental deletion of data, large and small.
Unchecked manipulation, such as using AI tools to automate writing Shopify product descriptions and manipulating product images to change their background, ultimately producing undesired outcomes.
Bizarre/misleading/unsettling output written directly onto your SaaS data.
An AI that works exactly as expected, overwriting your CRM data with a structure it thinks is “better” but you despise.

At face value, these sound like the reasons for human-caused data loss. Low-quality output. Unclear communications causes someone to DROP DATABASE on production. But AIs, with all their cloud-based GPU acceleration, are operating at a computational speed and scale even the most careless—or malicious—humans could never match. Limited only by processing power and the maximum write speed of your SaaS platform’s database, an AI tool can corrupt or delete your data in milliseconds.

Most humans need at least a few minutes.

You can’t bypass these risks by employing AI experts or picking tools based on open-source large language models (LLMs). You’re always fighting against the sheer complexity of the training data, purposeful layers of opacity (in the case of AIs based on ChatGPT, Gemini, or Bard), and few opportunities to proactively understand how these systems “think” so you can prevent data loss. With no way to truly understand why it chose to manipulate your data in a certain destructive way, you can’t diagnose the true root cause like your engineers or IT staff would for any traditional outage and postmortem.

It’s a pervasive and dangerous combination: Our curiosity and fear of missing out on the “AI revolution” meets the discoverability, stealthiness, and speed of some very expensive GPUs.

What tools and resources can prevent AIs from shredding your SaaS data?

You can’t completely mitigate the threat of AI tools, particularly those happily rewriting your vital SaaS data, but you can take some meaningful steps if you:

Design and implement a comprehensive Data Loss Prevention (DLP) plan. Get started by bringing on a DLP specialist, prioritizing employee education, and validating the many platforms that protect against insider threats and maintain compliance.
Create policies to prevent employees from allowing unfettered access to SaaS data. Data security should be part of your company’s culture, just like health and safety. Regular training should ensure employees understand the risks of allowing AIs to write SaaS data, particularly in light of the Shared Responsibility Model.
Fight AIs with AIs. Many data loss prevention platforms, like Nightfall AI, Forcepoint, and others, are using LLMs to identify high-risk data, automate enforcement, identify patterns of unauthorized manipulation, and even reinforce the training you’ve already completed.
Buy or build AIs that run on your internal infrastructure/networking. Just like you might host an on-premises edition of Jira, you can partially protect yourself by having more ownership of the LLMs you’re letting into your SaaS data. You’ll be better protected against data exposure and can more easily pull the plug if it corrupts your data.

These solutions can prevent data loss in your future, but do nothing for that terrifying inevitability when all your roadblocks and guardrails haven’t worked: your vital SaaS data is truly gone. As mentioned before, the Shared Responsibility Model that all SaaS providers operate under means they have no duty to protect your data from any AI tools that delete or corrupt it. They often don’t even have the means to do so, however nicely you might ask.

With the usage of AI tools growing at such a frenetic pace, proper SaaS backups are more necessary than ever.

Protecting yourself from AI-caused data loss

The only foolproof protection against AIs manipulating your SaaS data is automated daily backups you can easily restore. Manually exporting CSV or JSON files regularly and then trying to re-upload them after a major data loss incident is just a worthless facade.

Apologies if that feels like personal condemnation—manual processes have always been inadequate. Still, against the sheer speed and scope of AIs manipulating data, you’re like a single ant standing brave but ill-fated,against an incoming landslide.

Fast-moving and impossible-to-track incidents like AI-caused data loss are exactly why Rewind exists. We’ve already helped thousands of organizations protect themselves against human-caused data loss. We might have replaced people with AIs and dramatically amplified the speed and scope of the threat, but the solution hasn’t changed.

Rewind helps you safeguard your vital SaaS data on over a dozen platforms, like Shopify, GitHub, Jira, Confluence, and Mailchimp, with support for others like HubSpot coming soon. With Rewind, the first time an AI tool runs amok in your data—or the next time, if you’ve been a particularly unlucky early adopter—you’ll be ready to wrestle back control from the LLMs in just a few minutes.

Joel Hans

Joel Hans writes copy and marketing content that energizes startups with the technical and strategic storytelling they need to win developer trust. Learn more about how he helps clients like ngrok, CNCF, Rewind, and others at commitcopy.com.