Confidentiality for Translators and Interpreters in the Age of AI

Speech recognition, artificial intelligence, and other computer-assisted tools can save us time, improve quality and boost productivity.

But... what about confidentiality?

In this article, we’ll take a closer look at how leading terminology, speech recognition and other AI tools for translators and interpreters address confidentiality and data protection.

If it’s online, it’s never 100% confidential

To be clear: No matter how much protection a tool supposedly offers, with so many moving parts, data sent over the internet is never truly safe.

But the internet has permeated our businesses. 

Email. Online collaboration. Remote conferencing platforms.

Every cloud-based tool shares and stores information over the internet. What’s a language professional to do?

First, remember that online companies are incentivised to keep customer data safe. It should be in their interest to avoid leaks and protect information through good security practice such as strong encryption. Before you sign up for a service, do some due diligence on their security track record.

Second, take control of keeping your data safer by using a password manager like 1Password, turning on two-factor authentication wherever available and using a good Virtual Private Network tool like Tunnelbear when accessing the internet through unsecure public wifi in coffee shops or airports.

(If you’re an insiders member, check out our in-depth training on Perfect Passwords and learn how to create, store, and access strong passwords that protect your accounts and your clients’ data.)

Third, read on for more about which tools offer the best data protection. 😉

Terminology tools 

Except for extremely client-specific terms, terminology is not usually confidential. After all, it’s basically a list of words related to a specific subject. Therefore, sharing term lists with a colleague you’re working with is generally safe. 

Trickier issues arise when using terminology extraction technology with private or confidential documents. Let’s see how the leading tools address this aspect of confidentiality.

InterpretBank

InterpretBank is the only widely-used tool that extracts terminology on your device without connecting to the cloud.

InterpretBank also includes translation suggestions for individual entries or for an entire glossary. This feature sends terms to a server and requests translations; neither searches nor results are logged. You can also disable the program’s web access and only use offline resources included in InterpretBank, like IATE.

If you choose to share a glossary through InterpretBank, it is uploaded to your account in the cloud, but according to the company, they “do not pass any of your collected information to third parties.”

InterpretBank’s privacy page lacks information about how data is processed when working with the cloud-based Automatic Speech Recognition feature. As it most likely connects to commercial speech recognition engines, it should not be used on internal or confidential documents.

BoothMate

The terminology extraction feature built into BoothMate is simple, but efficient.

Copy a text and its translation into the source and target language columns. Read through the text, highlight terms and their translations, and add them to your glossary by hitting Enter.

According to the developer, texts copied into the terminology extraction interface are never uploaded to the web, so your data is fully confidential.

SketchEngine

SketchEngine’s OneClick Terms is one of the most powerful terminology extraction tools on the market. (Learn more in my blog post, How to use Sketch Engine to extract terminology from a document or parallel texts in just a few clicks.)

OneClick Terms offers high-quality terminology extraction by comparing term candidates in your document(s) to huge reference corpora. This requires the power of cloud computing, but there’s good news: According to the developers, your data is processed in a secure data center, is never shared, and is automatically deleted after three days. 

The company behind SketchEngine, Lexical Computing, is also certified under ISO 27001. (Learn more about this certification here.)

Speech recognition tools

As speech recognition continues to improve, more interpreters are incorporating it into their work. (Here’s an example from insiders member Lilia Pino Blouin.) 

Most speech recognition takes place online, since significant computing power is required to obtain high-quality results. 

(Programs like Dragon Dictate and Cabolo do provide offline speech recognition, but Dragon only supports a small number of languages, must be trained for a single user’s voice, and only runs on Windows, while Cabolo targets corporate users and is probably too expensive for individual interpreters.)

Tools designed for the general public, like Otter.ai and Maestra, use commercial speech recognition engines. It’s probably fine to use them for public meetings, but I’d avoid them in confidential settings unless your client explicitly permits it.

The newest computer-assisted interpreting tool, Cymo Note, is based on speech recognition. The developers address confidentiality by not storing your data on their servers and referring you to the privacy policies for each of the available speech recognition engines. 

In general, I recommend educating clients about how speech recognition can help you produce high-quality work for their benefit, and requesting their explicit written permission to use such tools.

Artificial intelligence

AI tools like ChatGPT, Claude, Copilot, Gemini, and NotebookLM are taking the world by storm, and language professionals are no exception. Most AI tools collect interaction data by default, so it’s important to review your settings before using them with client content.

ChatGPT

There are tons of great ways to use ChatGPT while keeping your data – and your clients’ data – safe. For example, use ChatGPT to power up your vocab, write emails, generate interpreting practice speeches, or find, win and keep your best clients.

But before you do so…make one quick tweak to your settings!

When you first set up a ChatGPT account, the “improve the model for everyone” setting is automatically turned on – which means the information you share with ChatGPT can be used to train the model! 🤯

But you can toggle this off in seconds.

Simply turn off “improve the model for everyone” to exclude your data from its training data.

To do so:

  • Click your profile icon

  • Select Settings

  • Go to Data Controls

  • Turn off "Improve the model for everyone"

Pro tip: Use “Temporary Chat” for conversations you don’t want ChatGPT to keep a record of.

To start a Temporary Chat:

  • Click the Temporary chat icon, a dotted speech bubble in the top-right

Claude

Claude is great at processing lengthy documents and generating natural-sounding text. You can use it as a writing assistant, a proofreader, a QA specialist, and more.

While considered one of the more ethical AI chatbots – it’s built by Anthropic, a Public Benefit Corporation legally committed to not sharing or selling your data – it may still use your conversations to improve its models.

To opt out:

  • Click your profile icon

  • Go to Settings

  • Select Privacy

  • Turn off "Help improve Claude"

Copilot

Copilot helps you work faster and smarter in Microsoft 365 apps and in Edge.

As a Microsoft product, it’s covered by the company’s Services Agreement and Privacy Statement, which state that Microsoft collects information about your interactions to improve its services.

To manage your privacy settings in Copilot:

  • Click your username

  • Click Privacy

  • (Optional:) Toggle off “Model training on text,” “Model training on voice,” and “Personalization and memory”

  • (Optional) Click “Delete Memory”

If using Copilot in Edge:

  • Click the Copilot icon on the top right

  • Click Settings

  • Click Privacy

  • (Optional:) Toggle off Context clues

Gemini

Gemini is useful for vocabulary research, terminology extraction, glossary creation, translation, and more.

But, since Google uses conversations to improve its models, it’s best to avoid sharing confidential information and adjust your activity settings.

To do so:

  • Open Gemini

  • Click Settings & help (gear icon, bottom left)

  • Select Activity

  • Turn off “Keep Activity”

  • (Optional:) Delete your Activity for the last hour, last day, all time, or custom range

  • (Optional:) Set an auto-delete schedule for your Gemini Apps activity (older than 3 months, 18 months, or 36 months)

NotebookLM

Unlike other AI tools that pull from the internet or broad data models, NotebookLM works exclusively with the files you upload. But how safe is it to share client materials?

Google states that NotebookLM does not use personal data (including your source uploads) to train its models. However, it also notes that human reviewers may review queries, uploads, and responses to improve the service.

If you’re using a standard Google account (rather than Google Workspace or Google Workspace for Education), avoid sharing sensitive or confidential materials, and do not upload copyrighted content unless you have permission.

Readwise Reader 

I’m a huge fan of the AI-powered Ghostreader feature in Readwise Reader. Add a publicly-available article, PDF, or video to your Reader, then use Ghostreader to define words in context, look up people & places, simplify complex language, and summarize paragraphs or entire documents. You can also generate lists of terms or key ideas in a flash, or extract terminology from your text. (To learn how, check out my insiders webinar on AI-Powered Preparation with Readwise Reader.)

As Readwise's Privacy Policy does not address how the artificial intelligence features in the tool work, I wouldn’t upload any confidential documents to the service. However, it's a great tool for any publicly available documents or videos.

Notion AI

Released in February 2023, Notion AI is an incredibly powerful preparation tool. Create lists of terms, acronyms, and definitions in one or more languages, see how words are used in context, extract terminology, align texts or compile fact sheets. (Learn more in my insiders webinar, AI-Powered Preparation with Notion.)

The best part? Notion’s strong security policy for AI-powered features, which explicitly states: “The Notion AI Writing Suite will not use your data to train our models...We do not allow any partners or 3rd parties to use your data for training their models or any other purpose.”

Notion also has strong security practices, including encryption, quarterly independent security audits, and GDPR compliance.

As a result, Notion AI might just be the safest, easiest, and most affordable way for translators and interpreters to tap into the power of ChatGPT without their data being used to train the model.

Machine translation

Machine translation tools can significantly streamline our work, offering a first version of a text we’re translating or a speech we’ll interpret.

The same principles outlined above apply to machine translation: Do not simply hand over your valuable information to algorithm training. Instead, pick machine translation engines where your confidential information is explicitly excluded from training data.

Google Translate

Do not use Google Translate for sensitive information.

Everything you translate with Google Translate is stored and analyzed by Google. Google’s Terms of Service explicitly grant the company the right to use “automated systems and algorithms to analyze your content...to recognize patterns in data.”

DeepL

DeepL offers a free plan and several paid options. DeepL’s terms explicitly state that data processed using the Free version of the service may be stored. Opt instead for a Pro license, where data is encrypted end-to-end, text is immediately deleted after translation, and “DeepL Pro subscribers’ texts are not used to train our models.” (You’ll get plenty of other perks, too, like DeepL plug-ins for a range of CAT tools.)

Artificial intelligence and machine translation

Do not use Google Translate or the free version of DeepL for machine translation. As explained above, your information may be used to train the machine.

Instead, opt for a safer option. Set up your own account with the AI tool(s) of your choice, and turn off “improve the model for everyone.”

Final thoughts on data protection in the age of AI

Confidentiality is the cornerstone of translation and interpreting ethics. This should also extend to the tools we use to do our work. 

Some computer-assisted interpreting tools offer much stronger data protection than others.

Your client’s data is safe with features that work offline, like the term extraction in InterpretBank and Boothmate.

Speech recognition requires considerable computing power and is nearly always web-based. Do not use it for confidential meetings unless your client explicitly approves.

If using machine translation, opt for a paid version where your data does not train the machine.

And if you’re going to use artificial intelligence, turn off “improve the model for everyone” to explicitly keep your content out of training data.

Finally, educate your clients. Explain how AI tools can help you work better, as well as how you and these companies protect their data. Get explicit written permission to use speech recognition or other artificial intelligence technology, and where possible, include this in your contracts. It’s always better to be safe than sorry! 

Disclaimer: This article does not constitute legal advice. Consult a local attorney to obtain legal advice about your personal situation, and your country's legal system. 

By making use of the information provided herein in any way whatsoever, you waive all claims of liability of any and every nature whatsoever against techforword Sàrl and Joshua Goldsmith.

Previous
Previous

How To Extract Terminology in Just a Few Clicks Using Boothmate

Next
Next

Cymo Note: Speech Recognition Meets Automated Note-Taking