Most AI providers try to enhance their products by training them with both public information and user data. However, the latter method puts a privacy-conscious company like Apple in a difficult position. How can it improve its Apple Intelligence technology without compromising the privacy of its users? It’s a tough challenge, but the company believes it has found a solution.
Synthetic data vs real data
OpenAI, Google, Microsoft, and Meta train their products partly by analyzing your chats. The goal is to improve the reliability and accuracy of their AIs by scraping data from real conversations. While you can generally opt out of this type of data sharing, the process for doing so varies for each product. This means the responsibility falls on you to figure out how to sever the connection.
Also: Will synthetic data derail generative AI’s momentum or be the breakthrough we need?
Apple has always prided itself on being more privacy-focused than its tech rivals. To that end, the company has relied on something called synthetic data to train and improve its AI products. Created using Apple’s own large language model (LLM), synthetic data attempts to mimic the essence of real data.
Also: Want AI to work for your business? Then privacy needs to come first
For example, the AI may create a synthetic email that is similar in topic and style to an actual message. The objective is to teach the AI how to summarize that email, a feature already built into Apple Mail.
Apple’s solution: ‘Differential privacy’
The problem with synthetic data is that it can’t replicate the special human touch found in real-world content. This limitation has led Apple to adopt a different approach, known as differential privacy. As described by Apple in a blog post published Monday, differential privacy combines synthetic data with real data. Here’s how it works.
Also: Apple’s AI doctor will be ready to see you next spring
Let’s say Apple wants to teach its AI how to summarize an email. The company starts by creating a large number of synthetic emails on various topics. Apple then generates an embedding for each synthetic message to capture key elements such as language, topic, and length. These embeddings are sent to Apple users who have opted into analytics sharing on their devices.
Each device selects a small sample of actual user emails and generates its own embeddings. The device then determines which synthetic embeddings most closely match the language, topic, and other characteristics of the user emails. Through differential privacy, Apple identifies which synthetic embeddings were the most similar. In the next step, the company can curate these samples to further refine the data or begin using them to train its AI.
Also: Forget the new Siri: Here’s the advanced AI I use on my iPhone instead
As one example provided by Apple, imagine that an email about playing tennis is one of the top embeddings. A similar message is generated by replacing “tennis” with “soccer” or another sport and added to the list for curation or training. Altering the topic and other elements of each email helps the AI learn how to create better summaries for a wider variety of messages.
How is Apple protecting privacy?
If you’re wondering how this process actually protects your privacy, device analytics sharing is turned off by default — so only those who opt in are involved in data training.
You can easily view, opt in, or opt out on any Apple device. Go to Settings (System Settings on a Mac) and select Privacy & Security. Scroll down to the bottom of the screen and tap the setting for Analytics & Improvements. You will now see the name and description of each option so you can decide which data, if any, you want to share. The full range of options will be available with iOS/iPadOS 18.5 and MacOS 15.5.
Also: How to turn on Siri’s new glow effect on iOS 18 – and other settings you should tweak
Furthermore, the data from the sampled user emails never leaves the device and is never shared with Apple. The device of someone who has opted in to analytics sharing sends a signal to the company indicating which synthetic emails are closest to the actual user emails. However, this signal does not reference an IP address, Apple account, or any other data associated with the user.
Apple has already been using differential privacy for its Genmoji feature, which uses AI to create custom emoji based on your descriptions. In this case, the company has been able to identify popular prompts and patterns without linking them to specific users. Looking ahead, Apple said it plans to expand its use of differential privacy to other AI features, including Image Playground, Image Wand, Memories Creation, Writing Tools, and Visual Intelligence.
Also: How I set ChatGPT as Siri’s backup – and what else it can do on my iPhone
“Building on our many years of experience using techniques like differential privacy, as well as new techniques like synthetic data generation, we are able to improve Apple Intelligence features while protecting user privacy for users who opt in to the device analytics program,” Apple said in its blog post. “These techniques allow Apple to understand overall trends without learning information about any individual, such as what prompts they use or the content of their emails.”
Get the morning’s top stories in your inbox each day with our Tech Today newsletter.
Leave a Reply