Table of Contents
Introduction:
Intelligent Data Discovery: I’m less concerned about the potential AI doomsday scenarios some experts warn of and more worried about safeguarding user privacy in AI services like ChatGPT and its competitors. The thought of tech giants or third parties potentially misusing large language models (LLMs) to collect more user data is disconcerting.
This is why I’m opposed to the inclusion of chatbots on platforms like Facebook Messenger and WhatsApp, and I’ve observed Google not adequately addressing user privacy during its Artificial Intelligence-centric Pixel 8 event.
My concerns appear to be valid. The issue isn’t that tech giants are exploiting LLMs to gather personal information for ad-based revenue. Rather, ChatGPT and similar models possess even greater capabilities than previously believed. A study has revealed that LLMs can deduce user data even when users don’t explicitly share that information.
Also read, "How VR Technology works: 8 Strategies for Phenomenal Growth."
Intelligent Data Discovery:
What’s even more troubling is the potential for malicious actors to exploit chatbots to unearth these secrets. With just seemingly innocuous text samples from a user, one could potentially discern their location, occupation, or even their race. Considering how early artificial intelligence technology still is, this study emphasizes the need for stronger privacy protections in services like ChatGPT.
It’s worth noting that ChatGPT initially lacked and still lacks robust privacy protections for users. It took OpenAI several months to implement features allowing ChatGPT users to prevent their interactions with the chatbot from being used for training purposes.
Jumping to early October, researchers from ETH Zurich published a study highlighting the privacy risks posed now that ChatGPT and similar products are widely accessible. They demonstrated that even seemingly innocuous comments online, devoid of personal information, can be used to infer a user’s location when processed by OpenAI’s GPT-4, the most advanced ChatGPT engine.
The study used information snippets from over 500 Reddit profiles, and GPT-4 correctly inferred private details with an accuracy between 85% and 95%. For instance, an LLM could deduce a user’s race with high confidence based on a mention of living near a restaurant in New York City, using population statistics for that area.
Tech giants are already developing personal AI features, such as Fitbit apps that analyze users’ training performance using personal data points. However, the study’s findings are based on much simpler data sets, including personal data that users wouldn’t explicitly share with AI, like health information.
The concerns go beyond tech giants using LLMs to boost ad revenue. Malicious actors could employ publicly available LLM models to potentially uncover details about a target’s race or location. They might manipulate conversations to make targets inadvertently reveal personal information. Repressive regimes could also utilize LLMs to target dissidents.
The study’s authors emphasize the need for a broader discussion on LLM privacy implications and more comprehensive privacy protections in place. They have engaged in active discussions with companies, including OpenAI, Google, Meta, and Anthropic, whose LLMs they used in their research.
Conclusion:
As a supporter of AI services like ChatGPT, I hope to see more substantial discussions about user privacy and the implementation of built-in safeguards in ChatGPT and similar platforms to prevent the misuse of personal data.