AT2k Design BBS Message Area
Casually read the BBS message area using an easy to use interface. Messages are categorized exactly like they are on the BBS. You may post new messages or reply to existing messages!

You are not logged in. Login here for full access privileges.

Previous Message | Next Message | Back to Slashdot  <--  <--- Return to Home Page
   Local Database  Slashdot   [83 / 117] RSS
 From   To   Subject   Date/Time 
Message   VRSS    All   Google Releases VaultGemma, Its First Privacy-Preserving LLM   September 16, 2025
 9:00 AM  

Feed: Slashdot
Feed Link: https://slashdot.org/
---

Title: Google Releases VaultGemma, Its First Privacy-Preserving LLM

Link: https://yro.slashdot.org/story/25/09/16/00020...

An anonymous reader quotes a report from Ars Technica: The companies seeking
to build larger AI models have been increasingly stymied by a lack of high-
quality training data. As tech firms scour the web for more data to feed
their models, they could increasingly rely on potentially sensitive user
data. A team at Google Research is exploring new techniques to make the
resulting large language models (LLMs) less likely to 'memorize' any of that
content. LLMs have non-deterministic outputs, meaning you can't exactly
predict what they'll say. While the output varies even for identical inputs,
models do sometimes regurgitate something from their training data -- if
trained with personal data, the output could be a violation of user privacy.
In the event copyrighted data makes it into training data (either
accidentally or on purpose), its appearance in outputs can cause a different
kind of headache for devs. Differential privacy can prevent such memorization
by introducing calibrated noise during the training phase. Adding
differential privacy to a model comes with drawbacks in terms of accuracy and
compute requirements. No one has bothered to figure out the degree to which
that alters the scaling laws of AI models until now. The team worked from the
assumption that model performance would be primarily affected by the noise-
batch ratio, which compares the volume of randomized noise to the size of the
original training data. By running experiments with varying model sizes and
noise-batch ratios, the team established a basic understanding of
differential privacy scaling laws, which is a balance between the compute
budget, privacy budget, and data budget. In short, more noise leads to lower-
quality outputs unless offset with a higher compute budget (FLOPs) or data
budget (tokens). The paper details the scaling laws for private LLMs, which
could help developers find an ideal noise-batch ratio to make a model more
private. The work the team has done here has led to a new Google model called
VaultGemma, its first open-weight model trained with differential privacy to
minimize memorization risks. It's built on the older Gemma 2 foundation and
sized at 1 billion parameters, which the company says performs comparably to
non-private models of similar size. It's available now from Hugging Face and
Kaggle.

Read more of this story at Slashdot.

---
VRSS v2.1.180528
  Show ANSI Codes | Hide BBCodes | Show Color Codes | Hide Encoding | Hide HTML Tags | Show Routing
Previous Message | Next Message | Back to Slashdot  <--  <--- Return to Home Page

VADV-PHP
Execution Time: 0.016 seconds

If you experience any problems with this website or need help, contact the webmaster.
VADV-PHP Copyright © 2002-2025 Steve Winn, Aspect Technologies. All Rights Reserved.
Virtual Advanced Copyright © 1995-1997 Roland De Graaf.
v2.1.250224