Model Inversion Attack
A model inversion attack reconstructs sensitive training data by probing a model's outputs — recovering, for example, features of the records it was trained on. It's a privacy threat: the model itself can leak the data it learned from.
Also known as: model inversion
A trained model encodes information about its training data, and a model inversion attack tries to pull that information back out. By querying the model and analyzing its responses or confidence, an attacker reconstructs features of the underlying records — potentially recovering private attributes the model was never meant to reveal.
It’s a serious risk wherever a model was trained on sensitive data and is exposed to outside queries. It sits alongside membership inference (which asks whether a record was in the training set) as a reason regulated teams care about what models memorize. Defenses center on limiting that memorization — differential privacy, careful output exposure, and not overfitting — so the model’s responses reveal as little as possible about any individual training example.