NLPGuard

NLPGuard is a powerful library for annotating and identifying protected attributes in text data. It consists of three main components: Explainer, Identifier, and Moderator.

This document provides an overview of these components and their role in improving fairness and interpretability in NLP models.

Explainer

The Explainer component, part of the nlpguard library, extracts the most important predictive words used by the classifier to make predictions.

Overview

The Explainer identifies influential features contributing to the model’s decisions, enabling users to interpret and debug the behavior of NLP models.

For more details, refer to: - Explainer Overview - API Documentation

Identifier

The Identifier component, part of the nlpguard library, determines which of the words extracted by the Explainer are protected attributes.

Overview

The Identifier flags features or attributes that are sensitive or protected, such as those related to race, gender, or age, ensuring that potential bias in models can be addressed.

For more details, refer to: - Identifier Overview - API Documentation

Moderator

The Moderator component, part of the nlpguard library, modifies the original training dataset to produce a new mitigated version that reduces reliance on protected attributes.

Overview

The Moderator helps ensure fairness by mitigating the model’s dependency on sensitive features identified by the Identifier.

For more details, refer to: - Moderator Overview - API Documentation