Identifier API Documentation
This section provides a detailed reference of the classes, methods, and attributes available in the Identifier module.
- class nlpguard.identifier.identifier.ChatGPTIdentifier(openai_api_key='')
Bases:
IdentifierChatGPT Identifier Class. This class is responsible for identifying protected attributes using ChatGPT.
- openai_api_key :obj:`str`
OpenAI API Key.
- _abc_impl = <_abc._abc_data object>
- static _aggregate_protected_categories_votes(df, protected_category_column_name='chatgpt_protected_category')
- static _chatgpt_annotate(tk, temperature=0.3, chatgpt_model='gpt-3.5-turbo') dict
Performs annotation using ChatGPT of the input token.
- Parameters:
tk (
str) – The token to be annotated.temperature (
float, optional) – The temperature of the ChatGPT model. Defaults to 0.3.chatgpt_model (
str, optional) – The ChatGPT model to be used. Defaults to “gpt-3.5-turbo”.
- Returns:
The annotated token.
- Return type:
dict
- _clean_chatgpt_responses(df_raw, protected_category_column_name='chatgpt_protected_category')
Cleans the chatgpt response.
- Parameters:
df_raw (
pd.DataFrame) – Raw dataframe containing the chatgpt response.protected_category_column_name (
str, optional) – Column name containing the protected category. Defaults to “chatgpt_protected_category”.
- Returns:
Cleaned dataframe.
- Return type:
pd.DataFrame
- annotate_protected_attributes(tokens, temperature=0.3)
Annotates the protected attributes using ChatGPT.
- Parameters:
tokens (
list(str)) – List of tokens to be annotated.temperature (
float, optional) – Sampling temperature. Defaults to 0.3.
- Returns:
Tuple containing the annotated results and exception logs.
- Return type:
tuple(list(dict), list(str))
- class nlpguard.identifier.identifier.Identifier
Bases:
ABCAbstract Moderator Class.
- _abc_impl = <_abc._abc_data object>
- abstract annotate_protected_attributes(**kwargs)
Abstract method that annotates the protected attributes.
- class nlpguard.identifier.identifier.LLamaIdentifier(hf_endpoint, hf_token)
Bases:
IdentifierThe LLamaIdentifier class for identifying protected attributes using LLaMA-based models.
- hf_endpoint
The endpoint of the model on Hugging Face Inference API.
- Type:
str
- hf_headers
The headers for the Hugging Face API.
- Type:
dict
- _abc_impl = <_abc._abc_data object>
- static _aggregate_protected_categories_votes(df, protected_category_column_name='llama_protected_category') DataFrame
Aggregates the votes (annotations) for each word’s assigned categories.
- Parameters:
df (
pd.DataFrame) – The cleaned dataframe with LLaMA annotations.protected_category_column_name (
str, optional) – The name of the protected category column. Defaults to “llama_protected_category”.
- Returns:
The aggregated dataframe.
- Return type:
pd.DataFrame
- _clean_llama_responses(df_raw, protected_category_column_name='llama_protected_category') DataFrame
Cleans and standardizes the responses for analysis.
- Parameters:
df_raw (
pd.DataFrame) – The raw dataframe with LLaMA annotations.protected_category_column_name (
str, optional) – The name of the protected category column. Defaults to “llama_protected_category”.
- Returns:
The cleaned dataframe.
- Return type:
pd.DataFrame
- _llama_annotate(tk, temperature=0.3, prompt_template=None) str
Generates annotation for a token using the Hugging Face Inference API.
- Parameters:
tk (
str) – The token to be annotated.temperature (
float, optional) – The sampling temperature for the model. Defaults to 0.3.prompt_template (
str, optional) – The prompt template for the model. Defaults to None.
- Returns:
The annotation generated by the model.
- Return type:
str
- annotate_protected_attributes(tokens, temperature=0.3)
Annotates tokens with protected attributes using LLaMA-based model.
- Parameters:
tokens (
list(str)) – The list of tokens to be annotated.temperature (
float, optional) – The temperature for sampling from the model. Defaults to 0.3.
- Returns:
The list of annotated tokens with protected attributes.
- Return type:
list(dict)