Developers seeking to deploy large language model (LLM) applications more safely and quickly now have a robust solution with LangChain Templates and NVIDIA NeMo Guardrails, according to NVIDIA Technical Blog.
Benefits of Integrating NeMo Guardrails with LangChain Templates
LangChain Templates offer developers a new way to create, share, maintain, download, and customize LLM-based agents and chains. These templates enable the swift creation of production-ready applications, leveraging FastAPI for seamless API development in Python. NVIDIA NeMo Guardrails can be integrated into these templates to provide content moderation, enhanced security, and evaluation of LLM responses.
As generative AI continues to evolve, integrating guardrails ensures that LLMs used in enterprise applications remain accurate, secure, and contextually relevant. The NeMo Guardrails platform provides programmable rules and runtime integration to control user inputs before engaging with the LLM and to validate the final LLM output.
Setting Up the Use Case
To demonstrate the integration, the blog post explores a Retrieval-Augmented Generation (RAG) use case using an existing LangChain template. The process involves downloading the template, modifying it to suit the specific use case, and then deploying the application with added guardrails to ensure security and accuracy.
LLM guardrails help minimize hallucinations and keep data secure by implementing input and output self-check rails that mask sensitive data or rephrase user inputs. For example, dialog rails can influence how LLMs respond, and retrieval rails can mask sensitive data in RAG applications.
Downloading and Customizing the LangChain Template
To begin, developers need to install the LangChain CLI and the LangChain NVIDIA AI Foundation Endpoints package. The template can be downloaded and customized by creating a new application project:
pip install -U langchain-cli
pip install -U langchain_nvidia_aiplay
langchain app nvidia_rag_guardrails --package nvidia-rag-canonical
The downloaded template sets up an ingestion pipeline into a Milvus vector database. In this example, the dataset contains sensitive information regarding Social Security Benefits, making guardrail integration crucial for secure responses.
Integrating NeMo Guardrails
To integrate NeMo Guardrails, developers need to create a directory named guardrails and configure the necessary files such as config.yml
, disallowed.co
, general.co
, and prompts.yml
. These configurations define the guardrail flows that control the chatbot's behavior and ensure it adheres to predefined rules.
For example, a disallowed flow might prevent the chatbot from responding to misinformation, while a general flow might define acceptable topics. Self-checks for user inputs and LLM outputs are also implemented to prevent cybersecurity attacks like prompt injection.
Activating and Using the Template
To activate the guardrails, developers need to include the configurations in the config.yml
file and set up the server for API access. The following code snippets show how to integrate the guardrails and set up the server:
from nvidia_guardrails_with_RAG import chain_with_guardrails as nvidia_guardrails_with_RAG_chain
add_routes(app, nvidia_guardrails_with_RAG_chain, path="/nvidia-guardrails-with-RAG")
from nvidia_guardrails_with_RAG import ingest as nvidia_guardrails_ingest
add_routes(app, nvidia_guardrails_ingest, path="/nvidia-rag-ingest")
Developers can then spin up the LangServe instance with the command:
langchain serve
An example of a secure LLM interaction might look like this:
"Question": "How many Americans receive Social Security Benefits?"
"Answer": "According to the Social Security Administration, about 65 million Americans receive Social Security benefits."
Conclusion
This integration of NeMo Guardrails with LangChain Templates demonstrates a robust approach to creating safer LLM applications. By adding security measures and ensuring accurate responses, developers can build trustworthy and secure AI applications.
Image source: Shutterstock