Using Langchain and OpenAI APIs in Python to Query Your Docs
Introduction
Are you looking to build a bot that can provide specific information based on your documentation and product details? Well, with the help of OpenAI APIs and the langchain project, you can easily implement such a bot. This article will guide you through the process of using Python to make API calls and train the bot using your documentations.
Embedding documents
One of the key steps in building a question-answering bot is to embed your documents into a database. Langchain provides various loaders for different document formats such as PDF, TXT, and sitemaps. These loaders automatically scrape web pages and remove HTML formatting. Here’s an example of how to embed documents using langchain for the combit project:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders.sitemap import SitemapLoader
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.document_loaders import TextLoader
# Add documents to the database using langchain loader
def add_documents(loader, instance):
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, separators= ["nn", "n", ".", ";", ",", " ", ""])
texts = text_splitter.split_documents(documents)
instance.add_documents(texts)
# Create embeddings instance
embeddings = OpenAIEmbeddings(openai_api_key="...")
# Create Chroma instance
instance = Chroma(embedding_function=embeddings, persist_directory="C:DocBot")
# Add Knowledgebase Dump (CSV file)
loader = TextLoader("C:DocBotInputen-kb@forum-combit-net-2023-04-25.dcqresult.csv")
add_documents(loader, instance)
# Add EN sitemap
loader = SitemapLoader(web_path="https://www.combit.com/page-sitemap.xml")
add_documents(loader, instance)
# Add EN Blog sitemap, only use English blog posts
loader = SitemapLoader(web_path="https://www.combit.blog/XMLSitemap.xml", filter_urls=["https://www.combit.blog/en/"])
add_documents(loader, instance)
# Add documentation PDFs
pdf_files = ["C:DocBotInputAd-hoc Designer-Manual.pdf",
"C:DocBotInputDesigner-Manual.pdf",
"C:DocBotInputProgrammers-Manual.pdf",
"C:DocBotInputServicePack.pdf",
"C:DocBotInputReportServer.pdf"]
for file_name in pdf_files:
loader = UnstructuredPDFLoader(file_name)
add_documents(loader, instance)
# Persist the instance
instance.persist()
instance = None
The cost of embedding the documents is usually affordable, especially for extensive documentation. Additionally, each query will incur a small cost, as it requires two API calls (embedding the question and generating the answer).
Q&A query with GPT 3.5 turbo
To make the bot accessible via a WebAPI, you can use Flask and langchain. The langchain framework allows you to query the persistent vector database created in the previous step. For the completion of the query, the gpt-3.5-turbo model from OpenAI is used. By using a custom prompt template, you can ensure that the bot’s answers are relevant and accurate. Here’s an example of how to set up the Flask API:
from flask import Flask, request,make_response
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
app = Flask(__name__)
embeddings = OpenAIEmbeddings(openai_api_key="...")
instance = Chroma(persist_directory="C:DocBot", embedding_function=embeddings)
# Set up the prompt template for the bot's responses
tech_template = """As a combit support bot, your goal is to provide accurate
and helpful technical information about List & Label, a powerful reporting tool used for
building various applications. You should answer user inquiries based on the
context provided and avoid making up answers. If you don't know the answer,
simply state that you don't know. Provide concrete examples like code snippets
or function prototypes wherever possible. Remember to provide relevant information
about List & Label's features, benefits, and API to assist the user in
understanding how to best use it for application development.
{context}
Q: {question}
A: """
PROMPT = PromptTemplate(
template=tech_template, input_variables=["context", "question"]
)
# Set up the retrieval-based question-answering model
qa = RetrievalQA.from_chain_type(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0,
openai_api_key="..."),
chain_type="stuff",
retriever=instance.as_retriever(),
chain_type_kwargs={"prompt": PROMPT})
@app.route('/api')
def my_api():
query = request.args.get('query')
# Process the input string here
output_string = qa.run(query)
response = make_response(output_string, 200)
response.mimetype = "text/plain"
response.headers.add('Access-Control-Allow-Origin', '*')
return response
if __name__ == '__main__':
app.run()
Usage in a Web Page
Once the API is set up, you can use it in a web page by making fetch requests from JavaScript. Here’s an example of how you can use the API in a web page:
<html>
<head>
<title>combit DocBot</title>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css">
<style>
#spinner {
display: none;
}
#result {
margin-top: 20px;
}
</style>
</head>
<body>
<div class="container my-5">
<div class="row justify-content-center">
<div class="col-md-6">
<div class="card">
<div class="card-header bg-primary text-white">
<h4 class="mb-0">combit DocBot</h4>
</div>
<div class="card-body">
<form>
<div class="mb-3">
<label for="prompt" class="form-label">Enter a prompt:</label>
<input type="text" id="prompt" name="prompt" class="form-control">
</div>
<div class="d-grid gap-2">
<button type="submit" class="btn btn-primary">
Query
<span class="spinner-border spinner-border-sm ms-2" id="spinner" role="status" aria-hidden="true"></span>
</button>
</div>
</form>
</div>
<div class="card-footer">
<div class="result" id="result"></div>
</div>
</div>
</div>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<script>
const form = document.querySelector('form');
const resultDiv = document.querySelector('#result');
spinner.style.display = 'none';
form.addEventListener('submit', async (event) => {
event.preventDefault();
spinner.style.display = 'inline-block';
resultDiv.textContent="";
const prompt = document.querySelector('#prompt').value;
fetch(`http://127.0.0.1:5000/api?query=${encodeURIComponent(prompt)}`)
.then((response) => response.text())
.then((text) => {
spinner.style.display = 'none'; resultDiv.innerHTML = marked.parse(text);});
});
</script>
</body>
</html>
The end result will be a web page where users can enter prompts and get answers from the DocBot. You can also customize the appearance of the web page according to your needs.
Wrap up
By utilizing the OpenAI APIs, Python programming language, and langchain framework, you can easily build a customized bot that can provide specific information based on domain knowledge. The code snippets provided in this article serve as a great starting point for your own bot-building projects. If you’re interested in learning more about the services provided by Skrots, such as text summarization, sentiment analysis, and language translation, visit our website at Skrots.com. You can also check out our wide range of services at Skrots.com/services. Thank you for reading!