!pip install -qqq langchain==0.0.228 --progress-bar off
!pip install -qqq chromadb==0.3.26 --progress-bar off
!pip install -qqq sentence-transformers==2.2.2 --progress-bar off
!pip install -qqq auto-gptq==0.2.2 --progress-bar off
!pip install -qqq einops==0.6.1 --progress-bar off
!pip install -qqq unstructured==0.8.0 --progress-bar off
!pip install -qqq transformers==4.30.2 --progress-bar off
!pip install -qqq torch==2.0.1 --progress-bar off

以下是我們將使用的導入列表:

from pathlib import Path

import torch
from auto_gptq import AutoGPTQForCausalLM
from langchain.chains import ConversationalRetrievalChain
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import DirectoryLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from transformers import AutoTokenizer, GenerationConfig, TextStreamer, pipeline

數據

聊天機器人的自定義知識源自Skyscanner幫助中心的常見問題解答部分。我們將從中提取12個問題和對應的答案,并為它們創建單獨的文本文件。接下來,讓我們編寫一個輔助函數來完成文件創建的任務。

questions_dir = Path("skyscanner")
questions_dir.mkdir(exist_ok=True, parents=True)

def write_file(question, answer, file_path):
text = f"""
Q: {question}
A: {answer}
""".strip()
with Path(questions_dir / file_path).open("w") as text_file:
text_file.write(text)

我們將為每個問題和答案采用的格式如下:

Q: Sample question?
A: Sample answer.

寫入文件的時間:

write_file(
question="How do I search for flights on Skyscanner?",
answer="""
Skyscanner helps you find the best options for flights on a specific date, or on any day in a given month or even year. For tips on how best to search, please head over to our search tips page.

If you're looking for inspiration for your next trip, why not try our everywhere feature. Or, if you want to hang out and ensure the best price, you can set up price alerts to let you know when the price changes.
""".strip(),
file_path="question_1.txt",
)

write_file(
question="What are mash-ups?",
answer="""
These are routes where you fly with different airlines, because it`s cheaper than booking with just one. For example:

If you wanted to fly London to New York, we might find it`s cheaper to fly out with British Airways and back with Virgin Atlantic, rather than buy a round-trip ticket with one airline. This is called a "sum-of-one-way" mash-up. Just in case you're interested.

Another kind of mash-up is what we call a "self-transfer" or a "non-protected transfer". For example:

If you wanted to fly London to Sydney, we might find it`s cheaper to fly London to Dubai with Emirates, and then Dubai to Sydney with Qantas, rather than booking the whole route with one airline.

Pretty simple, right?

However, what`s really important to bear in mind is that mash-ups are NOT codeshares. A codeshare is when the airlines have an alliance. If anything goes wrong with the route - a delay, say, or a strike - those airlines will help you out at no extra cost. But mash-ups DO NOT involve an airline alliance. So if something goes wrong with a mash-up, it could cost you more money.
""".strip(),
file_path="question_2.txt",
)

write_file(
question="Why have I been blocked from accessing the Skyscanner website?",
answer="""
Skyscanner's websites are scraped by bots many millions of times a day which has a detrimental effect on the service we're able to provide. To prevent this, we use a bot blocking solution which checks to ensure you're using the website in a normal manner.

Occasionally, this may mean that a genuine user may be wrongly flagged as a bot. This can be for a number of potential reasons, including, but not limited to:

You're using a VPN which we have had to block due to excessive bot traffic in the past
You're using our website at super speed which manages to beat our rate limits
You have a plug-in on your browser which could be interfering with how our website interacts with you as a user
You're using an automated browser
If you've been blocked during normal use, please send us your IP address (this website may help: http://www.whatismyip.com/), the website you're accessing (e.g. www.skyscanner.net) and the date/time this happened, via the Contact Us button below and we'll look into it as quickly as possible.
""".strip(),
file_path="question_3.txt",
)

write_file(
question="Where is my booking confirmation?",
answer="""
You should get a booking confirmation by email from the company you bought your travel from. This can sometimes go into your spam/junk mail folder, so it's always worth checking there.

If you still can't find it, try getting in touch with the company you bought from to find out what's going on.

To find out who you need to contact, check the company name next to the charge on your bank account.
""".strip(),
file_path="question_4.txt",
)

write_file(
question="How do I change or cancel my booking?",
answer="""
For all questions about changes, cancellations, and refunds - as well as all other questions about bookings - you'll need to contact the company you bought travel from. They'll have all the info about your booking and can advise you.

You'll find 1000s of travel agencies, airlines, hotels and car rental companies that you can buy from through our site and app. When you buy from one of these travel partners, they will take your payment (you'll see their name on your bank or credit card statement), contact you to confirm your booking, and provide any help or support you might need.

If you bought from one of these partners, you'll need to contact them as they have all the info about your booking. We unfortunately don't have any access to bookings you made with them.
""".strip(),
file_path="question_5.txt",
)

write_file(
question="I booked the wrong dates / times",
answer="""
If you have found that you have booked the wrong dates or times, please contact the airline or travel agent that you booked your flight with as they will be able to help you change your flights to the intended dates or times.

The search box below can help you find the contact details for the travel provider you booked with.

You can search flexible or specific dates on Skyscanner to find your preferred flight, and when you select a flight on Skyscanner you are transferred to the website where you will make and pay for your booking. Once you are redirected to the airline or travel agent website, you might be required to select dates again, depending on the website. In all cases, you will be shown the flight details of your selection and you are required before confirming payment to state that you have checked all details and agreed to the terms and conditions. We strongly recommend that you always check this information carefully, as travel information can be subject to change.
""".strip(),
file_path="question_6.txt",
)

write_file(
question="I entered the wrong email address",
answer="""
Please contact the airline or travel agent you booked with as Skyscanner does not have access to bookings made with airlines or travel agents.

If you can't remember who you booked with, you can check your credit card statement for a company name.

The search box below can help you find the contact details for the travel provider you booked with.
""".strip(),
file_path="question_7.txt",
)

write_file(
question="Luggage",
answer="""
Depending on the flight provider, the rules, conditions and prices for luggage (including sports equipment) do vary.
It's always a good idea to check with the airline or travel agent directly (and you should be shown the options when you make your booking).
""".strip(),
file_path="question_8.txt",
)

write_file(
question="Changes, cancellation and refunds",
answer="""
For changes, cancellations or refunds, we recommend that you contact the travel provider (airline or travel agent) agent that you completed your booking with.

As a travel search engine, Skyscanner doesn't take your booking or payment ourselves. Instead, we pass you through to your chosen airline or travel agent where you make your booking directly. We therefore don't have access or visibility to any of your booking information. Depending on the type of ticket you've booked, there may be different options for changes, cancellations and refunds, and the travel provider will be best placed to advise on these.
""".strip(),
file_path="question_9.txt",
)

write_file(
question="Why does the price sometimes change when I am redirected to a flight provider?",
answer="""
Flight prices and availability change constantly, so we make sure the data is updated regularly to reflect this. When you redirect to a travel provider's site, the price is updated again so you can be sure that you will always see the best price available from the airline or travel agent at time of booking.

We make every effort to ensure the information you see on Skyscanner is accurate and up to date, but very occasionally there can be reasons why a price change has not updated accurately on the site. If you see a price difference between Skyscanner and a travel provider, please contact us with all the flight details (from, to, dates, departure times, airline and travel agent if applicable) and we will investigate further.
""".strip(),
file_path="question_10.txt",
)

write_file(
question="Why is Skyscanner free?",
answer="""
Does Skyscanner charge commission?

Nope. Skyscanner is always free to search, and we never charge you any hidden fees.

Want to know how do we do it?

Well, we search through thousands of sites to find you the best deals for flights, hotels and car hire. That includes everything from fancy hotels to low cost airlines, so no matter what your budget is, we'll help you get there.

See a price you like? We'll connect you to that airline or travel company so you can book with them directly. And for this referral, the airline or travel company pays us a small fee.

And that's all there is to it!
""".strip(),
file_path="question_11.txt",
)

write_file(
question="Are my details safe?",
answer="""
We take your privacy and safety online very seriously. We'll never sell, share or pass on your IP details, cookies, personal info and location data to others unless it's required by law, or it's necessary for one of the reasons set out in our Privacy Policy.
""".strip(),
file_path="question_12.txt",
)

模型

我根據LMSYS提供的排名,為我們的項目選擇了模型組織。我的選擇標準包括卓越的性能、在HuggingFace Hub上的可用性、以及能在單個T4 GPU上執行實時推理的能力。我們最終選擇的型號是由Nous Research提供的Nous-Hermes-13b,該型號接受了GPT-4合成數據的訓練。為了提高推理速度,我們決定使用量化模型。

幸運的是,這個模型在HuggingFace Hub上有一個量化版本,因此我們可以使用AutoGPTQ庫來加載它。我們將使用的量化模型是4位的,由TheBloke提供。現在,讓我們來加載它:

DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"

model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device=DEVICE,
)

generation_config = GenerationConfig.from_pretrained(model_name_or_path)

請注意,我們需要明確指定想要加載的文件,并且還要設置model_basename參數。此外,我們還加載了對應的分詞器(tokenizer)和生成配置。

為了測試模型,我們需要按照模型所使用的標準格式來格式化輸入提示,這與處理LLaMa模型時的要求相似:

question = (
"Which programming language is more suitable for a beginner: Python or JavaScript?"
)
prompt = f"""
### Instruction: {question}
### Response:
""".strip()

讓我們運行prompt命令來修改tokenizer和模型:

%%time
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(DEVICE)
with torch.inference_mode():
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
CPU times: user 3.59 s, sys: 1.1 s, total: 4.69 s
Wall time: 12.4 s

這在單個T4 GPU上花費了大約12秒。讓我們看看模型生成了什么:

print(tokenizer.decode(output[0]))
<s> ### Instruction: Which programming language is more suitable for a
beginner: Python or JavaScript?
### Response:Python is generally considered more suitable for beginners due to
its readability and simplicity compared to JavaScript.</s>

請注意,模型生成了與問題相關的回復。它在回復的開頭添加了一個<s>標記,在回復的結尾添加了一個</s>標記。模型利用這些標記來標明回復的起始和結束。在將回復返回給用戶之前,我們需要將這些標記刪除。

最后,讓我們看看模型的生成配置,看看它使用了什么參數:

generation_config
GenerationConfig {
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.30.2"
}

構建管道

LangChain通過提供HuggingFacePipeline類,簡化了HuggingFace LLM(大型語言模型)的使用流程。在本例中,我們將利用特定的模型和分詞器來創建一個文本生成管道。此外,我們還將應用文本流處理器,以實現將響應實時地回傳給用戶。該處理器還會負責刪除<s>和</s>標記,以及響應中的提示內容。

streamer = TextStreamer(
tokenizer, skip_prompt=True, skip_special_tokens=True, use_multiprocessing=False
)

pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=2048,
temperature=0,
top_p=0.95,
repetition_penalty=1.15,
generation_config=generation_config,
streamer=streamer,
batch_size=1,
)

llm = HuggingFacePipeline(pipeline=pipe)

讓我們通過管道傳遞提示符來測試它:

response = llm(prompt)
Python is generally considered to be more suitable for beginners due to itsreadability and simplicity compared to JavaScript.

太棒了!這個響應與我們直接運行模型時得到的響應是一致的,但它已經被清理并整理好了,可以直接返回給用戶。

嵌入文檔

為了與我們偏好免費和開源模型的原則保持一致,我們將采用E5-base-v25嵌入模型,該模型最初在名為“弱監督對比預訓練的文本嵌入”的論文中被提出。我們選擇這個嵌入模型,主要是基于它在海量文本嵌入基準(MTEB)排行榜上的優秀表現,該模型位列第七。由于該模型在HuggingFace Hub上可供使用,因此我們可以利用HuggingFaceEmbeddings類來加載它。

embeddings = HuggingFaceEmbeddings(
model_name="embaas/sentence-transformers-multilingual-e5-base",
model_kwargs={"device": DEVICE},
)

我們可以利用LangChain中的DirectoryLoader來加載文本文件作為LangChain文檔。這將使我們能夠方便地處理文件的內容:

loader = DirectoryLoader("./skyscanner/", glob="**/*txt")
documents = loader.load()
len(documents)
12

鑒于我們模型的上下文限制(最多處理2048個令牌)以及文檔的實際長度,我們需要將文檔拆分成更小的段落。這里,我們可以使用CharacterTextSplitter工具,將文檔分割成每段512個字符,且各段之間不重疊。

text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
texts[4]
Document(
page_content='Q: Changes, cancellation and refunds A: For changes, cancellations or refunds, we recommend that you contact the travel provider (airline or travel agent) agent that you completed your booking with.',
metadata={'source': 'skyscanner/question_9.txt'}
)

為了創建(并存儲)嵌入,我們將使用Chroma工具。具體而言,我們可以借助Chroma的from_documents方法來創建一個數據庫。

db = Chroma.from_documents(texts, embeddings)
db.similarity_search("flight search")
[
Document(
page_content="Q: How do I search for flights on Skyscanner? A: Skyscanner helps you find the best options for flights on a specific date, or on any day in a given month or even year. For tips on how best to search, please head over to our search tips page.\n\nIf you're looking for inspiration for your next trip, why not try our everywhere feature. Or, if you want to hang out and ensure the best price, you can set up price alerts to let you know when the price changes.",
metadata={'source': 'skyscanner/question_1.txt'}
),
Document(
page_content="You're using a VPN which we have had to block due to excessive bot traffic in the past You're using our website at super speed which manages to beat our rate limits You have a plug-in on your browser which could be interfering with how our website interacts with you as a user You're using an automated browser If you've been blocked during normal use, please send us your IP address (this website may help: http://www.whatismyip.com/), the website you're accessing (e.g. www.skyscanner.net) and the date/time this happened, via the Contact Us button below and we'll look into it as quickly as possible.",
metadata={'source': 'skyscanner/question_3.txt'}
),
Document(
page_content='Q: I booked the wrong dates / times A: If you have found that you have booked the wrong dates or times, please contact the airline or travel agent that you booked your flight with as they will be able to help you change your flights to the intended dates or times.\n\nThe search box below can help you find the contact details for the travel provider you booked with.',
metadata={'source': 'skyscanner/question_6.txt'}
),
Document(
page_content='You can search flexible or specific dates on Skyscanner to find your preferred flight, and when you select a flight on Skyscanner you are transferred to the website where you will make and pay for your booking. Once you are redirected to the airline or travel agent website, you might be required to select dates again, depending on the website. In all cases, you will be shown the flight details of your selection and you are required before confirming payment to state that you have checked all details and agreed to the terms and conditions. We strongly recommend that you always check this information carefully, as travel information can be subject to change.',
metadata={'source': 'skyscanner/question_6.txt'}
)
]

擁有了智能代理(即大型語言模型LLM)和檢索相關信息(通過數據庫進行相似性搜索)的能力后,您就已經具備了構建聊天機器人所需的基本條件。

對話鏈

為了將這些組件整合在一起,我們將利用LangChain的“鏈”功能。接下來,讓我們開始定義輸入提示吧。

template = """
### Instruction: You're a travelling support agent that is talking to a customer.

Use only the chat history and the following information
{context}
to answer in a helpful manner to the question. If you don't know the answer -
say that you don't know. Keep your replies short, compassionate and informative.

{chat_history}
### Input: {question}
### Response:
""".strip()

prompt = PromptTemplate(
input_variables=["context", "question", "chat_history"], template=template
)

我們的提示包含三個變量:

至于內存模塊——我們將采用LangChain中的ConversationBufferMemory類。該類負責存儲對話記錄:

memory = ConversationBufferMemory(
memory_key="chat_history",
human_prefix="### Input",
ai_prefix="### Response",
output_key="answer",
return_messages=True,
)

請注意,我們還定義了human_prefix和ai_prefix這兩個參數,它們將用于在提示中格式化并區分用戶與AI的對話記錄。

讓我們利用前面定義的提示來創建一個LangChain的鏈。

chain = ConversationalRetrievalChain.from_llm(
llm,
chain_type="stuff",
retriever=db.as_retriever(),
memory=memory,
combine_docs_chain_kwargs={"prompt": prompt},
return_source_documents=True,
verbose=True,
)

這個鏈將大型語言模型(LLM)、檢索器和存儲器整合在了一起。它還利用提示來規范輸出的格式。最后,我們設置了return_source_documents=True,以便從數據庫中返回原始的文檔信息。

讓我們試一試:

question = "How flight search works?"
answer = chain(question)

鏈式輸出

進入新的鏈條…

進入新的鏈條…格式化后提示:

#說明:您是一名旅行支持代理,正在與客戶交談。僅使用聊天記錄和以下信息

Q: 如何在 Skyscanner 上搜索航班?答:Skyscanner 可幫助您查找特定日期或特定月份甚至年份中任何一天的最佳航班選項。有關如何最好地搜索的提示,請前往我們的搜索提示頁面。

如果您正在為下一次旅行尋找靈感,不妨試試我們的“無處不在”功能。另外,如果您想隨時掌握價格動態,確保獲得最優價格,可以設置價格警報,這樣一旦價格有所變動,您就能及時得知。

作為一個旅游搜索引擎,Skyscanner不會直接處理您的預訂或付款事宜。相反,我們會將您引導至您所選擇的航空公司或旅行社網站,您可以在那里直接完成預訂。因此,我們無法訪問或查看您的任何預訂詳情。關于更改、取消和退款政策,這通常取決于您所預訂的機票類型,旅行服務提供商將是最能為您提供相關建議的一方。

您可以在Skyscanner上搜索靈活日期或指定具體日期,以找到最滿意的航班,當您在Skyscanner選定航班后,我們會將您引導至相應的航空公司或旅行社網站,您將在那里完成預訂并支付相關費用。請注意,跳轉到這些網站后,您可能需要根據該網站的要求再次確認航班日期。無論哪種情況,您都將看到所選航班的詳細信息,并在最終確認付款之前,需要確認已仔細核對所有信息并同意相關條款與條件。我們極力建議您總是認真檢查這些信息,因為旅行信息有可能發生變動。

問:為什么當我被重定向到航班提供商網站時,價格有時會發生變化?答:航班價格和可用性總是在不斷變化中,我們致力于定期更新數據以反映這些變化。當您被重定向到旅行提供商網站時,價格會再次進行更新,確保您看到的是航空公司或旅行社提供的最新、最優價格。希望這能幫助您理解這一現象。

輸入:航班搜索是如何工作的?

響應:

Skyscanner能夠助您找到特定日期、給定月份內任意一天乃至整年中的最佳航班選擇。我們的搜索算法會廣泛掃描數百家航空公司和旅行社,為您提供涵蓋直飛、轉機以及各類中途停留選項的全面航班信息。若想了解如何更高效地進行搜索,請訪問我們的搜索技巧頁面。請注意,Skyscanner作為一個旅游搜索引擎,不會直接處理您的預訂或付款。相反,我們會將您引導至您所選的航空公司或旅行社網站,讓您在那里直接完成預訂。因此,我們無法訪問或查看您的任何預訂詳情。

成品鏈。

Skyscanner helps you find the best options for flights on a specific date, oron any day in a given month or even year. Our search algorithm scans hundreds ofairlines and travel agents to bring you the most comprehensive range of flight options,including direct and connecting flights, stopovers, and layovers.  For tips on how bestto search, please visit our Search Tips page. As a travel search engine,Skyscanner doesn't take your booking or payment ourselves. Instead, we pass you throughto your chosen airline or travel agent where you make your booking directly. Wetherefore don't have access or visibility to any of your booking information.

太棒了!我們的智能助手已經運用專業知識回答了您的問題。接下來,我們一起來分析下答案的構成:

answer.keys()
dict_keys(['question', 'chat_history', 'answer', 'source_documents'])

答案包含問題、聊天記錄、答案和源文檔。讓我們試著問另一個問題:

question = "I bought flight tickets, but I can't find any confirmation. Where is it?"
response = chain(question)

鏈式輸出

正在進入新的鏈條…

正在進入新的鏈條…格式化后提示:

請根據以下對話和后續問題,將后續問題重新表述為一個獨立的、保持原語言的問題。

聊天記錄:

人類:航班搜索是如何運作的?

助手:Skyscanner能幫助您找到特定日期或給定月份甚至年份中任意一天的最佳航班選項。我們的搜索算法會掃描數百家航空公司和旅行社,為您提供最全面的航班選擇,包括直飛、轉機航班以及中途停留。有關如何更有效地進行搜索的提示,請訪問我們的搜索技巧頁面。請注意,Skyscanner是一個旅游搜索引擎,不會直接接受您的預訂或付款。相反,我們會將您轉接到您選擇的航空公司或旅行社的網站進行預訂。因此,我們無法訪問或查看您的任何預訂信息。

后續輸入:我買了機票,但找不到確認信息。它在哪里?

獨立問題:你能幫我找到我的機票確認信息嗎?

完成鏈。

正在進入新鏈…

正在進入新鏈…

說明:您是一名旅行支持代理,正在與客戶溝通。請僅使用聊天記錄和以下信息來回答問題。

Q:我的預訂確認信息在哪里?
A:您應該會收到來自您購買旅行產品的公司發送的電子郵件確認。這封郵件有時會進入您的垃圾郵件文件夾,所以請務必檢查。

如果您仍然找不到,請嘗試聯系您購買產品的公司以了解情況。

要查找需要聯系的公司,請查看您銀行賬戶交易記錄旁邊的公司名稱。

Q:我預訂的日期/時間錯了怎么辦?
A:如果您發現自己預訂的日期或時間有誤,請聯系您預訂航班的航空公司或旅行社,因為他們將能夠幫助您更改航班至預期的日期或時間。

下面的搜索框可以幫助您找到旅行服務提供商的聯系方式。

Q:我輸入了錯誤的電子郵件地址怎么辦?
A:請聯系您預訂時選擇的航空公司或旅行社,因為Skyscanner無法訪問通過航空公司或旅行社預訂的信息。

如果您不記得是與哪家公司預訂的,可以檢查您的信用卡賬單以獲取公司名稱。

下面的搜索框可以幫助您找到旅行服務提供商的聯系方式。

作為旅游搜索引擎,Skyscanner不會直接接受您的預訂或付款。相反,我們會將您轉接到您選擇的航空公司或旅行社的網站進行預訂。因此,我們無法訪問或查看您的任何預訂信息。關于更改、取消和退款政策,請咨詢您的旅行服務提供商,他們會提供最準確的建議。

人類:航班搜索是如何運作的?

助手:Skyscanner能幫助您找到特定日期或給定月份甚至年份中任意一天的最佳航班選項。我們的搜索算法會掃描數百家航空公司和旅行社,為您提供最全面的航班選擇,包括直飛、轉機航班以及中途停留。有關如何更有效地進行搜索的提示,請訪問我們的搜索技巧頁面。請注意,Skyscanner是一個旅游搜索引擎,不會直接接受您的預訂或付款。相反,我們會將您轉接到您選擇的航空公司或旅行社的網站進行預訂。因此,我們無法訪問或查看您的任何預訂信息。

成品鏈。

You should receive an email confirmation from the airline or travel agent youbooked with. Sometimes this email might end up in your junk/spam folder. Trysearching your inbox and spam folder first before reaching out to the airline or travelagent for assistance.

答案確實挺不錯的!不過,區塊鏈技術似乎在嘗試一些新奇的做法,它會對問題進行重新闡述(具體可以參見詳細的輸出內容),然后再針對這個新問題給出答案。然而,在當前的LangChain版本中,我們還沒有找到關閉這一功能的方法。雖然這種做法的初衷是為了減輕代理的記憶負擔,但有時候它可能會讓代理的反應變得遲鈍,而且也不總是符合我們的期望。因此,我們計劃去掉這個重新闡述的步驟。

帶內存的 QA 鏈

我們將使用load_qa_chain重新創建我們的鏈:

memory = ConversationBufferMemory(
memory_key="chat_history",
human_prefix="### Input",
ai_prefix="### Response",
input_key="question",
output_key="output_text",
return_messages=False,
)

chain = load_qa_chain(
llm, chain_type="stuff", prompt=prompt, memory=memory, verbose=True
)

這種類型的鏈不會重新表達問題,也沒有檢索器來搜索文檔。我們得自己動手:

question = "How flight search works?"
docs = db.similarity_search(question)
answer = chain.run({"input_documents": docs, "question": question})
Skyscanner helps you find the best options for flights on a specific date, oron any day in a given month or even year. Our search algorithm scans hundreds ofairlines and travel agents to bring you the most comprehensive range of flight options,including direct and connecting flights, stopovers, and layovers.  For tips on how bestto search, please visit our Search Tips page. As a travel search engine,Skyscanner doesn't take your booking or payment ourselves. Instead, we pass you throughto your chosen airline or travel agent where you make your booking directly. Wetherefore don't have access or visibility to any of your booking information.

很好,我們得到了同樣的回答!雖然它可能會有更多的工作,但它更靈活,允許我們使用相同的鏈。讓我們嘗試另一個問題:

question = "I entered wrong email address during my flight booking. What should I do?"
docs = db.similarity_search(question)
answer = chain.run({"input_documents": docs, "question": question})
Please contact the airline or travel agent you booked with as Skyscanner doesnot have access to bookings made with airlines or travel agents. If you can'tremember who you booked with, you can check your credit card statement for a companyname. The search box below can help you find the contact details for the travelprovider you booked with.

沒問題,讓我們把所有有用的信息整合起來,打包成一個既方便又實用的整體方案。

支持聊天機器人

讓我們創建一個類,使其易于使用我們的聊天機器人:

DEFAULT_TEMPLATE = """
### Instruction: You're a travelling support agent that is talking to a customer.

Use only the chat history and the following information
{context}
to answer in a helpful manner to the question. If you don't know the answer -
say that you don't know. Keep your replies short, compassionate and informative.

{chat_history}
### Input: {question}
### Response:
""".strip()

class Chatbot:
def __init__(
self,
text_pipeline: HuggingFacePipeline,
embeddings: HuggingFaceEmbeddings,
documents_dir: Path,
prompt_template: str = DEFAULT_TEMPLATE,
verbose: bool = False,
):
prompt = PromptTemplate(
input_variables=["context", "question", "chat_history"],
template=prompt_template,
)
self.chain = self._create_chain(text_pipeline, prompt, verbose)
self.db = self._embed_data(documents_dir, embeddings)

def _create_chain(
self,
text_pipeline: HuggingFacePipeline,
prompt: PromptTemplate,
verbose: bool = False,
):
memory = ConversationBufferMemory(
memory_key="chat_history",
human_prefix="### Input",
ai_prefix="### Response",
input_key="question",
output_key="output_text",
return_messages=False,
)

return load_qa_chain(
text_pipeline,
chain_type="stuff",
prompt=prompt,
memory=memory,
verbose=verbose,
)

def _embed_data(
self, documents_dir: Path, embeddings: HuggingFaceEmbeddings
) -> Chroma:
loader = DirectoryLoader(documents_dir, glob="**/*txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
return Chroma.from_documents(texts, embeddings)

def __call__(self, user_input: str) -> str:
docs = self.db.similarity_search(user_input)
return self.chain.run({"input_documents": docs, "question": user_input})

我們的聊天機器人系統具備一個文本處理流程、嵌入功能以及一個包含各類文檔的目錄。兩大核心任務是構建聊天鏈和創建數據庫。此外,我們還對__call__方法進行了重寫,讓調用聊天機器人變得像調用普通函數一樣簡單便捷。

讓我們創建一個類的實例:

chatbot = Chatbot(llm, embeddings, "./skyscanner/")

要創建聊天,我們將使用while循環并要求用戶輸入。如果用戶輸入bye或goodbye,我們將中斷循環:

import warnings

warnings.filterwarnings("ignore", category=UserWarning)

while True:
user_input = input("You: ")
if user_input.lower() in ["bye", "goodbye"]:
break
answer = chatbot(user_input)
print()

:航班搜索如何運作?

助手:Skyscanner 可幫助您找到特定日期或給定月份甚至年份中任何一天的航班的最佳選擇。我們的搜索算法掃描數百家航空公司和旅行社,為您提供最全面的航班選擇,包括直飛和轉機航班、中途停留和中途停留。

:我買了機票,但我找不到任何確認。它在哪里?

助理:聽起來您的機票是從其他公司或航空公司購買的,而非通過Skyscanner。要查找預訂確認信息,請直接聯系購票公司。他們將提供更多關于預訂及確認的詳情。

: 我在預訂航班時輸入了錯誤的電子郵件地址。我該怎么做?

助理:很抱歉,由于Skyscanner不參與實際預訂流程,我們無法直接更改電子郵件地址。但您可以嘗試聯系預訂的航空公司或旅行社,詢問他們是否能為您更新電子郵件地址。或者,您也可以使用我們網站上的搜索功能查找預訂,看看能否自行找到并更正。

:混搭航班(mash-up flights)的優勢是什么?

助理:混搭航班相比傳統往返航班具有多項優勢。以下是主要幾點:

  1. 成本節省:通過組合兩個單獨的單程航班,您通常能節省費用,相較于購買單張往返機票更為劃算。
  2. 靈活性增強:混搭航班讓您能更靈活地規劃行程。如需更改回程日期或時間,您只需調整相應部分,無需重新預訂整個行程。
  3. 選擇更多樣:搭乘多家航空公司的航班,意味著您可以享受更廣泛的目的地選擇和出發時間安排,從而挑選最適合您日程和偏好的航班。

原文鏈接:https://www.mlexpert.io/blog/support-chatbot-using-custom-knowledge-base-with-langchain

上一篇:

基于自定義數據集微調LLama 2模型

下一篇:

探索OpenAI ChatGPT API:從創意構想到高級編程的無限可能
#你可能也喜歡這些API文章!

我們有何不同?

API服務商零注冊

多API并行試用

數據驅動選型,提升決策效率

查看全部API→
??

熱門場景實測,選對API

#AI文本生成大模型API

對比大模型API的內容創意新穎性、情感共鳴力、商業轉化潛力

25個渠道
一鍵對比試用API 限時免費

#AI深度推理大模型API

對比大模型API的邏輯推理準確性、分析深度、可視化建議合理性

10個渠道
一鍵對比試用API 限時免費