? ? ? ?基于上述Mistral 7B Prompt模板,我們構建關鍵詞抽取Prompt,包括Example Prompt和Keyword Prompt,Example Prompt是抽取關鍵詞的一個Prompt樣例,Keyword Prompt是讓LLM輸出關鍵詞的Prompt,下面展示一個例子:

example_prompt = """<s>[INST]I have the following document:- The website mentions that it only takes a couple of days to deliver but I still have not received mine.
Please give me the keywords that are present in this document and separate them with commas.Make sure you to only return the keywords and say nothing else. For example, don't say:"Here are the keywords present in the document"[/INST] meat, beef, eat, eating, emissions, steak, food, health, processed, chicken</s>"""

       Keyword Prompt充分利用了KeyBERT的 [DOCUMENT] 標簽表示下面是文檔:

keyword_prompt = """[INST]I have the following document:- [DOCUMENT]
Please give me the keywords that are present in this document and separate them with commas.Make sure you to only return the keywords and say nothing else. For example, don't say:"Here are the keywords present in the document"[/INST]"""

      關鍵詞抽取的完整Prompt需要合并Example Prompt和Keyword Prompt,代碼如下:

>>> prompt = example_prompt + keyword_prompt>>> print(prompt)"""<s>[INST]I have the following document:- The website mentions that it only takes a couple of days to deliver but I still have not received mine.
Please give me the keywords that are present in this document and separate them with commas.Make sure you to only return the keywords and say nothing else. For example, don't say: "Here are the keywords present in the document"[/INST] meat, beef, eat, eating, emissions, steak, food, health, processed, chicken</s>[INST]
I have the following document:- [DOCUMENT]
Please give me the keywords that are present in this document and separate them with commas.Make sure you to only return the keywords and say nothing else. For example, don't say: "Here are the keywords present in the document"[/INST]"""

使用KeyLLM抽取關鍵詞

from keybert.llm import TextGenerationfrom keybert import KeyLLM
# Load it in KeyLLMllm = TextGeneration(generator, prompt=prompt)kw_model = KeyLLM(llm)
documents = ["The website mentions that it only takes a couple of days to deliver but I still have not received mine.","I received my package!","Whereas the most powerful LLMs have generally been accessible only through limited APIs (if at all), Meta released LLaMA's model weights to the research community under a noncommercial license."]
keywords = kw_model.extract_keywords(documents)

輸出如下內容:

[['deliver', 'days', 'website', 'mention', 'couple', 'still', 'receive', 'mine'], ['package', 'received'], ['LLM', 'API', 'accessibility', 'release', 'license', 'research', 'community', 'model', 'weights', 'Meta']]

       可以隨意使用提示來指定要提取的關鍵字類型、關鍵字的長度,甚至如果LLM是多語言的,還可以使用哪種語言返回關鍵字。

? ? ?切換其他LLM,比如ChatGPT,可以參考:https://maartengr.github.io/KeyBERT/guides/llms.html

更高效使用KeyLLM抽取關鍵詞

? ? ? ?在成千上萬個文檔上重復使用LLM并不是最有效的方法!其實,我們可以對文檔先進行聚類,然后再提取關鍵詞。其工作原理如下:首先,我們embedding所有文檔,并將它們轉換為數字表示;其次,找出哪些文檔彼此最相似,假設高度相似的文檔將具有相同的關鍵字,因此不需要為所有文檔提取關鍵字。第三,只從每個聚類中的一個文檔中提取關鍵字,并將關鍵字分配給同一聚類中的所有文檔。

from keybert import KeyLLMfrom sentence_transformers import SentenceTransformer
# Extract embeddingsmodel = SentenceTransformer('BAAI/bge-small-en-v1.5')embeddings = model.encode(documents, convert_to_tensor=True)
# Load it in KeyLLMkw_model = KeyLLM(llm)
# Extract keywordskeywords = kw_model.extract_keywords( documents, embeddings=embeddings, threshold=.5)

       threshold增加到大約.95將識別幾乎相同的文檔,而將其設置為大約.5將識別關于相同主題的文檔。

輸出關鍵詞如下:

>>> keywords[['deliver', 'days', 'website', 'mention', 'couple', 'still', 'receive', 'mine'], ['deliver', 'days', 'website', 'mention', 'couple', 'still', 'receive', 'mine'], ['LLaMA', 'model', 'weights', 'release', 'noncommercial', 'license', 'research', 'community', 'powerful', 'LLMs', 'APIs']]

? ? ? ?在這個示例中,我們可以看到前兩個文檔被聚集在一起,并接收到相同的關鍵字。我們沒有將所有三個文檔都傳遞給LLM,而是只傳遞了兩個文檔。如果你有成千上萬的文檔,這可以大大加快速度。

更高效使用KeyBERT和KeyLLM抽取關鍵詞

? ? ? ?之前的例子中,我們手動將文檔embedding傳遞給KeyLLM,基本上是對關鍵字進行零樣本提取。我們可以利用KeyBERT來進一步擴展這個例子。由于KeyBERT可以生成關鍵字并對文檔,我們可以利用它不僅簡化管道,而且向LLM建議一些關鍵字。這些建議的關鍵字可以幫助LLM決定要使用的關鍵字。此外,它允許KeyBERT中的所有內容與KeyLLM一起使用!

使用KeyBERT和KeyLLM抽取關鍵詞只需要三行代碼,如下:

from keybert import KeyLLM, KeyBERT
# Load it in KeyLLMkw_model = KeyBERT(llm=llm, model='BAAI/bge-small-en-v1.5')
# Extract keywordskeywords = kw_model.extract_keywords(documents, threshold=0.5)

輸出如下:

>>> keywords[['deliver', 'days', 'website', 'mention', 'couple', 'still', 'receive', 'mine'], ['deliver', 'days', 'website', 'mention', 'couple', 'still', 'receive', 'mine'], ['LLaMA', 'model', 'weights', 'release', 'license', 'research', 'community', 'powerful', 'LLMs', 'APIs', 'accessibility']]

參考文獻:

[1] https://towardsdatascience.com/introducing-keyllm-keyword-extraction-with-llms-39924b504813

[2]?https://maartengr.github.io/KeyBERT/guides/keyllm.html

文章轉自微信公眾號@ArronAI

上一篇:

LLM之LangChain(七)| 使用LangChain,LangSmith實現Prompt工程ToT

下一篇:

LLM實戰(二)| 使用ChatGPT API提取文本topic
#你可能也喜歡這些API文章!

我們有何不同?

API服務商零注冊

多API并行試用

數據驅動選型,提升決策效率

查看全部API→
??

熱門場景實測,選對API

#AI文本生成大模型API

對比大模型API的內容創意新穎性、情感共鳴力、商業轉化潛力

25個渠道
一鍵對比試用API 限時免費

#AI深度推理大模型API

對比大模型API的邏輯推理準確性、分析深度、可視化建議合理性

10個渠道
一鍵對比試用API 限時免費