如圖1所示,重新排序的任務就像一個智能過濾器。當檢索器從索引集合中檢索多個上下文時,這些上下文與用戶的查詢的相關性可能不同,一些上下文可能非常相關(在圖1中用紅框突出顯示),而另一些上下文可能只有輕微的相關甚至不相關(在圖1中用綠框和藍框高亮顯示)。

? ? ? ?重新排序的任務是評估這些上下文的相關性,并優先考慮最有可能提供準確和相關答案的上下文,讓LLM在生成答案時優先考慮這些排名靠前的上下文,從而提高響應的準確性和質量。

      簡單地說,重新排名就像在開卷考試中幫助你從一堆學習材料中選擇最相關的參考文獻,這樣你就可以更高效、更準確地回答問題。

本文描述的重新排序方法主要可分為以下兩種類型:

二、使用重新排序模型作為重新排序

       與嵌入模型不同,重新排序模型以查詢和上下文為輸入,直接輸出相似性得分,而不是嵌入。需要注意的是,重新排序模型是使用交叉熵損失進行優化的,允許相關性得分不限于特定范圍,甚至可能是負的。

       目前,沒有太多可用的重新排序模型。一種選擇是Cohere[1]的在線模型,可以通過API訪問。另外還有開源模型,如bge-reranker-basebge-reanker-large等。

        命中率和平均倒數排名(MRR)指標的評估結果,如下圖2所示:

從這個評估結果可以看出:

三、在本文中,我們將使用bge-reranker-base模型進行演示

3.1 環境配置

導入相關庫,設置環境和全局變量

import osos.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_KEY"
from llama_index import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingRerankerfrom llama_index.schema import QueryBundle
dir_path = "YOUR_DIR_PATH"

目錄中只有一個PDF文件,使用的是論文“TinyLlama: An Open-Source Small Language Model”[2]。

(py) Florian:~ Florian$ ls /Users/Florian/Downloads/pdf_test/tinyllama.pdf

3.2 使用LlamaIndex構建一個簡單的檢索器

documents = SimpleDirectoryReader(dir_path).load_data()index = VectorStoreIndex.from_documents(documents)retriever = index.as_retriever(similarity_top_k = 3)

3.3 基本檢索

query = "Can you provide a concise description of the TinyLlama model?"nodes = retriever.retrieve(query)for node in nodes: print('----------------------------------------------------') display_source_node(node, source_length = 500)

display_source_node函數改編自llama_index源代碼[3],因此修改如下:

from llama_index.schema import ImageNode, MetadataMode, NodeWithScorefrom llama_index.utils import truncate_text
def display_source_node( source_node: NodeWithScore, source_length: int = 100, show_source_metadata: bool = False, metadata_mode: MetadataMode = MetadataMode.NONE,) -> None: """Display source node""" source_text_fmt = truncate_text( source_node.node.get_content(metadata_mode=metadata_mode).strip(), source_length ) text_md = ( f"Node ID: {source_node.node.node_id} \n" f"Score: {source_node.score} \n" f"Text: {source_text_fmt} \n" ) if show_source_metadata: text_md += f"Metadata: {source_node.node.metadata} \n" if isinstance(source_node.node, ImageNode): text_md += "Image:"
print(text_md) # display(Markdown(text_md)) # if isinstance(source_node.node, ImageNode) and source_node.node.image is not None: # display_image(source_node.node.image)

下面是基本檢索的結果,重新排序前的前3個節點

----------------------------------------------------Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 Score: 0.8706055408845863 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw... 
----------------------------------------------------Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 Score: 0.8624531691777889 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang?Guangtao Zeng?Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr...
----------------------------------------------------Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f Score: 0.8346160605298356 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens...

3.4 重新排序

要重新排列上述節點,這里使用bge-reranker-base模型

print('------------------------------------------------------------------------------------------------')print('Start reranking...')
reranker = FlagEmbeddingReranker( top_n = 3, model = "BAAI/bge-reranker-base",)
query_bundle = QueryBundle(query_str=query)ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle = query_bundle)for ranked_node in ranked_nodes: print('----------------------------------------------------') display_source_node(ranked_node, source_length = 500)

重新排序后的結果如下:

------------------------------------------------------------------------------------------------Start reranking...----------------------------------------------------Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 Score: -1.584416151046753 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang?Guangtao Zeng?Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr... 
----------------------------------------------------Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f Score: -1.7028117179870605 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens...
----------------------------------------------------Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 Score: -2.904750347137451 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw...

   很明顯,在重新排序后,ID為ca4db90f-5c6e-47d5-a544–05a9a1d09bc6的節點已將其排序從2更改為1。這意味著最相關的上下文被排在第一位。

四、使用LLM作為重新排序器

? ? ? 現有LLM的重新排序方法大致可分為三類:1)使用重新排序任務對LLM進行微調;2)提示LLM進行重新排序;3)在訓練過程中使用LLM進行數據擴充。

? ? ? 提示LLM重新排序的方法的成本是較低的,以下是使用RankGPT[4]的演示,該演示已集成到LlamaIndex[5]中。

? ? ? ?RankGPT的想法是使用LLM(如ChatGPTGPT-4或其他LLM)執行zero-shot 段落重新排序,它應用排列生成方法和滑動窗口策略來有效地對段落進行重新排序。

       如圖3所示,論文[6]提出了三種可行的方法。

       前兩種方法是傳統方法,對每個文檔進行評分,然后根據該評分對所有段落進行排序。

       本文提出了第三種方法:排列生成。具體來說,該模型不依賴于外部評分,而是直接對段落進行端到端排序。換句話說,它直接利用LLM的語義理解能力對所有候選段落進行相關性排序。然而,通常候選文檔的數量非常大,而LLM的輸入是有限的。因此,通常不可能一次輸入所有文本。

       因此,如圖4所示,引入了一種滑動窗口方法,它遵循了氣泡排序的思想。每次只對前4個文本進行排序,然后移動窗口,對隨后的4個文本排序。在對整個文本進行迭代后,我們可以獲得性能最好的最優文本。

? ? ? ?請注意,為了使用RankGPT,需要安裝較新版本的LlamaIndex。之前安裝的版本(0.9.29)不包括RankGPT所需的代碼。因此,使用LlamaIndex 0.9.45.post1版本創建了一個新的conda環境。

       代碼很簡單,基于上一節的代碼,只需將RankGPT設置為重新排序即可。

from llama_index.postprocessor import RankGPTRerankfrom llama_index.llms import OpenAIreranker = RankGPTRerank( top_n = 3, llm = OpenAI(model="gpt-3.5-turbo-16k"), # verbose=True,)

總體結果如下:

(llamaindex_new) Florian:~ Florian$ python /Users/Florian/Documents/rerank.py ----------------------------------------------------Node ID: 20de8234-a668-442d-8495-d39b156b44bb Score: 0.8703492815379594 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw... 
----------------------------------------------------Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd Score: 0.8621633467539512 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang?Guangtao Zeng?Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr...
----------------------------------------------------Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 Score: 0.8343984516104476 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens...
------------------------------------------------------------------------------------------------Start reranking...----------------------------------------------------Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd Score: 0.8621633467539512 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang?Guangtao Zeng?Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr...
----------------------------------------------------Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 Score: 0.8343984516104476 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens...
----------------------------------------------------Node ID: 20de8234-a668-442d-8495-d39b156b44bb Score: 0.8703492815379594 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw...

       從結果中可以看出,重新排名后,排名前1的結果是包含答案的正確文本,這與之前使用重新排名模型獲得的結果一致。

五、評估

        使用智源的bge-reranker-base模型進行評估,如下代碼所示:

reranker = FlagEmbeddingReranker( top_n = 3, model = "BAAI/bge-reranker-base", use_fp16 = False)
# or using LLM as reranker# from llama_index.postprocessor import RankGPTRerank# from llama_index.llms import OpenAI# reranker = RankGPTRerank(# top_n = 3,# llm = OpenAI(model="gpt-3.5-turbo-16k"),# # verbose=True,# )
query_engine = index.as_query_engine( # add reranker to query_engine similarity_top_k = 3, node_postprocessors=[reranker])# query_engine = index.as_query_engine() # original query_engine

參考:https://ai.plainenglish.io/advanced-rag-03-using-ragas-llamaindex-for-rag-evaluation-84756b82dca7

六、結論

總的來說,本文介紹了重新排序的原則和兩種主流方法。

其中,使用重新排序模型的方法是輕量級的,并且開銷較小。

另一方面,使用LLM的方法在多個基準[7]測試上表現良好,但更昂貴,并且僅在使用ChatGPTGPT-4時表現良好,而在使用FLAN-T5和Vicuna-13B等其他開源模型時其性能不好。

因此,在實際項目中,需要進行特定的權衡。

參考文獻:

[1] https://txt.cohere.com/rerank/

[2] https://arxiv.org/pdf/2401.02385.pdf

[3] https://github.com/run-llama/llama_index/blob/v0.9.29/llama_index/response/notebook_utils.py

[4] https://arxiv.org/pdf/2304.09542.pdf

[5] https://github.com/run-llama/llama_index/blob/v0.9.45.post1/llama_index/postprocessor/rankGPT_rerank.py

[6] https://arxiv.org/pdf/2304.09542.pdf

[7] https://arxiv.org/pdf/2304.09542.pdf

文章轉自微信公眾號@ArronAI

上一篇:

LLM之RAG實戰(三十六)| 使用LangChain實現多模態RAG

下一篇:

LLM之RAG實戰(三十七)| 高級RAG從理論到LlamaIndex實現
#你可能也喜歡這些API文章!

我們有何不同?

API服務商零注冊

多API并行試用

數據驅動選型,提升決策效率

查看全部API→
??

熱門場景實測,選對API

#AI文本生成大模型API

對比大模型API的內容創意新穎性、情感共鳴力、商業轉化潛力

25個渠道
一鍵對比試用API 限時免費

#AI深度推理大模型API

對比大模型API的邏輯推理準確性、分析深度、可視化建議合理性

10個渠道
一鍵對比試用API 限時免費