99热99操99射,日本黄色网址大全,一本一本久久α久久精品66

下面稍微介紹一下幾個關鍵步驟：

步驟1:從PDF中提取圖像

使用unstructured庫抽取PDF信息，并創建一個文本和圖像列表。提取的圖像需要存儲在特定的文件夾中。

# Extract images, tables, and chunk textfrom unstructured.partition.pdf import partition_pdf
raw_pdf_elements = partition_pdf( filename="LCM_2020_1112.pdf", extract_images_in_pdf=True, infer_table_structure=True, chunking_strategy="by_title", max_characters=4000, new_after_n_chars=3800, combine_text_under_n_chars=2000, image_output_dir_path=path,)

步驟2：創建矢量數據庫

準備矢量數據庫，并將圖像URI和文本添加到矢量數據庫中。

# Create chromavectorstore = Chroma( collection_name="mm_rag_clip_photos", embedding_function=OpenCLIPEmbeddings())
# Get image URIs with .jpg extension onlyimage_uris = sorted( [ os.path.join(path, image_name) for image_name in os.listdir(path) if image_name.endswith(".jpg") ])
print(image_uris)# Add imagesvectorstore.add_images(uris=image_uris)
# Add documentsvectorstore.add_texts(texts=texts)

步驟3：QnA管道

? ? ? ?最后一部分是Langchain QnA管道。

chain = ( { "context": retriever | RunnableLambda(split_image_text_types), "question": RunnablePassthrough(), } | RunnableLambda(prompt_func) | model | StrOutputParser())

如上面的腳本所示，有兩個重要的自定義函數在這個管道中發揮著重要作用，分別是：split_image_text_types和prompt_func。

對于split_image_text_types函數，使用CLIP嵌入獲取相關圖像后，還需要將圖像轉換為base64格式，因為GPT-4-Vision的輸入是base64格式。

prompt_func函數是描述如何構建prompt工程。在這里，我們將“問題”和“base64圖像”放在提示中。

二、帶圖像摘要的多向量檢索器

? ? ? ?根據langchain的觀察結果，該方法比多模態嵌入方法的準確性更高，它使用GPT-4V提取摘要。

如果使用多矢量方法摘要時，需要提交文檔中的所有圖片進行摘要。當在矢量數據庫中進行相似性搜索時，圖像的摘要文本也可以提供信息。

一般來說，摘要用于查找相關圖像，相關圖像輸入到多模態LLM以回答用戶查詢。這兩條信息需要分開，這就是為什么我們在這種情況下使用多向量檢索器[3]的原因。

優點：圖像檢索精度高

缺點：這種方法非常昂貴，因為GPT-4-Vision非常昂貴，尤其是如果你想總結許多圖像，如果你有成本問題，可能會是一個問題。

完整的代碼可以參考langchain cookbook[4]，下圖是整體架構圖：

? ? ? ?該方法也可以很好地回答圖像中的問題，唯一的問題是整個過程比較緩慢，大概是因為GPT-4-Vision需要時間來處理整個事情。

下面稍微介紹一下幾個關鍵步驟：

步驟1：提取圖像

該步驟與上述多模態嵌入方法類似。

步驟2：生成文本和圖像摘要

使用generate_text_summarys和generate_mg_summarys函數生成每個文本和圖像的摘要。數據結構，如下所示：

{"input":text, "summary":text_summaries}, {"input":table, "summary":table_summaries},{"input":image_base64, "summary":image_summaries }

此結構將添加到多矢量檢索器中。

步驟3：創建多矢量檢索器

? ? ? 使用下面的python腳本將上述數據結構添加到多向量檢索器中，我們還需要指定矢量數據庫。

# The vectorstore to use to index the summariesvectorstore = Chroma( collection_name="mm_rag_cj_blog", embedding_function=OpenAIEmbeddings())
# Create retrieverretriever_multi_vector_img = create_multi_vector_retriever( vectorstore, text_summaries, texts, table_summaries, tables, image_summaries, img_base64_list,)

? ? ?使用create_multi_vector_requirer函數初始化多向量。langchain中的MultiVectorRetriever是一個創建多向量檢索器的類，在這里我們需要添加向量存儲、文檔存儲和key_id作為輸入。

? ? ? 該案例中，我們的文檔存儲設置為InMemoryStore，將不會持久化。因此需要使用另一個langchain存儲類來使其持久化，可以參考[5]。

docstore很重要，它存儲我們的圖像和文本，而不是摘要。摘要存儲在矢量存儲中。

def create_multi_vector_retriever( vectorstore, text_summaries, texts, table_summaries, tables, image_summaries, images): """ Create retriever that indexes summaries, but returns raw images or texts """
 # Initialize the storage layer store = InMemoryStore() id_key = "doc_id"
 # Create the multi-vector retriever retriever = MultiVectorRetriever( vectorstore=vectorstore, docstore=store, id_key=id_key, )
 # Helper function to add documents to the vectorstore and docstore def add_documents(retriever, doc_summaries, doc_contents): doc_ids = [str(uuid.uuid4()) for _ in doc_contents] summary_docs = [ Document(page_content=s, metadata={id_key: doc_ids[i]}) for i, s in enumerate(doc_summaries) ] retriever.vectorstore.add_documents(summary_docs) retriever.docstore.mset(list(zip(doc_ids, doc_contents)))
 # Add texts, tables, and images # Check that text_summaries is not empty before adding if text_summaries: add_documents(retriever, text_summaries, texts) # Check that table_summaries is not empty before adding if table_summaries: add_documents(retriever, table_summaries, tables) # Check that image_summaries is not empty before adding if image_summaries: add_documents(retriever, image_summaries, images)
 return retriever

步驟4：QnA管道

最后一步與前面的方法類似。只有檢索器不同，現在我們使用的是多向量檢索器。