圖片由作者提供

如今,我們用智能手機拍攝了大量照片,并將其中許多分享到社交網絡或消息應用程序上。然而,有時圖像并不足以充分表達我們在日常生活中、與家人共度時光或在難忘旅行中所捕捉到的那些珍貴瞬間。

試想,如果我們能利用Generative AI技術,用文字來描繪照片所蘊含的意義,讓AI來講述那些精彩紛呈的瞬間,那該有多好?你可以將這些文字發布在網上,與親朋好友分享,或者將它們記錄下來,作為自己的日記珍藏。

由于這是我個人非常想使用的工具,因此我決定以一個充滿創意的開發人員身份來實現它,而不是以研究人員、ML工程師或數據科學家的身份。我對利用和整合一系列強大的Google API來完成這項任務充滿興趣。

本文附帶了一個Jupyter/Colab筆記本,其中包含了整個解決方案的詳細步驟。這個方案涵蓋了從EXIF照片元數據提取,到使用Google Maps API獲取照片拍攝地點的信息,再到利用生成式AI API(如Vertex Imagen用于圖像描述,以及Vertex Palm API用于博客文章生成)的全過程。

該流程的輸出結果是一篇生成的博客文章,用于描述整個照片相冊。你可以將自己的相冊上傳到Colab筆記本中,然后輕松地看到Generative AI是如何用文字來描繪那些相機記錄下的美好時刻的。

設置

該項目依賴于 Google Cloud Platform(GCP)來訪問相關API。若您打算在Colab上運行,可以選擇使用現有的GCP賬戶,或者在此注冊新賬戶并獲取300美元的免費積分。

若您想在Colab上利用提供的照片或自己的照片運行筆記本,筆記本的設置指南將指引您完成以下步驟:安裝必要的庫、通過Google身份驗證登錄GCP、獲取Google Maps Platform API密鑰,并啟用以下API:

設置的最后一步是下載我提供的洛杉磯和舊金山旅行示例照片。

處理照片

在此筆記本部分中,您將配置包含相冊照片的文件夾的路徑。它將使用 Pillow 成像庫處理照片以執行以下任務:

使用 Google Maps API 提取位置和附近地點

Google Maps為不同的任務提供了許多專門的 API。這里我們使用以下 API:

在設置了Maps Platform API密鑰后,調用Geocoding API和Places API將變得非常簡單。

import googlemaps
gmaps = googlemaps.Client(key=MAPS_API_KEY)
locations = gmaps.reverse_geocode(latlng=(lat,lng))
nearby_places = gmaps.places_nearby(location=(lat,lng), radius=radius)

使用 Generative AI 和 Vertex Imagen 進行照片字幕

在本筆記本的這一部分中,我們將開始使用生成式 AI。Vertex Imagen 提供了一個用于圖像字幕的 API,即能夠以文本格式描述圖片中的內容。

為此,我們首先需要使用您的 GCP 項目初始化 Vertex AI SDK。

import vertexai
from vertexai.vision_models import ImageTextModel, Image

vertexai.init(project=PROJECT_ID)
model = ImageTextModel.from_pretrained("imagetext")

然后從圖像中獲取標題很簡單。

source_image = Image.load_from_file(location=path)
captions = model.get_captions(
image=source_image,
number_of_results=1,
language="en",
)

設計提示

在大型語言模型(LLM)的應用場景中,提示是指向模型提供的輸入或查詢,旨在引導模型生成相應的響應。提示的質量和具體性對于塑造模型的輸出至關重要。

LLM 通常會按照提示中的說明進行微調,從而能夠執行他們以前沒有接受過培訓的任務。設計一個好的提示通常需要一個與 LLM 交互的試錯過程,并檢查輸出是否接近(或優于)預期。

本項目需要設計一個提示,指導LLM生成一個帖子,用以描述一組照片中所捕捉的瞬間。這包括為LLM編寫特定的指示,明確輸入格式(包含照片元數據的列表)、需遵循的規范(例如,在描述照片時引用<Photo id>)以及期望的輸出格式,即包含交錯文本和照片占位符的內容。

你可以參考我提供的prompt模板,它已被封裝在下面的函數中。請注意,該模板包含用于照片描述和上下文段落的占位符,你可能希望為LLM提供更多關于照片拍攝背景的信息。

快速工程

提示工程技術是研究人員或社區發現并提出的一系列提示設計模式,旨在幫助LLM產生更優的輸出。其中,few-shot prompting技術便是一種,它要求我們提供一些輸入和預期輸出的示例,就像下面的prompt模板那樣。在我使用 Vertex Palm API 的測試中,這種技術在大多數情況下都有助于獲得所需的輸出。

def generate_prompt(context, pictures_infos):
prompt = f"""
You are a copywriter and journalist.
Can you help me to write a photo tour that describes the moments
registered in a photo album from a context and some information
I provide about the photos?

The items were already sorted by the date and time the photos were taken.
Pay attention to the dates and time to infer how many days were
covered by these photos and at which time of the day they were taken.

Please include descriptions of all the photos taken.
Only report places or experiences that are described by the
photo informations.

The photos information has the following structure:

- <Photo id> | Date the photo was taken | Time the photo was taken |
Photo Description generated by an LLM |
Approximate Locations where the photo was taken |
Approximate Nearby locations where photo was taken

Here is an example of photo information and how it should be generated
in plain text, interleaving photo descriptions and the <Photo id>.

Example photo information:
- <Photo 0> | Date: 08/04/2023 | Time: 07:53:13 |
Photo Description: a man stands in front of a sign that says
welcome to the united states |
Possible Photo Locations: BURBERRY LAX TERMINAL B, Los Angeles
International Airport, Terminal B, Los Angeles, Los Angeles County |
Possible Photo Nearby locations: Los Angeles, Star Alliance Lounge,
ICE International Currency Exchange, Relay, Bank of America.

Expected output:
I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety
to leave the airport and get to visit the city.
<Photo 0>

```
Photos album context: {context}

Photos description:
{pictures_infos}
```
"""
return prompt

生成提示

在此示例中,我們提供了 photos 元數據和一個簡短的上下文段落,以根據上面的模板生成提示

album_context = """I flew to Los Angeles for a short trip,
and the album contains the photos
from the day I arrived there.
The man in those photos is myself.
"""

blog_prompt = generate_prompt(album_context, photos_info_concat)

現在來嘗試一下。只需復制此過程所生成的以下提示,并將其粘貼到用戶端的LLM聊天系統(例如BARD)中。
您可能會像我??一樣對結果印象深刻!

You are a copywriter and journalist.
Can you help me write a photo tour that describes the moments registered in a
photo album from a context and some information I provide about the photos?

The items are already sorted by the time the photos was taken.
Pay attention to the dates and time to infer how many days were
covered by these photos and in which time of the day they were taken.
Please include descriptions of all the photos taken.
Do not report any place or experience that is not described by the
photo informations.
The photos information has following structure:
- <Photo id> | Date the photo was taken | Time the photo was taken |
Photo Description generated by an LLM |
Approximate Locations where the photo was taken |
Approximate Nearby locations where photo was taken
Here is an example of photo information and how it should be generated in plain
text,
interleaving photo descriptions and the <Photo id>.
Example photo information:
- <Photo 0> | Date: 08/04/2023 | Time: 07:53:13 |
Photo Description: a man stands in front of a sign that says welcome to the
united states |
Possible Photo Locations: BURBERRY LAX TERMINAL B, Los Angeles International
Airport, Terminal B, Los Angeles, Los Angeles County |
Possible Photo Nearby locations: Los Angeles, Star Alliance Lounge, ICE
International Currency Exchange, Relay, Bank of America
Expected output:
I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety to leave the airport
and get to visit the city.
<Photo 0>

```
Photos album context: I flew to Los Angeles for a short trip, and the album
contains the photos from the day I arrived there. The man in those photos is myself.

Photos description:
- <Photo 0> | Date and time: 08/04/2023 (Friday) 07:53 AM | Photo Description: a
man stands in front of a sign that says welcome to the united states | Locations: BURBERRY
LAX TERMINAL B, Los Angeles International Airport, Los Angeles, Los Angeles County,
California | Possible Nearby locations: Los Angeles, Star Alliance Lounge, ICE
International Currency Exchange, Relay, Bank of America
- <Photo 1> | Date and time: 08/04/2023 (Friday) 09:32 AM | Photo Description: a man in a
nasa shirt is sitting in a white car | Locations: Los Angeles International Airport, Los
Angeles, Los Angeles County, California, United States | Possible Nearby locations: Los
Angeles
- <Photo 2> | Date and time: 08/04/2023 (Friday) 09:59 AM | Photo Description: a man in a
white shirt is driving a mustang | Locations: Westchester, Los Angeles, Los Angeles
County, California, United States | Possible Nearby locations: Plaza Towers OBGYN:
Lawrence Bruksch, MD, LA Fitness, Dr. Jitsen Chang, Obstetrician-gynecologist, Kinecta
Federal Credit Union - Westchester, Clarity Retirement
- <Photo 3> | Date and time: 08/04/2023 (Friday) 10:29 AM | Photo Description: a man
wearing a nasa shirt stands on a beach | Locations: Los Angeles, Los Angeles County,
California, United States | Possible Nearby locations: Los Angeles, Venice
- <Photo 4> | Date and time: 08/04/2023 (Friday) 11:29 AM | Photo Description: a man sits
on a bench in front of a subba gump shrimp restaurant | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Bubba Gump Shrimp
Co., Santa Monica Pier Rock Shop, Pier Burger, Santa Monica Police Pier Substation, 66-To-
Cali
- <Photo 5> | Date and time: 08/04/2023 (Friday) 11:43 AM | Photo Description: a man
stands on a pier with a ferris wheel in the background | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Santa Monica Pier,
The eCenter, Character Drawings, Santa Monica Pier, ビーチ?サインズ&モア
- <Photo 6> | Date and time: 08/04/2023 (Friday) 11:46 AM | Photo Description: a man
stands on a pier with a seagull sitting on the railing | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Santa Monica,
Pacific Plunge, Inkie’s Scrambler, Fun 'N' Games, Pacific Wheel
- <Photo 7> | Date and time: 08/04/2023 (Friday) 11:52 AM | Photo Description: a man with
a backpack that says o'neill on it | Locations: Santa Monica, Los Angeles County,
California, United States | Possible Nearby locations: Coffee Bean & Tea Leaf, Japadog (at
Santa Monica Pier), Santa Monica Trapeze School, Pacific Park on the Santa Monica Pier,
Funnel Cakes
- <Photo 8> | Date and time: 08/04/2023 (Friday) 12:10 PM | Photo Description: a man poses
in front of the cheesecake factory | Locations: Downtown, Santa Monica, Los Angeles
County, California, United States | Possible Nearby locations: Forever 21, Tiffany & Co.,
Louis Vuitton Santa Monica Place, Pandora Jewelry, Johnny Was
- <Photo 9> | Date and time: 08/04/2023 (Friday) 12:32 PM | Photo Description: a plate of
food with a napkin that says the cheesecake factory | Locations: Downtown, Santa Monica,
Los Angeles County, California, United States | Possible Nearby locations: Forever 21,
Tesla, Nike Santa Monica, Louis Vuitton Santa Monica Place, Pandora Jewelry
- <Photo 10> | Date and time: 08/04/2023 (Friday) 01:15 PM | Photo Description: a man
stands in front of a blue tesla model x | Locations: Downtown, Santa Monica, Los Angeles
County, California, United States | Possible Nearby locations: Forever 21, Tiffany & Co.,
Louis Vuitton Santa Monica Place, Pandora Jewelry, Johnny Was
- <Photo 11> | Date and time: 08/04/2023 (Friday) 05:03 PM | Photo Description: a green
trolley is parked in front of a gap store | Locations: La Brea, Central LA, Los Angeles,
Los Angeles County, California | Possible Nearby locations: Haagen-Dazs Ice Cream Shops,
Wetzel's Pretzels, Nike The Grove, Gap, Bar Verde
- <Photo 12> | Date and time: 08/04/2023 (Friday) 05:44 PM | Photo Description: a variety
of caramel apples are displayed in a store | Locations: La Brea, Central LA, Los Angeles,
Los Angeles County, California | Possible Nearby locations: Los Angeles, The Original
Farmers Market, The Dog Bakery - Fresh Baked Treats & Dog Birthday Cakes, Marconda's,
Littlejohn's English Toffee House & Fine Candies
- <Photo 13> | Date and time: 08/04/2023 (Friday) 06:01 PM | Photo Description: a man is
holding a scoop of ice cream in front of a sign that says " drinks " | Locations: Farmers
Market, La Brea, Central LA, Los Angeles, Los Angeles County | Possible Nearby locations:
Los Angeles, The Original Farmers Market, Littlejohn's English Toffee House & Fine
Candies, Hutchco Technologies, Marconda's
- <Photo 14> | Date and time: 08/04/2023 (Friday) 06:06 PM | Photo Description: cars are
parked in front of a ross store | Locations: 3rd / Ogden, La Brea, Central LA, Los
Angeles, Los Angeles County | Possible Nearby locations: A1 Locksmith & Keys, GapBody, 3rd
/ Ogden, 3rd & Ogden (Eastbound), Karsaz & Associates
- <Photo 15> | Date and time: 08/04/2023 (Friday) 09:33 PM | Photo Description: a hotel
room with a blue blanket on the bed | Locations: Eagle Rock, Northeast Los Angeles, Los
Angeles, Los Angeles County, California | Possible Nearby locations: Welcome Inn, North
East Los Angeles Hotel Owners Association, Kandoo Kitchen, Inland Faculty Medical Group
Inc, Pathway Healthcare
- <Photo 16> | Date and time: 08/04/2023 (Friday) 09:58 PM | Photo Description: two boxes
of food on a table with a fork | Locations: Eagle Rock, Northeast Los Angeles, Los
Angeles, Los Angeles County, California | Possible Nearby locations: Welcome Inn, North
East Los Angeles Hotel Owners Association, MV, Inland Faculty Medical Group Inc, Pathway
Healthcare
```

生成帖子

現在,我們將使用Vertex Palm API中的TextGenerationModel來提交前面設計的提示,并獲取生成的帖子。您可以通過調整溫度、top_k和top_p等參數來配置生成文本的隨機性或創造性水平,具體如相關評論和API文檔所述。

from vertexai.language_models import TextGenerationModel
generation_model = TextGenerationModel.from_pretrained("text-bison")

def generate_text(prompt, temperature=1.0,
top_p= 0.4, top_k=40, max_output_tokens=1024):
parameters = {
# Temperature controls the degree of randomness in token selection.
"temperature": temperature,
# Tokens are selected from most probable to least until the sum
# of their probabilities equals the top_p value.
"top_p": top_p,
# A top_k of 1 means the selected token is the most probable
# among all tokens.
"top_k": top_k,
# Token limit determines the maximum amount of text output.
"max_output_tokens": max_output_tokens,
}

generated_text = generation_model.predict(prompt=prompt, **parameters).text
return generated_text

Palm API的輸出是一個生成的帖子,其中<Photo id>占位符與描述內容相互交錯。LLM會決定在文本中的哪些位置包含照片。以下是一個示例。隨后,我利用正則表達式查找這些照片占位符,并將它們替換為實際的照片。

I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety
to leave the airport and get to visit the city.
<Photo 0>

I rented a car and drove to my hotel in Eagle Rock.
The hotel was nice and comfortable.
<Photo 14>

The next morning I went to Santa Monica Pier.
I had lunch at Bubba Gump Shrimp Co. and then walked around the pier.
<Photo 4>, <Photo 5>, <Photo 6>, <Photo 7>

In the afternoon I went to the Cheesecake Factory.
I had a delicious meal and then went shopping at the mall.
<Photo 8>, <Photo 9>, <Photo 10>

In the evening I went to Farmers Market.
I bought some caramel apples and ice cream.
<Photo 11>, <Photo 12>, <Photo 13>

It was a long day but I had a lot of fun.
I can't wait to explore more of Los Angeles tomorrow.

您將會看到以下“Photo StoryTelling”為我兩次旅行所生成的帖子示例。LLM的輸出具有不確定性,并且在質量和對提示中所描述事實的保真度上會有所不同。為了生成不同的響應,您可能希望嘗試對?temperaturetop_p?和?top_k?使用不同的配置,或者只是向?TextGenerationModel?發送新請求。

AI 生成的關于我的洛杉磯之旅的帖子

album_context = """I flew to Los Angeles for a short trip, 
and the album contains the photos
from the day I arrived there.
The man in those photos is myself.
"""
blog_prompt_sf = generate_prompt(album_context, photos_info_concat_sf)
generated_post = generate_text(prompt=blog_prompt)

我很高興終于抵達了目的地洛杉磯。
當我踏入美國海關的那一刻,心中充滿了迫切想要離開機場、去探索這座城市的焦慮。

我乘出租車去了我在鷹巖的酒店。房間雖小但很舒適。

放下行李后,我出去探索這座城市。我首先開車去了圣莫尼卡碼頭。
我沿著碼頭走,欣賞著海洋的景象和聲音。我什至乘坐了摩天輪!

在碼頭呆了一會兒后,我去了芝士蛋糕工廠吃午飯。食物很美味,服務也非常好。

下午,我去了拉布雷亞的農貿市場。我買了一些新鮮農產品和鮮花。我還吃了一些冰淇淋。

我回到酒店房間放松一下,結束了一天的行程。
我所做的所有探索讓我感到疲憊,但我也很高興看到第二天會發生什么。

AI生成的關于我的舊金山之旅的帖子

album_context_sf = """Me and my wife travelled to San Francisco. 
We spent a single day there. We rented a car in SF and visited.
many places during that day.
The man in the pictures is myself and the woman is my wife.
"""
blog_prompt_sf = generate_prompt(album_context_sf, photos_info_concat_sf)
generated_post_sf = generate_text(prompt=blog_prompt)

我和我的妻子去了舊金山。我們在那里呆了一天。我們在舊金山租了一輛車,那天去了很多地方。

我們在舊金山國際機場開始了新的一天。我們很高興終于來到舊金山并準備探索這座城市。

我們開車去了俄羅斯山,找到了一個停車位。我們在附近走了一圈,欣賞了這里的景象和聲音。

我們走到金門大橋上,拍了一些照片。那天天氣晴朗,這座橋令人驚嘆。

我們坐在長凳上,看著船只駛過。它是如此寧靜和放松。

我們走回車里,開車去了濱海區。我們在湖邊散步,欣賞風景。

我們在濱海區的一家餐廳停下來吃晚飯。食物很美味,氣氛很熱鬧。

晚飯后我們在濱海區走了一圈,又拍了一些照片。

我們開車去了漁人碼頭,在商店和餐館里走了一圈。我們晚餐吃了一些美味的海鮮。

晚飯后,我們在漁人碼頭周圍走了一圈,又拍了一些照片。我們真的很享受在這個社區的時光。

我們開車去了梅森堡,繞著 Ghirardelli 廣場走了一圈。我們吃了一些美味的巧克力和冰淇淋。

晚飯后我們在梅森堡周圍走了一圈,又拍了一些照片。

我們在舊金山度過了一段美好的時光,我們迫不及待地想很快再次回來。

結論

如果您已經閱讀至此,那么您定能體會到將數據提取(例如從圖像中獲取EXIF元數據)、數據增強(例如利用Google Maps API根據地理坐標確定位置)、提示工程(如小樣本學習)以及生成式AI(如Vertex Imagen和Palm API)相結合所能產生的強大效果。在這個案例中,這些技術共同生成了描述照片相冊的有趣博客文章。

希望您能喜歡這個項目,并愿意動手嘗試,或許您可以使用自己的照片,看看能生成出怎樣描述您美好時刻的博客文章!

原文鏈接:https://medium.com/google-developer-experts/photo-storytelling-leveraging-generative-ai-and-google-apis-to-compose-posts-from-your-photo-cce8e30f4d57

上一篇:

使用 NestJS 和 Prisma 構建 REST API:錯誤處理

下一篇:

使用 Exchangeratesapi.io 自動更新貨幣匯率
#你可能也喜歡這些API文章!

我們有何不同?

API服務商零注冊

多API并行試用

數據驅動選型,提升決策效率

查看全部API→
??

熱門場景實測,選對API

#AI文本生成大模型API

對比大模型API的內容創意新穎性、情感共鳴力、商業轉化潛力

25個渠道
一鍵對比試用API 限時免費

#AI深度推理大模型API

對比大模型API的邏輯推理準確性、分析深度、可視化建議合理性

10個渠道
一鍵對比試用API 限時免費