
圖片AI工具:探索最新的圖像生成技術(shù)
要開(kāi)始使用Google Speech-to-Text API,首先需要在Python環(huán)境中安裝google-cloud-speech
包,并在Google Cloud項(xiàng)目中啟用Speech-to-Text API。
%pip install --upgrade --quiet google-cloud-speech
按照Google Cloud快速入門(mén)指南創(chuàng)建項(xiàng)目并啟用API。
使用Google Speech-to-Text API前,需要準(zhǔn)備project_id
和file_path
。音頻文件可以是Google Cloud Storage的URI或本地文件路徑。
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
file_path = 'gs://cloud-samples-data/speech/brooklyn_bridge.raw'
audio = types.RecognitionAudio(uri=file_path)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US',
)
response = client.recognize(config=config, audio=audio)
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
可以通過(guò)config
參數(shù)自定義識(shí)別配置,如選擇不同的語(yǔ)音識(shí)別模型和功能。
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US',
enable_automatic_punctuation=True,
)
在某些地區(qū)訪(fǎng)問(wèn)Google API可能會(huì)不穩(wěn)定,推薦使用API代理服務(wù)提高訪(fǎng)問(wèn)穩(wěn)定性。例如,可以使用API代理服務(wù)。
Google Speech-to-Text API對(duì)單個(gè)音頻文件的長(zhǎng)度有限制(60秒或10MB)。對(duì)于更長(zhǎng)的音頻文件,可以將其分割成多個(gè)小文件進(jìn)行處理。
確保config
中的language_code
與音頻文件中的語(yǔ)言一致,以獲得最佳的識(shí)別效果。
以下是一個(gè)使用Python將語(yǔ)音文件轉(zhuǎn)換為文本的完整示例。
from google.cloud import speech
client = speech.SpeechClient()
gcs_uri = 'gs://cloud-samples-data/speech/brooklyn_bridge.raw'
audio = speech.RecognitionAudio(uri=gcs_uri)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US',
)
response = client.recognize(config=config, audio=audio)
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
以下示例展示了如何使用麥克風(fēng)實(shí)時(shí)捕捉語(yǔ)音并轉(zhuǎn)換為文本。
import os
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
import pyaudio
from six.moves import queue
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'your-path-to-credentials.json'
RATE = 16000
CHUNK = int(RATE / 10)
class MicrophoneStream(object):
def __init__(self, rate, chunk):
self._rate = rate
self._chunk = chunk
self._buff = queue.Queue()
self.closed = True
def __enter__(self):
self._audio_interface = pyaudio.PyAudio()
self._audio_stream = self._audio_interface.open(
format=pyaudio.paInt16,
channels=1,
rate=self._rate,
input=True,
frames_per_buffer=self._chunk,
stream_callback=self._fill_buffer,
)
self.closed = False
return self
def __exit__(self, type, value, traceback):
self._audio_stream.stop_stream()
self._audio_stream.close()
self.closed = True
self._buff.put(None)
self._audio_interface.terminate()
def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
self._buff.put(in_data)
return None, pyaudio.paContinue
def generator(self):
while not self.closed:
chunk = self._buff.get()
if chunk is None:
return
data = [chunk]
try:
while True:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
except queue.Empty:
break
yield b''.join(data)
def listen_print_loop(responses):
num_chars_printed = 0
for response in responses:
if not response.results:
continue
result = response.results[0]
if not result.alternatives:
continue
transcript = result.alternatives[0].transcript
overwrite_chars = ' ' * (num_chars_printed - len(transcript))
if not result.is_final:
sys.stdout.write(transcript + overwrite_chars + 'r')
sys.stdout.flush()
num_chars_printed = len(transcript)
else:
print(transcript + overwrite_chars)
if re.search(r'b(exit|quit)b', transcript, re.I):
print('Exiting..')
break
num_chars_printed = 0
def main():
language_code = 'zh'
client = speech.SpeechClient()
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=RATE,
language_code=language_code,
)
streaming_config = speech.StreamingRecognitionConfig(
config=config, interim_results=True
)
with MicrophoneStream(RATE, CHUNK) as stream:
audio_generator = stream.generator()
requests = (
speech.StreamingRecognizeRequest(audio_content=content)
for content in audio_generator
)
responses = client.streaming_recognize(streaming_config, requests)
listen_print_loop(responses)
if __name__ == '__main__':
main()
本文詳細(xì)介紹了Google語(yǔ)音識(shí)別技術(shù)的原理、應(yīng)用場(chǎng)景、安裝設(shè)置、使用方法以及常見(jiàn)問(wèn)題的解決方案。通過(guò)實(shí)際代碼示例,展示了如何使用Google Speech-to-Text API將音頻文件轉(zhuǎn)換為文本。希望本文能幫助您快速上手Google語(yǔ)音識(shí)別技術(shù),并在實(shí)際項(xiàng)目中得到應(yīng)用。
Google語(yǔ)音識(shí)別技術(shù)以其準(zhǔn)確性和易用性,為開(kāi)發(fā)者提供了強(qiáng)大的工具,推動(dòng)了智能語(yǔ)音處理技術(shù)的發(fā)展。通過(guò)不斷學(xué)習(xí)和實(shí)踐,我們可以更好地利用這項(xiàng)技術(shù),創(chuàng)造更多價(jià)值。
如果您對(duì)Google語(yǔ)音識(shí)別技術(shù)有更多的問(wèn)題或想法,歡迎在評(píng)論區(qū)交流討論。
圖片AI工具:探索最新的圖像生成技術(shù)
QA問(wèn)答如何應(yīng)用大模型:深入解析與實(shí)踐指南
curl無(wú)法訪(fǎng)問(wèn)api.openai.com的解決方案與實(shí)踐
兼容各種端的Web框架深度分析與實(shí)踐指南
多層感知機(jī)(MLP)深度解析
全網(wǎng)最詳細(xì)的Spring入門(mén)教程
在Nest.js中使用Redis:高效緩存與數(shù)據(jù)管理
Twitter網(wǎng)頁(yè)版:賬號(hào)管理與防封技巧(2024最新指南)
GoogLeNet架構(gòu)示意圖與代碼實(shí)現(xiàn)
對(duì)比大模型API的內(nèi)容創(chuàng)意新穎性、情感共鳴力、商業(yè)轉(zhuǎn)化潛力
一鍵對(duì)比試用API 限時(shí)免費(fèi)