麥克風(fēng)權(quán)限

在我們開(kāi)始錄制音頻之前，請(qǐng)確保獲得訪問(wèn)麥克風(fēng)的必要權(quán)限。

對(duì)于 Windows

打開(kāi)“設(shè)置”。
2. 轉(zhuǎn)到隱私 > 麥克風(fēng)。
3. 確保“此設(shè)備的麥克風(fēng)訪問(wèn)”已打開(kāi)。
4. 確保您使用的應(yīng)用程序（例如，您的 Python IDE）被允許訪問(wèn)麥克風(fēng)。

對(duì)于 MacOS

打開(kāi)系統(tǒng)偏好設(shè)置。
轉(zhuǎn)到安全和隱私> 隱私。
從左側(cè)菜單中選擇麥克風(fēng)。
確保已檢查您正在使用的應(yīng)用程序（例如，Python IDE）。

設(shè)置

我們需要幾個(gè)庫(kù)來(lái)錄制和處理音頻。

pyaudio：從麥克風(fēng)捕獲音頻。
? wave：處理.wav 文件。
? tempfile：創(chuàng)建用于存儲(chǔ)錄音的臨時(shí)文件。
? simpleaudio：播放音頻（用于調(diào)試）。

要安裝先決條件，只需運(yùn)行以下代碼片段。

# Prerequisites for the Python Modules

!brew install ffmpeg

!brew install portaudio



# Audio Processing

%pip install -q simpleaudio

%pip install -q pyaudio

%pip install -q wave



# Clipboard Management

%pip install -q pyperclip



# Speech Transcriber

%pip install -q openai

%pip install -q openai --upgrade # fix for  Cannot import name 'OpenAI' from 'openai'



# Securing API keys

%pip install -q python-dotenv

從設(shè)備麥克風(fēng)錄制音頻

我們將創(chuàng)建一個(gè)處理錄音的函數(shù)。此功能將支持手動(dòng)和定時(shí)錄音：

設(shè)置臨時(shí)文件。 我們創(chuàng)建一個(gè)臨時(shí)文件來(lái)存儲(chǔ)錄制的音頻。使用后該文件將被刪除。

temp_file = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)

temp_file_name = temp_file.name

2. 回調(diào)函數(shù)。 此函數(shù)在錄音時(shí)將音頻數(shù)據(jù)寫(xiě)入臨時(shí)文件。

def callback(data_input, frame_count, time_info, status):

  wav_file.writeframes(data_input)

  return None, pyaudio.paContinue

3. 錄制音頻。 我們?cè)O(shè)置麥克風(fēng)來(lái)捕捉音頻并將其保存到臨時(shí)文件中。

打開(kāi) .wav 文件進(jìn)行寫(xiě)入。設(shè)置音頻格式：1 通道、16 位樣本和 16000 Hz 采樣率。

這些值是語(yǔ)音識(shí)別任務(wù)的標(biāo)準(zhǔn)：

1 通道（單聲道） ：?jiǎn)温暤酪纛l足以進(jìn)行語(yǔ)音識(shí)別，并減少要處理的數(shù)據(jù)量。
16 位樣本 ：在音頻質(zhì)量和文件大小之間提供良好的平衡。
16000 Hz 采樣率 ：常用于語(yǔ)音識(shí)別，因?yàn)樗梢圆蹲饺祟?lèi)語(yǔ)音所需的頻率范圍，同時(shí)保持文件大小可管理。

初始化 PyAudio 并開(kāi)始錄音。

import pyaudio

import wave

import tempfile

import time



def record_audio(timed_recording=False, record_seconds=5):

    temp_file = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)

    temp_file_name = temp_file.name



    def callback(data_input, frame_count, time_info, status):

        wav_file.writeframes(data_input)

        return None, pyaudio.paContinue



    with wave.open(temp_file_name, "wb") as wav_file:

        wav_file.setnchannels(1)  # Mono channel

        wav_file.setsampwidth(2)  # 16-bit samples

        wav_file.setframerate(16000)  # 16kHz sample rate



        audio = pyaudio.PyAudio()

        stream = audio.open(

            format=pyaudio.paInt16,

            channels=1,

            rate=16000,

            input=True,

            frames_per_buffer=1024,

            stream_callback=callback,

        )



        if timed_recording:

            print(f"Recording for {record_seconds} seconds...")

            time.sleep(record_seconds)

        else:

            input("Press Enter to stop recording...")



        stream.stop_stream()

        stream.close()

        audio.terminate()



    return temp_file_name

此函數(shù)允許我們記錄特定持續(xù)時(shí)間（timed_recording=True）或直到用戶按下 Enter（timed_recording=False）。

轉(zhuǎn)錄或翻譯音頻

現(xiàn)在，讓我們創(chuàng)建一個(gè)函數(shù)來(lái)處理轉(zhuǎn)錄（用于英語(yǔ)）和翻譯（用于非英語(yǔ)音頻）：

1.導(dǎo)入 OpenAI 庫(kù)

我們使用 OpenAI 庫(kù)來(lái)訪問(wèn) Audio Whisper API。要使用 OpenAI API，您需要設(shè)置 API 密鑰。您可以從 OpenAI 網(wǎng)站獲取 API 密鑰。

出于安全考慮，請(qǐng)?jiān)陧?xiàng)目目錄中創(chuàng)建一個(gè) .env 文件并將 OpenAI API 密鑰存儲(chǔ)在那里。這樣，您就避免直接在代碼中硬編碼敏感信息。

1. 在與筆記本相同的目錄中創(chuàng)建一個(gè).env 文件。

2. 將您的 OpenAI API 密鑰添加到 .env 文件中，格式如下：

OPEN_AI_API_KEY=your_actual_api_key_here

3. 使用 dotenv 庫(kù)在筆記本中加載環(huán)境變量。

獲得密鑰后，您可以按如下方式在代碼中進(jìn)行設(shè)置：

from openai import OpenAI

from dotenv import load_dotenv

import os



# Load the OpenAI API key from the .env file

load_dotenv()

openai_api_key = os.getenv("OPEN_AI_API_KEY")



# Set up your OpenAI API client

client = OpenAI(api_key=openai_api_key)

2. 轉(zhuǎn)錄音頻

打開(kāi)錄制的音頻文件并將其發(fā)送到 OpenAI Audio Whisper API 進(jìn)行轉(zhuǎn)錄。API 返回文本。

def process_audio(file_name, is_english=True, prompt=""):

    with open(file_name, "rb") as audio_file:

        if is_english:

            response = client.audio.transcriptions.create(

                model="whisper-1", file=audio_file, prompt=prompt

            )

        else:

            response = client.audio.translations.create(

                model="whisper-1", file=audio_file

            )



        return response.text.strip()

注意： 您可以使用提示來(lái)指導(dǎo)您錄制時(shí)的轉(zhuǎn)錄。這有多種用途，例如拼寫(xiě)糾正、語(yǔ)言規(guī)范、首字母縮略詞識(shí)別、填充詞刪除或添加、標(biāo)點(diǎn)符號(hào)等等。

請(qǐng)查看 Audio Whisper API 的參考資料以了解更多信息。或者，您也可以查看 prestontuggle 的 AI Cookbook Recipe 。

復(fù)制到剪貼板

1.導(dǎo)入 pyperclip

這個(gè)庫(kù)有助于將文本復(fù)制到剪貼板。

import pyperclip

2. 復(fù)制轉(zhuǎn)錄

將轉(zhuǎn)錄的文本復(fù)制到剪貼板并打印確認(rèn)信息。

def copy_to_clipboard(text):

    pyperclip.copy(text)

    print("Result copied to clipboard!")

主要代碼片段

這是錄制音頻、轉(zhuǎn)錄并將結(jié)果文本復(fù)制到剪貼板的完整功能。

import simpleaudio as sa

import os



def transcribe_audio(

    debug: bool = False,

    prompt: str = "",

    timed_recording: bool = False,

    record_seconds: int = 5,

    is_english: bool = True,

) -> str:

    """

    Records audio from the microphone and transcribes or translates it using OpenAI's API.



    Args:

        debug (bool): If True, plays back the recorded audio for verification.

        prompt (str): A prompt to guide the transcription (only used for English).

        timed_recording (bool): If True, records for a set duration. If False, records until user input.

        record_seconds (int): The number of seconds to record if timed_recording is True.

        is_english (bool): If True, uses transcription. If False, uses translation to English.



    Returns:

        str: The transcription or translation of the recorded audio.

    """

    # Record audio

    temp_file_name = record_audio(timed_recording, record_seconds)



    # Debug playback

    if debug:

        print("Playing back recorded audio...")

        playback = sa.WaveObject.from_wave_file(temp_file_name)

        play_obj = playback.play()

        play_obj.wait_done()



    # Process audio (transcribe or translate)

    result = process_audio(temp_file_name, is_english, prompt)



    # Clean up temporary file

    os.remove(temp_file_name)



    # Copy result to clipboard

    copy_to_clipboard(result)



    return result

演示

# Demo: Transcribe 5 seconds of spoken English with proper grammar and punctuation

result = transcribe_audio(

    debug=True,

    prompt="English spoken. Proper grammar and punctuation. Skip fillers.",

    timed_recording=True,

    record_seconds=5,

    is_english=True,

)

print("\nTranscription/Translation:", result)