Setting Up Voice Message Transcription in OpenClaw

This guide configures OpenClaw to automatically transcribe incoming voice messages using an Azure OpenAI gpt-4o-transcribe deployment. After setup, users can send voice messages via Discord, Telegram, etc., and the agent receives plain text — fully transparent.

Prerequisites

OpenClaw installed and running
An Azure OpenAI resource with a gpt-4o-transcribe model deployment

Step 1: Create the Azure OpenAI Transcription Deployment

If you don’t have one yet:

Go to Azure OpenAI Studio
Select your Azure OpenAI resource
Go to Deployments → Create new deployment
Model: gpt-4o-transcribe
Give it a deployment name (e.g. gpt-4o-transcribe)
Note down:
- Resource name: the subdomain in your endpoint URL (e.g. my-resource from https://my-resource.openai.azure.com)
- Deployment name: what you named it (e.g. gpt-4o-transcribe)
- API key: found in Azure Portal → your resource → Keys and Endpoint

Step 2: Test the Endpoint

Before configuring OpenClaw, verify the endpoint works:

curl -s "https://<your-resource>.openai.azure.com/openai/deployments/<your-deployment>/audio/transcriptions?api-version=2025-03-01-preview" \
  -H "api-key: <your-api-key>" \
  -F "file=@test-audio.mp3"

You should get a JSON response with the transcribed text. If you get an error, check your resource name, deployment name, and API key.

Step 3: Configure OpenClaw

Edit openclaw.json and add or update the tools.media.audio section:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "curl",
            "args": [
              "-s",
              "https://<your-resource>.openai.azure.com/openai/deployments/<your-deployment>/audio/transcriptions?api-version=2025-03-01-preview",
              "-H",
              "api-key: <your-api-key>",
              "-F",
              "file=@{{MediaPath}}"
            ]
          }
        ]
      }
    }
  }
}

Placeholders to replace

Placeholder	Example	Where to find
`<your-resource>`	`my-aoai-eastus`	Azure Portal → your OpenAI resource → Overview → Endpoint URL subdomain
`<your-deployment>`	`gpt-4o-transcribe`	Azure OpenAI Studio → Deployments → deployment name
`<your-api-key>`	`abc123...`	Azure Portal → your OpenAI resource → Keys and Endpoint → Key 1 or Key 2

Important: Do NOT change `{{MediaPath}}`

{{MediaPath}} is an OpenClaw template variable. At runtime, OpenClaw automatically replaces it with the actual path to the received audio file. Leave it exactly as {{MediaPath}}.

Step 4: Restart OpenClaw

openclaw gateway restart

Step 5: Verify

Send a voice message to your OpenClaw bot (via Discord, Telegram, etc.)
The agent should respond to the spoken content as text
Check status — the media summary should show:
```
📎 Media: audio ok
```

If the agent doesn’t understand the voice message or responds with something unrelated, check:

Is curl available on the system? (which curl)
Are the Azure credentials correct? (re-run the test from Step 2)
Is the tools.media.audio section properly nested in openclaw.json? (validate JSON syntax)

How It Works

The transcription pipeline runs before the message reaches the agent:

User sends voice message
    ↓
OpenClaw gateway receives audio file
    ↓
Gateway runs the configured curl command with the audio file
    ↓
Azure OpenAI returns transcribed text (JSON)
    ↓
Gateway extracts text and delivers it to the agent as a normal message
    ↓
Agent sees plain text, responds normally

The agent never sees the audio file — it only receives the transcribed text. This is a gateway-level feature, not a skill.

Full openclaw.json Context

The tools.media.audio config sits inside the top-level tools object. Here’s where it fits in the overall structure:

{
  "agents": { ... },
  "channels": { ... },
  "gateway": { ... },
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "curl",
            "args": [
              "-s",
              "https://<your-resource>.openai.azure.com/openai/deployments/<your-deployment>/audio/transcriptions?api-version=2025-03-01-preview",
              "-H",
              "api-key: <your-api-key>",
              "-F",
              "file=@{{MediaPath}}"
            ]
          }
        ]
      }
    },
    "exec": { ... }
  }
}

Setting Up Voice Message Transcription in OpenClaw (Azure OpenAI Whisper)