Aller au contenu

Protocole Voice

Communication audio temps réel avec Opus codec.

Port

4126/UDP


Technologies

Composant Technologie
Codec Opus
Capture PortAudio
Transport UDP

Types de Messages

Type Value Direction Description
VoiceJoin 0x0300 C→S Rejoindre voice channel
VoiceJoinAck 0x0301 S→C Confirmation
VoiceLeave 0x0302 C→S Quitter voice
VoiceFrame 0x0303 Both Audio Opus encodé
VoiceMute 0x0304 Both Mute/unmute

Structures

VoiceJoin (38 bytes)

struct VoiceJoin {
    SessionToken token;      // 32 bytes - Réutilise l'auth TCP
    char roomCode[6];        // 6 bytes
};

VoiceFrame (5-485 bytes)

struct VoiceFrame {
    uint8_t  speaker_id;    // 1 byte - Qui parle
    uint16_t sequence;      // 2 bytes - Détection perte paquets
    uint16_t opus_len;      // 2 bytes - Taille données Opus
    uint8_t  opus_data[480]; // Max 480 bytes Opus
    // Header: 5 bytes, Total max: 485 bytes
};

VoiceMute (2 bytes)

struct VoiceMute {
    uint8_t player_id;  // 1 byte
    uint8_t muted;      // 1 byte - 0 = unmuted, 1 = muted
};

Paramètres Opus

Paramètre Valeur Description
Sample Rate 48000 Hz Standard Opus
Channels 1 (mono) Voix uniquement
Frame Size 960 samples 20ms à 48kHz
Bitrate 32000 bps Qualité VoIP
Max Frame Size 480 bytes Taille max encodée

Architecture

flowchart LR
    subgraph Client Émetteur
        Mic[🎤 Micro] --> PA1[PortAudio]
        PA1 --> Enc[Opus Encode]
        Enc --> UDP1[UDP Send]
    end

    UDP1 --> Server
    Server --> UDP2[UDP Recv]

    subgraph Client Récepteur
        UDP2 --> Dec[Opus Decode]
        Dec --> PA2[PortAudio]
        PA2 --> Spk[🔊 Speaker]
    end

Implémentation

VoiceChatManager

class VoiceChatManager {
    OpusEncoder* encoder_;
    OpusDecoder* decoder_;
    PaStream* inputStream_;
    PaStream* outputStream_;
    UDPSocket& socket_;

    std::queue<VoiceDataPacket> incoming_;
    bool muted_ = false;

public:
    VoiceChatManager(UDPSocket& socket)
        : socket_(socket)
    {
        // Init Opus
        int error;
        encoder_ = opus_encoder_create(48000, 1,
            OPUS_APPLICATION_VOIP, &error);
        decoder_ = opus_decoder_create(48000, 1, &error);

        opus_encoder_ctl(encoder_,
            OPUS_SET_BITRATE(32000));

        // Init PortAudio
        Pa_Initialize();
        setupStreams();
    }

    void capture() {
        if (muted_) return;

        float samples[960];
        Pa_ReadStream(inputStream_, samples, 960);

        // Encode
        uint8_t encoded[256];
        int len = opus_encode_float(encoder_,
            samples, 960, encoded, 256);

        // Send
        VoiceDataPacket packet{
            .sender_id = localPlayerId_,
            .sequence = ++sequence_,
            .timestamp = now(),
            .opus_length = static_cast<uint16_t>(len)
        };
        std::memcpy(packet.opus_data, encoded, len);

        socket_.broadcast(packet);
    }

    void playback() {
        while (!incoming_.empty()) {
            auto& packet = incoming_.front();

            // Decode
            float samples[960];
            opus_decode_float(decoder_,
                packet.opus_data, packet.opus_length,
                samples, 960, 0);

            // Play
            Pa_WriteStream(outputStream_, samples, 960);

            incoming_.pop();
        }
    }
};

Push-to-Talk vs VAD

PTT

void VoiceChatManager::setPTT(bool pressed) {
    if (pressed && !muted_) {
        capturing_ = true;
    } else {
        capturing_ = false;
    }
}

VAD (Voice Activity Detection)

bool VoiceChatManager::detectVoice(float* samples, int count) {
    float energy = 0;
    for (int i = 0; i < count; i++) {
        energy += samples[i] * samples[i];
    }
    energy /= count;

    return energy > vadThreshold_;
}

Jitter Buffer

Pour gérer les variations de latence :

class JitterBuffer {
    std::map<uint32_t, VoiceDataPacket> buffer_;
    uint32_t playbackSeq_ = 0;
    static constexpr int BUFFER_SIZE = 3;  // 60ms

public:
    void push(VoiceDataPacket packet) {
        buffer_[packet.sequence] = std::move(packet);
    }

    std::optional<VoiceDataPacket> pop() {
        // Wait until we have enough buffer
        if (buffer_.size() < BUFFER_SIZE)
            return std::nullopt;

        auto it = buffer_.find(playbackSeq_++);
        if (it != buffer_.end()) {
            auto packet = std::move(it->second);
            buffer_.erase(it);
            return packet;
        }

        // Packet lost - generate silence or interpolate
        return generateSilence();
    }
};

Latence

Composant Latence
Capture ~10ms
Encoding ~20ms
Network Variable
Jitter buffer ~60ms
Decoding ~20ms
Playback ~10ms
Total ~120ms + network