Skip to main content
Latency Recorder is the event ledger for voice calls.
It tracks turn boundaries, builds turn summaries, computes durations, and writes one persisted latency record per communication.

What changed in this model

  • VAD events are now treated as a call-level global stream.
  • Voice orchestrator turn advancement still supports VAD:speech_started.
  • Turn-start derivation for turns after 0 uses closest global VAD:speech_ended with a 3-second distance guard.
  • All duration computation is centralized in duration_calculator.go.
  • Dashboard latency exposure is restricted to agent_latency.

Event model

There are two event scopes:
  1. Turn-scoped events (sanitized timeline used for per-turn breakdowns).
  2. Call-scoped VAD events (raw, unsanitized stream used for turn-start and human-speech totals).
The call-level VAD stream is accessible from recorder snapshots and is written once per breakdown payload under VADEvents.

Turn boundaries

  • Turn 0 starts when call_started is marked.
  • STT boundaries:
    • interim_transcription always starts a new turn.
    • finished_transcription starts a new turn only when no interim transcription was seen in that turn.
  • orchestrator:user_heard_all_data is emitted only from telephony onUserFinishedHearing callbacks. It is not synthesized as part of STT/VAD turn-boundary transitions.
  • Recorder stop closure always emits turn_finish.description=recorder_stopped.
  • orchestrator:initial_message_completed is a marker event and does not advance turn boundaries.
  • Voice orchestrator boundaries:
    • VAD:speech_started remains a valid boundary event for turn advancement.

Turn-start derivation for turns > 0

Turn start is chosen with this priority:
  1. Build the sanitized in-turn event window (same event window persisted for turn timelines).
  2. Compute the turn’s first event from that sanitized window.
  3. Back-search within that same window for the latest VAD:speech_ended at or before the first event.
  4. Accept it only when absolute distance is <= 1200ms.
  5. Otherwise fallback to:
    • latest finished_transcription timestamp - silence_detection_threshold.
Threshold constants:
  • Speech detection threshold: VAD speech_start_frames * frame_duration_ms (default 300ms)
  • Silence detection threshold: VAD speech_end_frames * frame_duration_ms (default 500ms)
  • Closest-silence max distance: 1200ms

Duration calculation ownership

duration_calculator.go owns all duration math:
  • Per-turn durations (stt_tail_latency_ms, tts_total_ms, gaps, pipeline total, etc.)
  • Call-level durations:
    • total_call_duration
    • agent_speech_duration_ms
    • human_speech_duration_ms
latency_recorder.go is responsible for event recording and turn-boundary behavior, not metric math. breakdown_writer.go formats payloads from calculator outputs.

Call-level metric definitions

  • total_call_duration:
    • Earliest turn start to latest turn stop for the communication.
  • agent_speech_duration_ms:
    • Sum, per turn, of Telephony:start to earliest of:
      • user_heard_all_data
      • user-started turn_finish.
  • human_speech_duration_ms:
    • Sum of matched global VAD:speech_started -> VAD:speech_ended spans that meet the configured VAD speech-start threshold.

Dashboard exposure

Only agent_latency is exposed in dashboard query entities.

Breakdown payload structure

Top-level fields:
  • OrchestratorType
  • VADEvents (global call-level VAD list, optional)
  • CallDurations (call-level aggregates, optional)
  • Turns
Each item in Turns is a turn summary object that includes turn metadata plus:
  • Durations
  • Events
Durations and Events are intentionally emitted as the final fields in each turn summary object. Turn-level Events omit VAD records so VAD appears once at call level. TurnSummary.StopReason is built from definitive orchestrator events only, using deterministic tokens joined with |:
  • turn_finish or turn_finish.description when provided
  • user_heard_all_data when present
  • idle_timeout_warning when present
  • idle_timeout_fired when present
CallDurations contains:
  • total_call_duration_ms
  • agent_speech_duration_ms
  • human_speech_duration_ms
EoT timeout breakdown semantics:
  • EoTQueryTimeout.DurationMs (eot_query_timeout) is measured from EoT:start.
  • EoTFalseNegativeTimeout.DurationMs (eot_timeout_false_negative) is measured from decision-bearing EoT:finish.
  • eot_latency_ms is measured from EoT:start to the first terminal EoT outcome in the turn:
    • EoT:finish with decision=true
    • EoT:eot_query_timeout
    • EoT:eot_timeout_false_negative

Runtime logs

  • Turn latency breakdown includes per-turn latency fields and flattened summary fields (no nested TurnSummary object).
    • Core fields:
      • turn_start_timestamp
      • turn_stop_timestamp
      • user_heard_all_data
      • tools_called
      • tools (when present)
    • Timeout/idle duration fields (only when present):
      • eot_query_timeout_duration_ms
      • eot_false_negative_timeout_duration_ms
      • idle_timeout_warning_duration_ms
      • idle_timeout_fired_duration_ms
    • Grouped component sections for easier log inspection:
      • stt
        • stt_ms
        • provider, model
      • eot
        • eot_ms
        • provider, model
        • has_query_timeout
        • has_false_negative_timeout
      • llm
        • llm_text_ttft_ms
        • llm_text_total_ms
        • llm_audio_ttft_ms
        • llm_audio_total_ms
        • llm_function_call_request_average_ms
        • provider, model
      • tts
        • tts_ttft_ms
        • tts_total_ms
        • tts_cache_lookup_avg_ms
        • provider, model
      • etc (remaining cross-component/gap durations)
        • agent_latency_ms
        • silence_to_llm_first_token_ms
        • stt_to_llm_gap_ms
        • llm_to_tts_gap_ms
        • tts_to_telephony_gap_ms
        • pipeline_total_ms
        • stt_to_eot_ms
        • eot_to_tts_ms
        • llm_to_tts_ready_ms
        • tts_to_telephony_gap_ms
    • Top-level duplication policy:
      • component and cross-component duration fields are emitted only inside the grouped sections above (not duplicated at top level).
      • top-level fields are reserved for turn summary metadata/state: turn_start_timestamp, turn_stop_timestamp, user_heard_all_data, tools_called, tools, timeout durations, and idle-timeout durations.
  • Speech durations logged is emitted once per call and carries:
    • total_call_duration_ms
    • agent_speech_duration_ms
    • human_speech_duration_ms
  • Tool duration logged is emitted once per tool invocation duration with:
    • agent_turn_id, customer, agent_id, orchestrator_type
    • tool_name, duration_ms
    • provider, model (when available)

Persistence path

On shutdown, BuildPersistencePayload produces:
  • Breakdown JSON
  • Aggregated communication averages/totals
UpsertLatencyStats stores these values in comm_latency_stats. The persisted row also includes agent_id for direct agent-scoped filtering. Upsert-column resolution is covered by unit tests and no longer depends on integration-only environment setup.