Conversation analytics

Conversations are rich with insights beyond words alone. By quantifying who spoke when, how long they spoke, how fast they talked, how much silence occurred, and if multiple people spoke simultaneously (overlap), organizations gain a clearer picture of conversational dynamics.

Who Benefits:

  • Project Managers & Team Leads: Understand team communication patterns, ensuring balanced participation and effective information exchange.
  • CX Teams: Verify that customers get ample talk time and agents aren’t rushing or dominating the discussion.
  • Business Analysts & Strategists: Identify communication bottlenecks and improve training and workflows based on measured engagement.
  • Compliance & QA Officers: Check whether calls adhere to standards of fairness, responsiveness, and clarity.

Value Proposition:
By turning raw conversation audio into quantifiable metrics, teams can quickly diagnose interaction issues, highlight improvements, and ensure more productive, customer-centered communications.


Data Dictionary & Schema

Objective:
Extract key conversation metrics—duration, silence, overlap, speaker ratios, talk times, and pace—from a timestamped, diarized conversation. The userId field is optional and should only be included if provided as part of the input data.

JSON Schema:

{
  "analytics": {
    "duration": <number>,
    "silence": {
      "seconds": <number>
    },
    "overlap": {
      "percent": <number>,
      "seconds": <number>
    },
    "speakers": [
      {
        "id": "<string>",
        "name": "<string>",
        "ratio": <number>,
        "talkTime": {
          "seconds": <number>
        },
        "listenTime": {
          "seconds": <number>
        },
        "pace": {
          "wpm": <number>
        }
        // Include "userId": "<string>" only if it is provided in the input
      }
    ]
  }
}

Data Dictionary:

FieldMeaning for Your BusinessExample
analyticsEncapsulates all conversation metricsN/A
durationTotal conversation duration in seconds3600 (60 mins)
silence.secondsAmount of silence in seconds600 (10 mins)
overlap.percentPercent of conversation where multiple people spoke at once10.0
overlap.secondsTotal overlap time in seconds360
speakersArray of speaker-level metrics[ {...}, {...} ]
speakers[].idUnique identifier for a speaker"speaker-1"
speakers[].nameThe speaker’s display name"John"
speakers[].ratioSpeaker’s talk ratio compared to others4.0 (John spoke 4x more than Arya)
speakers[].talkTime.secondsTotal talk time in seconds per speaker2400 (40 mins)
speakers[].listenTime.secondsTotal listen time in seconds (if applicable)0 if none
speakers[].pace.wpmWords per minute for that speaker100 wpm
speakers[].userIdOptional unique user identifier if provided in input"user-123"

If No Analytics:

{
  "analytics": {
    "duration": 0,
    "silence": { "seconds": 0 },
    "overlap": { "percent": 0.0, "seconds": 0 },
    "speakers": []
  }
}

Confidence & Calibration

Confidence Considerations:

  • Duration, silence, and overlap are time-based and usually straightforward, leading to high confidence.
  • Ratio, talk time, and pace depend on accurate speaker diarization and word counting. If speech-to-text or speaker identification is uncertain, consider assigning lower confidence scores internally (though not explicitly required in the schema).
  • Iterative testing with known samples will refine accuracy. Gathering feedback from PMs, CX leads, or compliance officers will guide threshold adjustments or metric definitions.

Calibration Steps:

  1. Test on Known Conversations: Validate metrics against manually timed samples.
  2. Adjust Pace Logic: If wpm seems off, refine speech recognition or pause handling.
  3. Solicit Feedback: Adjust metrics if stakeholders need more granularity or different interpretations.
  4. Continuous Improvement: Update logic as conversation patterns shift (e.g., shorter standups, longer support calls).

Prompt Construction & Instructions

Role Specification & Reiteration:

  • The system is a “highly experienced assistant” focused on extracting these analytics.
  • Reiterate the schema and instructions multiple times to avoid confusion.
  • Output only the JSON—no reasoning steps, no extra text.

No Hallucination:

  • Only use data derivable from the conversation (e.g., timestamps, speech durations).
  • If uncertain about any metric, default to zero or omit optional fields.

Strict Formatting:

  • Return only the JSON structure.
  • Include userId for a speaker only if provided in the input data.

Prompt Implementation

System Message (Role: System):
You are a highly experienced assistant that extracts conversation analytics (duration, silence, overlap, speaker metrics) from a timestamped, diarized transcript. Your output must follow this JSON schema:

{
  "analytics": {
    "duration": <number>,
    "silence": {
      "seconds": <number>
    },
    "overlap": {
      "percent": <number>,
      "seconds": <number>
    },
    "speakers": [
      {
        "id": "<string>",
        "name": "<string>",
        "ratio": <number>,
        "talkTime": {
          "seconds": <number>
        },
        "listenTime": {
          "seconds": <number>
        },
        "pace": {
          "wpm": <number>
        }
        // Include "userId": "<string>" only if provided in the input
      }
    ]
  }
}

Instructions (Reiterated):

  1. Calculate total duration in seconds.
  2. Identify silence duration in seconds.
  3. Determine overlap percent and seconds.
  4. For each speaker:
    • Assign id, name.
    • Calculate ratio of their talk time compared to others.
    • talkTime.seconds and listenTime.seconds as applicable.
    • pace.wpm = words per minute if available.
    • Include userId only if present in input data.
  5. Return only the JSON—no extra text or reasoning.
  6. If no metrics can be derived, return zeros or empty arrays.

Chain-of-Thought (Hidden):

  • Reason silently, don’t display reasoning steps.

No Hallucination:

  • Stick to data that can be derived from the transcript’s timeline and speaker info.
  • If uncertain, use defaults (e.g., 0 seconds, 0.0 percent).

System Summary:
Read the transcript, compute analytics, and return them strictly in the defined JSON format. No additional commentary."


User Message (Role: User):
"Analyze the following conversation and return the analytics as instructed:

[TRANSCRIPT_JSON][TRANSCRIPT_JSON]"

(Replace [TRANSCRIPT_JSON] with your actual conversation data.)