Questions

Conversations often contain explicit or implicit questions that indicate information gaps, decision points, or areas requiring follow-up. Identifying these questions helps teams understand which issues remain unresolved, what additional data is needed, and who might need to provide more input.

Stakeholders & Benefits:

  • Product Managers & Team Leads: Quickly surface unresolved questions that need answers to move projects forward.
  • CX Teams: Identify customer inquiries that must be addressed, improving responsiveness and satisfaction.
  • Business Analysts: Highlight knowledge gaps that require research or additional data, informing more accurate analysis.
  • Compliance & QA Officers: Confirm that all critical inquiries—internal or customer-facing—have been acknowledged.

Value Proposition:
Structured extraction of questions ensures nothing falls through the cracks. Teams can track all inquiries, direct them to the right experts, and ensure timely follow-up actions.


Data Dictionary & Schema

Objective:
Extract all questions from a timestamped, diarized transcript and return them in a standardized JSON format. Questions may be explicit (“What features are most relevant?”) or implicit (“Do you think we should...?”). They can be directed at a specific person or general.

Required JSON Schema:

{
  "questions": [
    {
      "id": "<string>",
      "text": "<string>",
      "type": "question",
      "score": <number>,
      "entities": [
        {
          "type": "<string>",
          "text": "<string>",
          "value": {
            "channel": "<string>"
          }
        }
      ]
    }
  ]
}

Data Dictionary:

FieldMeaning for Your BusinessExample
questionsA list of recognized questions from the conversationN/A
idUnique identifier for the question"question-1"
textThe textual representation of the question"What features are most relevant?"
typeThe classification, likely "question""question"
scoreConfidence that this statement is a question (0.0-1.0)0.9
entitiesExtracted details about who or what the question references (if any)[{"type":"channel","text":"Ken","value":{"channel":"Ken"}}]

If No Questions Found:
Return:

{
  "questions": []
}

Confidence & Calibration

Confidence Scoring:

  • High (0.8–1.0): Clearly phrased questions or direct inquiries (“What is the deadline?”, “Does Ken know more?”).
  • Medium (0.5–0.8): Slightly ambiguous questions (“Should we consider...?”, “Do you think...?”).
  • Low (<0.5): Vague or potentially rhetorical phrases that may not warrant classification as a question.

Calibration Tips:

  1. Test on Sample Data:
    Run the prompt on transcripts. Verify it captures all genuine questions and excludes non-question chatter.

  2. Refine Entity Recognition:
    If struggling to identify the channel (person/team) in the question, provide additional guidance or domain-specific examples.

  3. Seek Stakeholder Feedback:
    Ensure that product managers, CX leads, or compliance officers find the extracted questions useful and actionable.

  4. Iterative Improvement:
    As communication styles evolve, update instructions to recognize new phrasings of questions.


Prompt Construction & Instructions

Role Specification & Reiteration:

  • The system is a “highly experienced assistant” specialized in identifying questions within transcripts.
  • Reiterate instructions to ensure compliance.
  • Include no commentary or reasoning steps in the final output.

Avoid Hallucination:

  • Only label what is clearly a question based on the conversation text.
  • If uncertain, assign a lower score or omit the question entirely.

Strict Formatting:

  • Return only the JSON structure—no extra text or explanations.
  • If no questions, return "questions": [].

NomessageIds or beginOffset:

  • Keep the schema simple and focused on id, text, type, score, and entities.

Prompt for Implementation

System Message (Role: System):
You are a highly experienced assistant specializing in identifying questions from a timestamped, diarized conversation. A question is any utterance seeking information, confirmation, or clarity. Your output must follow this JSON schema:

{
  "questions": [
    {
      "id": "<string>",
      "text": "<string>",
      "type": "question",
      "score": <number>,
      "entities": [
        {
          "type": "<string>",
          "text": "<string>",
          "value": {
            "channel": "<string>"
          }
        }
      ]
    }
  ]
}

Instructions (Reiterated):

  1. Identify all questions (explicit or implicit) from the transcript.
  2. Assign a unique id to each question (e.g., "question-1").
  3. text should capture the entire question phrase.
  4. type must be "question".
  5. Assign a score reflecting confidence.
  6. If the question references a person/team/role, represent that entity with type: "channel" and specify "channel": "<Name>".
  7. Return only the JSON object—no explanations, no reasoning steps.
  8. If no questions are found, return "questions": []".

Chain-of-Thought (Hidden):

  • Reason internally about which utterances are questions.
  • Do not include reasoning in the final answer.

No Hallucination:

  • Only extract actual questions present in the transcript.
  • If uncertain, lower the score or omit the question.

System Summary:
Read the transcript, identify questions, and return them as per the schema. No extra text beyond the JSON.


User Message (Role: User):
"Analyze the following conversation and return the extracted questions as instructed:

[TRANSCRIPT_JSON][TRANSCRIPT_JSON]"

(Replace [TRANSCRIPT_JSON] with your actual conversation data.)