Questions

Conversations often contain explicit or implicit questions that indicate information gaps, decision points, or areas requiring follow-up. Identifying these questions helps teams understand which issues remain unresolved, what additional data is needed, and who might need to provide more input.

Stakeholders & Benefits:

Product Managers & Team Leads: Quickly surface unresolved questions that need answers to move projects forward.
CX Teams: Identify customer inquiries that must be addressed, improving responsiveness and satisfaction.
Business Analysts: Highlight knowledge gaps that require research or additional data, informing more accurate analysis.
Compliance & QA Officers: Confirm that all critical inquiries—internal or customer-facing—have been acknowledged.

Value Proposition:
Structured extraction of questions ensures nothing falls through the cracks. Teams can track all inquiries, direct them to the right experts, and ensure timely follow-up actions.

Data Dictionary & Schema

Objective:
Extract all questions from a timestamped, diarized transcript and return them in a standardized JSON format. Questions may be explicit (“What features are most relevant?”) or implicit (“Do you think we should...?”). They can be directed at a specific person or general.

Required JSON Schema:

{
  "questions": [
    {
      "id": "<string>",
      "text": "<string>",
      "type": "question",
      "score": <number>,
      "entities": [
        {
          "type": "<string>",
          "text": "<string>",
          "value": {
            "channel": "<string>"
          }
        }
      ]
    }
  ]
}

Data Dictionary:

Field	Meaning for Your Business	Example
questions	A list of recognized questions from the conversation	N/A
id	Unique identifier for the question	"question-1"
text	The textual representation of the question	"What features are most relevant?"
type	The classification, likely "question"	"question"
score	Confidence that this statement is a question (0.0-1.0)	0.9
entities	Extracted details about who or what the question references (if any)	`[{"type":"channel","text":"Ken","value":{"channel":"Ken"}}]`

If No Questions Found:
Return:

{
  "questions": []
}

Confidence & Calibration

Confidence Scoring:

High (0.8–1.0): Clearly phrased questions or direct inquiries (“What is the deadline?”, “Does Ken know more?”).
Medium (0.5–0.8): Slightly ambiguous questions (“Should we consider...?”, “Do you think...?”).
Low (<0.5): Vague or potentially rhetorical phrases that may not warrant classification as a question.

Calibration Tips:

Test on Sample Data:
Run the prompt on transcripts. Verify it captures all genuine questions and excludes non-question chatter.
Refine Entity Recognition:
If struggling to identify the channel (person/team) in the question, provide additional guidance or domain-specific examples.
Seek Stakeholder Feedback:
Ensure that product managers, CX leads, or compliance officers find the extracted questions useful and actionable.
Iterative Improvement:
As communication styles evolve, update instructions to recognize new phrasings of questions.

Prompt Construction & Instructions

Role Specification & Reiteration:

The system is a “highly experienced assistant” specialized in identifying questions within transcripts.
Reiterate instructions to ensure compliance.
Include no commentary or reasoning steps in the final output.

Avoid Hallucination:

Only label what is clearly a question based on the conversation text.
If uncertain, assign a lower score or omit the question entirely.

Strict Formatting:

Return only the JSON structure—no extra text or explanations.
If no questions, return "questions": [].

NomessageIds or beginOffset:

Keep the schema simple and focused on id, text, type, score, and entities.

Prompt for Implementation

System Message (Role: System):
You are a highly experienced assistant specializing in identifying questions from a timestamped, diarized conversation. A question is any utterance seeking information, confirmation, or clarity. Your output must follow this JSON schema:

{
  "questions": [
    {
      "id": "<string>",
      "text": "<string>",
      "type": "question",
      "score": <number>,
      "entities": [
        {
          "type": "<string>",
          "text": "<string>",
          "value": {
            "channel": "<string>"
          }
        }
      ]
    }
  ]
}

Instructions (Reiterated):

Identify all questions (explicit or implicit) from the transcript.
Assign a unique id to each question (e.g., "question-1").
text should capture the entire question phrase.
type must be "question".
Assign a score reflecting confidence.
If the question references a person/team/role, represent that entity with type: "channel" and specify "channel": "<Name>".
Return only the JSON object—no explanations, no reasoning steps.
If no questions are found, return "questions": []".

Chain-of-Thought (Hidden):

Reason internally about which utterances are questions.
Do not include reasoning in the final answer.

No Hallucination:

Only extract actual questions present in the transcript.
If uncertain, lower the score or omit the question.

System Summary:
Read the transcript, identify questions, and return them as per the schema. No extra text beyond the JSON.

User Message (Role: User):
"Analyze the following conversation and return the extracted questions as instructed:

[TRANSCRIPT_JSON][TRANSCRIPT_JSON]"

(Replace [TRANSCRIPT_JSON] with your actual conversation data.)