Extracting Action Items from Meeting Transcripts with NLP
A 45-minute meeting ends. Six people were in the room. Four commitments were made, two of them conditional on a decision that happened in the last ten minutes. An hour later, someone asks who owns the infrastructure migration. Nobody remembers exactly. This is not a process failure. It is a memory failure, and it happens in every meeting at every company.
Action item extraction from meeting transcripts is the NLP task that turns spoken commitments into structured records. The core challenge is that commitments in natural speech are expressed in dozens of different ways: "I'll handle the deployment," "Can you get that to Sarah by Thursday?", "We should probably schedule a follow-up." No two speakers phrase ownership and deadline the same way.
There are three practical approaches to this problem, each with different accuracy and complexity tradeoffs. Rule-based extraction using regex and modal verb patterns is fast and transparent. Transformer-based extraction using fine-tuned T5 or similar models is more robust but heavier to run. Large language model prompting using structured output is the most accurate but requires an API call per meeting. The right choice depends on your latency budget, data sensitivity, and accuracy requirements.
In this guide, you will build all three approaches and wire them to a MeetStream webhook so that action items are extracted automatically when a meeting ends. We will cover fetching the transcript, running each extraction method, and pushing results to a destination like Jira or Notion. Let's get into it.
Rule-Based Extraction with Regex and Modal Patterns
Rule-based extraction works by matching syntactic patterns that correlate with commitments. Modal verbs (will, should, need to, going to) combined with a person mention or first-person pronoun are strong signals. Explicit phrases like "action item", "take ownership", "follow up" are even stronger. Deadline mentions ("by Friday", "end of week", "before the next sprint") round out the extraction.
import re
from typing import List, Dict, Optional
# Patterns ordered by specificity
ACTION_PATTERNS = [
# Explicit ownership: "Alice will", "I'll", "we need to"
r"(?P[A-Z][a-z]+|I|We)(?:\'ll| will| should| needs? to| going to| am going to)\s+(?P[a-z][^.?!]{5,60})",
# Passive assignment: "action item for Bob"
r"action item(?:\s+for\s+(?P[A-Z][a-z]+))?[:\s]+(?P[^.?!]{5,80})",
# Imperative with name: "Bob, please send"
r"(?P[A-Z][a-z]+),?\s+(?:please\s+)?(?P(?:send|write|review|schedule|follow up|check|confirm|update|share)[^.?!]{3,60})",
]
DEADLINE_PATTERN = r"by\s+(?:end of (?:day|week|month)|(?:Monday|Tuesday|Wednesday|Thursday|Friday)|(?:tomorrow|next week))"
def extract_action_items_regex(transcript: List[Dict]) -> List[Dict]:
results = []
for turn in transcript:
text = turn["text"]
speaker = turn["speaker"]
for pattern in ACTION_PATTERNS:
for match in re.finditer(pattern, text, re.IGNORECASE):
owner = match.group("owner") if "owner" in match.groupdict() else speaker
action = match.group("action").strip()
deadline_match = re.search(DEADLINE_PATTERN, text, re.IGNORECASE)
deadline = deadline_match.group(0) if deadline_match else None
results.append({
"owner": owner if owner not in ("I", "We") else speaker,
"action": action,
"deadline": deadline,
"source_text": text,
"method": "regex"
})
break # one match per pattern per turn
return results

Regex extraction has a precision problem: it fires on rhetorical uses of modal verbs ("You should probably talk to product about that" is not always an action item). Adding a post-filter that checks for specificity (does the extracted action contain a verb and a noun?) reduces false positives significantly.
Transformer-Based Extraction with a Fine-Tuned Model
A sequence-to-sequence model like T5 fine-tuned on meeting data produces much cleaner extractions because it learns the semantics of commitment language, not just surface syntax. The Hugging Face Hub has models fine-tuned on the AMI corpus (meeting recordings) and similar datasets that handle action item extraction directly.
from transformers import pipeline
# Model fine-tuned on meeting action items
extractor = pipeline(
"text2text-generation",
model="knkarthick/MEETING_SUMMARY",
max_length=256
)
def extract_with_transformer(transcript_text: str) -> str:
prompt = f"Extract action items from this meeting transcript: {transcript_text}"
result = extractor(prompt, max_length=256, num_beams=4)
return result[0]["generated_text"]
# For per-turn extraction using NLI classifier approach:
classifier = pipeline(
"zero-shot-classification",
model="cross-encoder/nli-deberta-v3-small"
)
def is_action_item(text: str, threshold: float = 0.6) -> bool:
result = classifier(
text,
candidate_labels=["action item or task assignment", "general discussion or information"]
)
return result["labels"][0] == "action item or task assignment" and result["scores"][0] > threshold
def extract_action_items_transformer(transcript: List[Dict]) -> List[Dict]:
results = []
for turn in transcript:
if is_action_item(turn["text"]):
results.append({
"owner": turn["speaker"],
"action": turn["text"],
"method": "transformer"
})
return results
The NLI-based classification approach is practical for production: it runs on CPU in 50-100ms per turn, handles paraphrase well, and requires no training data. The tradeoff is that it classifies full turns rather than extracting structured fields like owner and deadline separately.
LLM Prompting with Structured Output
For the highest accuracy, especially on complex meetings with cross-speaker dependencies, an LLM with structured output prompt is the right tool. GPT-4o or Claude returns a JSON array of action items with owner, action, deadline, and priority fields extracted in a single pass over the full transcript.
import openai
import json
client = openai.OpenAI()
ACTION_ITEM_PROMPT = """
You are analyzing a meeting transcript to extract action items.
For each action item, identify:
- owner: the person responsible (use the speaker name if implicit)
- action: the specific task to complete (concise, verb-first)
- deadline: any mentioned deadline (null if none)
- priority: high/medium/low based on urgency signals
- source_quote: the exact quote that triggered this action item
Return a JSON array. If no action items exist, return an empty array [].
Do not include discussion items, suggestions, or open questions.
Transcript:
{transcript}
"""
def extract_action_items_llm(transcript: List[Dict]) -> List[Dict]:
formatted = "\n".join(
f"{turn['speaker']}: {turn['text']}"
for turn in transcript
)
prompt = ACTION_ITEM_PROMPT.format(transcript=formatted)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
temperature=0
)
content = response.choices[0].message.content
parsed = json.loads(content)
return parsed.get("action_items", parsed) if isinstance(parsed, dict) else parsed
Setting temperature=0 is important for structured extraction. You want deterministic output, not creative variation. The response_format: json_object parameter forces valid JSON and eliminates the need to strip markdown code fences from the response.

End-to-End Pipeline: MeetStream to Jira
The full pipeline connects a MeetStream webhook to action item extraction to a destination. The bot.stopped webhook fires when the bot leaves the meeting. At that point, the full transcript is available via the transcript endpoint.
import httpx
from fastapi import FastAPI, Request
app = FastAPI()
MEETSTREAM_API_KEY = "YOUR_API_KEY"
JIRA_TOKEN = "YOUR_JIRA_TOKEN"
JIRA_BASE_URL = "https://your-org.atlassian.net"
JIRA_PROJECT_KEY = "ENG"
async def fetch_transcript(bot_id: str) -> List[Dict]:
async with httpx.AsyncClient() as client:
response = await client.get(
f"https://api.meetstream.ai/api/v1/transcript/{bot_id}/get_transcript",
headers={"Authorization": f"Token {MEETSTREAM_API_KEY}"}
)
return response.json()
async def create_jira_task(action_item: Dict) -> str:
async with httpx.AsyncClient() as client:
response = await client.post(
f"{JIRA_BASE_URL}/rest/api/3/issue",
headers={
"Authorization": f"Bearer {JIRA_TOKEN}",
"Content-Type": "application/json"
},
json={
"fields": {
"project": {"key": JIRA_PROJECT_KEY},
"summary": action_item["action"],
"description": {
"type": "doc",
"version": 1,
"content": [{
"type": "paragraph",
"content": [{"type": "text", "text": action_item.get("source_quote", "")}]
}]
},
"assignee": {"displayName": action_item.get("owner")},
"issuetype": {"name": "Task"}
}
}
)
data = response.json()
return data.get("key")
@app.post("/webhook/meeting-ended")
async def handle_meeting_ended(request: Request):
body = await request.json()
if body.get("event") != "bot.stopped":
return {"status": "ignored"}
bot_id = body["bot_id"]
transcript = await fetch_transcript(bot_id)
action_items = extract_action_items_llm(transcript)
created_issues = []
for item in action_items:
issue_key = await create_jira_task(item)
created_issues.append(issue_key)
return {"created": created_issues, "count": len(created_issues)}
Pushing to Notion Instead of Jira
Notion is a common destination for teams that use it as their knowledge base. The Notion API accepts a block-structured payload. Each action item becomes a to-do block in a meeting notes page.
async def push_to_notion(page_id: str, action_items: List[Dict], notion_token: str):
blocks = [
{
"object": "block",
"type": "to_do",
"to_do": {
"rich_text": [{
"type": "text",
"text": {
"content": f"[{item.get('owner', 'TBD')}] {item['action']}"
}
}],
"checked": False
}
}
for item in action_items
]
async with httpx.AsyncClient() as client:
await client.patch(
f"https://api.notion.com/v1/blocks/{page_id}/children",
headers={
"Authorization": f"Bearer {notion_token}",
"Notion-Version": "2022-06-28"
},
json={"children": blocks}
)
Deduplication and Confidence Scoring
Running multiple extractors on the same transcript produces duplicates. An action item mentioned three times in a meeting may be extracted by both the regex pass and the LLM pass. Deduplication using sentence similarity (cosine similarity on sentence embeddings) catches near-duplicates that exact string matching misses.
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer("all-MiniLM-L6-v2")
def deduplicate_action_items(items: List[Dict], threshold: float = 0.85) -> List[Dict]:
if not items:
return []
texts = [item["action"] for item in items]
embeddings = model.encode(texts)
keep = [True] * len(items)
for i in range(len(items)):
if not keep[i]:
continue
for j in range(i + 1, len(items)):
sim = np.dot(embeddings[i], embeddings[j]) / (
np.linalg.norm(embeddings[i]) * np.linalg.norm(embeddings[j])
)
if sim > threshold:
keep[j] = False
return [item for item, k in zip(items, keep) if k]
Choosing the Right Extraction Method
| Method | Latency | Accuracy | Structured fields | Cost |
|---|---|---|---|---|
| Regex | Under 10ms | Low to medium | Partial | Zero |
| NLI classifier | 50-150ms/turn | Medium | No (classification only) | Inference cost |
| Fine-tuned T5 | 200-500ms | Medium to high | Yes (with prompt) | Inference cost |
| GPT-4o / Claude | 2-8 seconds | High | Yes (structured JSON) | API cost per call |
For most production deployments, the LLM approach is the right default. The cost per meeting (typically a few cents) is negligible compared to the value of reliably captured action items. If data privacy is a constraint, a self-hosted T5 model handles extraction locally with reasonable accuracy.

Conclusion
Action item extraction from meeting transcripts is a solved problem at the technique level. The engineering work is connecting the pieces: getting clean, speaker-labeled transcript data from a meeting bot, running your extraction method of choice, and routing results to where your team actually works. MeetStream's webhook system handles the data delivery side. If you want to test this pipeline against a real meeting, the MeetStream app is a fast way to get a bot running and transcripts flowing to your webhook in minutes.
What is the best NLP method for extracting action items from meeting transcripts?
For most production use cases, LLM-based extraction with a structured prompt (GPT-4o or Claude) produces the highest accuracy. It handles implicit ownership, conditional commitments, and deadline references that rule-based and smaller transformer approaches miss. If latency or cost is a constraint, an NLI-based zero-shot classifier that identifies action-item turns is a good middle ground before applying finer extraction logic to those turns only.
How do I extract action items from a transcript without training data?
Zero-shot and few-shot approaches work without labeled training data. An NLI classifier using cross-encoder/nli-deberta-v3-small classifies turns as action items without any fine-tuning. An LLM with a well-crafted prompt produces structured extractions in a single API call. Regex patterns on modal verbs and explicit phrases require only domain knowledge about how commitments are phrased in your specific meeting type.
How do I handle action items that span multiple speaker turns?
Commitments in meetings are often collaborative: one person makes a request across two turns, another person confirms in a third turn. Processing only individual turns misses this. Pass the full transcript to an LLM prompt rather than processing turn-by-turn, and the model resolves cross-turn dependencies naturally. If you must process locally, use a sliding window of 3-5 turns as the classification unit rather than individual utterances.
How do I push meeting action items to Jira automatically?
Connect the MeetStream bot.stopped webhook to a server that fetches the transcript, extracts action items, and calls the Jira REST API to create issues. Map the extracted owner to a Jira assignee by matching speaker names against your team's Jira user list. The key field mapping is: extracted action text to issue summary, source quote to description, and deadline mention to due date if your Jira instance has that field enabled.
Can I run action item extraction in real time during a meeting?
Yes. Connect to the transcription.processed webhook in MeetStream, which fires per utterance with an end_of_turn flag. When end_of_turn is true, run your classifier on the completed turn. Regex and NLI classifiers have low enough latency to run synchronously. For LLM-based extraction, run asynchronously and buffer results rather than blocking the webhook response. Real-time extraction lets you display a live action item list to participants during the meeting.
Frequently Asked Questions
What NLP method is most reliable for extracting action items?
Fine-tuned sequence labeling using BERT-based models trained on the AMI and ICSI meeting corpora outperforms rule-based and zero-shot approaches, achieving F1 above 0.72 on held-out test sets. For production use without training data, an LLM prompt that requests structured JSON output with action, assignee, and deadline fields achieves comparable quality with no training overhead.
How do I identify who is responsible for each action item?
After extracting the action item text, run a named entity recognition pass to identify person names in the surrounding utterances. Cross-reference with the meeting participant list to resolve informal references ("John" to "John Smith"). If no assignee is found, fall back to the speaker who stated the action item as the implicit owner.
How do I handle vague action items that lack deadlines or clear owners?
Flag action items missing a deadline or assignee with a requires_clarification field set to true. Surface these in your post-meeting summary with a prompt to the meeting organizer to assign them before closing the meeting record. Alternatively, route them to a Slack message that lets the organizer tag a teammate directly from the action item.
What is the best data format for storing extracted action items?
Store each action item as a JSON object with fields: id (UUID), meeting_id, text (the action statement), speaker (who stated it), assignee (resolved person name or null), due_date (extracted or null), source_timestamp_ms (position in transcript), and status (open, in_progress, done). This schema integrates cleanly with project management APIs like Jira and Trello.
