Course → Module 7: APIs as Research Tools
Session 6 of 7

Individual research tasks are useful. A complete research workflow is a system. The difference: a task answers one question. A workflow answers every question your content needs, consistently, every time you produce something.

This session defines the complete research workflow from defining questions to delivering a research packet that makes generation faster and more accurate.

The Six-Stage Research Workflow

graph TD A["Stage 1:
Define research questions"] --> B["Stage 2:
Query multiple APIs"] B --> C["Stage 3:
Filter by relevance
and reliability"] C --> D["Stage 4:
Extract key data
points and quotes"] D --> E["Stage 5:
Organize into
research brief"] E --> F["Stage 6:
Feed into generation
pipeline"] C -->|"HUMAN CHECK"| C2["Review source quality"] C2 --> D style A fill:#2a2a28,stroke:#c8a882,color:#ede9e3 style C2 fill:#2a2a28,stroke:#c47a5a,color:#ede9e3 style E fill:#2a2a28,stroke:#6b8f71,color:#ede9e3

Stage 1: Define Research Questions

Research questions are not the same as your content topic. The topic is what you are writing about. The research questions are the specific facts, data points, and perspectives you need to write about it credibly.

Topic Bad Research Question Good Research Question
Remote work trends What is remote work? What % of US companies offer permanent remote work as of 2025?
AI in healthcare How is AI used in healthcare? Which FDA-approved AI diagnostic tools are in clinical use?
Content marketing ROI Does content marketing work? What is the median cost per lead for content marketing vs. paid ads in B2B SaaS?

Good research questions are specific enough that the answer is a fact or a data point, not an overview. Write 5-10 research questions per piece of content. Each question should map to a specific claim or section in your planned outline.

Stage 2: Query Multiple APIs

No single search API covers everything. A robust research stage queries multiple sources: Tavily for general web results, Google Search grounding for fact verification, news APIs for current events, and specialized data APIs for domain-specific information.

Your script takes each research question and queries the appropriate APIs. General questions go to Tavily. Current event questions go to a news API. Data-specific questions (financial data, government statistics) go to specialized APIs. The query routing can be manual (you decide which API gets each question) or automated (a simple classifier determines the best API based on the question type).

Stage 3: Filter by Relevance and Reliability

This is the stage that must have a human check. APIs return results ranked by relevance, but relevance is not reliability. A highly relevant result from an unreliable source is worse than a moderately relevant result from a credible source.

Stage 3 is the one stage in the research workflow that should always include human judgment. Automating source quality assessment is possible but risky. A human reviewer catching one bad source is worth more than saving the five minutes of review time.

Filtering criteria include: source reputation (established publications vs. content farms), publication date (current vs. outdated), author credentials (expert vs. unknown), and corroboration (single source vs. multiple sources confirming the same data).

Stage 4: Extract Key Data

Raw search results contain full articles or excerpts. Your content does not need full articles. It needs specific data points: statistics, quotes, dates, names, and facts. The extraction stage pulls these from the filtered results and formats them for use.

Extraction can be partially automated. An AI call with instructions like "From this article, extract: (1) all statistics mentioned with their sources, (2) any direct quotes from named individuals, (3) key dates and events" produces a structured summary that is more useful than the full text.

Stage 5: Organize Into Research Brief

The research brief is the deliverable of the research workflow. It is a structured document that gives your generation pipeline everything it needs to produce accurate, well-sourced content.

Brief Section Contents
Overview 1-2 paragraph summary of findings
Key data points Each point with source citation and date
Notable quotes Direct quotes with speaker, source, and date
Source list All sources with URL, publication date, and reliability assessment
Unanswered questions Research questions that could not be answered from available sources
Contradictions Where sources disagree, with both positions documented

Stage 6: Feed Into Generation Pipeline

The research brief becomes the primary context for your AI generation call. The system prompt instructs the model to write from the provided sources. The brief is included in the user message or as a document attachment. The model synthesizes the researched information into your content format and voice.

graph LR A["Research brief"] --> B["+ System prompt
(voice, format, rules)"] B --> C["+ Content outline"] C --> D["API generation call"] D --> E["Draft with citations"] style A fill:#2a2a28,stroke:#6b8f71,color:#ede9e3 style E fill:#2a2a28,stroke:#c8a882,color:#ede9e3

The entire workflow, from question definition to research brief delivery, can run in 5-15 minutes for a single piece of content when automated. Manual research for the same breadth of coverage would take 1-3 hours. The time savings compound with volume: research 10 articles in an hour instead of a day.

Further Reading

Assignment

  1. Design your research workflow as a documented process. For your specific content type, define: What research questions do you typically need answered? Which APIs serve each question? How do you filter results?
  2. Create a research brief template. Define the sections, the format for each section, and what information must be included. The template should be reusable for any piece of content in your domain.
  3. Execute the full workflow once for a real piece of content. Time each stage. Identify bottlenecks. Where does the workflow slow down? Where could additional automation help?