Course → Module 7: APIs as Research Tools
Session 7 of 7

API results are not gospel. Search results can be wrong, outdated, or from unreliable sources. Data APIs can have errors, lag times, or schema changes that silently break your pipeline. The skill is knowing when API results are trustworthy enough to use directly and when they require human verification.

The Trust Spectrum

Not all data is equal. A company's stock price from a financial API is nearly real-time and highly reliable. A health claim from a general web search is potentially unreliable regardless of how confident the search result sounds. Trust should be assigned by data type, not by data source alone.

graph LR A["Use directly
(high trust)"] --> B["Verify sample
(medium trust)"] B --> C["Verify all
(low trust)"] D["Stock prices
Government statistics
Official company data"] --> A E["News reports
Industry surveys
Academic citations"] --> B F["Health/medical claims
Legal interpretations
Unattributed statistics"] --> C style A fill:#2a2a28,stroke:#6b8f71,color:#ede9e3 style B fill:#2a2a28,stroke:#c8a882,color:#ede9e3 style C fill:#2a2a28,stroke:#c47a5a,color:#ede9e3
Trust Level Action Data Types Verification Method
High (use directly) Include in content without additional checking Real-time market data, government APIs, official company filings None needed (source is authoritative)
Medium (verify sample) Spot-check 20-30% of data points News articles, industry reports, academic papers Cross-reference 1 in 3 claims with a second source
Low (verify all) Every claim requires independent verification Health claims, legal statements, user-generated content, unattributed statistics Manual verification of each claim against primary source

Common API Data Failures

Understanding how API data fails helps you design better verification protocols.

Failure Mode What Happens How to Detect
Stale data API returns cached results that are hours, days, or months old Check response timestamps, compare to known current values
Source contamination Search results include AI-generated content citing other AI-generated content Trace claims to primary sources, not secondary articles
Relevance mismatch High-ranked result is topically related but does not actually answer the query Read the actual content, do not rely on snippet and title alone
Schema changes API updates its response format, breaking your parsing code Validate response structure before processing
Rate limit degradation API returns lower-quality results when you approach rate limits Monitor result quality at different request volumes

Source contamination is the most insidious failure mode. When AI-generated content cites other AI-generated content, errors compound. Always trace statistics and claims back to the primary source, not the article that mentions them.

Building a Trust Matrix

A trust matrix maps your specific data sources to trust levels and verification requirements. Build one for your content area and reference it every time you pull API data.

graph TD A["API data arrives"] --> B{"What type of data?"} B -->|"Quantitative from
authoritative source"| C["Trust Level: High
Use directly"] B -->|"Factual claim from
news/industry source"| D["Trust Level: Medium
Verify 1 in 3"] B -->|"Health, legal, or
unattributed claim"| E["Trust Level: Low
Verify everything"] C --> F["Include in content"] D --> G["Spot-check sample"] E --> H["Full verification"] G -->|"Sample passes"| F G -->|"Sample fails"| I["Reject source,
find alternative"] H -->|"Verified"| F H -->|"Failed"| I style C fill:#2a2a28,stroke:#6b8f71,color:#ede9e3 style D fill:#2a2a28,stroke:#c8a882,color:#ede9e3 style E fill:#2a2a28,stroke:#c47a5a,color:#ede9e3

Practical Verification Techniques

Cross-reference with a second API. If Tavily returns a statistic, verify it with a Google Search grounding call. If both sources agree, confidence increases. If they disagree, investigate further.

Check the publication date. A search result from 2019 may be irrelevant or outdated for a 2026 article. Always check whether the data is still current.

Trace to the primary source. When an article says "According to a McKinsey report," find the actual McKinsey report. The article may have misquoted it, taken it out of context, or cited a report that does not exist.

Watch for circular citations. Article A cites Article B, which cites Article C, which cites Article A. This happens more than you think, especially with frequently repeated statistics. Find the original study or dataset.

When to Accept Imperfect Data

Not every claim needs forensic verification. If you are writing a general overview and a statistic is directionally correct (the exact number might be 37% or 42%, but the point is "roughly a third"), the verification bar is lower than if you are writing an analysis where the exact percentage matters.

The decision is always: what is the cost of being wrong? If a wrong number undermines your argument, verify it. If a wrong number is a supporting detail that does not change the conclusion, a reasonable approximation is acceptable as long as you signal the approximation ("roughly," "approximately," "about").

Further Reading

Assignment

  1. Create a trust matrix for your most-used data sources. For each source (Tavily results, news API results, financial data, government data, etc.), assign a trust level: High (use directly), Medium (verify sample), or Low (verify all).
  2. For each trust level, define the verification method: what do you check, how do you check it, and what constitutes a pass or fail?
  3. Implement this matrix in your workflow documentation. The next time you pull API data for content, apply the matrix and document how it performs. Does the trust assignment match reality? Adjust as needed.