You ask ChatGPT: "Who are the leading industrial pump distributors in Indonesia?" It gives you a list. Your company is not on it. You have been distributing industrial pumps in Indonesia for fifteen years. You have ISO certification. You have institutional clients. But ChatGPT has never heard of you.

This is not a bug. It is a direct consequence of how large language models decide what to cite. And understanding that mechanism is the first step toward fixing it.

How ChatGPT decides what to cite

ChatGPT is a large language model trained on a massive corpus of text data. When it answers a question, it generates responses based on patterns learned during training. It does not search the web in real time (except when using the browsing feature). It works from what it absorbed during its training data cutoff.

This means three things for your company:

First, you need to have been in the training data. If your company name, description, and industry context do not appear in the text corpus that GPT was trained on, you do not exist to ChatGPT. It cannot cite what it has never encountered.

Second, you need to have appeared in authoritative sources within the training data. ChatGPT does not treat all text equally. Text from Wikipedia, academic papers, major news outlets, and government databases carries more weight than text from random blog posts or obscure websites. A single mention in Wikipedia creates more entity association than a hundred mentions across low-authority blogs.

Third, your entity needs to be corroborated across multiple sources. ChatGPT is more likely to cite entities that appear consistently across different source types. If your company is mentioned in Wikipedia AND in industry databases AND in news articles AND in academic papers, ChatGPT has high confidence. If you only appear on your own website, confidence is near zero.

What ChatGPT cites by source type

Based on testing industry queries across multiple sectors and analyzing which companies ChatGPT recommends, a pattern emerges in which source types drive citations.

Estimated influence of source types on ChatGPT citations for industry/company recommendation queries. Based on manual testing across 50+ industry queries, not a scientific study.

The pattern is clear. Sources you do not control (Wikipedia, news, government registries) have the highest influence. Sources you fully control (your website, your social media) have the lowest. This is not an accident. It is how trust works in AI systems. Independent corroboration outweighs self-declaration.

The training data cutoff problem

ChatGPT's knowledge has a cutoff date. As of early 2026, GPT-4's training data extends through mid-2025, with some web browsing capabilities for more recent information. But the core knowledge base, the information that shapes confident citations, comes from the training corpus.

This creates a timing problem. If your company only started building entity infrastructure in late 2025, that work has not yet been incorporated into ChatGPT's training data. It takes time for new information to propagate through the training pipeline:

You publish or get mentioned (Day 0). A news article mentions your company. You create a Wikidata entry. You get listed in an industry database.

Content gets crawled and indexed (Days to weeks). Search engines and web crawlers discover the content. It becomes part of the publicly available web corpus.

Training data is assembled (Months). OpenAI assembles training data from web crawls, licensed content, and curated datasets. Your mention is potentially included, depending on the source authority and the crawl coverage.

Model is trained (Months). The model is trained on the assembled data. Your entity becomes part of the model's knowledge, weighted by the authority of the sources it appeared in.

Model is deployed (After training). The new model version reaches users. Now ChatGPT can cite your company when relevant queries come in.

Total elapsed time from initial mention to ChatGPT citation: roughly 6-18 months, depending on when in the training cycle your content was published.

This is why building entity presence now, even before results are visible, is essential. You are planting seeds for future training data inclusion.

Five reasons ChatGPT skips your company

Reason 1: You only exist on your own website

Your company has a website. It describes what you do. It has an About page, a Services page, and a Contact page. That is all. No external presence. No directory listings. No press coverage. No structured data in third-party repositories.

ChatGPT sees your website as a single self-referential source. A company claiming to be an expert on its own website is not evidence. It is marketing. ChatGPT needs independent confirmation.

Reason 2: Your name collides with a more prominent entity

If your company name is common or overlaps with a more well-known entity, ChatGPT's entity resolution may not distinguish you from the other entity. "Global Tech Solutions" could be a hundred companies. "Ibrahim Anwar" collides with a Malaysian prime minister. The less distinctive your name, the more entity corroboration you need to establish a separate identity.

As I discussed in why AI does not mention your company name, this collision problem is especially acute for Indonesian companies whose names are often generic or follow common naming patterns.

Reason 3: Your industry presence is fragmented

You have a LinkedIn page under one name, a Tokopedia store under another, a website with a third variation, and a government registration under the legal name. ChatGPT cannot reconcile these into a single entity. Instead of one strong entity with four corroborating sources, you have four weak entities with one source each.

Reason 4: You have no presence in high-authority sources

Your company is mentioned in blog posts, forum threads, and social media comments. But not in Wikipedia, not in industry databases, not in news articles, and not in academic papers. ChatGPT weights source authority heavily. A thousand low-authority mentions do not compensate for zero high-authority mentions.

Reason 5: Your structured data is missing or broken

When AI training pipelines crawl your website, they look for structured data (schema.org markup) to extract entity information efficiently. If your website has no Organization schema, no sameAs links, and no structured entity declarations, the training pipeline has to guess what your website represents. That guessing is unreliable. Companies with clean structured data are extracted more reliably into training datasets.

What to do before the next training cut

You cannot control when the next GPT model is trained. But you can control what it finds when it is. Here is the priority-ordered action list:

Immediate (This week)

Add Organization schema to your website. Include: name, description, url, logo, foundingDate, address, sameAs (linking to all your verified profiles). This makes your website machine-readable for AI training pipelines.

Create a Wikidata entry. Wikidata is a primary source for AI training data. A well-structured entry with your company's core attributes (type, country, website, founding date, industry) puts you in one of the highest-authority structured data sources available.

Audit your naming consistency. List every platform where your company appears. Check that the name is identical everywhere. Standardize any variations. This is unglamorous work that has enormous impact on entity reconciliation.

Short-term (This month)

Get listed in 3-5 industry directories. Not generic web directories. Industry-specific databases that AI systems consider authoritative for your sector. For Indonesian industries, this means KADIN directory, relevant ministry databases, and international databases like Kompass.

Optimize your LinkedIn company page. LinkedIn data is a significant source in AI training corpora. Complete your company profile with: description, founding year, headquarters, website URL, employee count, and specialties. Use your canonical company name.

Verify and complete your Google Business Profile. GBP data feeds into Google's Knowledge Graph, which feeds into Gemini's training data. A complete, verified GBP with photos, reviews, and accurate category selection is a high-value entity signal.

Medium-term (Next 3-6 months)

Earn press coverage. Even one editorial mention in a recognized publication changes your entity profile significantly. Pitch real stories. A product launch, a partnership announcement, an industry milestone. Not press releases distributed through wire services. Actual editorial coverage where a journalist writes about you.

Publish original research or data. Content that provides unique data, frameworks, or analysis that does not exist elsewhere is the type of content AI systems value most. Write a white paper based on your operational data. Publish an industry report. Create something citable.

Build institutional connections. Partnerships with universities, government agencies, or industry associations create mentions on institutional websites. These are among the highest-authority sources for AI training data.

The Perplexity advantage

While ChatGPT relies primarily on training data, Perplexity performs real-time web searches for every query. This means Perplexity can cite your company even if it is not in ChatGPT's training data. The prerequisite is that your company appears in search results for relevant queries and has sufficient entity signals for Perplexity's system to consider you authoritative.

This makes Perplexity a useful early indicator. If Perplexity cites your company but ChatGPT does not, your entity infrastructure is working. It just has not been incorporated into ChatGPT's training data yet. If Perplexity also does not cite you, your entity infrastructure needs more work before you can expect any AI citations.

Google AI Overview operates similarly to Perplexity, combining Knowledge Graph data with real-time retrieval. Companies that invest in entity infrastructure through the entity infrastructure approach see results in retrieval-augmented AI systems (Perplexity, Google AI Overview) months before they see results in training-data systems (ChatGPT, Gemini base model).

What ChatGPT knows about your industry

Before worrying about why ChatGPT does not mention your company specifically, test what ChatGPT knows about your industry in your market. Ask broad questions: "What are the main industrial pump suppliers in Indonesia?" or "Who provides book conservation services in Southeast Asia?"

The answers tell you three things:

If ChatGPT names specific competitors, it means your industry has entity representation in the training data. Your competitors did something you did not. Analyze what sources they appear in that you do not.

If ChatGPT gives generic answers without naming companies, your entire industry has weak entity representation in the training data. This is actually an opportunity. The first company in your sector to build proper entity infrastructure will dominate AI citations when the training data catches up.

If ChatGPT hallucinates company names, it means the AI has some industry knowledge but cannot confidently attribute it to specific entities. The entity signals are weak across the board. Building strong entity corroboration for your company in this environment makes you the default citation when confidence thresholds are met.

Monitoring and tracking AI mentions

Once you start building entity infrastructure, you need to track whether it is working. Quarterly testing is the minimum. Monthly is better.

Run a standard set of queries across ChatGPT, Perplexity, Gemini, and Google AI Overview. Document the results. Track changes over time. The queries should include:

Brand-name queries: "Tell me about [your company name]."

Industry queries: "Who are the top [your industry] companies in [your market]?"

Product/service queries: "Where can I find [your product/service] in [your region]?"

Comparison queries: "Compare [your company] and [competitor]."

Track which platforms cite you, what they say about you (accuracy matters), and how the responses change over time. This data tells you whether your entity infrastructure investment is producing results.

The patience requirement

AI visibility is a slow game. The timeline from "start building entity infrastructure" to "ChatGPT consistently cites your company" is typically 6-18 months. There are no shortcuts. You cannot hack the training data. You cannot bribe the model. You build real entity presence in real authoritative sources and wait for the training cycle to incorporate it.

The companies that are already visible in ChatGPT today started building their entity presence years ago. Most of them did not do it intentionally for AI. They did it because they maintained Wikipedia pages, got press coverage, published in industry databases, and kept their digital presence consistent. AI visibility was a side effect of good entity hygiene.

That is the core insight. AI visibility is not a new discipline. It is the result of being a verifiable entity. If the machines can verify who you are across multiple authoritative sources, they will cite you. If they cannot, they will not. Everything else is details.

The Entity Infrastructure courses provide implementation paths for each stage of this process, with specific checklists and timelines adapted for companies at different starting points.

Frequently Asked Questions

Does ChatGPT use my website content when answering questions?

Only if your website content was included in ChatGPT's training data. ChatGPT does not crawl your website in real time (unless using the browsing feature for specific queries). During training, OpenAI crawls the web and includes content from various sources. Your website may or may not be in the training corpus. If it is, the weight given to it depends on your domain's perceived authority. A company website with Organization schema, external corroboration, and consistent entity signals is weighted higher than a generic website with no structured data.

If I create content specifically for ChatGPT to learn from, will it work?

Not in the way you might hope. ChatGPT learns from its training data, which is assembled from broad web crawls and licensed datasets. You cannot target ChatGPT specifically. Instead, create content that is valuable, unique, and published on authoritative platforms. Content that gets cited by other authoritative sources, referenced in academic papers, or syndicated by industry publications has the highest chance of being included in training data with meaningful weight. Content created "for AI" that no human reads or references has minimal impact.

My competitor was mentioned by ChatGPT with inaccurate information. How does that happen?

ChatGPT's training data includes conflicting information from multiple sources. When sources disagree, the model may generate a response that blends or misrepresents the actual data. This is hallucination. It happens more frequently when an entity has inconsistent data across sources. Your competitor may have enough entity presence to be cited but not enough corroboration for ChatGPT to get the details right. Consistent, accurate data across sources reduces hallucination risk for your own entity.

Should I use ChatGPT plugins or GPTs to increase my visibility?

Creating a custom GPT or ChatGPT plugin does not increase your visibility in ChatGPT's general responses. Custom GPTs are sandboxed applications that only surface when a user specifically selects them. They do not influence how the base ChatGPT model responds to general queries. Focus on entity infrastructure (training data presence, corroboration, structured data) rather than ChatGPT-specific features for broad visibility.

References

  1. SEO Works. "How to Appear in ChatGPT Answers." SEO Works. Link
  2. Advertising Is Simple. "Why Your Business Is Invisible to ChatGPT and How to Fix It." Advertising Is Simple. Link
  3. LSEO. "Why Your Brand Is Invisible in ChatGPT and How to Fix It." LSEO. Link
  4. Nico Digital. "The AI Search Gap: Why Brands Are Invisible on ChatGPT but Ranking on Google." Nico Digital. Link
  5. First Line Software. "Why Your Brand Is Not Appearing in ChatGPT, Perplexity, or AI Overviews." First Line Software Blog. Link

Related notes

2026-03-28

The companies that show up in ChatGPT are the ones that bothered to be verifiable.