SEO
March 10, 2026
18 min read
2 views

Complete SEO & GEO Strategy Guide 2026 — Technical Audit, SSR, Tools & AI Search

A comprehensive technical guide to ranking in both traditional Google search and AI-powered engines (ChatGPT, Perplexity, Gemini). Covers technical SEO, GEO strategy, SSR verification, schema markup, and every tool you need to audit and monitor.

SEOGEOTechnical SEOSSRNext.jsSchemaAI SearchTools
Complete SEO & GEO Strategy Guide 2026 — Technical Audit, SSR, Tools & AI Search

In 2026, ranking your website requires two parallel strategies: traditional SEO for Google and Bing, and GEO (Generative Engine Optimization) for AI-powered search engines like ChatGPT, Perplexity, Google Gemini, and Claude. These are not the same problem. Traditional search rewards crawlable, keyword-relevant content. AI search rewards factual accuracy, structured data, authoritative signals, and content that can be directly cited in a generated answer. This guide covers the full technical stack for both — along with every tool you need to audit and monitor your progress.

What Is GEO (Generative Engine Optimization)?

GEO is the practice of optimising your website so that AI search engines — ChatGPT Browse, Perplexity, Google AI Overviews, and Claude — can discover, understand, and cite your content in their generated answers. Unlike traditional SEO (which drives clicks to your site), GEO drives brand citations and recommendations within AI-generated responses, where users may never visit your site at all.

  • Traditional SEO goal: Rank #1 for a keyword → user clicks → traffic to your site
  • GEO goal: AI engine cites your brand/page in a generated answer → user trusts your authority without clicking
  • GEO ranking signals: Factual accuracy, named authorship (E-E-A-T), structured schema data, external citations, llms.txt accessibility, and consistent NAP (Name/Address/Phone) across the web
  • Key difference: AI models are trained on static snapshots of the web but also use real-time retrieval (RAG); your content needs to be both in the training data AND retrievable in real-time
  • AI crawlers to allow: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended, Bytespider (ByteDance)

Part 1 – Technical SEO Foundation

Core Web Vitals — LCP, INP, CLS

Core Web Vitals are Google's page experience signals. Poor scores directly impact rankings. Measure with PageSpeed Insights (pagespeed.web.dev) and Lighthouse:

  • LCP (Largest Contentful Paint) — Target < 2.5s. Caused by: unoptimised hero images, render-blocking resources, slow server response (TTFB). Fix: Use next/image with priority prop, CDN for static assets, HTTP/2 push.
  • INP (Interaction to Next Paint — replaced FID in March 2024) — Target < 200ms. Caused by: heavy JavaScript on the main thread, large React re-renders, unoptimised event handlers. Fix: Code-split with React.lazy, use web workers for heavy computation.
  • CLS (Cumulative Layout Shift) — Target < 0.1. Caused by: images without explicit width/height, dynamically injected content above the fold, web fonts causing FOUT. Fix: Always specify image dimensions; use font-display: swap; reserve space for ad slots.

Measure real-user CWV data (field data) in Google Search Console → Core Web Vitals report, not just lab data from Lighthouse. Lab data is a single synthetic test; field data aggregates real user experiences and is what Google actually uses for ranking.

URL Structure and Canonicalization

  • Use clean, descriptive slugs: /blog/install-n8n-aws-ec2 not /blog/post?id=124
  • Canonical tags on every page — prevent duplicate content penalties from URL parameters, pagination, and CDN mirrors
  • Consistent protocol and trailing slash — pick https://domain.com (no slash) OR https://domain.com/ and enforce it everywhere including in sitemap, schema, and internal links
  • Hreflang for multi-language sites — must be fully reciprocal (each language page must reference all others) or Google ignores them entirely
  • Pagination — Use rel="next"/"prev" for paginated series or consolidate with a "view all" canonical
typescript
// Next.js — set canonical in generateMetadata
import type { Metadata } from 'next';

export const metadata: Metadata = {
  title: 'Page Title | Brand',
  description: 'Description 120-160 chars',
  alternates: {
    canonical: 'https://domain.com/exact-page-path', // no trailing slash
  },
  openGraph: {
    url: 'https://domain.com/exact-page-path', // must match canonical
  },
};

XML Sitemap and robots.txt

  • XML sitemap — must include all canonical URLs; exclude paginated URLs, tag pages with thin content, and noindexed pages; submit in GSC and Bing Webmaster Tools
  • sitemap.xml in Next.js — use the app/sitemap.ts file to generate dynamically; include lastmod dates from your CMS for accurate crawl scheduling
  • robots.txt — must explicitly Allow key crawlers; block /admin, /api, and other non-public paths; link to your sitemap at the bottom
  • llms.txt — new standard (llmstxt.org) at the root of your domain; provides an AI-readable index of your site for LLM training and RAG retrieval
text
# robots.txt — allow all crawlers including AI bots
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /_next/

# Explicitly allow AI crawlers (GEO critical)
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bytespider
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Schema Markup (JSON-LD)

JSON-LD schema is the single highest-ROI technical SEO task you can do. It directly feeds Google's Knowledge Graph, powers rich results (FAQ dropdowns, review stars, HowTo carousels), and is a primary signal for AI search engines to understand who you are and what you do:

json
{
  "@context": "https://schema.org",
  "@type": ["LocalBusiness", "ProfessionalService"],
  "name": "Your Company Name",
  "url": "https://yourdomain.com",
  "foundingDate": "2019",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "Your exact address",
    "addressLocality": "City",
    "addressCountry": "IN"
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.9",
    "reviewCount": "47"
  },
  "sameAs": [
    "https://linkedin.com/company/your-company",
    "https://github.com/your-company",
    "https://twitter.com/your-company"
  ]
}
  • LocalBusiness + ProfessionalService — correct type for a service agency (not just Organization); required for AggregateRating rich results
  • Article schema — for blog posts: requires author (Person type with name + sameAs LinkedIn), datePublished, dateModified, mainEntityOfPage, speakable
  • FAQPage schema — for FAQ sections: each Question/Answer pair must match visible page text exactly
  • HowTo schema — for step-by-step guides: each step requires name and text; optionally include image
  • BreadcrumbList — on every page; helps Google understand your site hierarchy; renders as visual breadcrumbs in SERPs
  • Validate at: validator.schema.org and Google Rich Results Test (search.google.com/test/rich-results)

Part 2 – Why SSR Matters for SEO (and How to Verify It)

Client-Side Rendering (CSR) is a silent SEO killer. When a page is CSR — all content rendered by JavaScript in the browser — Googlebot sees an empty HTML shell on first crawl. While Googlebot eventually executes JavaScript, this adds latency of days to weeks for indexing. More critically, AI crawlers (GPTBot, ClaudeBot, PerplexityBot) do NOT execute JavaScript at all — they see only the initial HTML response. If your content is rendered by JavaScript, AI search engines simply cannot read it.

How to Verify Your SSR Is Working

The definitive test: fetch the raw HTTP response (no JavaScript execution) and check if your visible content is present in the HTML:

bash
# Method 1: curl — fetch raw HTML without JavaScript
curl -s https://yourdomain.com/your-page | grep -i 'your-headline'
# If the headline appears: SSR is working
# If the output is empty or shows only <div id="root"></div>: CSR only

# Method 2: Check the HTTP response with full headers
curl -sI https://yourdomain.com/ | grep -E 'content-type|x-powered-by|cache-control'

# Method 3: wget and inspect HTML
wget -q -O - https://yourdomain.com/ | grep -o '<h1[^>]*>.*</h1>'

# Method 4: Check Googlebot's rendered version
# Google Search Console → URL Inspection → 'View Crawled Page'
# Compare 'HTML' tab (raw response) vs 'Screenshot' tab (rendered)
# If HTML tab is empty but Screenshot shows content: CSR only
javascript
// Node.js script to test SSR — runs without a browser
const https = require('https');

https.get('https://yourdomain.com/', (res) => {
  let html = '';
  res.on('data', (chunk) => html += chunk);
  res.on('end', () => {
    const hasH1 = /<h1[^>]*>/.test(html);
    const hasTitle = /<title>[^<]{10,}/.test(html);
    const hasDescription = /meta name="description"/.test(html);
    const hasLdJson = /application\/ld\+json/.test(html);
    console.log({ hasH1, hasTitle, hasDescription, hasLdJson });
    // All should be true for a properly SSR'd page
  });
});

In Google Search Console URL Inspection, compare the "HTML" tab (what Googlebot receives as raw HTTP response) with the "Screenshot" tab (fully rendered page). If the HTML tab is empty or missing key content that appears in the screenshot, your page is CSR-only. Fix: use Next.js Server Components, getServerSideProps, or generateStaticParams to ensure all critical content is in the initial HTML response.

Next.js SSR vs Static vs CSR — Which to Use

  • Server Components (Next.js 13+ App Router default) — HTML generated on every request; best for dynamic personalised content; fully SSR'd and crawlable
  • generateStaticParams (Static Site Generation) — HTML pre-built at deploy time; fastest for crawlers; best for blog posts, product pages, and content that doesn't change per-user
  • Client Components with "use client" — Rendered in browser only; invisible to AI crawlers; use only for interactive UI elements (forms, animations, charts) never for content
  • Hybrid approach — Static generation for content + Client Components for interactivity: the fastest and most SEO-friendly architecture for content-driven sites

Part 3 – On-Page SEO Technical Checklist

  • Title tag — 30-60 characters; include primary keyword near the start; unique on every page; format: "Primary Keyword — Secondary Keyword | Brand"
  • Meta description — 120-160 characters; include a call-to-action; unique on every page; no keyword stuffing (Google rewrites stuffed descriptions)
  • H1 tag — exactly one H1 per page; must contain the primary keyword; should match (but not duplicate) the title tag
  • Heading hierarchy — H2 for main sections, H3 for subsections; do not skip levels; screen readers and crawlers rely on proper hierarchy
  • Internal linking — every page should have at least 3-5 internal links from topically related pages; use descriptive anchor text (not "click here")
  • Image alt text — every image must have descriptive alt text; include keywords where naturally relevant; decorative images use alt=""
  • Page speed — target TTFB < 600ms, LCP < 2.5s; compress images (WebP/AVIF), minimise CSS/JS, use CDN
  • Mobile-first design — Google uses mobile-first indexing; test with Google Mobile-Friendly Test (search.google.com/test/mobile-friendly)
  • Open Graph + Twitter Card tags — required for social sharing and some AI citation tools; always include og:image (1200x630px)

Part 4 – GEO Optimisation Strategy

AI search engines use a fundamentally different retrieval mechanism than traditional search. Understanding it lets you optimise directly for citation in AI-generated answers:

The llms.txt Standard

The llms.txt file (llmstxt.org) is an emerging standard — a Markdown-formatted index at https://yourdomain.com/llms.txt that gives AI systems a concise, structured map of your site. Think of it as robots.txt for AI models — but instead of access rules, it provides factual context about who you are and what content is worth reading:

markdown
# Your Company Name

> One-sentence factual description of what you do and who you serve.
> Founded YYYY. Location. Rating/credibility signal.

## Services
- [Service Name](/services/service-slug): Brief description
- [Service Name](/services/service-slug): Brief description

## Blog (Technical Guides)
- [Article Title](/blog/slug): One-line summary
- [Article Title](/blog/slug): One-line summary

## Key Pages
- [About](/about): Company background, team, and credentials
- [Portfolio](/portfolio): Case studies and client work
- [Contact](/contact): How to reach us

## Optional: Full content index
/llms-full.txt

Schema and Structured Data for AI Discovery

  • Person schema for all named authors — AI engines use E-E-A-T signals; anonymous content ("BitPixel Team") is less citable than named experts with LinkedIn sameAs links
  • speakable schema on articles — marks key paragraphs as high-authority answers; used by Google Assistant and increasingly by AI retrieval systems
  • FAQPage schema — directly feeds AI question-answering; each Q/A pair may appear verbatim in AI-generated responses
  • LocalBusiness with aggregateRating — signals legitimacy and trust to AI models evaluating whether to cite your business
  • Consistent NAP across the web — Name, Address, Phone must be identical on your site, Google Business Profile, Bing Places, LinkedIn, Crunchbase, and all directory listings

Content Structure for AI Citation

  • Lead with the direct answer — AI engines use the first 1-2 sentences of a section as citation candidates; put the key fact first, explanation second
  • Use specific, verifiable data — "3.2 second average load time" is more citable than "fast load times"; AI models prefer precise claims
  • Include a "Sources" or "References" section on factual articles — signals to AI that your content follows journalistic/academic conventions
  • Structure with H2/H3 hierarchy — AI retrieval systems parse headings to understand content structure; well-structured content is more likely to be pulled accurately
  • Answer questions directly — format key sections as "Question: Answer" to match how AI models extract FAQ-style information
  • Avoid unsourced bold claims — AI models are trained to avoid citing unverified statistics; claims like "340% average ROI" without a source reduce your citability score

Part 5 – SEO Tools: Complete Audit Stack

  • Google Search Console (free) — The ground truth for crawl data, indexing status, Core Web Vitals, Manual Actions, and search performance. Check weekly.
  • Google PageSpeed Insights (free) — pagespeed.web.dev — Lab and field CWV data; mobile and desktop scores; specific fix recommendations with code examples.
  • Lighthouse (free, built into Chrome DevTools) — Run locally with Cmd+Shift+I → Lighthouse tab; audit Performance, Accessibility, Best Practices, SEO, and PWA in one report.
  • Google Rich Results Test (free) — search.google.com/test/rich-results — Validate structured data and preview how schema renders as rich results in SERPs.
  • Schema Markup Validator (free) — validator.schema.org — Strict Schema.org validation; catches errors that the Rich Results Test misses.
  • Screaming Frog SEO Spider (free up to 500 URLs) — Desktop crawler that audits title tags, descriptions, H1s, canonical tags, internal links, redirects, and response codes across your whole site.
  • Ahrefs Webmaster Tools (free for site owners) — Backlink analysis, top pages by organic traffic, and broken link detection. Paid plans add keyword research and rank tracking.
  • Semrush (paid) — All-in-one: keyword research, competitor analysis, site audit, position tracking, and content gap analysis.
  • Bing Webmaster Tools (free) — Essential for Bing SEO; submit sitemaps, use the site scan feature, and set up IndexNow for instant indexing notification on content changes.
  • Ahrefs / Majestic / Moz — Domain Authority and backlink profile analysis; use to benchmark against competitors and identify high-value link opportunities.

Part 6 – GEO Tools: Check AI Citation and Visibility

  • Perplexity.ai (free) — Search for your brand name and key topics you want to rank for; note which domains are cited in answers. This is your primary GEO benchmarking tool.
  • ChatGPT with Browse (free/paid) — Test prompts like "best [service] companies in [location]" and see if your brand appears; compare with and without the Browse plugin enabled.
  • Google AI Overviews — Search Google for target keywords; if an AI Overview appears, note which sources are cited. These citations correlate with strong E-E-A-T signals.
  • llms.txt Checker (llmstxt.org) — Validate your llms.txt file format against the official spec.
  • Diffbot (paid) — Structured knowledge graph extraction tool that shows how AI systems parse and represent your content; useful for identifying what information is being misread.
  • BrightEdge / Conductor (paid, enterprise) — Include AI Overview tracking and GEO citation monitoring as part of their SEO platforms.
  • Manual brand search — Regularly search "<your brand> reviews", "<your brand> vs competitors", and key service queries in ChatGPT, Perplexity, and Gemini to audit citation frequency.

Part 7 – How to Audit Your SSR, Schema, and Crawlability

  • 1. SSR check — curl -s https://yourdomain.com/your-page | grep "<h1" — if empty, your page is CSR-only
  • 2. Schema validation — paste page HTML into validator.schema.org AND Google Rich Results Test; fix all errors before warnings
  • 3. robots.txt check — curl https://yourdomain.com/robots.txt — verify AI crawlers are allowed and sitemap URL is listed
  • 4. llms.txt check — curl https://yourdomain.com/llms.txt — ensure it returns valid Markdown with correct URLs
  • 5. Canonical consistency — Screaming Frog → "Canonicals" tab → check for self-referencing canonicals on all pages; any "noindex" pages should NOT have canonical to themselves
  • 6. Structured data coverage — Screaming Frog → Custom Extraction with an XPath to find pages missing JSON-LD: //script[@type="application/ld+json"]
  • 7. Internal link audit — Screaming Frog → "Inlinks" tab for any page with 0 internal links (orphan pages); these are invisible to crawlers
  • 8. Core Web Vitals — GSC → Core Web Vitals report → fix all "Poor" URLs first; then "Needs Improvement"
  • 9. Manual Actions — GSC → Manual Actions → should show "No issues detected"; if not, this is your highest priority fix
  • 10. Index coverage — GSC → Pages → "Not indexed" section → categorise reasons: Crawled-not-indexed (thin content), Discovered-not-indexed (crawl budget issue), Excluded by noindex (intentional)

SEO & GEO Monthly Monitoring Checklist

  • ✓ Google Search Console — check for new Manual Actions, Security Issues, and Core Web Vitals regressions
  • ✓ GSC Performance — compare impressions/clicks month-over-month; investigate drops > 20%
  • ✓ Bing Webmaster Tools — check site scan results and submit any new content via IndexNow
  • ✓ Screaming Frog — run a full crawl; fix new broken links (404s), redirect chains, and missing meta tags
  • ✓ PageSpeed Insights — test your 5 highest-traffic pages; LCP, INP, CLS should all be in the green
  • ✓ Schema validation — re-validate structured data on any recently edited pages
  • ✓ GEO audit — search for your brand and top 3 service keywords in Perplexity and ChatGPT; document whether you are cited
  • ✓ Backlink profile — check for new toxic links in Ahrefs/GSC Links report; disavow if necessary
  • ✓ Content freshness — update dateModified in Article schema on posts with new information; Google favours recently updated content for many queries
  • ✓ robots.txt and llms.txt — verify no accidental Disallow rules have been added by deployments

Want a professional SEO and GEO audit for your website? BitPixel Coders performs full technical SEO audits, schema markup implementation, SSR diagnosis, and GEO optimisation for Next.js, Laravel, and WordPress sites.

Get a Free Consultation