Tutorial

Astro Deep Dive: Content Collections — Schema, Querying, and Type-Safe MDX

Your blog has 50 posts and a typo in one date field blows up the whole page. Content Collections catch schema violations at build time — before your readers do. This post covers the Content Layer API, Zod schemas, dynamic routes, and image validation.

Tin Dang April 10, 2026 11 min read

Structured data flowing through validation gates into a type-safe content pipeline

Your blog has 50 posts. One has a date field that reads "Aprll 10" instead of a proper date. Another is missing its required tags array entirely. A third references hero-iamge.jpg — a file that does not exist. You discover none of this until a reader reports a broken page in production.

Content Collections prevent all three failures. Every field in every content file is validated against a Zod schema at build time. A typo in a date? Build fails. Missing tag array? Build fails. A hero image pointing to a file that is not on disk? Build fails — before your readers ever see the damage.

In Part 1 we scaffolded the project. In Part 2 we wired up islands for interactivity. Now we build the content layer that makes the whole thing worth deploying.

The Content Layer API

Astro 5 replaced the legacy content system with the Content Layer API. The old approach relied on magic directories and implicit conventions. The new system is explicit: you define collections with loaders, validate with Zod schemas, and query with type-safe functions.

The single source of truth is src/content.config.ts. Not src/content/config.ts (that was Astro 4). Not a contentlayer.config.ts. One file, one schema definition, full TypeScript inference.

Defining Collections

A collection needs two things: a loader that tells Astro where to find files, and a schema that tells Astro what shape the data must have.

Here is the blog collection from Blogcraft:

import { defineCollection, z } from 'astro:content';
import { glob } from 'astro/loaders';

const blog = defineCollection({
  loader: glob({ pattern: '**/*.mdx', base: './src/content/blog' }),
  schema: ({ image }) => z.object({
    title: z.string().max(100),
    description: z.string().max(300),
    pubDate: z.coerce.date(),
    updatedDate: z.coerce.date().optional(),
    heroImage: image(),
    heroAlt: z.string(),
    author: z.string().default('default'),
    tags: z.array(z.string()),
    category: z.enum([
      'engineering', 'architecture', 'data', 'devops',
      'ai-ml', 'rust', 'frontend', 'tutorial'
    ]),
    draft: z.boolean().default(false),
    featured: z.boolean().default(false),
    series: z.object({
      slug: z.string(),
      order: z.number(),
    }).optional(),
    toc: z.boolean().default(true),
    readingTime: z.number().optional(),
  }),
});

Every piece of this schema earns its place. z.string().max(100) on the title prevents runaway SEO titles. z.coerce.date() accepts both 2026-04-10 and "2026-04-10T00:00:00Z" — it coerces the string to a Date object. z.enum() on category restricts values to a known set, so a typo like "tutoral" fails the build instead of silently creating an uncategorized post.

The image() helper deserves special attention.

Image Validation with image()

The image() function is not a standard Zod type. It is an Astro-provided schema helper passed through the destructured { image } parameter in the schema function. It does two things no regular z.string() can:

Validates the file exists at build time. If your frontmatter says heroImage: ../../assets/images/heroes/missing.jpg and that file is not on disk, the build fails immediately with a clear error pointing to the exact file and field.
Returns processed image metadata. The resolved value is not a string path — it is an image object with src, width, height, and format properties. This metadata feeds directly into Astro’s <Image> and <Picture> components for responsive image generation.

When you use the processed image in a layout:

---
import { Picture } from 'astro:assets';

const { post, headings, readingTime } = Astro.props;
---

<article>
  <Picture
    src={post.data.heroImage}
    alt={post.data.heroAlt}
    widths={[400, 800, 1200]}
    formats={['avif', 'webp', 'jpeg']}
    class="hero-image"
  />
  <!-- ... -->
</article>

Astro’s image service (Sharp) generates optimized variants at build time — AVIF, WebP, and JPEG fallback at multiple widths. The <Picture> component emits a <picture> element with <source> tags and proper srcset attributes. Zero runtime cost, zero layout shift (because width and height are known from the schema).

Multiple Collection Types

Content Collections are not limited to MDX. Any structured data with a loader and schema qualifies. Blogcraft uses three collections:

const authors = defineCollection({
  loader: glob({ pattern: '**/*.json', base: './src/content/authors' }),
  schema: z.object({
    name: z.string(),
    bio: z.string(),
    avatar: z.string(),
    social: z.object({
      github: z.string().url().optional(),
      twitter: z.string().url().optional(),
      linkedin: z.string().url().optional(),
    }).optional(),
  }),
});

const series = defineCollection({
  loader: glob({ pattern: '**/*.yaml', base: './src/content/series' }),
  schema: ({ image }) => z.object({
    title: z.string().max(100),
    description: z.string().max(300),
    coverImage: image(),
    coverAlt: z.string(),
    status: z.enum(['ongoing', 'complete']),
    comingSoon: z.array(z.string()).default([]),
  }),
});

export const collections = { blog, authors, series };

JSON for authors. YAML for series metadata. MDX for posts. The glob loader handles all of them — you just change the pattern and base path. Each collection gets its own fully-typed schema, and Astro generates TypeScript types for all of them.

Querying Collections

Astro 5 provides two primary query functions: getCollection for lists and getEntry for single items.

import { getCollection, getEntry } from 'astro:content';

// All published posts, sorted newest first
const posts = (await getCollection('blog', ({ data }) => !data.draft))
  .sort((a, b) => b.data.pubDate.valueOf() - a.data.pubDate.valueOf());

// Single author by ID (filename without extension)
const author = await getEntry('authors', 'default');

// Filter by tag
const rustPosts = await getCollection('blog', ({ data }) =>
  !data.draft && data.tags.includes('rust')
);

// Filter by series
const seriesPosts = (await getCollection('blog', ({ data }) =>
  !data.draft && data.series?.slug === 'astro-deep-dive'
)).sort((a, b) => (a.data.series?.order ?? 0) - (b.data.series?.order ?? 0));

The filter callback in getCollection runs at build time. It receives the full typed data object, so TypeScript knows data.draft is a boolean, data.tags is string[], and data.series is { slug: string; order: number } | undefined. No type assertions. No as any. The schema is the type.

Dynamic Routes with getStaticPaths

Every blog post needs a URL. In Astro 5, dynamic routes use getStaticPaths to generate pages at build time:

---
import { getCollection, render } from 'astro:content';
import BlogPostLayout from '../../layouts/BlogPostLayout.astro';
import Callout from '../../components/mdx/Callout.astro';
import ComparisonTable from '../../components/mdx/ComparisonTable.astro';
import Tabs from '../../components/mdx/Tabs.astro';
// ... other MDX components

export async function getStaticPaths() {
  const posts = await getCollection('blog', ({ data }) => !data.draft);
  return posts.map((post) => ({
    params: { slug: post.id },
    props: { post },
  }));
}

const { post } = Astro.props;
const { Content, headings, remarkPluginFrontmatter } = await render(post);
const readingTime = remarkPluginFrontmatter?.minutesRead ?? '5 min read';

const mdxComponents = {
  Callout,
  ComparisonTable,
  Tabs,
  // ... register all MDX components
};
---
<BlogPostLayout post={post} headings={headings} readingTime={readingTime}>
  <Content components={mdxComponents} />
</BlogPostLayout>

Two critical Astro 5 changes to note here:

render() is a standalone import, not a method on the entry. The old entry.render() from Astro 4 is gone. You import render from astro:content and call render(entry).
post.id is the content identifier. In Astro 5’s Content Layer, id is derived from the filename (minus the extension). There is no separate slug property — id serves as both the identifier and the route parameter.

Type Safety: Schema to TypeScript

When you run astro dev or astro build, Astro generates TypeScript types from your schemas into .astro/types.d.ts. Every getCollection('blog') call returns CollectionEntry<'blog'>[] where CollectionEntry<'blog'> has a fully typed data property matching your Zod schema.

This means:

Autocomplete works on post.data.title, post.data.tags, post.data.series?.order
Accessing post.data.nonexistent is a compile error
The category field autocompletes to 'engineering' | 'architecture' | 'data' | ...
post.data.pubDate is typed as Date, not string

You never write an interface for your frontmatter. The schema is the interface.

Series Support

Linking posts within a series requires the series field in the schema and a query that groups by series slug:

import { getCollection } from 'astro:content';

export async function getSeriesPosts(seriesSlug: string) {
  return (await getCollection('blog', ({ data }) =>
    !data.draft && data.series?.slug === seriesSlug
  )).sort((a, b) =>
    (a.data.series?.order ?? 0) - (b.data.series?.order ?? 0)
  );
}

export function getSeriesNavigation(
  posts: Awaited<ReturnType<typeof getSeriesPosts>>,
  currentOrder: number,
) {
  const currentIndex = posts.findIndex(
    (p) => p.data.series?.order === currentOrder,
  );
  return {
    prev: currentIndex > 0 ? posts[currentIndex - 1] : null,
    next: currentIndex < posts.length - 1 ? posts[currentIndex + 1] : null,
    total: posts.length,
    current: currentIndex + 1,
  };
}

The layout calls getSeriesNavigation and renders prev/next links. Because the series order is a number in the schema, sorting is deterministic. No string comparison surprises.

Schema Parity with Your CMS

If you use a CMS like Keystatic alongside Content Collections, the CMS field definitions must match the Zod schema exactly. A field that is required in the schema but optional in the CMS means authors can save content that fails the build.

Blogcraft enforces parity with a prebuild script:

// Parse the Zod schema and Keystatic config programmatically
// Compare field names, types, required/optional status
// Fail the build if any drift is detected

// Runs during: pnpm build → prebuild hook
// Exit code 1 = drift detected, build aborted

This script runs before every build. If someone adds a field to the Keystatic config without updating content.config.ts (or vice versa), the build fails with a clear diff showing exactly what drifted.

The Content Pipeline

MDX files do not go straight from source to HTML. They pass through a pipeline of remark and rehype plugins, each transforming the content:

Each plugin operates on the AST (abstract syntax tree) at a specific stage. Remark plugins work on the Markdown AST (mdast). Rehype plugins work on the HTML AST (hast). The order within each stage matters — rehypeSlug must run before rehypeAutolinkHeadings because autolink needs the IDs that slug generates.

Here is the relevant section from astro.config.ts:

markdown: {
  remarkPlugins: [remarkGfm, remarkReadingTime],
  rehypePlugins: [
    rehypeSlug,
    [rehypeAutolinkHeadings, { behavior: 'append' }],
    [rehypeMermaid, {
      strategy: 'inline-svg',
      mermaidConfig: {
        fontFamily: 'General Sans, sans-serif',
        theme: 'neutral',
      },
    }],
  ],
},

The remarkReadingTime plugin is a custom remark plugin that counts words and injects a minutesRead string into remarkPluginFrontmatter. The render() function exposes this via its return value, so the layout can display “8 min read” without any client-side calculation.

Integration Order: Why It Matters

In astro.config.ts, the integrations array is ordered. This is not cosmetic — it determines the build pipeline:

integrations: [
  expressiveCode({ /* ... */ }),  // MUST come first
  mdx(),                          // MUST come after expressiveCode
  react(),
  keystatic(),
  sitemap(),
],

Expressive Code must register before MDX. It intercepts code fences and transforms them into styled blocks with syntax highlighting, line numbers, and frame decorations. If MDX processes the code fences first, Expressive Code never sees them and your code blocks render as unstyled <pre> tags.

This is not documented prominently. It will cost you an hour of debugging if you get it wrong.

Approach	Schema Validation	Type Safety	Image Processing	Multi-Format
Astro Content Layer	Zod at build time	Full TypeScript inference	Built-in image() helper	MDX, JSON, YAML, custom loaders
Contentlayer (deprecated)	Zod-like, custom DSL	Generated types	Manual	MDX, JSON
Manual frontmatter parsing	None (runtime errors)	Manual interfaces	Manual	Whatever you build
gray-matter + manual Zod	Possible but DIY	Manual wiring	Manual	Whatever you build

Putting It All Together

Here is the complete content.config.ts that powers Blogcraft — three collections, three formats, full type safety:

import { defineCollection, z } from 'astro:content';
import { glob } from 'astro/loaders';

const blog = defineCollection({
  loader: glob({ pattern: '**/*.mdx', base: './src/content/blog' }),
  schema: ({ image }) => z.object({
    title: z.string().max(100),
    description: z.string().max(300),
    pubDate: z.coerce.date(),
    updatedDate: z.coerce.date().optional(),
    heroImage: image(),
    heroAlt: z.string(),
    author: z.string().default('default'),
    tags: z.array(z.string()),
    category: z.enum([
      'engineering', 'architecture', 'data', 'devops',
      'ai-ml', 'rust', 'frontend', 'tutorial',
    ]),
    draft: z.boolean().default(false),
    featured: z.boolean().default(false),
    series: z.object({ slug: z.string(), order: z.number() }).optional(),
    toc: z.boolean().default(true),
    readingTime: z.number().optional(),
  }),
});

const authors = defineCollection({
  loader: glob({ pattern: '**/*.json', base: './src/content/authors' }),
  schema: z.object({
    name: z.string(),
    bio: z.string(),
    avatar: z.string(),
    social: z.object({
      github: z.string().url().optional(),
      twitter: z.string().url().optional(),
      linkedin: z.string().url().optional(),
    }).optional(),
  }),
});

const series = defineCollection({
  loader: glob({ pattern: '**/*.yaml', base: './src/content/series' }),
  schema: ({ image }) => z.object({
    title: z.string().max(100),
    description: z.string().max(300),
    coverImage: image(),
    coverAlt: z.string(),
    status: z.enum(['ongoing', 'complete']),
    comingSoon: z.array(z.string()).default([]),
  }),
});

export const collections = { blog, authors, series };

Every MDX file, every JSON author profile, every YAML series definition passes through Zod validation before a single HTML page is generated. A malformed date, a missing image, a misspelled category — the build catches it.

What We Covered

This post established the content foundation:

Content Layer API with defineCollection and glob loader replaces the legacy content system
Zod schemas validate every field at build time, including the image() helper that catches missing files
content.config.ts is the single source of truth for all content types
getCollection and render provide type-safe querying and rendering (note: render(entry), not entry.render())
Dynamic routes via getStaticPaths generate one HTML page per content entry
Remark and rehype plugins transform MDX through a deterministic pipeline
Integration order in astro.config.ts is load-bearing — Expressive Code before MDX, always

In Part 4, we deploy all of this to Cloudflare Pages with GitHub Actions, Lighthouse CI, and the configuration that makes deploy day boring.

Next in this series

Astro Deep Dive: Deployment, CI/CD, and Lighthouse 100/100