Convert Markdown to HTML, PDF, and DOCX

Convert Markdown to HTML, PDF, and DOCX

Markdown is the format you write in. HTML, PDF, and DOCX are the formats you usually need to send. This page covers the tools, the pitfalls, and how to automate the conversion.

Markdown to HTML

The simplest conversion, and the one Markdown was originally designed for. Every Markdown renderer produces HTML as its output, the only differences are which Markdown extensions are supported (footnotes, tables, task lists, math equations) and how strictly the output is sanitized.

For programmatic conversion, the most popular libraries are markdown-it (JavaScript), commonmark.py (Python), blackfriday or goldmark (Go), and cmark (C, used as the reference implementation). For command-line use, pandoc handles every dialect you are likely to encounter.

Markdown to PDF

PDF conversion has two paths. The print-style path uses a headless browser (Puppeteer, Playwright) to render the HTML and capture it as PDF — output looks like a printed web page, which is fine for casual documents but ignores PDF-specific features like bookmarks, named destinations, or accessibility tagging.

The publication-quality path goes through LaTeX. Pandoc converts Markdown to LaTeX, then a LaTeX engine (xelatex, lualatex) compiles to PDF. This gives you proper typography, footnote handling, citation management, and equation rendering, but requires a working LaTeX installation and the patience to debug occasional package conflicts.

Markdown to DOCX

Almost always done through Pandoc. The conversion preserves headings, lists, basic formatting, tables, and inline code; it does not preserve custom CSS, JavaScript, or embedded HTML. If your DOCX recipient expects styled content, you can supply a Pandoc reference document (an existing DOCX file Pandoc uses for styles) so the output matches your house style.

What gets lost in conversion

Every format loses something. HTML loses Markdown’s syntactic simplicity once rendered. PDF loses everything dynamic — links work, but no hover states, no JavaScript, no responsive layout. DOCX loses tight typographic control because Word’s layout engine does not respect CSS-style precision.

For long-form content meant to be archived (research papers, books, regulatory filings), PDF is the only format that preserves layout indefinitely; HTML changes as browsers evolve, DOCX changes as Word evolves. For collaborative documents meant to be edited, DOCX is still the lingua franca. For everything else, HTML — generated fresh from Markdown — is the most future-proof option.

Automating conversion in a pipeline

If you maintain Markdown documentation and need to publish in multiple formats, a CI pipeline that runs Pandoc on every commit eliminates the manual conversion step entirely. GitHub Actions can convert Markdown to PDF and attach the result as a release asset, or push the generated HTML to GitHub Pages, or upload the DOCX to a shared drive. The setup is usually under fifty lines of YAML and pays for itself the first time you avoid manually re-exporting a document because someone fixed a typo.

Further reading