Marklift

Documentation

URL → Clean Markdown. Fetch a webpage, extract main content, convert to LLM-friendly Markdown. Built for agents and pipelines.

Install
Node.js 18+.
npm install marklift
Usage (programmatic)
Source is inferred from URL (twitter/x.com → twitter, reddit → reddit, else website).
import { urlToMarkdown } from "marklift";

const result = await urlToMarkdown("https://example.com/article", {
  timeout: 10_000,
});
const tweet = await urlToMarkdown("https://x.com/user/status/123");

console.log(result.title);
console.log(result.markdown);
console.log(result.wordCount, result.sections.length, result.links.length);
CLI
Install globally: npm install -g marklift
marklift https://example.com
marklift https://x.com/user/status/123   # twitter adapter
marklift https://reddit.com/r/...         # reddit adapter
marklift https://example.com --json      # full result as JSON
marklift https://example.com --timeout 15000
marklift https://example.com --chunk-size 2000
marklift https://example.com --source website

Options: --source (website | twitter | reddit), --timeout, --chunk-size, --json.

Streaming
urlToMarkdownStream(url, options?) — async generator yielding MarkdownChunk.
for await (const chunk of urlToMarkdownStream("https://blog.example.com/post")) {
  process.stdout.write(chunk.content);
}
Errors

InvalidUrlError — invalid or non-HTTP(S) URL

FetchError — network, timeout, or non-2xx

ParseError — Readability or parsing failure

Markdown format (per source)
Each adapter outputs markdown with a frontmatter block (------) then the body.

Website (and reddit). Format type: website.

---
source: https://example.com/article
canonical: https://example.com/article
title: Example Article Title
description: Short meta description
author: John Doe
published_at: 2025-01-12
language: en
content_hash: <sha256>
word_count: 1243
---

# Title

Body content…

Twitter

---
platform: twitter
source: https://twitter.com/username/status/1234567890
tweet_id: 1234567890
author:
  name: Author Name
published_at: 2025-01-10T18:22:00Z
language: en
content_hash: <sha256>
---

Body content…

← Back to home