How Dolex Works

Dolex turns CSV files into expert-level visualizations. This page walks through how the system works under the hood — from how Claude communicates with Dolex to how charts get rendered.

01

The Big Picture

Dolex is an npm package that exposes visualization intelligence through three interfaces: an MCP server (for AI assistants), a React component library (for apps), and a programmatic API (for scripts). All three share the same pattern library, selector, and rendering pipeline.

AI Assistant
Claude, GPT, etc.
MCP Protocol
stdio / JSON-RPC
Dolex MCP Server
src/mcp/index.ts
Pattern Selector
43 patterns scored
Spec Generation
VisualizationSpec
HTML Builder
Self-contained D3 doc
MCP Apps Shell
Inline rendering in Claude

The key insight: an LLM will always suggest bar/line/pie. Dolex's pattern selector encodes design expertise as scoring rules — it knows when a bump chart, beeswarm, Sankey diagram, or waffle chart is the right answer.

02

MCP Protocol Layer

The Model Context Protocol lets AI assistants call external tools over a standardized JSON-RPC transport. Dolex runs as a stdio MCP server — the assistant's host process spawns it, sends tool calls over stdin, and reads results from stdout.

Claude Desktop
Host process
stdio
Dolex Server
McpServer instance

Server Identity

name: 'dolex'
version: '1.0.0'

Transport

StdioServerTransport
JSON-RPC 2.0 over stdin/stdout

For Claude Desktop, a shell wrapper (scripts/mcp-server.sh) sets up nvm PATH before launching the server, so Node.js is found regardless of the host environment.

03

All 18 Tools

The server registers 15 tools organized into four groups. Tools marked with App use MCP Apps for inline chart rendering.

Visualization

visualizeApp

Chart data from inline arrays, cached query results, or loaded CSVs. Provide one data source: data array, resultId from query_data, or sourceId + sql to query a loaded CSV server-side (saves tokens). Scores all 43 patterns → returns a compact response with specId, recommended pattern + reasoning, alternatives, and data shape summary. The full spec with data is stored server-side; the pre-rendered chart HTML goes to the UI via structuredContent. Set title and subtitle upfront to avoid a refine round-trip. Use maxAlternativeChartTypes to control how many alternatives (default: 2, set 0 for none). Set includeDataTable to add a companion sortable data table with linked highlighting. Optional pattern parameter to force a specific chart type.

list_patterns

Returns all 43 patterns with descriptions, data requirements, and capabilities. The LLM uses this to understand what's available before calling visualize.

refine_visualizationApp

Tweaks a visualization by specId: sort, limit, flip axes, change colors, highlight values, update titles. Pass the specId from a previous visualize or refine call — no data round-trip needed. Use selectAlternative to switch to a different pattern from the original alternatives. Returns a compact response with just specId + changes applied (no spec echo).

CSV Data

load_csv

Loads a CSV file or directory by path. Datasets persist across server restarts — re-loading an existing dataset reconnects automatically. Set detail: "compact" to get just column names/types + row counts (saves tokens); default "full" includes stats, top values, and sample rows.

list_data

Lists all loaded datasets with IDs and table counts.

remove_data

Removes a loaded dataset by ID.

describe_data

Re-examines a dataset mid-conversation. Set detail: "compact" for just column names/types + row counts (saves tokens); default "full" includes numeric stats, categorical top values, and sample rows.

query_data

Runs a SQL query against a dataset and returns tabular results plus a resultId. Pass that resultId to visualize to chart the same data without re-sending rows (saves tokens). Supports JOINs, GROUP BY, window functions, CTEs, and custom aggregates (MEDIAN, STDDEV, P25, P75, P10, P90). Useful for exploration, validation, or query-then-visualize workflows.

analyze_data

Examines a loaded dataset and generates a structured analysis plan with ready-to-execute queries. Returns 4-6 analysis steps covering trends, comparisons, distributions, and relationships — each with a title, question, query, and suggested chart patterns. Use this after load_csv to get an automatic analysis plan.

Derived Columns

transform_data

Creates computed columns using expressions like zscore(revenue), rank(sales), or (revenue - cost) / revenue * 100. Columns start as session-only; promote them to persist across sessions.

promote_columns

Promotes working columns to derived (persisted). Derived columns are saved to .dolex.json and automatically restored when the CSV is reloaded.

list_transforms

Lists all columns for a table grouped by layer: source (original CSV), derived (persisted), working (session-only).

drop_columns

Drops derived or working columns. Validates that no other columns depend on it before dropping.

Privacy & Cache Management

server_status

Shows what data Dolex currently holds in memory: cached visualization specs (with data row counts), query result cache size, loaded CSV datasets, and server uptime. Use this to audit data retention.

clear_cache

Clears cached data from server memory. Scope options: "all" (specs + results), "specs" (visualization specs and their embedded data), "results" (query result cache). Use after working with sensitive data or to free memory.

Debugging & Export

export_html

Returns the full, self-contained HTML for a previously-created visualization. Pass a specId — the returned HTML is a complete document with embedded D3 and data, suitable for saving to a file or opening in a browser.

screenshot

Renders a visualization to a PNG image via headless Chromium. Returns a base64-encoded PNG. Pass a specId from any visualize or refine call. Requires Playwright to be installed.

04

Pattern Selector Intelligence

The selector (src/patterns/selector.ts) is the core IP. It analyzes data shape + user intent, scores every registered pattern against a set of hand-tuned selection rules, and returns ranked recommendations.

Data + Columns
rows, types, cardinality
Intent String
"compare sales by region"
buildMatchContext()
Analyze data shape
parseIntent()
Classify intent category
scorePattern() × 43
Run selection rules, sum weights
Sort by Score
Top N+1 → generateSpec()
SelectionResult
recommended + alternatives

Key functions

FunctionPurpose
selectPattern(data, columns, intent, options?)Primary entry point. Builds context, scores all patterns, returns ranked recommendations.
buildMatchContext(data, columns, intent)Analyzes data into a PatternMatchContext: row count, column types, cardinality, time series detection, hierarchy detection, negative values.
parseIntent(intent)Classifies the intent string into a primary category (comparison, distribution, time, etc.) by keyword matching. Returns scores for all categories.
scorePattern(pattern, ctx, intentResult)Evaluates one pattern: runs its selectionRules against the context, adds category alignment boost, returns total score + matched rules.
selectColumnsForPattern(pattern, columns)Picks the best columns for a pattern based on its category. Time patterns get date first; comparison gets categorical first; scatter gets two numerics; etc.
buildRecommendation(scored, data, columns)Generates a VisualizationSpec by calling the pattern's generateSpec() with selected columns. Returns null if spec generation fails.

Selection Rules

Each of the 43 patterns defines an array of selection rules. A rule has a condition (human-readable label), a weight (positive = boost, negative = penalty), and a matches(ctx) function that tests the data context.

// Example: bar chart selection rule
{
  condition: 'Moderate categories (5-15)',
  weight: 30,
  matches: (ctx) =>
    ctx.dataShape.categoryCount >= 5 &&
    ctx.dataShape.categoryCount <= 15,
}

The scoring pipeline sums all matching rule weights, adds a category alignment boost when the inferred intent matches the pattern's category, and sorts. The top recommendation always wins — no randomness, fully deterministic.

Force Pattern Override

The pattern parameter (MCP) / forcePattern option (API) lets callers bypass scoring. The selector still runs the full pipeline (so alternatives are available), then promotes or constructs the forced pattern as the recommended result. Unknown IDs or spec failures fall back gracefully with a note in the reasoning.

05

Data Flow: visualize Tool

Here's exactly what happens when the visualize tool is called, step by step.

1. MCP Tool Call
visualize (inline data, resultId, or sourceId + SQL)
2. Data Resolution
Inline array, cached result lookup, or query your CSV
3. Column Inference
Auto-detect types if not provided
4. Pattern Selection
selectPattern() → recommended + alternatives
5. Title/Subtitle + Colors
Apply inline overrides + palette / highlight to spec
6. Compound Decision
shouldCompound() → wrap with data table?
7. HTML Generation
buildChartHtml() or buildCompoundHtml()
8. Store + Respond
SpecStore saves full spec; text has specId + metadata only

Token-Efficient Response

The text content returned to the LLM is compact — just a specId, recommended pattern + title + reasoning, alternatives (pattern + reasoning only), and a data shape summary. The full spec with data is stored server-side in the SpecStore and never round-tripped through the conversation. The pre-rendered chart HTML goes to the UI via structuredContent.

For refinements, the LLM just passes back the specId — no data, no encoding, no config. The refine response is equally lean: just specId + an array of changes applied (no spec echo). This reduces a typical visualize + 2 refines workflow from ~36,500 tokens to ~3,000.

Additional token-saving features: set title and subtitle on the initial visualize call to avoid a refine round-trip just for titles. Use resultId from a query_data call to pass data by reference instead of re-sending rows. Use detail: "compact" on load_csv / describe_data to get minimal column metadata when full stats aren't needed.

Key functions

FunctionPurpose
handleVisualizeCore(selectPatterns)Shared core for all visualize data paths. Takes resolved data + args, infers columns, threads forcePattern, applies color prefs, decides compound wrapping, generates HTML, stores spec in SpecStore, returns compact response.
handleVisualize(selectPatterns, deps?)MCP handler factory. Resolves data from args.data (inline), args.resultId (cached query result), or args.sourceId + args.sql (server-side CSV query), then delegates to handleVisualizeCore.
inferColumns(data)Heuristic column detection: numeric (>70% parseable numbers), date (key name or YYYY- prefix), id (high cardinality + id-like name), categorical (everything else).
applyColorPreferences(spec, prefs)Mutates a spec to set palette, highlight values, or color field from the colorPreferences parameter.
shouldCompound(spec, options)Decides if a chart should be wrapped with a companion data table. Considers data size, pattern type, and caller preference.
buildCompoundSpec(spec, columns)Wraps an atomic VisualizationSpec into a CompoundVisualizationSpec with chart + table views.
SpecStore.save(spec, columns, alternatives)Stores the full spec + alternatives server-side, returns a specId. Max 100 entries with LRU eviction. Entries expire after 1 hour — a background timer purges stale data every 5 minutes.
06

CSV Mode

CSV mode is Dolex's answer to the biggest problem in AI-driven data visualization: token cost. A typical CSV might have 500,000 rows — passing that through the AI assistant's context window would burn tens of thousands of tokens per visualization. CSV mode ensures the raw data never touches the AI assistant at all.

The Token-Efficient Architecture

The AI assistant never sees your data. It sees a compact profile (column types, stats, top values, a few sample rows) and writes a tiny query. The Dolex server — running locally on your machine — handles the actual data, runs the query, selects the pattern, and renders the chart. The result that flows back through the assistant is a spec + pre-rendered HTML.

AI Assistant
Sees profiles only
1. load_csv
{"name": "sales", "path": "/data/sales.csv"}
Dolex Server
Loads CSV locally
Returns Profile
~200 tokens: column types, stats, 5 sample rows
AI Assistant Reads Profile
Understands the data shape without seeing the data
2. visualize
{"sourceId": "src-abc", "sql": "SELECT ...", "intent": "..."}
Dolex Server
Runs query against local data
Pattern Selection + Rendering
All happens server-side with the actual data
Returns specId + Metadata
~200 tokens of compact JSON + pre-rendered chart HTML for the UI

Result: Visualizing a 500,000-row CSV costs roughly 500 tokens total (profile + query + compact specId response) instead of millions of tokens for inline data. Refinements cost ~100 tokens each (just specId + refinement text). Your CSV files stay on your machine, the AI assistant works from the profile, and Dolex does the heavy lifting.

CSV Persistence

Loaded CSVs persist to ~/.dolex/sources.json automatically. On server restart, list_data shows previously-loaded datasets immediately. The first query triggers a lazy reload — no upfront overhead. Re-calling load_csv with an existing name is idempotent: it reloads and returns the schema without duplicating the entry.

Compact vs. Full Schema

Both load_csv and describe_data accept a detail parameter. In "full" mode (default), you get numeric stats, categorical top values, and sample rows. In "compact" mode, you get just column names, types, and row counts — perfect for multi-table sources where the assistant already knows the schema and just needs a quick reminder.

What the AI Assistant Sees vs. What It Doesn't

Sees (small, via tool responses)

  • Column names and inferred types
  • Numeric stats, top values, samples (full mode)
  • Column names + types + row counts only (compact mode)
  • Table names and row counts
  • specId + recommended pattern + reasoning (no data)
  • resultId for query-then-visualize reuse

Never sees (stays on your machine)

  • The actual data rows
  • Internal queries (compiled and run server-side)
  • Query result sets (fed directly into renderer)
  • The rendered chart HTML (goes to UI, not LLM)
  • CSV file paths beyond the dataset ID

The Server-Side Pipeline

load_csv
CSV file
SourceManager
Persistent, lazy connections
Column Profiles
Compact or full detail
Query
select, filter, groupBy, join
SQLite
Runs the query locally
Result Rows + Cache
Returns resultId for reuse

SQL Capabilities

Claude writes standard SQL. Dolex runs it locally via SQLite with some handy extensions.

Aggregations

Standard SQL plus custom aggregates:
SUM, AVG, COUNT, MIN, MAX
MEDIAN, STDDEV, P10, P25, P75, P90

Window Functions

Rankings and comparisons:
ROW_NUMBER(), RANK(), DENSE_RANK()
LAG(), LEAD(), SUM() OVER()

JOINs

Query across multiple CSVs in the same folder:
SELECT * FROM orders JOIN customers ON ...

CTEs

Common Table Expressions for complex queries:
WITH totals AS (...) SELECT ...

Key functions

FunctionPurpose
SourceManager(persistPath?)Constructor. If persistPath is set, loads saved sources from disk on startup and auto-saves on every mutation.
SourceManager.add(name, config)Registers a CSV dataset and persists it. Returns a dataset ID. Does NOT load yet (lazy).
SourceManager.connect(sourceId)Loads the CSV into memory for querying. Called lazily on first query.
SourceManager.getSchema(sourceId)Returns tables and column profiles. In full mode: types, stats, top values, samples. In compact mode: just names and types.
SourceManager.querySql(sourceId, sql, maxRows?)Executes a SQL SELECT query against your data and returns result rows. Supports custom aggregates (MEDIAN, STDDEV, P25, P75, P10, P90). Results are cached with a resultId for reuse.
07

Compound Visualizations

A compound visualization pairs a chart with a sortable data table and links them with interactive highlighting. When you hover a bar, the corresponding table row highlights — and vice versa.

shouldCompound()
Decides yes/no
buildCompoundSpec()
Wraps spec + table
buildCompoundHtml()
CSS grid + interaction bus

Chart View

The primary visualization, rendered via the pattern's HTML builder in an iframe. Emits highlight events on hover.

Table View

A sortable, scrollable data table showing the underlying data. Receives highlight events and scrolls to + highlights matching rows.

The interaction bus uses postMessage between the parent compound document and the chart iframe. The compound HTML builder produces a single self-contained document with CSS grid layout.

08

Rendering Pipeline

Three rendering targets share a single source of truth: the D3 renderers.

D3 Renderers
src/renderers/d3/ — 43 files
React Components
useChart() hook wraps D3
HTML Builders
esbuild bundles D3 → IIFE
Playground
Direct D3 in browser

The Build Pipeline

HTML builders are 2-line wrapper files (html/render/*.render.ts) that re-export from D3 renderers. The npm run build:bundles command uses esbuild to compile each renderer into an IIFE string stored in _generated/bundles.ts.

D3 Renderer
e.g. renderers/d3/comparison/bar.ts
Render Wrapper
html/render/bar.render.ts
esbuild
→ IIFE string in bundles.ts

This means there's one source of truth for each chart's rendering logic. Edit a D3 renderer, run npm run build:bundles, and both React and HTML outputs pick up the change.

D3 Renderer Organization

comparison/10 patterns

bar (vertical + horizontal), diverging-bar, slope-chart, connected-dot-plot, bump-chart, lollipop, bullet, grouped-bar, waterfall

distribution/7 patterns

histogram, beeswarm, violin, ridgeline, strip-plot, box-plot, density-plot

composition/9 patterns

stacked-bar, waffle, treemap, sunburst, circle-pack, metric, donut, marimekko, icicle

time/7 patterns

line, area, small-multiples, sparkline-grid, calendar-heatmap, stream-graph, horizon-chart

relationship/5 patterns

scatter, connected-scatter, parallel-coordinates, radar, heatmap

flow/4 patterns

sankey, alluvial, chord, funnel

geo/2 patterns

choropleth, proportional-symbol

09

MCP Apps Integration

MCP Apps lets tools render rich UI inline in the AI assistant. Dolex uses this to show charts directly in the Claude Desktop conversation instead of just returning JSON.

Tool Call
visualize / refine
Handler
Returns content + structuredContent
content
JSON for the LLM (spec, reasoning)
structuredContent.html
Pre-rendered chart HTML
App Shell
ui://dolex/chart.html
Chart Iframe
srcdoc = tool result HTML

How the App Shell Works

The app shell (src/mcp/app-shell.ts) is a minimal HTML page registered as a resource at ui://dolex/chart.html. Claude Desktop loads it once, then for each tool call:

  1. The host sends a ui/notifications/tool-result notification with the chart HTML
  2. The shell creates a nested srcdoc iframe with the chart HTML
  3. The shell sends ui/notifications/size-changed with the desired height (500px charts, 700px compound)

CSP Policy

Both tool _meta.ui.csp and resource content _meta.ui.csp must allow d3js.org (D3.js library). Geo map data and the TopoJSON parser are bundled inline.

Dual Response

Every App tool returns both content (compact JSON with specId + metadata) and structuredContent (HTML the user sees). The LLM never sees the rendered chart or the data — it works from the specId and pattern metadata.
10

Color System

Colors are applied through the colorPreferences parameter on the visualize tool, or via natural language refinement with refine_visualization (using a specId from a previous call).

Named Palettes

categorical, blue, green, purple, warm
blueRed, greenPurple, tealOrange, redGreen
traffic-light, profit-loss, temperature

Highlight Mode

Specify values to emphasize — all others become muted gray. Custom highlight colors, muted color, and muted opacity.

Color Field

Override which data field drives color encoding. Automatically sets nominal type if not specified.

The applyColorPreferences() function mutates the spec's encoding.color before HTML generation. All palettes are WCAG AA compliant and tested for colorblind safety.

11

Data Privacy & Cache Management

Dolex runs entirely on your machine. No telemetry, no analytics, no data sent to Anthropic or any third party. This section explains what data the server holds and how to manage it.

What's in Memory

The MCP server is a long-lived process — it starts when Claude Desktop launches and stays running for the session. Data accumulates across tool calls and is automatically cleaned up by TTL-based expiration.

Spec Store

Cached visualization specs with embedded data. 1-hour TTL, max 100 entries. Background cleanup every 5 minutes.

Result Cache

Query results from query_data calls. 10-minute TTL, max 20 entries. Auto-evicted on access.

Loaded CSVs

CSV files loaded into memory for querying. Persist until remove_data or server restart.

Inspecting & Clearing Data

Two tools give you visibility and control over cached data:

server_status

Shows cached spec count and total data rows, result cache size, connected sources, and server uptime. Use this to audit what's in memory before clearing.

clear_cache

Clears cached data by scope: "all", "specs", or "results". Use after working with sensitive data.

HTML Output

Every chart embeds its data as JSON in the HTML document for client-side rendering. Data is capped at 10,000 rows per view to limit exposure. The HTML lives in a sandboxed iframe and is not persisted by the MCP host.

CSV Persistence

Loaded CSV paths persist to ~/.dolex/sources.json automatically. Use remove_data to delete entries when done.

What Leaves Your Machine

ConnectionWhat's Sent
D3.js CDN (d3js.org)HTTP GET only — loads the D3.js charting library at render time

No telemetry. No analytics. No data sent to Anthropic or any third party. Pattern selection, data processing, and chart rendering all happen locally in the Dolex server process on your machine.