How Dolex Works
Dolex turns CSV files into expert-level visualizations. This page walks through how the system works under the hood — from how Claude communicates with Dolex to how charts get rendered.
The Big Picture
Dolex is an npm package that exposes visualization intelligence through three interfaces: an MCP server (for AI assistants), a React component library (for apps), and a programmatic API (for scripts). All three share the same pattern library, selector, and rendering pipeline.
The key insight: an LLM will always suggest bar/line/pie. Dolex's pattern selector encodes design expertise as scoring rules — it knows when a bump chart, beeswarm, Sankey diagram, or waffle chart is the right answer.
MCP Protocol Layer
The Model Context Protocol lets AI assistants call external tools over a standardized JSON-RPC transport. Dolex runs as a stdio MCP server — the assistant's host process spawns it, sends tool calls over stdin, and reads results from stdout.
Server Identity
name: 'dolex'version: '1.0.0'Transport
StdioServerTransportJSON-RPC 2.0 over stdin/stdout
For Claude Desktop, a shell wrapper (scripts/mcp-server.sh) sets up nvm PATH before launching the server, so Node.js is found regardless of the host environment.
All 18 Tools
The server registers 15 tools organized into four groups. Tools marked with App use MCP Apps for inline chart rendering.
Visualization
visualizeAppChart data from inline arrays, cached query results, or loaded CSVs. Provide one data source: data array, resultId from query_data, or sourceId + sql to query a loaded CSV server-side (saves tokens). Scores all 43 patterns → returns a compact response with specId, recommended pattern + reasoning, alternatives, and data shape summary. The full spec with data is stored server-side; the pre-rendered chart HTML goes to the UI via structuredContent. Set title and subtitle upfront to avoid a refine round-trip. Use maxAlternativeChartTypes to control how many alternatives (default: 2, set 0 for none). Set includeDataTable to add a companion sortable data table with linked highlighting. Optional pattern parameter to force a specific chart type.
list_patternsReturns all 43 patterns with descriptions, data requirements, and capabilities. The LLM uses this to understand what's available before calling visualize.
refine_visualizationAppTweaks a visualization by specId: sort, limit, flip axes, change colors, highlight values, update titles. Pass the specId from a previous visualize or refine call — no data round-trip needed. Use selectAlternative to switch to a different pattern from the original alternatives. Returns a compact response with just specId + changes applied (no spec echo).
CSV Data
load_csvLoads a CSV file or directory by path. Datasets persist across server restarts — re-loading an existing dataset reconnects automatically. Set detail: "compact" to get just column names/types + row counts (saves tokens); default "full" includes stats, top values, and sample rows.
list_dataLists all loaded datasets with IDs and table counts.
remove_dataRemoves a loaded dataset by ID.
describe_dataRe-examines a dataset mid-conversation. Set detail: "compact" for just column names/types + row counts (saves tokens); default "full" includes numeric stats, categorical top values, and sample rows.
query_dataRuns a SQL query against a dataset and returns tabular results plus a resultId. Pass that resultId to visualize to chart the same data without re-sending rows (saves tokens). Supports JOINs, GROUP BY, window functions, CTEs, and custom aggregates (MEDIAN, STDDEV, P25, P75, P10, P90). Useful for exploration, validation, or query-then-visualize workflows.
analyze_dataExamines a loaded dataset and generates a structured analysis plan with ready-to-execute queries. Returns 4-6 analysis steps covering trends, comparisons, distributions, and relationships — each with a title, question, query, and suggested chart patterns. Use this after load_csv to get an automatic analysis plan.
Derived Columns
transform_dataCreates computed columns using expressions like zscore(revenue), rank(sales), or (revenue - cost) / revenue * 100. Columns start as session-only; promote them to persist across sessions.
promote_columnsPromotes working columns to derived (persisted). Derived columns are saved to .dolex.json and automatically restored when the CSV is reloaded.
list_transformsLists all columns for a table grouped by layer: source (original CSV), derived (persisted), working (session-only).
drop_columnsDrops derived or working columns. Validates that no other columns depend on it before dropping.
Privacy & Cache Management
server_statusShows what data Dolex currently holds in memory: cached visualization specs (with data row counts), query result cache size, loaded CSV datasets, and server uptime. Use this to audit data retention.
clear_cacheClears cached data from server memory. Scope options: "all" (specs + results), "specs" (visualization specs and their embedded data), "results" (query result cache). Use after working with sensitive data or to free memory.
Debugging & Export
export_htmlReturns the full, self-contained HTML for a previously-created visualization. Pass a specId — the returned HTML is a complete document with embedded D3 and data, suitable for saving to a file or opening in a browser.
screenshotRenders a visualization to a PNG image via headless Chromium. Returns a base64-encoded PNG. Pass a specId from any visualize or refine call. Requires Playwright to be installed.
Pattern Selector Intelligence
The selector (src/patterns/selector.ts) is the core IP. It analyzes data shape + user intent, scores every registered pattern against a set of hand-tuned selection rules, and returns ranked recommendations.
Key functions
| Function | Purpose |
|---|---|
selectPattern(data, columns, intent, options?) | Primary entry point. Builds context, scores all patterns, returns ranked recommendations. |
buildMatchContext(data, columns, intent) | Analyzes data into a PatternMatchContext: row count, column types, cardinality, time series detection, hierarchy detection, negative values. |
parseIntent(intent) | Classifies the intent string into a primary category (comparison, distribution, time, etc.) by keyword matching. Returns scores for all categories. |
scorePattern(pattern, ctx, intentResult) | Evaluates one pattern: runs its selectionRules against the context, adds category alignment boost, returns total score + matched rules. |
selectColumnsForPattern(pattern, columns) | Picks the best columns for a pattern based on its category. Time patterns get date first; comparison gets categorical first; scatter gets two numerics; etc. |
buildRecommendation(scored, data, columns) | Generates a VisualizationSpec by calling the pattern's generateSpec() with selected columns. Returns null if spec generation fails. |
Selection Rules
Each of the 43 patterns defines an array of selection rules. A rule has a condition (human-readable label), a weight (positive = boost, negative = penalty), and a matches(ctx) function that tests the data context.
// Example: bar chart selection rule
{
condition: 'Moderate categories (5-15)',
weight: 30,
matches: (ctx) =>
ctx.dataShape.categoryCount >= 5 &&
ctx.dataShape.categoryCount <= 15,
}The scoring pipeline sums all matching rule weights, adds a category alignment boost when the inferred intent matches the pattern's category, and sorts. The top recommendation always wins — no randomness, fully deterministic.
Force Pattern Override
The pattern parameter (MCP) / forcePattern option (API) lets callers bypass scoring. The selector still runs the full pipeline (so alternatives are available), then promotes or constructs the forced pattern as the recommended result. Unknown IDs or spec failures fall back gracefully with a note in the reasoning.
Data Flow: visualize Tool
Here's exactly what happens when the visualize tool is called, step by step.
Token-Efficient Response
The text content returned to the LLM is compact — just a specId, recommended pattern + title + reasoning, alternatives (pattern + reasoning only), and a data shape summary. The full spec with data is stored server-side in the SpecStore and never round-tripped through the conversation. The pre-rendered chart HTML goes to the UI via structuredContent.
For refinements, the LLM just passes back the specId — no data, no encoding, no config. The refine response is equally lean: just specId + an array of changes applied (no spec echo). This reduces a typical visualize + 2 refines workflow from ~36,500 tokens to ~3,000.
Additional token-saving features: set title and subtitle on the initial visualize call to avoid a refine round-trip just for titles. Use resultId from a query_data call to pass data by reference instead of re-sending rows. Use detail: "compact" on load_csv / describe_data to get minimal column metadata when full stats aren't needed.
Key functions
| Function | Purpose |
|---|---|
handleVisualizeCore(selectPatterns) | Shared core for all visualize data paths. Takes resolved data + args, infers columns, threads forcePattern, applies color prefs, decides compound wrapping, generates HTML, stores spec in SpecStore, returns compact response. |
handleVisualize(selectPatterns, deps?) | MCP handler factory. Resolves data from args.data (inline), args.resultId (cached query result), or args.sourceId + args.sql (server-side CSV query), then delegates to handleVisualizeCore. |
inferColumns(data) | Heuristic column detection: numeric (>70% parseable numbers), date (key name or YYYY- prefix), id (high cardinality + id-like name), categorical (everything else). |
applyColorPreferences(spec, prefs) | Mutates a spec to set palette, highlight values, or color field from the colorPreferences parameter. |
shouldCompound(spec, options) | Decides if a chart should be wrapped with a companion data table. Considers data size, pattern type, and caller preference. |
buildCompoundSpec(spec, columns) | Wraps an atomic VisualizationSpec into a CompoundVisualizationSpec with chart + table views. |
SpecStore.save(spec, columns, alternatives) | Stores the full spec + alternatives server-side, returns a specId. Max 100 entries with LRU eviction. Entries expire after 1 hour — a background timer purges stale data every 5 minutes. |
CSV Mode
CSV mode is Dolex's answer to the biggest problem in AI-driven data visualization: token cost. A typical CSV might have 500,000 rows — passing that through the AI assistant's context window would burn tens of thousands of tokens per visualization. CSV mode ensures the raw data never touches the AI assistant at all.
The Token-Efficient Architecture
The AI assistant never sees your data. It sees a compact profile (column types, stats, top values, a few sample rows) and writes a tiny query. The Dolex server — running locally on your machine — handles the actual data, runs the query, selects the pattern, and renders the chart. The result that flows back through the assistant is a spec + pre-rendered HTML.
Result: Visualizing a 500,000-row CSV costs roughly 500 tokens total (profile + query + compact specId response) instead of millions of tokens for inline data. Refinements cost ~100 tokens each (just specId + refinement text). Your CSV files stay on your machine, the AI assistant works from the profile, and Dolex does the heavy lifting.
CSV Persistence
Loaded CSVs persist to ~/.dolex/sources.json automatically. On server restart, list_data shows previously-loaded datasets immediately. The first query triggers a lazy reload — no upfront overhead. Re-calling load_csv with an existing name is idempotent: it reloads and returns the schema without duplicating the entry.
Compact vs. Full Schema
Both load_csv and describe_data accept a detail parameter. In "full" mode (default), you get numeric stats, categorical top values, and sample rows. In "compact" mode, you get just column names, types, and row counts — perfect for multi-table sources where the assistant already knows the schema and just needs a quick reminder.
What the AI Assistant Sees vs. What It Doesn't
Sees (small, via tool responses)
- Column names and inferred types
- Numeric stats, top values, samples (full mode)
- Column names + types + row counts only (compact mode)
- Table names and row counts
- specId + recommended pattern + reasoning (no data)
- resultId for query-then-visualize reuse
Never sees (stays on your machine)
- The actual data rows
- Internal queries (compiled and run server-side)
- Query result sets (fed directly into renderer)
- The rendered chart HTML (goes to UI, not LLM)
- CSV file paths beyond the dataset ID
The Server-Side Pipeline
SQL Capabilities
Claude writes standard SQL. Dolex runs it locally via SQLite with some handy extensions.
Aggregations
SUM, AVG, COUNT, MIN, MAXMEDIAN, STDDEV, P10, P25, P75, P90Window Functions
ROW_NUMBER(), RANK(), DENSE_RANK()LAG(), LEAD(), SUM() OVER()JOINs
SELECT * FROM orders JOIN customers ON ...CTEs
WITH totals AS (...) SELECT ...Key functions
| Function | Purpose |
|---|---|
SourceManager(persistPath?) | Constructor. If persistPath is set, loads saved sources from disk on startup and auto-saves on every mutation. |
SourceManager.add(name, config) | Registers a CSV dataset and persists it. Returns a dataset ID. Does NOT load yet (lazy). |
SourceManager.connect(sourceId) | Loads the CSV into memory for querying. Called lazily on first query. |
SourceManager.getSchema(sourceId) | Returns tables and column profiles. In full mode: types, stats, top values, samples. In compact mode: just names and types. |
SourceManager.querySql(sourceId, sql, maxRows?) | Executes a SQL SELECT query against your data and returns result rows. Supports custom aggregates (MEDIAN, STDDEV, P25, P75, P10, P90). Results are cached with a resultId for reuse. |
Compound Visualizations
A compound visualization pairs a chart with a sortable data table and links them with interactive highlighting. When you hover a bar, the corresponding table row highlights — and vice versa.
Chart View
Table View
The interaction bus uses postMessage between the parent compound document and the chart iframe. The compound HTML builder produces a single self-contained document with CSS grid layout.
Rendering Pipeline
Three rendering targets share a single source of truth: the D3 renderers.
The Build Pipeline
HTML builders are 2-line wrapper files (html/render/*.render.ts) that re-export from D3 renderers. The npm run build:bundles command uses esbuild to compile each renderer into an IIFE string stored in _generated/bundles.ts.
This means there's one source of truth for each chart's rendering logic. Edit a D3 renderer, run npm run build:bundles, and both React and HTML outputs pick up the change.
D3 Renderer Organization
comparison/10 patternsbar (vertical + horizontal), diverging-bar, slope-chart, connected-dot-plot, bump-chart, lollipop, bullet, grouped-bar, waterfall
distribution/7 patternshistogram, beeswarm, violin, ridgeline, strip-plot, box-plot, density-plot
composition/9 patternsstacked-bar, waffle, treemap, sunburst, circle-pack, metric, donut, marimekko, icicle
time/7 patternsline, area, small-multiples, sparkline-grid, calendar-heatmap, stream-graph, horizon-chart
relationship/5 patternsscatter, connected-scatter, parallel-coordinates, radar, heatmap
flow/4 patternssankey, alluvial, chord, funnel
geo/2 patternschoropleth, proportional-symbol
MCP Apps Integration
MCP Apps lets tools render rich UI inline in the AI assistant. Dolex uses this to show charts directly in the Claude Desktop conversation instead of just returning JSON.
How the App Shell Works
The app shell (src/mcp/app-shell.ts) is a minimal HTML page registered as a resource at ui://dolex/chart.html. Claude Desktop loads it once, then for each tool call:
- The host sends a
ui/notifications/tool-resultnotification with the chart HTML - The shell creates a nested
srcdociframe with the chart HTML - The shell sends
ui/notifications/size-changedwith the desired height (500px charts, 700px compound)
CSP Policy
_meta.ui.csp and resource content _meta.ui.csp must allow d3js.org (D3.js library). Geo map data and the TopoJSON parser are bundled inline.Dual Response
content (compact JSON with specId + metadata) and structuredContent (HTML the user sees). The LLM never sees the rendered chart or the data — it works from the specId and pattern metadata.Color System
Colors are applied through the colorPreferences parameter on the visualize tool, or via natural language refinement with refine_visualization (using a specId from a previous call).
Named Palettes
blueRed, greenPurple, tealOrange, redGreen
traffic-light, profit-loss, temperature
Highlight Mode
Color Field
The applyColorPreferences() function mutates the spec's encoding.color before HTML generation. All palettes are WCAG AA compliant and tested for colorblind safety.
Data Privacy & Cache Management
Dolex runs entirely on your machine. No telemetry, no analytics, no data sent to Anthropic or any third party. This section explains what data the server holds and how to manage it.
What's in Memory
The MCP server is a long-lived process — it starts when Claude Desktop launches and stays running for the session. Data accumulates across tool calls and is automatically cleaned up by TTL-based expiration.
Spec Store
Result Cache
query_data calls. 10-minute TTL, max 20 entries. Auto-evicted on access.Loaded CSVs
remove_data or server restart.Inspecting & Clearing Data
Two tools give you visibility and control over cached data:
server_status
clear_cache
"all", "specs", or "results". Use after working with sensitive data.HTML Output
Every chart embeds its data as JSON in the HTML document for client-side rendering. Data is capped at 10,000 rows per view to limit exposure. The HTML lives in a sandboxed iframe and is not persisted by the MCP host.
CSV Persistence
Loaded CSV paths persist to ~/.dolex/sources.json automatically. Use remove_data to delete entries when done.
What Leaves Your Machine
| Connection | What's Sent |
|---|---|
D3.js CDN (d3js.org) | HTTP GET only — loads the D3.js charting library at render time |
No telemetry. No analytics. No data sent to Anthropic or any third party. Pattern selection, data processing, and chart rendering all happen locally in the Dolex server process on your machine.