How Dolex Works

Dolex turns your own data — CSV files, a live PostgreSQL database, or MongoDB collections — into expert-level visualizations. This page walks through how the system works under the hood — from how Claude communicates with Dolex to how charts get rendered.

The Big Picture

Dolex is an npm package that exposes visualization intelligence through three interfaces: an MCP server (for AI assistants), a React component library (for apps), and a programmatic API (for scripts). All three share the same pattern library, selector, and rendering pipeline.

AI Assistant

Claude, GPT, etc.

→

MCP Protocol

stdio / JSON-RPC

→

Dolex MCP Server

src/mcp/index.ts

↓

Pattern Selector

43 patterns scored

→

Spec Generation

VisualizationSpec

→

HTML Builder

Self-contained D3 doc

↓

MCP Apps Shell

Inline rendering in Claude

Dolex's pattern selector encodes design expertise as scoring rules — it matches the visualization to the shape of your data and the question being asked, reaching for a bump chart, beeswarm, Sankey diagram, or waffle chart when the data calls for one.

MCP Protocol Layer

The Model Context Protocol lets AI assistants call external tools over a standardized JSON-RPC transport. Dolex runs as a stdio MCP server — the assistant's host process spawns it, sends tool calls over stdin, and reads results from stdout.

Claude Desktop

Host process

↔stdio

Dolex Server

McpServer instance

Server Identity

name: 'dolex'
version: '1.0.0'

Transport

StdioServerTransport
JSON-RPC 2.0 over stdin/stdout

For Claude Desktop, a shell wrapper (scripts/mcp-server.sh) sets up nvm PATH before launching the server, so Node.js is found regardless of the host environment.

All 22 Tools

The server registers 22 tools in total — the 20 below cover everything you do from an assistant, organized into six groups. (The other two write charts to disk for the CLI.) Tools marked with App use MCP Apps for inline chart rendering.

Visualization

visualizeApp

Chart data from inline arrays, cached query results, or loaded CSVs. Provide one data source: data array, resultId from query_data, or sourceId + sql to query a loaded CSV server-side (saves tokens). Scores all 43 patterns → returns a compact response with specId, recommended pattern + reasoning, alternatives, and data shape summary. The full spec with data is stored server-side; the pre-rendered chart HTML goes to the UI via structuredContent. Set title and subtitle upfront to avoid a refine round-trip. Use maxAlternativeChartTypes to control how many alternatives (default: 2, set 0 for none). Set includeDataTable to add a companion sortable data table with linked highlighting. Optional pattern parameter to force a specific chart type.

list_patterns

Returns all 43 patterns with descriptions, data requirements, and capabilities. The LLM uses this to understand what's available before calling visualize.

refine_visualizationApp

Tweaks a visualization by specId: sort, limit, flip axes, change colors, highlight values, update titles. Pass the specId from a previous visualize or refine call — no data round-trip needed. Use selectAlternative to switch to a different pattern from the original alternatives. Returns a compact response with just specId + changes applied (no spec echo).

Data Sources

load_source

Loads a data source — a CSV file/directory (type "csv" + path), a live PostgreSQL database (type "postgres" + uri or host/database/user), or a MongoDB database (type "mongodb" + uri/host + database). Type defaults to "csv". Datasets persist across server restarts — re-loading reconnects automatically. Returns a smart summary: column names, types, numeric ranges, categorical values — enough to query without a describe_data round-trip.

list_data

Lists all loaded datasets with IDs and table counts.

remove_data

Removes a loaded dataset by ID.

capabilities

Reports what this Dolex install can do here: which source types are ready (CSV always; Postgres/MongoDB only if their optional driver is installed), whether python3 is present (for clean_column), and the exact command to enable anything missing. Call it before loading a Postgres/Mongo source to tell the user how to install a missing driver instead of hitting an error.

test_source

Health-checks a registered Postgres/Mongo source — is its saved database reachable with its credentials? Returns a classified reason (unreachable / auth-failed / db-not-found / driver-missing) so you can tell the user exactly what to fix. A source can be registered while its DB is down; use this to confirm it once it's up.

describe_data

Re-examines a dataset mid-conversation. Set detail: "compact" for just column names/types + row counts (saves tokens); default "full" includes numeric stats, categorical top values, and sample rows.

query_data

Queries a loaded dataset and returns tabular results plus a resultId. Pass that resultId to visualize to chart the same data without re-sending rows (saves tokens). SQL for CSV/Postgres — JOINs, GROUP BY, window functions, CTEs, custom aggregates (MEDIAN, STDDEV, P25, P75, P10, P90); MongoDB is queried with an aggregation pipeline. Useful for exploration, validation, or query-then-visualize workflows.

analyze_data

Examines a loaded dataset and generates a structured analysis plan with ready-to-execute queries. Returns 4-6 analysis steps covering trends, comparisons, distributions, and relationships — each with a title, question, query, and suggested chart patterns. Use this after load_source to get an automatic analysis plan.

Derived Columns

transform_data

Creates computed columns using expressions like zscore(revenue), rank(sales), or (revenue - cost) / revenue * 100. Columns start as session-only; promote them to persist across sessions.

promote_columns

Promotes working columns to derived (persisted). Derived columns are saved to .dolex.json and automatically restored when the CSV is reloaded.

list_transforms

Lists all columns for a table grouped by layer: source (original CSV), derived (persisted), working (session-only).

drop_columns

Drops derived or working columns. Validates that no other columns depend on it before dropping.

clean_column

Fixes one messy column with a Python clean(value) function the assistant writes — parse inconsistent dates, map null sentinels (N/A, -, 999) to real nulls, canonicalize category spellings. Previews before→after with stats first, then apply:true materializes a non-destructive <column>_clean alongside the original. Requires python3 on PATH.

Privacy & Cache Management

server_status

Shows what data Dolex currently holds in memory: cached visualization specs (with data row counts), query result cache size, loaded CSV datasets, and server uptime. Use this to audit data retention.

clear_cache

Clears cached data from server memory. Scope options: "all" (specs + results), "specs" (visualization specs and their embedded data), "results" (query result cache). Use after working with sensitive data or to free memory.

Debugging & Export

export_html

Returns the full, self-contained HTML for a previously-created visualization. Pass a specId — the returned HTML is a complete document with embedded D3 and data, suitable for saving to a file or opening in a browser.

screenshot

Renders a visualization to a PNG image via headless Chromium. Returns a base64-encoded PNG. Pass a specId from any visualize or refine call. PNG export is optional — enable it once with `npm install playwright && npx playwright install chromium`; everything else works without it.

Pattern Selector Intelligence

The selector (src/patterns/selector.ts) is the core IP. It analyzes data shape + user intent, scores every registered pattern against a set of hand-tuned selection rules, and returns ranked recommendations.

Data + Columns

rows, types, cardinality

Intent String

"compare sales by region"

↓

buildMatchContext()

Analyze data shape

↓

parseIntent()

Classify intent category

↓

scorePattern() × 43

Run selection rules, sum weights

↓

Sort by Score

Top N+1 → generateSpec()

↓

SelectionResult

recommended + alternatives

Key functions

Function	Purpose
`selectPattern(data, columns, intent, options?)`	Primary entry point. Builds context, scores all patterns, returns ranked recommendations.
`buildMatchContext(data, columns, intent)`	Analyzes data into a PatternMatchContext: row count, column types, cardinality, time series detection, hierarchy detection, negative values.
`parseIntent(intent)`	Classifies the intent string into a primary category (comparison, distribution, time, etc.) by keyword matching. Returns scores for all categories.
`scorePattern(pattern, ctx, intentResult)`	Evaluates one pattern: runs its selectionRules against the context, adds category alignment boost, returns total score + matched rules.
`selectColumnsForPattern(pattern, columns)`	Picks the best columns for a pattern based on its category. Time patterns get date first; comparison gets categorical first; scatter gets two numerics; etc.
`buildRecommendation(scored, data, columns)`	Generates a VisualizationSpec by calling the pattern's generateSpec() with selected columns. Returns null if spec generation fails.

Selection Rules

Each of the 43 patterns defines an array of selection rules. A rule has a condition (human-readable label), a weight (positive = boost, negative = penalty), and a matches(ctx) function that tests the data context.

// Example: bar chart selection rule
{
  condition: 'Moderate categories (5-15)',
  weight: 30,
  matches: (ctx) =>
    ctx.dataShape.categoryCount >= 5 &&
    ctx.dataShape.categoryCount <= 15,
}

The scoring pipeline sums all matching rule weights, adds a category alignment boost when the inferred intent matches the pattern's category, and sorts. The top recommendation always wins — no randomness, fully deterministic.

Force Pattern Override

The pattern parameter (MCP) / forcePattern option (API) lets callers bypass scoring. The selector still runs the full pipeline (so alternatives are available), then promotes or constructs the forced pattern as the recommended result. Unknown IDs or spec failures fall back gracefully with a note in the reasoning.

Data Flow: visualize Tool

Here's exactly what happens when the visualize tool is called, step by step.

1. MCP Tool Call

visualize (inline data, resultId, or sourceId + SQL)

↓

2. Data Resolution

Inline array, cached result lookup, or query your source

↓

3. Column Inference

Auto-detect types if not provided

↓

4. Pattern Selection

selectPattern() → recommended + alternatives

↓

5. Title/Subtitle + Colors

Apply inline overrides + palette / highlight to spec

↓

6. Compound Decision

shouldCompound() → wrap with data table?

↓

7. HTML Generation

buildChartHtml() or buildCompoundHtml()

↓

8. Store + Respond

SpecStore saves full spec; text has specId + metadata only

Token-Efficient Response

The text content returned to the LLM is compact — just a specId, recommended pattern + title + reasoning, alternatives (pattern + reasoning only), and a data shape summary. The full spec with data is stored server-side in the SpecStore and never round-tripped through the conversation. The pre-rendered chart HTML goes to the UI via structuredContent.

For refinements, the LLM just passes back the specId — no data, no encoding, no config. The refine response is equally lean: just specId + an array of changes applied (no spec echo). This reduces a typical visualize + 2 refines workflow from ~36,500 tokens to ~3,000.

Additional token-saving features: set title and subtitle on the initial visualize call to avoid a refine round-trip just for titles. Use resultId from a query_data call to pass data by reference instead of re-sending rows. Use detail: "compact" on load_source / describe_data to get minimal column metadata when full stats aren't needed.

Key functions

Function	Purpose
`handleVisualizeCore(selectPatterns)`	Shared core for all visualize data paths. Takes resolved data + args, infers columns, threads forcePattern, applies color prefs, decides compound wrapping, generates HTML, stores spec in SpecStore, returns compact response.
`handleVisualize(selectPatterns, deps?)`	MCP handler factory. Resolves data from args.data (inline), args.resultId (cached query result), or args.sourceId + args.sql (server-side CSV query), then delegates to handleVisualizeCore.
`inferColumns(data)`	Heuristic column detection: numeric (>70% parseable numbers), date (key name or YYYY- prefix), id (high cardinality + id-like name), categorical (everything else).
`applyColorPreferences(spec, prefs)`	Mutates a spec to set palette, highlight values, or color field from the colorPreferences parameter.
`shouldCompound(spec, options)`	Decides if a chart should be wrapped with a companion data table. Considers data size, pattern type, and caller preference.
`buildCompoundSpec(spec, columns)`	Wraps an atomic VisualizationSpec into a CompoundVisualizationSpec with chart + table views.
`SpecStore.save(spec, columns, alternatives)`	Stores the full spec + alternatives server-side, returns a specId. Max 100 entries with LRU eviction. Entries expire after 1 hour — a background timer purges stale data every 5 minutes.

Source Mode: CSV, Postgres & Mongo

Source mode is Dolex's answer to the biggest problem in AI-driven data visualization: token cost. A source might have 500,000 rows — passing that through the AI assistant's context window would burn tens of thousands of tokens per visualization. Source mode ensures the raw data never touches the AI assistant at all, whether that source is a CSV file, a live PostgreSQL database, or MongoDB.

Three source types, one contract. A CSV file/directory loads into a local in-memory SQLite database; a Postgres source is queried in place over a pooled connection (with declared foreign keys read straight from the schema); a MongoDB source profiles each collection as a table and runs aggregation pipelines. The assistant sees the same profile-then-query flow in every case — it never learns which engine is behind the source. The Postgres and Mongo drivers are optional; capabilities reports what's installed and how to enable the rest.

The Token-Efficient Architecture

The AI assistant never sees your data. It sees a compact profile (column types, stats, top values, a few sample rows) and writes a tiny query. The Dolex server — running locally on your machine — handles the actual data, runs the query, selects the pattern, and renders the chart. The result that flows back through the assistant is a spec + pre-rendered HTML.

AI Assistant

Sees profiles only

↓

1. load_source

{"name": "sales", "path": "/data/sales.csv"}

↓

Dolex Server

Loads CSV locally

↓

Returns Profile

~200 tokens: column types, stats, 5 sample rows

↓

AI Assistant Reads Profile

Understands the data shape without seeing the data

↓

2. visualize

{"sourceId": "src-abc", "sql": "SELECT ...", "intent": "..."}

↓

Dolex Server

Runs query against local data

↓

Pattern Selection + Rendering

All happens server-side with the actual data

↓

Returns specId + Metadata

~200 tokens of compact JSON + pre-rendered chart HTML for the UI

Result: Visualizing a 500,000-row CSV costs roughly 500 tokens total (profile + query + compact specId response) instead of millions of tokens for inline data. Refinements cost ~100 tokens each (just specId + refinement text). Your CSV files stay on your machine, the AI assistant works from the profile, and Dolex does the heavy lifting.

Source Persistence

Loaded sources persist to ~/.dolex/sources.json automatically (a CSV path, or a database connection with its credentials kept out of the file). On server restart, list_data shows previously-loaded datasets immediately. The first query triggers a lazy reconnect — no upfront overhead. Re-calling load_source with an existing name is idempotent: it reloads and returns the schema without duplicating the entry.

Compact vs. Full Schema

Both load_source and describe_data accept a detail parameter. In "full" mode (default), you get numeric stats, categorical top values, and sample rows. In "compact" mode, you get just column names, types, and row counts — perfect for multi-table sources where the assistant already knows the schema and just needs a quick reminder.

What the AI Assistant Sees vs. What It Doesn't

Sees (small, via tool responses)

Column names and inferred types
Numeric stats, top values, samples (full mode)
Column names + types + row counts only (compact mode)
Table names and row counts
specId + recommended pattern + reasoning (no data)
resultId for query-then-visualize reuse

Never sees (stays on your machine)

The actual data rows
Internal queries (compiled and run server-side)
Query result sets (fed directly into renderer)
The rendered chart HTML (goes to UI, not LLM)
CSV file paths beyond the dataset ID

The Server-Side Pipeline

load_source

CSV / Postgres / Mongo

→

SourceManager

Persistent, lazy connections

→

Column Profiles

Compact or full detail

↓

Query

select, filter, groupBy, join

→

SQLite

Runs the query locally

→

Result Rows + Cache

Returns resultId for reuse

SQL Capabilities

Claude writes standard SQL. Dolex runs it locally via SQLite with some handy extensions.

Aggregations

Standard SQL plus custom aggregates:
SUM, AVG, COUNT, MIN, MAX
MEDIAN, STDDEV, P10, P25, P75, P90

Window Functions

Rankings and comparisons:
ROW_NUMBER(), RANK(), DENSE_RANK()
LAG(), LEAD(), SUM() OVER()

JOINs

Query across multiple CSVs in the same folder:
SELECT * FROM orders JOIN customers ON ...

CTEs

Common Table Expressions for complex queries:
WITH totals AS (...) SELECT ...

Key functions

Function	Purpose
`SourceManager(persistPath?)`	Constructor. If persistPath is set, loads saved sources from disk on startup and auto-saves on every mutation.
`SourceManager.add(name, config)`	Registers a data source (CSV / Postgres / Mongo) and persists it. Returns a dataset ID. Does NOT connect yet (lazy).
`SourceManager.connect(sourceId)`	Loads the CSV into memory for querying. Called lazily on first query.
`SourceManager.getSchema(sourceId)`	Returns tables and column profiles. In full mode: types, stats, top values, samples. In compact mode: just names and types.
`SourceManager.querySql(sourceId, sql, maxRows?)`	Executes a SQL SELECT query against your data and returns result rows. Supports custom aggregates (MEDIAN, STDDEV, P25, P75, P10, P90). Results are cached with a resultId for reuse.

Compound Visualizations

A compound visualization pairs a chart with a sortable data table and links them with interactive highlighting. When you hover a bar, the corresponding table row highlights — and vice versa.

shouldCompound()

Decides yes/no

→

buildCompoundSpec()

Wraps spec + table

→

buildCompoundHtml()

CSS grid + interaction bus

Chart View

The primary visualization, rendered via the pattern's HTML builder in an iframe. Emits highlight events on hover.

Table View

A sortable, scrollable data table showing the underlying data. Receives highlight events and scrolls to + highlights matching rows.

The interaction bus uses postMessage between the parent compound document and the chart iframe. The compound HTML builder produces a single self-contained document with CSS grid layout.

Rendering Pipeline

Three rendering targets share a single source of truth: the D3 renderers.

D3 Renderers

src/renderers/d3/ — 43 files

↓

React Components

useChart() hook wraps D3

HTML Builders

esbuild bundles D3 → IIFE

Playground

Direct D3 in browser

The Build Pipeline

HTML builders are 2-line wrapper files (html/render/*.render.ts) that re-export from D3 renderers. The npm run build:bundles command uses esbuild to compile each renderer into an IIFE string stored in _generated/bundles.ts.

D3 Renderer

e.g. renderers/d3/comparison/bar.ts

→

Render Wrapper

html/render/bar.render.ts

→

esbuild

→ IIFE string in bundles.ts

This means there's one source of truth for each chart's rendering logic. Edit a D3 renderer, run npm run build:bundles, and both React and HTML outputs pick up the change.

D3 Renderer Organization

comparison/10 patterns

bar (vertical + horizontal), diverging-bar, slope-chart, connected-dot-plot, bump-chart, lollipop, bullet, grouped-bar, waterfall

distribution/7 patterns

histogram, beeswarm, violin, ridgeline, strip-plot, box-plot, density-plot

composition/9 patterns

stacked-bar, waffle, treemap, sunburst, circle-pack, metric, donut, marimekko, icicle

time/7 patterns

line, area, small-multiples, sparkline-grid, calendar-heatmap, stream-graph, horizon-chart

relationship/5 patterns

scatter, connected-scatter, parallel-coordinates, radar, heatmap

flow/4 patterns

sankey, alluvial, chord, funnel

geo/2 patterns

choropleth, proportional-symbol

MCP Apps Integration

MCP Apps lets tools render rich UI inline in the AI assistant. Dolex uses this to show charts directly in the Claude Desktop conversation instead of just returning JSON.

Tool Call

visualize / refine

→

Handler

Returns content + structuredContent

↓

content

JSON for the LLM (spec, reasoning)

structuredContent.html

Pre-rendered chart HTML

↓

App Shell

ui://dolex/chart.html

↓

Chart Iframe

srcdoc = tool result HTML

How the App Shell Works

The app shell (src/mcp/app-shell.ts) is a minimal HTML page registered as a resource at ui://dolex/chart.html. Claude Desktop loads it once, then for each tool call:

The host sends a ui/notifications/tool-result notification with the chart HTML
The shell creates a nested srcdoc iframe with the chart HTML
The shell sends ui/notifications/size-changed with the desired height (500px charts, 700px compound)

CSP Policy

Both tool _meta.ui.csp and resource content _meta.ui.csp must allow d3js.org (D3.js library). Geo map data and the TopoJSON parser are bundled inline.

Dual Response

Every App tool returns both content (compact JSON with specId + metadata) and structuredContent (HTML the user sees). The LLM never sees the rendered chart or the data — it works from the specId and pattern metadata.

Color System

Colors are applied through the colorPreferences parameter on the visualize tool, or via natural language refinement with refine_visualization (using a specId from a previous call).

Named Palettes

categorical, blue, green, purple, warm
blueRed, greenPurple, tealOrange, redGreen
traffic-light, profit-loss, temperature

Highlight Mode

Specify values to emphasize — all others become muted gray. Custom highlight colors, muted color, and muted opacity.

Color Field

Override which data field drives color encoding. Automatically sets nominal type if not specified.

The applyColorPreferences() function mutates the spec's encoding.color before HTML generation. All palettes are WCAG AA compliant and tested for colorblind safety.

Data Privacy & Cache Management

Dolex runs entirely on your machine. No telemetry, no analytics, no data sent to Anthropic or any third party. This section explains what data the server holds and how to manage it.

What's in Memory

The MCP server is a long-lived process — it starts when Claude Desktop launches and stays running for the session. Data accumulates across tool calls and is automatically cleaned up by TTL-based expiration.

Spec Store

Cached visualization specs with embedded data. 1-hour TTL, max 100 entries. Background cleanup every 5 minutes.

Result Cache

Query results from query_data calls. 10-minute TTL, max 20 entries. Auto-evicted on access.

Loaded Sources

CSV data held in memory; Postgres/Mongo held as pooled connections. Persist until remove_data or server restart.

Inspecting & Clearing Data

Two tools give you visibility and control over cached data:

server_status

Shows cached spec count and total data rows, result cache size, connected sources, and server uptime. Use this to audit what's in memory before clearing.

clear_cache

Clears cached data by scope: "all", "specs", or "results". Use after working with sensitive data.

HTML Output

Every chart embeds its data as JSON in the HTML document for client-side rendering. Data is capped at 10,000 rows per view to limit exposure. The HTML lives in a sandboxed iframe and is not persisted by the MCP host.

Source Persistence

Loaded source definitions persist to ~/.dolex/sources.json automatically — a CSV path, or a database connection whose password is kept out of the file (read from an env var at connect time). Use remove_data to delete entries when done.

What Leaves Your Machine

Connection	What's Sent
`D3.js CDN (d3js.org)`	HTTP GET only — loads the D3.js charting library at render time
`Your own database (optional)`	Only if you connect a Postgres/Mongo source: Dolex sends queries to the database you configured — your own infrastructure, never a third party. CSV sources make no network connection at all.

No telemetry. No analytics. No data sent to Anthropic or any third party. Pattern selection, data processing, and chart rendering all happen locally in the Dolex server process on your machine.