Core Concepts

Understand how Ballast organizes data, manages access, and powers search.

Collections

A collection is the fundamental unit in Ballast—a searchable index that contains data from one or more sources. Each collection:

Has its own vector embedding space
Can contain data from multiple sources (Slack + Google Drive + PostgreSQL, for example)
Has independent access controls (members with viewer/editor/admin roles)
Exposes its own MCP server for AI agent integration
Can be searched via API with a collection-scoped API key

Think of collections as purpose-built knowledge bases. You might have separate collections for “Engineering Docs”, “Customer Support”, and “Sales Collateral”—each with different sources and access permissions.

Sources and Connections

A source is an integration type (PostgreSQL, Slack, Google Drive, etc.). A source connection is a configured instance of that source attached to a collection.

When you connect a source:

Authentication: OAuth flow or credential entry
Discovery: Ballast lists available content (tables, channels, folders)
Configuration: You select what to sync
Sync: Data is fetched, chunked, and embedded
Indexing: Vectors are stored for semantic search

Ballast supports 60+ integrations across databases, cloud storage, SaaS apps, and developer tools. See Integrations for the full list.

Personal vs Shared Sources

Ballast distinguishes between two connection types:

Shared Sources

Configured at the collection level, visible to all collection members:

Databases (PostgreSQL, MySQL, BigQuery)
Shared drives (Google Shared Drives, SharePoint)
Team tools (Slack workspaces, GitHub orgs, Jira projects)

Shared sources are indexed once and available to everyone with collection access.

Personal Sources

Connected by individual users, encrypted with user-specific keys:

Personal Gmail
Personal Google Drive (My Drive)
Personal Slack DMs
Personal calendar

When you search, Ballast merges results from shared sources with your personal sources. Other users—including admins—cannot access your personal source data.

Entities and Chunking

When Ballast syncs a source, content is processed into:

Entities: Individual items (documents, messages, database rows)
Chunks: Smaller segments optimized for search

Ballast uses structure-aware chunking that respects document boundaries—paragraphs, code blocks, table cells—rather than splitting at arbitrary character limits. This significantly improves search relevance.

Each entity and chunk stores:

Original content
Vector embedding (for semantic search)
Metadata (source, timestamps, author, etc.)
Relations to other entities (optional)

Search Modes

Ballast supports multiple search strategies:

Semantic Search

Converts your query to a vector embedding and finds chunks with similar vectors. Best for natural language questions:

What's our policy on remote work?

Keyword Search

Traditional text matching. Best for exact terms, code, or identifiers:

TICKET-1234

Hybrid Search

Combines semantic and keyword approaches. Balances meaning with exact matches.

Query Expansion

AI generates variations of your query to improve recall. For “remote work policy”, it might also search for “work from home guidelines” and “distributed team rules”.

Federated Search

For some sources (databases, external APIs), Ballast queries the source in real-time rather than searching a local index. This ensures results are always fresh.

AI Features

Ballast integrates AI throughout the search pipeline:

Filter Interpretation: Extract structured filters from natural language (“emails from Sarah last week” → author filter + date range)
Reranking: Use Cohere or Jina to reorder results by relevance
Answer Generation: Synthesize answers from top search results
Chart Generation: Create visualizations from structured data
Query Routing: Automatically select the best search strategy

API Keys and Scopes

API keys provide programmatic access. Each key has a scope:

Scope	Access
Collection	Single collection only
Organization	All collections in org
User	Includes personal sources

Keys are prefixed with bk_ and stored as bcrypt hashes—even Ballast can’t retrieve your key after creation.

Access Control

Collections use role-based access:

Role	Permissions
Viewer	Search and browse
Editor	Add sources, configure sync
Admin	Manage members, API keys
Owner	Full control, delete collection

Organization-level roles (Admin, Owner) provide cross-collection permissions.

Sync and Scheduling

Syncs can be:

Manual: Triggered via UI or API
Scheduled: Cron-based (every 15 minutes, hourly, daily)
Webhook-triggered: Real-time for supported sources

Ballast uses incremental sync where possible—only fetching content that changed since the last sync. Sync cursors track progress for reliable resumption.

Dashboards

The Chat interface can generate visualizations. Save these as dashboards:

Live Dashboards: Auto-refresh on schedule
Snapshots: Frozen point-in-time view
Published Dashboards: Shareable via public URL (with optional password)

Dashboards support custom layouts—rearrange, resize, and organize charts as needed.