Core Concepts

Understand how Ballast organizes data, manages access, and powers search.

Collections

A collection is the fundamental unit in Ballast—a searchable index that contains data from one or more sources. Each collection:

  • Has its own vector embedding space
  • Can contain data from multiple sources (Slack + Google Drive + PostgreSQL, for example)
  • Has independent access controls (members with viewer/editor/admin roles)
  • Exposes its own MCP server for AI agent integration
  • Can be searched via API with a collection-scoped API key

Think of collections as purpose-built knowledge bases. You might have separate collections for “Engineering Docs”, “Customer Support”, and “Sales Collateral”—each with different sources and access permissions.

Sources and Connections

A source is an integration type (PostgreSQL, Slack, Google Drive, etc.). A source connection is a configured instance of that source attached to a collection.

When you connect a source:

  1. Authentication: OAuth flow or credential entry
  2. Discovery: Ballast lists available content (tables, channels, folders)
  3. Configuration: You select what to sync
  4. Sync: Data is fetched, chunked, and embedded
  5. Indexing: Vectors are stored for semantic search

Ballast supports 60+ integrations across databases, cloud storage, SaaS apps, and developer tools. See Integrations for the full list.

Personal vs Shared Sources

Ballast distinguishes between two connection types:

Shared Sources

Configured at the collection level, visible to all collection members:

  • Databases (PostgreSQL, MySQL, BigQuery)
  • Shared drives (Google Shared Drives, SharePoint)
  • Team tools (Slack workspaces, GitHub orgs, Jira projects)

Shared sources are indexed once and available to everyone with collection access.

Personal Sources

Connected by individual users, encrypted with user-specific keys:

  • Personal Gmail
  • Personal Google Drive (My Drive)
  • Personal Slack DMs
  • Personal calendar

When you search, Ballast merges results from shared sources with your personal sources. Other users—including admins—cannot access your personal source data.

Entities and Chunking

When Ballast syncs a source, content is processed into:

  • Entities: Individual items (documents, messages, database rows)
  • Chunks: Smaller segments optimized for search

Ballast uses structure-aware chunking that respects document boundaries—paragraphs, code blocks, table cells—rather than splitting at arbitrary character limits. This significantly improves search relevance.

Each entity and chunk stores:

  • Original content
  • Vector embedding (for semantic search)
  • Metadata (source, timestamps, author, etc.)
  • Relations to other entities (optional)

Search Modes

Ballast supports multiple search strategies:

Semantic Search

Converts your query to a vector embedding and finds chunks with similar vectors. Best for natural language questions:

What's our policy on remote work?

Keyword Search

Traditional text matching. Best for exact terms, code, or identifiers:

TICKET-1234

Hybrid Search

Combines semantic and keyword approaches. Balances meaning with exact matches.

Query Expansion

AI generates variations of your query to improve recall. For “remote work policy”, it might also search for “work from home guidelines” and “distributed team rules”.

Federated Search

For some sources (databases, external APIs), Ballast queries the source in real-time rather than searching a local index. This ensures results are always fresh.

AI Features

Ballast integrates AI throughout the search pipeline:

  • Filter Interpretation: Extract structured filters from natural language (“emails from Sarah last week” → author filter + date range)
  • Reranking: Use Cohere or Jina to reorder results by relevance
  • Answer Generation: Synthesize answers from top search results
  • Chart Generation: Create visualizations from structured data
  • Query Routing: Automatically select the best search strategy

API Keys and Scopes

API keys provide programmatic access. Each key has a scope:

ScopeAccess
CollectionSingle collection only
OrganizationAll collections in org
UserIncludes personal sources

Keys are prefixed with bk_ and stored as bcrypt hashes—even Ballast can’t retrieve your key after creation.

Access Control

Collections use role-based access:

RolePermissions
ViewerSearch and browse
EditorAdd sources, configure sync
AdminManage members, API keys
OwnerFull control, delete collection

Organization-level roles (Admin, Owner) provide cross-collection permissions.

Sync and Scheduling

Syncs can be:

  • Manual: Triggered via UI or API
  • Scheduled: Cron-based (every 15 minutes, hourly, daily)
  • Webhook-triggered: Real-time for supported sources

Ballast uses incremental sync where possible—only fetching content that changed since the last sync. Sync cursors track progress for reliable resumption.

Dashboards

The Chat interface can generate visualizations. Save these as dashboards:

  • Live Dashboards: Auto-refresh on schedule
  • Snapshots: Frozen point-in-time view
  • Published Dashboards: Shareable via public URL (with optional password)

Dashboards support custom layouts—rearrange, resize, and organize charts as needed.