Core Concepts
Understand how Ballast organizes data, manages access, and powers search.
Collections
A collection is the fundamental unit in Ballast—a searchable index that contains data from one or more sources. Each collection:
- Has its own vector embedding space
- Can contain data from multiple sources (Slack + Google Drive + PostgreSQL, for example)
- Has independent access controls (members with viewer/editor/admin roles)
- Exposes its own MCP server for AI agent integration
- Can be searched via API with a collection-scoped API key
Think of collections as purpose-built knowledge bases. You might have separate collections for “Engineering Docs”, “Customer Support”, and “Sales Collateral”—each with different sources and access permissions.
Sources and Connections
A source is an integration type (PostgreSQL, Slack, Google Drive, etc.). A source connection is a configured instance of that source attached to a collection.
When you connect a source:
- Authentication: OAuth flow or credential entry
- Discovery: Ballast lists available content (tables, channels, folders)
- Configuration: You select what to sync
- Sync: Data is fetched, chunked, and embedded
- Indexing: Vectors are stored for semantic search
Ballast supports 60+ integrations across databases, cloud storage, SaaS apps, and developer tools. See Integrations for the full list.
Personal vs Shared Sources
Ballast distinguishes between two connection types:
Shared Sources
Configured at the collection level, visible to all collection members:
- Databases (PostgreSQL, MySQL, BigQuery)
- Shared drives (Google Shared Drives, SharePoint)
- Team tools (Slack workspaces, GitHub orgs, Jira projects)
Shared sources are indexed once and available to everyone with collection access.
Personal Sources
Connected by individual users, encrypted with user-specific keys:
- Personal Gmail
- Personal Google Drive (My Drive)
- Personal Slack DMs
- Personal calendar
When you search, Ballast merges results from shared sources with your personal sources. Other users—including admins—cannot access your personal source data.
Entities and Chunking
When Ballast syncs a source, content is processed into:
- Entities: Individual items (documents, messages, database rows)
- Chunks: Smaller segments optimized for search
Ballast uses structure-aware chunking that respects document boundaries—paragraphs, code blocks, table cells—rather than splitting at arbitrary character limits. This significantly improves search relevance.
Each entity and chunk stores:
- Original content
- Vector embedding (for semantic search)
- Metadata (source, timestamps, author, etc.)
- Relations to other entities (optional)
Search Modes
Ballast supports multiple search strategies:
Semantic Search
Converts your query to a vector embedding and finds chunks with similar vectors. Best for natural language questions:
What's our policy on remote work? Keyword Search
Traditional text matching. Best for exact terms, code, or identifiers:
TICKET-1234 Hybrid Search
Combines semantic and keyword approaches. Balances meaning with exact matches.
Query Expansion
AI generates variations of your query to improve recall. For “remote work policy”, it might also search for “work from home guidelines” and “distributed team rules”.
Federated Search
For some sources (databases, external APIs), Ballast queries the source in real-time rather than searching a local index. This ensures results are always fresh.
AI Features
Ballast integrates AI throughout the search pipeline:
- Filter Interpretation: Extract structured filters from natural language (“emails from Sarah last week” → author filter + date range)
- Reranking: Use Cohere or Jina to reorder results by relevance
- Answer Generation: Synthesize answers from top search results
- Chart Generation: Create visualizations from structured data
- Query Routing: Automatically select the best search strategy
API Keys and Scopes
API keys provide programmatic access. Each key has a scope:
| Scope | Access |
|---|---|
| Collection | Single collection only |
| Organization | All collections in org |
| User | Includes personal sources |
Keys are prefixed with bk_ and stored as bcrypt hashes—even Ballast can’t retrieve your key after creation.
Access Control
Collections use role-based access:
| Role | Permissions |
|---|---|
| Viewer | Search and browse |
| Editor | Add sources, configure sync |
| Admin | Manage members, API keys |
| Owner | Full control, delete collection |
Organization-level roles (Admin, Owner) provide cross-collection permissions.
Sync and Scheduling
Syncs can be:
- Manual: Triggered via UI or API
- Scheduled: Cron-based (every 15 minutes, hourly, daily)
- Webhook-triggered: Real-time for supported sources
Ballast uses incremental sync where possible—only fetching content that changed since the last sync. Sync cursors track progress for reliable resumption.
Dashboards
The Chat interface can generate visualizations. Save these as dashboards:
- Live Dashboards: Auto-refresh on schedule
- Snapshots: Frozen point-in-time view
- Published Dashboards: Shareable via public URL (with optional password)
Dashboards support custom layouts—rearrange, resize, and organize charts as needed.