Knowledge Base Search Methods and Parameters

This section covers FastGPT's knowledge base architecture, including its QA storage format and multi-vector mapping, to help you build better knowledge bases. It also explains each search parameter. This guide focuses on practical usage rather than in-depth theory.

Understanding Vectors

FastGPT uses an Embedding-based RAG approach for its knowledge base. To use FastGPT effectively, you need a basic understanding of how Embedding vectors work and their characteristics.

Human text, images, videos, and other media cannot be directly understood by computers. To determine whether two pieces of text are similar or related, they typically need to be converted into a computer-readable format — vectors are one such method.

A vector is essentially an array of numbers. The "distance" between two vectors can be calculated using mathematical formulas — the smaller the distance, the more similar the vectors. This maps back to text, images, videos, and other media to measure similarity between them. Vector search leverages this principle.

Since text comes in many types with countless combinations, exact matching is hard to guarantee when converting to vectors for similarity comparison. In vector-based knowledge bases, a top-k recall approach is typically used — finding the top k most similar results and passing them to an LLM for further semantic evaluation, logical reasoning, and summarization, enabling knowledge base Q&A. This makes vector search the most critical step in the process.

Many factors affect vector search accuracy, including: vector model quality, data quality (length, completeness, diversity), and retriever precision (the speed vs. accuracy tradeoff). Search query quality is equally important.

Retriever precision is relatively straightforward to address, and training vector models is more complex, so optimizing data and query quality becomes a key focus.

Improving Vector Search Accuracy

Better tokenization and chunking: When a text segment has complete and singular structure and semantics, accuracy improves. Many systems optimize their tokenizers to preserve data completeness.
Streamline index content by reducing vector content length: Shorter, more precise index content improves search accuracy, though it may narrow the search scope. Best suited for scenarios requiring strict answers.
Increase index quantity: Add multiple index entries for the same chunk to improve recall.
Optimize search queries: In practice, user questions are often vague or incomplete. Refining the query (search term) can significantly improve accuracy.
Fine-tune vector models: Off-the-shelf vector models are general-purpose and may underperform in specific domains. Fine-tuning can greatly improve domain-specific search results.

FastGPT Knowledge Base Architecture

Data Storage Structure

In FastGPT, a knowledge base consists of three parts: libraries, collections, and data entries. A collection can be thought of as a "file." A library can contain multiple collections, and a collection can contain multiple data entries. The smallest searchable unit is the library — searches span the entire library. Collections are only for organizing and managing data and do not affect search results (at least for now).

Vector Storage Structure

FastGPT uses PostgreSQL's PG Vector extension as the vector retriever, with HNSW indexing. PostgreSQL is used solely for vector search (this engine can be swapped for other databases), while MongoDB handles all other data storage.

In MongoDB's dataset.datas collection, vector source data is stored along with an indexes field that records corresponding vector IDs. This is an array, meaning a single data entry can map to multiple vectors.

In PostgreSQL, a vector field stores the vectors. During search, vectors are recalled first, then their IDs are used to look up the original data in MongoDB. If multiple vectors map to the same source data, they are merged and the highest vector score is used.

Purpose and Usage of Multi-Vector Mapping

In a single vector, content length and semantic richness are often at odds. FastGPT uses multi-vector mapping to map a single data entry to multiple vectors, preserving both data completeness and semantic richness.

You can add multiple vectors to a longer text so that if any one vector is matched during search, the entire data entry is recalled.

This means you can continuously improve data chunk accuracy through annotation.

Search Pipeline

Use Query Optimization for coreference resolution and query expansion, improving multi-turn conversation search capability and semantic richness.
Use Concat Query to improve Rerank accuracy during multi-turn conversations.
Use RRF (Reciprocal Rank Fusion) to merge results from multiple search channels.
Use Rerank for secondary sorting to improve precision.

Search Parameters

Search Modes

Semantic Search

Semantic search calculates the vector distance between the user's query and knowledge base content to determine "similarity" — mathematical similarity, not linguistic.

Pros:

Understands similar semantics
Cross-language understanding (e.g., Chinese query matching English content)
Multimodal understanding (text, images, audio/video, etc.)

Cons:

Depends on model training quality
Inconsistent accuracy
Affected by keywords and sentence completeness

Full-Text Search

Uses traditional full-text search. Best for finding key subjects, predicates, and other specific terms.

Hybrid Search

Combines vector search and full-text search, merging results using the RRF formula. Generally produces richer and more accurate results.

Since hybrid search covers a large range and cannot directly filter by similarity, a rerank model is typically used to re-sort results and filter by rerank scores.

Result Reranking

Uses a ReRank model to re-sort search results. In most cases, this significantly improves accuracy. Rerank models work better with complete questions (with proper subjects and predicates), so query optimization is usually applied before search and reranking. Reranking produces a score between 0-1 representing the relevance between the search content and the query — this score is typically more accurate than vector similarity scores and can be used for filtering.

FastGPT uses RRF to merge rerank results, vector search results, and full-text search results into the final output.

Reference Limit

The maximum number of tokens to reference per search.

Instead of using top k, we found that in mixed knowledge bases (Q&A + document), different chunk lengths vary significantly, making top k results unstable. Using a token limit provides more consistent control.

Minimum Relevance

A value between 0-1 that filters out low-relevance search results.

This only takes effect when using Semantic Search or Result Reranking.

Query Optimization

Background

In RAG, we need to perform embedding searches against the database based on the input query to find similar content (i.e., knowledge base search).

During search — especially in multi-turn conversations — follow-up questions often fail to find relevant content because knowledge base search only uses the "current" question. Consider this example:

When the user asks "What's the second point?", the system searches for "What's the second point?" in the knowledge base, which returns nothing useful. The actual query should be "What is the QA structure?". This is why we need a Query Optimization module to complete the user's current question, enabling the knowledge base search to find relevant content. Here's the result after optimization:

How It Works

Before performing data retrieval, the model first performs coreference resolution and query expansion. This resolves ambiguous references and enriches the query's semantic content. You can view the optimized query in the conversation details after each interaction.