Knowledge Base Search Methods and Parameters
This section covers FastGPT's knowledge base architecture, including its QA storage format and multi-vector mapping, to help you build better knowledge bases. It also explains each search parameter. This guide focuses on practical usage rather than in-depth theory.
Understanding Vectors
FastGPT uses an Embedding-based RAG approach for its knowledge base. To use FastGPT effectively, you need a basic understanding of how Embedding vectors work and their characteristics.
Human text, images, videos, and other media cannot be directly understood by computers. To determine whether two pieces of text are similar or related, they typically need to be converted into a computer-readable format — vectors are one such method.
A vector is essentially an array of numbers. The "distance" between two vectors can be calculated using mathematical formulas — the smaller the distance, the more similar the vectors. This maps back to text, images, videos, and other media to measure similarity between them. Vector search leverages this principle.
Since text comes in many types with countless combinations, exact matching is hard to guarantee when converting to vectors for similarity comparison. In vector-based knowledge bases, a top-k recall approach is typically used — finding the top k most similar results and passing them to an LLM for further semantic evaluation, logical reasoning, and summarization, enabling knowledge base Q&A. This makes vector search the most critical step in the process.
Many factors affect vector search accuracy, including: vector model quality, data quality (length, completeness, diversity), and retriever precision (the speed vs. accuracy tradeoff). Search query quality is equally important.
Retriever precision is relatively straightforward to address, and training vector models is more complex, so optimizing data and query quality becomes a key focus.
Improving Vector Search Accuracy
- Better tokenization and chunking: When a text segment has complete and singular structure and semantics, accuracy improves. Many systems optimize their tokenizers to preserve data completeness.
- Streamline
indexcontent by reducing vector content length: Shorter, more preciseindexcontent improves search accuracy, though it may narrow the search scope. Best suited for scenarios requiring strict answers. - Increase
indexquantity: Add multipleindexentries for the samechunkto improve recall. - Optimize search queries: In practice, user questions are often vague or incomplete. Refining the query (search term) can significantly improve accuracy.
- Fine-tune vector models: Off-the-shelf vector models are general-purpose and may underperform in specific domains. Fine-tuning can greatly improve domain-specific search results.
FastGPT Knowledge Base Architecture
Data Storage Structure
In FastGPT, a knowledge base consists of three parts: libraries, collections, and data entries. A collection can be thought of as a "file." A library can contain multiple collections, and a collection can contain multiple data entries. The smallest searchable unit is the library — searches span the entire library. Collections are only for organizing and managing data and do not affect search results (at least for now).

Vector Storage Structure
FastGPT uses PostgreSQL's PG Vector extension as the vector retriever, with HNSW indexing. PostgreSQL is used solely for vector search (this engine can be swapped for other databases), while MongoDB handles all other data storage.
In MongoDB's dataset.datas collection, vector source data is stored along with an indexes field that records corresponding vector IDs. This is an array, meaning a single data entry can map to multiple vectors.
In PostgreSQL, a vector field stores the vectors. During search, vectors are recalled first, then their IDs are used to look up the original data in MongoDB. If multiple vectors map to the same source data, they are merged and the highest vector score is used.

Purpose and Usage of Multi-Vector Mapping
In a single vector, content length and semantic richness are often at odds. FastGPT uses multi-vector mapping to map a single data entry to multiple vectors, preserving both data completeness and semantic richness.
You can add multiple vectors to a longer text so that if any one vector is matched during search, the entire data entry is recalled.
This means you can continuously improve data chunk accuracy through annotation.
Search Pipeline
- Use
Query Optimizationfor coreference resolution and query expansion, improving multi-turn conversation search capability and semantic richness. - Use
Concat Queryto improveRerankaccuracy during multi-turn conversations. - Use
RRF(Reciprocal Rank Fusion) to merge results from multiple search channels. - Use
Rerankfor secondary sorting to improve precision.

Search Parameters
![]() | ![]() | ![]() |
Search Modes
Semantic Search
Semantic search calculates the vector distance between the user's query and knowledge base content to determine "similarity" — mathematical similarity, not linguistic.
Pros:
- Understands similar semantics
- Cross-language understanding (e.g., Chinese query matching English content)
- Multimodal understanding (text, images, audio/video, etc.)
Cons:
- Depends on model training quality
- Inconsistent accuracy
- Affected by keywords and sentence completeness
Full-Text Search
Uses traditional full-text search. Best for finding key subjects, predicates, and other specific terms.
Hybrid Search
Combines vector search and full-text search, merging results using the RRF formula. Generally produces richer and more accurate results.
Since hybrid search covers a large range and cannot directly filter by similarity, a rerank model is typically used to re-sort results and filter by rerank scores.
Result Reranking
Uses a ReRank model to re-sort search results. In most cases, this significantly improves accuracy. Rerank models work better with complete questions (with proper subjects and predicates), so query optimization is usually applied before search and reranking. Reranking produces a score between 0-1 representing the relevance between the search content and the query — this score is typically more accurate than vector similarity scores and can be used for filtering.
FastGPT uses RRF to merge rerank results, vector search results, and full-text search results into the final output.
Search Filters
Reference Limit
The maximum number of tokens to reference per search.
Instead of using top k, we found that in mixed knowledge bases (Q&A + document), different chunk lengths vary significantly, making top k results unstable. Using a token limit provides more consistent control.
Minimum Relevance
A value between 0-1 that filters out low-relevance search results.
This only takes effect when using Semantic Search or Result Reranking.
Query Optimization
Background
In RAG, we need to perform embedding searches against the database based on the input query to find similar content (i.e., knowledge base search).
During search — especially in multi-turn conversations — follow-up questions often fail to find relevant content because knowledge base search only uses the "current" question. Consider this example:

When the user asks "What's the second point?", the system searches for "What's the second point?" in the knowledge base, which returns nothing useful. The actual query should be "What is the QA structure?". This is why we need a Query Optimization module to complete the user's current question, enabling the knowledge base search to find relevant content. Here's the result after optimization:

How It Works
Before performing data retrieval, the model first performs coreference resolution and query expansion. This resolves ambiguous references and enriches the query's semantic content. You can view the optimized query in the conversation details after each interaction.
File Updated


