pg_jieba User Guide
pg_jieba
is a third-party extension module for PostgreSQL based on the Jieba segmentation library. It is specifically designed for Chinese full-text search, providing efficient segmentation of Chinese text to support full-text retrieval. This article provides a detailed guide on how to install and use pg_jieba
in ServBay.
Installing pg_jieba
ServBay comes with the pg_jieba
extension module. You just need to enable it in the database. Here are the steps to enable pg_jieba
:
Connect to the PostgreSQL database:
bashpsql -U your_username -d your_database
1Create the extension:
sqlCREATE EXTENSION pg_jieba;
1Verify the installation:
sql\dx
1
Configuring pg_jieba
After enabling pg_jieba
, you need to perform some configurations to ensure it correctly segments Chinese text and supports full-text search.
Configuring Text Search Configuration
Create a text search configuration:
sqlCREATE TEXT SEARCH CONFIGURATION chinese (PARSER = pg_jieba);
1Add the dictionary:
sqlALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple;
1
Using pg_jieba for Full-Text Search
Below is an example showing how to use pg_jieba
for full-text search.
Creating Sample Tables and Data
Create a table:
sqlCREATE TABLE documents ( id SERIAL PRIMARY KEY, content TEXT );
1
2
3
4Insert sample data:
sqlINSERT INTO documents (content) VALUES ('I love natural language processing'), ('Chinese word segmentation is an important step in text processing'), ('pg_jieba is a great Chinese word segmentation tool');
1
2
3
4
Creating a Full-Text Search Index
- Create a GIN index:sql
CREATE INDEX idx_gin_content ON documents USING gin (to_tsvector('chinese', content));
1
Executing Full-Text Search
Execute a search query:
sqlSELECT * FROM documents WHERE to_tsvector('chinese', content) @@ to_tsquery('chinese', '中文 & 分词');
1
2This query will return documents containing both "中文" and "分词".
Custom Dictionary
You can customize the dictionary used by pg_jieba
to better meet specific application needs.
Adding Custom Words
Create a custom dictionary file:
plaintext/Applications/ServBay/etc/scws/custom_dict.txt
1Add words to the file, one per line:
plaintextNatural language processing Chinese word segmentation
1
2Configure
pg_jieba
to use the custom dictionary:sqlSET pg_jieba.dict_path = '/Applications/ServBay/etc/scws/custom_dict.txt';
1
Reloading the Dictionary
- Reload the dictionary:sql
SELECT jieba_reload_dict();
1
Summary
pg_jieba
is a powerful Chinese word segmentation tool. With simple configuration and usage, you can achieve efficient Chinese full-text search in PostgreSQL. ServBay comes with the pg_jieba
extension module, and you can start using it by following the steps outlined in this article. By customizing the dictionary, you can further optimize segmentation results to meet specific application needs.