zhparser Usage Guide
zhparser
is a third-party extension module for PostgreSQL designed specifically for processing Chinese text. It provides efficient word segmentation and search functionality suitable for various Chinese text processing scenarios. ServBay comes with scws
(Simple Chinese Word Segmentation), and zhparser
can use scws
to create custom dictionaries. This article will detail how to install and use zhparser
in ServBay.
Installing zhparser
ServBay already includes the zhparser
extension module. You just need to enable it in the database. Here are the steps to enable zhparser
:
Connect to the PostgreSQL database:
bashpsql -U your_username -d your_database
1Create the extension:
sqlCREATE EXTENSION zhparser;
1Verify the installation:
sql\dx
1
Configuring zhparser
After enabling zhparser
, you need to perform some configurations for it to correctly handle Chinese word segmentation and full-text search.
Configuring Text Search Configuration
Create a text search configuration:
sqlCREATE TEXT SEARCH CONFIGURATION chinese (PARSER = zhparser);
1Add dictionaries:
sqlALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple;
1
Using zhparser for Full-Text Search
Below is an example showing how to use zhparser
for full-text search.
Creating Sample Tables and Data
Create a table:
sqlCREATE TABLE documents ( id SERIAL PRIMARY KEY, content TEXT );
1
2
3
4Insert sample data:
sqlINSERT INTO documents (content) VALUES ('I love Natural Language Processing'), ('Chinese word segmentation is an important step in text processing'), ('zhparser is a great Chinese word segmentation tool');
1
2
3
4
Creating Full-Text Search Index
- Create a GIN index:sql
CREATE INDEX idx_gin_content ON documents USING gin (to_tsvector('chinese', content));
1
Executing Full-Text Search
Execute a search query:
sqlSELECT * FROM documents WHERE to_tsvector('chinese', content) @@ to_tsquery('chinese', '中文 & 分词');
1
2This query will return documents containing both "Chinese" and "word segmentation".
Custom Dictionary
ServBay comes with scws
, and you can use scws
to create custom dictionaries to better meet specific application needs.
Adding Custom Words
Create a custom dictionary file:
plaintext/Applications/ServBay/etc/scws/custom_dict.txt
1Add words to the file, one word per line:
plaintextNatural Language Processing Chinese word segmentation
1
2Configure
zhparser
to use the custom dictionary:sqlSET zhparser.dict_path = '/Applications/ServBay/etc/scws/custom_dict.txt';
1
Reloading the Dictionary
- Reload the dictionary:sql
SELECT zhprs_reload_dict();
1
Adjusting Segmentation Mode
zhparser
supports various segmentation modes, and you can adjust them as needed.
Setting Segmentation Mode
Set the segmentation mode to the finest granularity segmentation:
sqlSET zhparser.seg_with_duality = true;
1Set the segmentation mode to the coarsest granularity segmentation:
sqlSET zhparser.seg_with_duality = false;
1
Summary
zhparser
is a powerful Chinese word segmentation tool. With simple configuration and usage, you can achieve efficient Chinese full-text search in PostgreSQL. ServBay comes with the zhparser
extension module and scws
, allowing you to create custom dictionaries. By customizing dictionaries and adjusting segmentation modes, you can further optimize the segmentation effect to meet specific application needs.