How to Enable ServBay Built-in SCWS Module
As a powerful integrated web development tool, ServBay comes with the SCWS module, and its activation process is very simple. SCWS (Simple Chinese Word Segmentation) is an efficient Chinese word segmentation engine that can quickly and accurately perform word segmentation on Chinese text, making it ideal for search engines, text analysis, and other application scenarios.
Introduction to the SCWS Module
SCWS is an open-source Chinese word segmentation engine specifically designed for processing Chinese text. By combining dictionary matching and statistical models, it provides efficient and accurate word segmentation. SCWS not only supports basic word segmentation but also advanced functionalities like keyword extraction and part-of-speech tagging.
Main Features
- Efficient Segmentation: SCWS adopts an efficient segmentation algorithm that can quickly process large-scale Chinese text.
- High Accuracy: By combining dictionary matching and statistical models, SCWS has a significant advantage in segmentation accuracy.
- Supports Various Features: In addition to basic segmentation, SCWS supports advanced features like keyword extraction and part-of-speech tagging.
- Easy Integration: SCWS provides rich APIs, enabling developers to easily integrate it into various applications.
- Open-source: SCWS is open-source software, allowing developers to customize and extend it as needed.
SCWS Module Version in ServBay
ServBay supports multiple PHP versions, each pre-installed with the corresponding SCWS module. The specific versions are as follows:
- PHP 5.6 - 8.4: SCWS 1.2.3
How to Enable the SCWS Module
By default, the SCWS module is disabled. Enabling the SCWS module is very simple, just modify the configuration file of the corresponding PHP version. Here are the detailed steps:
Step 1: Locate the Configuration File
First, locate the conf.d
directory of the corresponding PHP version. For example, to enable the SCWS module for PHP 8.3, we need to edit the following file:
/Applications/ServBay/etc/php/8.3/conf.d/scws.ini
Step 2: Edit the Configuration File
Open the scws.ini
file and uncomment the following lines:
[scws]
; Uncomment the following line to enable scws
extension = scws.so
scws.default.charset = gbk
scws.default.fpath = /Applications/ServBay/etc/scws
2
3
4
5
Step 3: Restart the PHP Service
In the ServBay service management panel, restart the corresponding PHP service. For example, restart the PHP 8.3 service. After the restart, the SCWS module will be successfully loaded.
Verify if the SCWS Module is Successfully Loaded
You can verify if the SCWS module is successfully loaded by creating a simple PHP file. Create a phpinfo.php
file in the root directory of the web server with the following content:
<?php
phpinfo();
?>
2
3
Access https://servbay.host/phpinfo.php
, and look for SCWS information on the output PHP info page. If you see information related to SCWS, it means the module has been successfully loaded.
Creating SCWS Dictionary
Before using SCWS for word segmentation, you need to create and configure a dictionary file. The dictionary file used by SCWS can be an ordinary text file or a binary xdb file. Here are the steps to create a dictionary:
Step 1: Prepare the Dictionary File
Create a plain text file containing the required words and their frequencies. The format of the file is as follows:
word1 frequency1
word2 frequency2
2
For example:
China 1000
Beijing 800
Shanghai 600
2
3
Save this file as dict.txt
.
Step 2: Generate xdb Format Dictionary File
SCWS provides a tool to generate an xdb format dictionary file. The SCWS tool comes pre-installed with ServBay, you can use the following command to generate the xdb file:
scws-gen-dict -i dict.txt -o dict.utf8.xdb
This command will convert dict.txt
to dict.utf8.xdb
file.
Step 3: Configure SCWS to Use the Dictionary File
Place the generated dict.utf8.xdb
file in the /Applications/ServBay/etc/scws
directory and ensure the dictionary path is correctly configured in the scws.ini
file:
[scws]
; Uncomment the following line to enable scws
extension = scws.so
scws.default.charset = utf8
scws.default.fpath = /Applications/ServBay/etc/scws
2
3
4
5
Example Usage
After enabling the SCWS module and configuring the dictionary, you can use SCWS for Chinese word segmentation in PHP code. Here is a simple example:
Example Code
<?php
// Initialize SCWS
$scws = scws_new();
$scws->set_charset('utf8');
$scws->set_dict('/Applications/ServBay/etc/scws/dict.utf8.xdb');
$scws->set_rule('/Applications/ServBay/etc/scws/rules.utf8.ini');
// Text to be segmented
$text = "I am Chinese, and I love my country.";
// Perform segmentation
$scws->send_text($text);
// Get segmentation results
while ($result = $scws->get_result()) {
foreach ($result as $word) {
echo $word['word'] . "\n";
}
}
// Release SCWS resources
$scws->close();
?>
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
In the above code, we first initialize SCWS, set the charset, dictionary, and rule file. Then, we pass the text to be segmented to SCWS and get the segmentation results in a loop. Finally, we release SCWS resources.
Conclusion
ServBay provides a convenient way to manage and enable the SCWS module. With simple configuration and restart operations, developers can quickly enable the SCWS module in different PHP versions, thereby taking full advantage of its efficient and accurate word segmentation capabilities to improve the efficiency of Chinese text processing. The high efficiency, high accuracy, and rich features of SCWS make it an indispensable choice for Chinese text analysis and processing.