How to Enable ServBay's Built-in SCWS Module
As a powerful integrated web development tool, ServBay comes with the SCWS module, which is very easy to activate. SCWS (Simple Chinese Word Segmentation) is an efficient Chinese word segmentation engine capable of quickly and accurately segmenting Chinese text. It's particularly suitable for applications such as search engines and text analysis.
Introduction to the SCWS Module
SCWS is an open-source Chinese word segmentation engine designed specifically for Chinese text processing. It combines dictionary matching with statistical models to offer efficient and precise segmentation capabilities. SCWS supports not only basic segmentation functions but also keyword extraction and part-of-speech tagging.
Main Features
- Efficient Segmentation: SCWS uses efficient algorithms to process large-scale Chinese text quickly.
- High Accuracy: By combining dictionary matching and statistical models, SCWS offers significant advantages in segmentation accuracy.
- Supports Various Functions: In addition to basic segmentation, SCWS supports advanced functions like keyword extraction and part-of-speech tagging.
- Easy to Integrate: SCWS provides a rich API, allowing developers to easily integrate it into various applications.
- Open Source: SCWS is open source, enabling developers to customize and extend it as needed.
SCWS Module Versions in ServBay
ServBay supports multiple PHP versions, with the corresponding SCWS modules pre-installed for each version. The specific versions are as follows:
- PHP 5.6 - 8.4: SCWS 1.2.3
How to Enable the SCWS Module
By default, the SCWS module is disabled. The activation process is straightforward: simply navigate to Language
- PHP
, select the desired PHP version to enable the module, such as PHP 8.4
, click on the Extensions
on the right, and then toggle the switch next to the SCWS
module. Save your changes, and you're set.
Users can also manually enable or modify the module configurations, as detailed in the steps below:
Step 1: Locate the Configuration File
First, locate the conf.d
directory for the respective PHP version. For instance, if you wish to enable the SCWS module for PHP 8.3, you would edit the following file:
/Applications/ServBay/etc/php/8.3/conf.d/scws.ini
Step 2: Edit the Configuration File
Open the scws.ini
file and uncomment the following lines:
[scws]
; Uncomment the following line to enable scws
extension = scws.so
scws.default.charset = gbk
scws.default.fpath = /Applications/ServBay/etc/scws
2
3
4
5
Step 3: Restart PHP Service
In the ServBay service management panel, restart the respective PHP service. For example, restart the PHP 8.3 service. Once the restart is complete, the SCWS module will be successfully loaded.
Verifying SCWS Module Load Success
To verify that the SCWS module has been successfully loaded, create a simple PHP file in the web server's root directory, named phpinfo.php
, with the following content:
<?php
phpinfo();
?>
2
3
Visit https://servbay.host/phpinfo.php
and look for SCWS information on the PHP information page. If SCWS details are present, the module has been successfully loaded.
Creating an SCWS Dictionary
Before using SCWS for word segmentation, a dictionary file needs to be created and configured. SCWS dictionary files can be either plain text or binary format xdb files. Here are the steps to create a dictionary:
Step 1: Prepare the Dictionary File
Create a plain text file that includes the necessary words and their frequencies. The file format is as follows:
Word1 Frequency1
Word2 Frequency2
2
For example:
China 1000
Beijing 800
Shanghai 600
2
3
Save this file as dict.txt
.
Step 2: Generate xdb Format Dictionary File
SCWS provides a tool to generate xdb format dictionary files. This tool is pre-installed with ServBay and can be used with the following command to generate an xdb file:
scws-gen-dict -i dict.txt -o dict.utf8.xdb
This command will convert dict.txt
into dict.utf8.xdb
.
Step 3: Configure SCWS to Use the Dictionary File
Place the generated dict.utf8.xdb
file in the /Applications/ServBay/etc/scws
directory and ensure the dictionary path is correctly configured in the scws.ini
file:
[scws]
; Uncomment the following line to enable scws
extension = scws.so
scws.default.charset = utf8
scws.default.fpath = /Applications/ServBay/etc/scws
2
3
4
5
Usage Example
Once the SCWS module is enabled and the dictionary is configured, SCWS can be used in PHP code for Chinese word segmentation. Below is a simple example:
Sample Code
<?php
// Initialize SCWS
$scws = scws_new();
$scws->set_charset('utf8');
$scws->set_dict('/Applications/ServBay/etc/scws/dict.utf8.xdb');
$scws->set_rule('/Applications/ServBay/etc/scws/rules.utf8.ini');
// Text to segment
$text = "我是中国人,我爱我的祖国。";
// Perform segmentation
$scws->send_text($text);
// Retrieve segmentation results
while ($result = $scws->get_result()) {
foreach ($result as $word) {
echo $word['word'] . "\n";
}
}
// Release SCWS resources
$scws->close();
?>
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
In the above code, we first initialize SCWS and set the charset, dictionary, and rule file. Then, we pass the text to be segmented to SCWS and loop through to retrieve the segmentation results. Finally, we release SCWS resources.
Conclusion
ServBay offers a convenient way to manage and enable the SCWS module. Through simple configuration and restart operations, developers can quickly enable the SCWS module across different PHP versions, leveraging its efficient and accurate segmentation capabilities to enhance Chinese text processing efficiency. SCWS's efficient segmentation, high accuracy, and rich features make it an ideal choice for Chinese text analysis and processing.