SCWS Chinese Word Segmentation in ServBay: Installation, Configuration & Usage Guide 
As a robust local web development environment, ServBay comes pre-integrated with many essential tools and packages for developers. SCWS (Simple Chinese Word Segmentation) is a high-efficiency Chinese segmentation system, crucial for processing Chinese texts in scenarios such as search, NLP, content analytics, and more. ServBay already has SCWS and its PHP module preinstalled, so there's no need for any complicated extra installation steps. This guide provides detailed instructions on how to configure and use SCWS within ServBay, including both command-line tools and PHP API usage.
Overview 
SCWS is a high-performance Chinese word segmentation library, especially suitable for scenarios where you need to segment large amounts of Chinese text quickly and accurately. It supports multiple segmentation modes, custom dictionaries, and rules, making it a fundamental tool for building Chinese search, recommendation, and text analysis applications. ServBay has integrated SCWS into its distribution, providing a precompiled PHP extension, which greatly simplifies using SCWS in your local development setup.
Prerequisites 
- You have successfully installed and are running ServBay on macOS.
Installation & Configuration 
Installation 
ServBay is designed to deliver a ready-to-use development environment. As a vital Chinese language processing tool, SCWS is already preinstalled with ServBay. There's no need for additional downloads or compilation. The executable files, configuration files, and dictionaries for SCWS are all centrally located in your ServBay installation directory, typically at /Applications/ServBay/ by default.
Configuration 
The default SCWS configuration file can be found at /Applications/ServBay/etc/scws/scws.ini within your ServBay installation. You can modify this file according to your specific needs to adjust the segmentation behavior, character sets, dictionaries, and rule settings of SCWS.
Here’s an example of the default configuration file:
ini
[charset]
default = utf8
[rule]
rules = /Applications/ServBay/etc/scws/rules.ini
[dict]
dict = /Applications/ServBay/etc/scws/dict.utf8.xdb1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
- [charset]: Specifies the default character set, usually left as- utf8.
- [rule]: Specifies the path to the segmentation rules file.
- [dict]: Specifies the path to the word dictionary file. You may specify multiple dictionary files separated by commas- ,.
Basic Usage: Command-Line Tool 
SCWS provides a powerful command-line utility, scws, which allows you to test or batch-process Chinese text segmentation right in your terminal. ServBay has included the scws executable in its bin directory. Typically, /Applications/ServBay/bin is already added to your system's PATH, enabling you to run scws commands directly in the terminal.
Segmentation Examples 
Below are some basic examples of using the scws command-line tool:
Segment a String 
Pipe a string directly to the scws command:
bash
echo "这是一个中文分词的例子" | scws -i1
Segment Text from a File 
Use the -i option to specify the input file, and -o to specify the output file:
bash
scws -i input.txt -o output.txt1
Specify Segmentation Rules 
Use the -r option to provide a custom rules file path:
bash
scws -i input.txt -o output.txt -r /path/to/your/rules.ini1
Specify Custom Dictionary 
Use the -d option to specify a custom dictionary file path:
bash
scws -i input.txt -o output.txt -d /path/to/your/dict.utf8.xdb1
Advanced Usage 
Custom Dictionaries 
For improved segmentation accuracy—especially for industry-specific terminology, person names, place names, or new words—you can create custom dictionaries. SCWS uses an efficient xdb format dictionary. You can convert a text-format dictionary to an xdb file using the scws-gen tool provided by ServBay.
Steps to Create a Custom Dictionary: 
- Create a text file, e.g., - custom_dict.txt. Each line contains a word, optionally followed by a space and its weight (an integer influencing segmentation priority).- ServBay 10 Local development environment 8 Chinese word segmentation 91
 2
 3
- Use the - scws-gentool to generate an- xdbdictionary file.- scws-genis also located in ServBay's- bindirectory.bash- scws-gen -i custom_dict.txt -o custom_dict.xdb1
- Edit the - [dict]section of your SCWS configuration file- /Applications/ServBay/etc/scws/scws.iniand add the path to your custom dictionary after the default, separated by a comma.ini- [dict] dict = /Applications/ServBay/etc/scws/dict.utf8.xdb,/path/to/your/custom_dict.xdb1
 2- Make sure - /path/to/your/custom_dict.xdbmatches where you actually store your custom dictionary.
Tuning Segmentation Rules 
The rules file (default: /Applications/ServBay/etc/scws/rules.ini) defines how SCWS handles ambiguities or complex Chinese structures. Editing the rules file often requires an in-depth understanding of SCWS’s segmentation algorithm. For most users, using the default rules combined with custom dictionaries is sufficient. If you need to tweak the rules, do so carefully and refer to the official SCWS documentation for rule file formats and syntax (if documentation is included with the SCWS version shipped by ServBay).
Sample rules file content (generally contains pattern-matching rules):
ini
[rule]
# Add custom segmentation rules here
# Example: Define a simple rule
# pattern = result1
2
3
4
2
3
4
Using the PHP API 
For developers building web applications with PHP, the PHP environment in ServBay already comes with the SCWS extension module enabled. This means you don’t need to install or configure any extra PHP extension; you can invoke the SCWS API directly in your PHP code for Chinese text segmentation.
You can verify whether the SCWS extension is enabled by visiting ServBay’s built-in phpinfo() page.
Usage Example 
Here's a sample PHP script demonstrating how to use the SCWS API to perform segmentation:
php
<?php
// Ensure the SCWS extension is loaded
if (!extension_loaded('scws')) {
    die("SCWS extension is not loaded.");
}
// The text to be segmented
$text = "ServBay 是一款强大的本地 Web 开发环境,支持 PHP、Node.js、Python 等多种语言,并集成了 MySQL、Nginx 等软件包。";
// Open an SCWS segmenter instance
$sh = scws_open();
// Set the character set, usually matching your text encoding
scws_set_charset($sh, 'utf8');
// Specify dictionary and rule file paths
// Make sure these are the actual SCWS file paths in the ServBay environment
$dict_path = '/Applications/ServBay/etc/scws/dict.utf8.xdb';
$rule_path = '/Applications/ServBay/etc/scws/rules.ini';
if (!file_exists($dict_path)) {
    die("SCWS dictionary file not found: " . $dict_path);
}
if (!file_exists($rule_path)) {
    die("SCWS rules file not found: " . $rule_path);
}
scws_set_dict($sh, $dict_path);
scws_set_rule($sh, $rule_path);
// Send the text to be segmented to the SCWS instance
scws_send_text($sh, $text);
// Retrieve segmentation results
echo "Original Text: " . $text . "\n";
echo "Segmentation Result:\n";
// Loop through and print the segmentation results
// $res is an array; each element represents a segmented word with details (word, attribute, etc.)
while ($res = scws_get_result($sh)) {
    foreach ($res as $word_info) {
        // Print the word itself
        echo $word_info['word'] . " ";
        // Optionally, print part of speech or weight if needed, e.g.:
        // echo "Word: " . $word_info['word'] . ", POS: " . $word_info['attr'] . ", Weight: " . $word_info['idf'] . "\n";
    }
}
echo "\n";
// Close the SCWS instance and release resources
scws_close($sh);
?>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
You can save this code as a .php file (e.g., segment_test.php), place it in the website root directory of ServBay (/Applications/ServBay/www/servbay.demo/ if you have a site named servbay.demo), and then access it via your browser or run it from the terminal using PHP CLI to view the segmentation results.
Common PHP Extension Functions 
Here are some commonly used core functions in the SCWS PHP extension:
- scws_open(): Initializes and opens an SCWS segmenter instance. Returns a resource handle on success, or- falseon failure.
- scws_set_charset($sh, $charset): Sets the character set for the segmenter instance- $sh.
- scws_set_dict($sh, $dict_path, $mode = SCWS_XDICT_TXT): Sets the dictionary path for the segmenter instance- $sh.- $modespecifies the dictionary format;- SCWS_XDICT_TXTmeans text format (deprecated,- xdbis recommended). Usually, just provide the- $dict_pathto the- xdbfile.
- scws_set_rule($sh, $rule_path): Sets the rule file path for the segmenter instance- $sh.
- scws_send_text($sh, $text): Submits the- $textto be segmented to the segmenter instance- $shfor processing.
- scws_get_result($sh): Retrieves segmentation results from the segmenter instance- $sh. Returns a detailed array for each chunk until processing is complete, then returns- false.
- scws_close($sh): Closes the segmenter instance- $shand releases resources.
For more advanced functions (such as ignoring punctuation, segmentation modes, retrieving word weights, and more), refer to the SCWS PHP extension official documentation.
Frequently Asked Questions (FAQ) 
1. What should I do if SCWS segmentation results are inaccurate? 
- Solution: First, check that the dictandrulefile paths in the configuration file/Applications/ServBay/etc/scws/scws.iniare correct and that these files exist and are readable. For domain-specific texts or new words, it's recommended to create a custom dictionary (usescws-gento generate thexdbformat), and add your custom dictionary path to the configuration. Adjusting word weights or segmentation rules may help further, but requires more in-depth knowledge.
2. What if SCWS is slow or segmentation performance is poor? 
- Solution: Ensure that SCWS is using the optimized xdbdictionary format rather than the older text format. Thexdbformat provides faster loading and lookups. In your config, make sure the dictionary path points to anxdbfile. For large texts, consider splitting the processing into smaller chunks.
3. What if the SCWS command-line tool can't be found or won't run? 
- Solution: This usually means ServBay’s executable directory isn't in your system PATH variable. Try running the command using the full path, e.g., /Applications/ServBay/bin/scws -i .... Alternatively, add/Applications/ServBay/binto your shell profile (such as~/.bash_profile,~/.zshrc, etc.), reload the profile, or restart your terminal.
4. Why does scws_open() fail or the function is missing in PHP? 
- Solution: This means the SCWS PHP extension isn't loaded in your ServBay PHP environment. Check the active PHP version in ServBay, then view its phpinfo()page (ServBay often provides a shortcut) to see ifscwsis listed and enabled. If not enabled, check your PHP configuration file (php.ini) for a line likeextension=scws.so, and make surescws.soexists in the PHP extensions directory (which ServBay pre-configures). If issues persist, try restarting the ServBay service.
Summary 
SCWS is a powerful and efficient Chinese word segmentation system. With ServBay’s integrated package and PHP extension, developers can easily install, configure, and use SCWS in their local macOS environment—either for text processing via the command-line or dynamic segmentation in PHP applications. By following this guide, you'll quickly get started and integrate SCWS into your projects, enhancing your Chinese text processing capabilities.
