做最好的基于 jieba 的 Vim/Neovim 中文分词插件。
For English, see below.
Vim (以及很多其它文本编辑器) 使用 word motions 在一行内移动光标。对于英语等使用空格分隔单词的语言,它能很好地工作,但对于像中文一样不使用空格分隔单词的语言则很难使用。
jieba 是一个用于中文分词的 Python 包。已经有很多插件项目诸如 Jieba (VSCode)、Deno bridge jieba (Emacs)、jieba_nvim (neovim) 将其用以更好地编辑中文文本。然而我还没有发现 Vim 8/9 上的 jieba 插件,因此我开发了这个插件。
- 增强 Vim word motions 使其能够处理汉字。
- 测试丰富,覆盖各种边缘用例。
- 使用 Rust + Python 编写,有速度保证。
- 为主流平台提供预编译链接库,无需本地 Rust 开发环境。
本插件使用 Python3 + Rust 开发,Vim/Neovim 需要 +python3
对于 vim-plug,使用如下代码安装最新稳定版:
Plug 'kkew3/jieba.vim', { 'tag': 'v1.0.4', 'do': './build.sh' }
其中 ./build.sh
虽然通常不需要,但在极少数情况下可能需要进入插件目录调整 rust_backend/Cargo.toml
中的 pyo3 python ABI 版本,以匹配 vim 中 python3 的版本。可以在终端使用
vim +"py3 print(sys.version)"
查看 vim 的 python3 版本。
对于 Neovim 用户,可使用 lazy.nvim 安装:
tag = "v1.0.4",
build = "./build.sh",
init = function()
vim.g.jieba_vim_lazy = 1
vim.g.jieba_vim_keymap = 1
- 增强八个 Vim word motion,即
会跳过中文标点等。 - 在无中文 ASCII 文档中与 Vim 原生 word motion 行为完全兼容。结合懒惰加载(见下文
开关)可实现(在某些文档类型中)常开。 - 如果安装了
重复上一次 word operation。例如dw.
。 - 预览 word motion 的跳转位置。由于中文分词有时存在歧义,即使没有歧义也会有人类与 jieba 的对齐问题,因此有时中文 word motion 的跳转位置并不显然。这时用户可能想提前预览将要进行的跳转将会跳转到哪些位置。
本插件设计为非侵入式,即默认不映射任何按键,但提供一些命令与 <Plug>(...)
提供以下 <Plug>()
映射,其中 X
表示上文所述的八个 Vim word motion 按键,即 b
: 增强了的X
,同时在 normal、operator-pending、visual 三种模式下可用,以及可与 count 协同使用。例如假设w
用户可自行在 .vimrc
中将按键映射到这些 <Plug>()
nmap <LocalLeader>jw <Plug>(Jieba_preview_w)
" 等等,以及
map w <Plug>(Jieba_w)
" 等等
提供快捷开关 g:jieba_vim_keymap
,可通过在 .vimrc
中将其设为 1 来开启对八个 word motion 的 nmap
, xmap
和 omap
(默认 1):是/否 (1/0) 延迟加载 jieba 词典直到有中文出现。g:jieba_vim_user_dict
(默认 0):是/否 (1/0) 自动开启 keymap。
若想在本地运行针对 rust 实现的测试,
# 核心代码
cd rust_backend/jieba_vim_rs_core
cargo test
# 可开启 verifiable_case 来验证测试本身是否正确,需要安装 junegunn/vader.vim
# (https://github.com/junegunn/vader.vim).
#cargo test -F verifiable_case
# 测试工具代码
cd ../jieba_vim_rs_test
cargo test
见 TODO.md。
Apache license v2.
Vim (and many other text editors) use word motions to move the cursor within a line. It works well for space-delimited language like English, but not quite well for language like Chinese, where there's no space between words.
jieba is a Python library for Chinese word segmentation. It has been used in various projects (e.g. Jieba (for VSCode), Deno bridge jieba (for Emacs), jieba_nvim (for neovim)) to facilitate better word motions when editing Chinese. However, I haven't seen one for Vim. That's why I develop this plugin.
Features overview:
- Enhanced Vim word motions for Chinese characters.
- Extensive testing covering various edge cases.
- Built with Rust + Python for better performance.
- Precompiled libraries available for major platforms, no local Rust environment required.
This plugin was developed using Python3 + Rust.
features is required for Vim/Neovim to use the jieba.vim.
For vim-plug, the latest stable version is installable using:
Plug 'kkew3/jieba.vim', { 'tag': 'v1.0.4', 'do': './build.sh' }
where ./build.sh
is used to download precompiled shared library.
Local compilation will be attempted only if the shared library cannot be found.
Though not always necessary, user may need to adjust the pyo3 python ABI in rust_backend/Cargo.toml
under the plugin directory after downloading the plugin, in order to match with the python3 version vim is compiled against.
The vim's python3 version may be checked by the following command at terminal:
vim +"py3 print(sys.version)"
For Neovim users, it can be installed using lazy.nvim:
tag = "v1.0.4",
build = "./build.sh",
init = function()
vim.g.jieba_vim_lazy = 1
vim.g.jieba_vim_keymap = 1
- Augment eight Vim word motions (i.e.
) such that they can be used in Chinese text and English text at the same time. The augmented behavior remains similar. For example, augmentedw
won't jump over Chinese punctuation whereasW
will. - The behavior of the augmented word motions is compatible with Vim's original word motions when handling ASCII text without Chinese. Together with lazy loading (see the option
), it's possible to leave this plugin on (for certain file types). - If
has been installed,.
can be used to repeat last word operation. For example,dw.
will be equivalent todwdw
. - Preview the destination of the word motions beforehand. Since there's sometimes ambiguity in Chinese word segmentation, and since even when there's no ambiguity, jieba library may not align well with human users, it's not always evident where a word motion will jump to. In such circumstance, user may want to preview jumps beforehand.
This plugin is designed to be nonintrusive, i.e. not providing any default keymaps.
However, various commands and <Plug>(...)
mappings are provided for users to manually configure to their needs.
Provided commands:
: used to clear up the preview markup
Provided <Plug>()
mappings, wherein X
denotes the eight Vim word motion keys, i.e. b
, B
, ge
, gE
, w
, W
, e
, E
: same as the commandJiebaPreviewCancel
: preview the destination of the augmentedX
: the augmentedX
. This mapping is usable in normal, operator-pending and visual modes, and can be used together with count. For example, assuming thatw
has been mapped to<Plug>(Jieba_w)
, then3w
will jump three words forward,d3w
will delete three words forward
User may map keys to these <Plug>()
mappings on their own.
For example,
nmap <LocalLeader>jw <Plug>(Jieba_preview_w)
" etc., and
map w <Plug>(Jieba_w)
" etc.
A convenient option g:jieba_vim_keymap
is provided. When set to 1, the keymap of the eight word motions under nmap
, xmap
and omap
will be enabled.
(default 1): Whether or not (1/0) to delay loading jieba dictionary until occurrence of any Chinese characters.g:jieba_vim_user_dict
(default empty): When set to nonempty string, load the custom user dictionary pointed to by this file path.g:jieba_vim_keymap
(default 0): Whether or not (1/0) to enable jieba keymap.
To run tests against rust implementation locally,
# Core logic
cd rust_backend/jieba_vim_rs_core
cargo test
# verifiable_case feature can be enabled to verify the correctness of the tests.
# junegunn/vader.vim (https://github.com/junegunn/vader.vim) is required.
#cargo test -F verifiable_case
# Test utilities
cd ../jieba_vim_rs_test
cargo test
See TODO.md.
Apache license v2.