golang 使用 elasticsearch ik 分词器
相关学习网址
https://github.com/olivere/elastic/wiki
https://github.com/olivere/elastic
ik分词器
安装
- 参照词表找对应的 ik 版本
IK version | ES version |
---|---|
master | 7.x -> master |
6.x | 6.x |
5.x | 5.x |
1.10.6 | 2.4.6 |
1.9.5 | 2.3.5 |
1.8.1 | 2.2.1 |
1.7.0 | 2.1.1 |
1.5.0 | 2.0.0 |
1.2.6 | 1.0.0 |
1.2.5 | 0.90.x |
1.1.3 | 0.20.x |
1.0.0 | 0.16.2 -> 0.19.0 |
- 下载解压到 plugins 的ik目录下
- 重启 es
例子
- 默认分词-例子
GET /cms_index/_analyze
{
"text": "我是中国人"
}
- 2.ik分词器 (ik_max_word)
GET /cms_index/_analyze
{
"text": "我们是软件工程师",
"tokenizer":"ik_max_word"
}
- 3.ik分词器 (ik_smart)
GET /cms_index/_analyze
{
"text":"我们是软件工程师",
"tokenizer":"ik_smart"
}
- 4
GET cms_index/_search
{
"query":{
"match":{"title":"测试"}
}
}
- ik_max_word 和 ik_smart 什么区别?
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
逗号分词器
- 例子
GET cms_index/_search
{
"query":{
"match":{"tags":"一,二"}
}
}
- 多条件查询
GET cms_index/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "测试"
}
}
},
{
"match": {
"tags": {
"query": "一"
}
}
}
]
}
},
"from": 0,
"size": 10
}
{
"bool": {
"must": [
{
"bool": {
"should": [
{
"wildcard": {
"nickName": {
"wildcard": "*测试*",
"boost": 1
}
}
},
{
"match": {
"research": {
"query": "测试",
"operator": "OR",
"analyzer": "ik_max_word",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"match": {
"content": {
"query": "测试",
"operator": "OR",
"analyzer": "ik_max_word",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
},
{
"match": {
"doctorStyle": {
"query": "测试",
"operator": "OR",
"analyzer": "ik_max_word",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
{
"match": {
"status": {
"query": 1,
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
常用的条件查询
- term
- term是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词拆解。
GET cms_index/_search
{
"query" : {
"term": {
"id": "22"
}
}
}
GET cms_index/_search
{
"query" : {
"terms": {
"id": ["22","23"]
}
}
}
- match
- match进行搜索的时候,会先进行分词拆分,拆完后,再来匹配
- match_phrase
- 称为短语搜索,要求所有的分词必须同时出现在文档中,同时位置必须紧邻一致