Skip to content

Commit

Permalink
Merge pull request #86 from yanghaiji/es_feature
Browse files Browse the repository at this point in the history
Es feature
  • Loading branch information
yanghaiji authored Feb 10, 2022
2 parents d79a5bb + d9aa7b1 commit 892bd5a
Show file tree
Hide file tree
Showing 4 changed files with 264 additions and 1 deletion.
1 change: 1 addition & 0 deletions note/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,7 @@
- [🔖 elasticSearch 安装](elasticsearch/basics/install_elasticsearch.md)
- [🔖 elasticsearch 对索引增删改](elasticsearch/basics/insert_update_delete.md)
- [🔖 elasticsearch 常用查询表达式](elasticsearch/basics/query_dsl.md)
- [🔖 elasticsearch 批量操作常用API](elasticsearch/basics/batch.md)

## [🔖 面试题总结](InterviewQuestions/JavaYouHuo面试题总结.md)

Expand Down
3 changes: 2 additions & 1 deletion note/elasticsearch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@

## ElasticSearch 常用语法命令
- [elasticsearch 对索引增删改](basics/insert_update_delete.md)
- [elasticsearch 常用查询表达式](basics/query_dsl.md)
- [elasticsearch 常用查询表达式](basics/query_dsl.md)
- [elasticsearch 批量操作常用API](basics/batch.md)
261 changes: 261 additions & 0 deletions note/elasticsearch/basics/batch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
# 批量索引更新和删除

> 到目前为止,您已经一次索引了一个文档。这对于玩乐来说很好,但它意味着至少从两个方向的性能损失:
>
> - 您的应用程序必须等待 Elasticsearch 的回复,然后才能继续。
> - Elasticsearch 必须处理来自每个索引文档的请求中的所有数据。
>
> 如果您需要更高的索引速度,Elasticsearch 提供了一个批量 API,您可以使用它一次为多个文档编制索引
![image-20220208135630422](../img/20220208135630422.png)

如图所示,您可以使用 HTTP 执行此操作,因为到目前为止,您已经用于为文档编制索引,并且您将获得包含所有索引请求结果的回复

## 批量查询

### mget

> 在之前我们的查询都是单条查询,如果查询多条就需要进行多次的IO,这样的效率不高,这时我们可以通过mget来进行批量查询,如下
```json
# 查询 index为megacorp id为 1,2的数据
GET /_mget
{
"docs": [
{
"_index": "megacorp",
"_id": "1"
},
{
"_index": "megacorp",
"_id": "2",
"_source" :[
"first_name" ,
"last_name"
]
}
]
}

```

这是我们就可以同时查询不通的索引的数据,并且通过`_source`可以指定需要返回的字段

如果查询的时同一个索引,我们还可以进行优化,如下

```json
GET /megacorp/_mget
{
"ids": [
1,
2
]
}
```

可以说mget是很重要的,一般来说,在进行查询的时候,如果一次性要查询多条数据的话,那么一定要用batch批量操作的api
尽可能减少网络开销次数,可能可以将性能提升数倍,甚至数十倍,非常非常之重要

### msearch

多搜索 API 从单个 API 请求执行多个搜索。请求的格式类似于批量 API 格式,并使用换行符分隔的 JSON (NDJSON) 格式。

结构如下:

```
header\n
body\n
header\n
body\n
```

- header用于指定index、type、search_type、preference、routing等内容,告诉ElasticSearch下一个body的查询位置。

{"index":"<indexName>","type":"<typeName>"}
若所有header的index相同,或index及type都相同,可通过URL指定全局index或type。

这样,在header中就可以不再设置index或type内容。

但请注意,header部分不可省略,需要用{}\n占位。

否则,ElasticSearch将会把下一个body当成header处理,从而导致整个批量查询都解析错位。

- body用于指定具体的查询内容,其格式请参考_search查询格式。

```json
GET /_msearch
{"index":"megacorp"}
{"query":{"match":{"about":"I love"}}}
{"index":"megacorp"}
{"aggs":{"all_interests":{"terms":{"field": "interests.keyword"}}}}
```

如果是同一个索引可以简写成如下

```json
GET /megacorp/_msearch
{}
{"query":{"match":{"about":"I love"}}}
{}
{"aggs":{"all_interests":{"terms":{"field": "interests.keyword"}}}}
```

返回的数据是一个数组,数组中的每一个元素依次对应一个查询(body)。

```xml
{
"responses":[<body1Response>,<body2Response>, ...]
}
```

某个查询失败将在对应的返回信息中显示错误信息,不会影响其它查询结果。

## 批量增删改

### bulk

有哪些类型的操作可以执行呢?

1. delete:删除一个文档,只要1个json串就可以了
2. create:PUT /index/type/id/_create,强制创建
3. index:普通的put操作,可以是创建文档,也可以是全量替换文档
4. update:执行的partial update操作

bulk api对json的语法,有严格的要求,每个json串不能换行,只能放一行,同时一个json串和一个json串之间,必须有一个换行

```json
POST _bulk
{"delete":{"_index":"megacorp","_id":30}}
{"create":{"_index":"megacorp","_id":5}}
{"first_name":"John","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}
{"create":{"_index":"megacorp","_id":6}}
{"first_name":"John","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}
{"index":{"_index":"megacorp","_id":7}}
{"first_name":"John","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}
{"index":{"_index":"megacorp","_id":8}}
{"first_name":"John","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}
{"update":{"_index":"megacorp","_id":1}}
{"doc":{"age":55}}

```

**bulk操作中,任意一个操作失败,是不会影响其他的操作的**,但是在返回结果里,会告诉你异常日志

```json
{
"took" : 30,
"errors" : true,
"items" : [
{
"delete" : {
"_index" : "megacorp",
"_type" : "_doc",
"_id" : "30",
"_version" : 2,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 18,
"_primary_term" : 1,
"status" : 404
}
},
{
"create" : {
"_index" : "megacorp",
"_type" : "_doc",
"_id" : "5",
"status" : 409,
"error" : {
"type" : "version_conflict_engine_exception",
"reason" : "[5]: version conflict, document already exists (current version [3])",
"index_uuid" : "HJSUaB6lSWKLKfY0treFXg",
"shard" : "0",
"index" : "megacorp"
}
}
},
{
"create" : {
"_index" : "megacorp",
"_type" : "_doc",
"_id" : "6",
"status" : 409,
"error" : {
"type" : "version_conflict_engine_exception",
"reason" : "[6]: version conflict, document already exists (current version [3])",
"index_uuid" : "HJSUaB6lSWKLKfY0treFXg",
"shard" : "0",
"index" : "megacorp"
}
}
},
{
"index" : {
"_index" : "megacorp",
"_type" : "_doc",
"_id" : "7",
"_version" : 4,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 19,
"_primary_term" : 1,
"status" : 200
}
},
{
"index" : {
"_index" : "megacorp",
"_type" : "_doc",
"_id" : "8",
"_version" : 4,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 20,
"_primary_term" : 1,
"status" : 200
}
},
{
"update" : {
"_index" : "megacorp",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "noop",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 8,
"_primary_term" : 1,
"status" : 200
}
}
]
}

```

如果只是操作同一个索引可以想mget一样指定索引

```
POST /megacorp/_bulk
{"delete":{"_id":30}}
{"delete":{"_id":31}}
{"delete":{"_id":32}}
```

**bulk size最佳大小:** bulk request会加载到内存里,如果太大的话,性能反而会下降,因此需要反复尝试一个最佳的bulk size。一般从1000~5000条数据开始,尝试逐渐增加。另外,如果看大小的话,最好是在5~15MB之间。
Binary file added note/elasticsearch/img/20220208135630422.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 892bd5a

Please sign in to comment.