深度分页查询会给ES集群性能带来极大损耗,滚动查询可以避免这种情况。
示例
数据准备
PUT student
{
"mappings" : {
"properties" : {
"name" : {
"type" : "keyword"
},
"age" : {
"type" : "integer"
}
}
}
}
POST _bulk
{ "index" : { "_index" : "student", "_id" : "1" } }
{ "name" : "张三", "age": 12}
{ "index" : { "_index" : "student", "_id" : "2" } }
{ "name" : "李四", "age": 10 }
{ "index" : { "_index" : "student", "_id" : "3" } }
{ "name" : "王五", "age": 11 }
{ "index" : { "_index" : "student", "_id" : "4" } }
{ "name" : "陈六", "age": 11 }
查询示例1:
查询
GET student/_search?scroll=1m
{
"query": {
"match_all": {}
},
"sort": {
"name": "desc"
},
"size": 10
}
结果:
{
"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "student",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "陈六",
"age" : 11
},
"sort" : [
"陈六"
]
},
{
"_index" : "student",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "王五",
"age" : 11
},
"sort" : [
"王五"
]
},
{
"_index" : "student",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "李四",
"age" : 10
},
"sort" : [
"李四"
]
},
{
"_index" : "student",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "张三",
"age" : 12
},
"sort" : [
"张三"
]
}
]
}
}
返回结果中有 _scroll_id ,我们下次查询基于 scroll_id 查询:
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ=="
}
结果:
{
"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
已经查不到数据了,返回值中的 scroll_id 和查询是的 scroll_id 是一个值。继续查询时,返回值和上面的相同。
因为指定保存 scroll 上下文的时间是 1分钟( 1m
),所以,过1分钟后查询,会报错。
1分钟后,继续查:
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ=="
}
此时会报错:
{
"error" : {
"root_cause" : [
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [243]"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [243]"
}
}
],
"caused_by" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [243]"
}
},
"status" : 404
}
查询示例2:
请求:
GET student/_search?scroll=1m
{
"query": {
"match_all": {}
},
"sort": {
"name": "desc"
},
"size": 3
}
响应:
{
"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAFwWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "student",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "陈六",
"age" : 11
},
"sort" : [
"陈六"
]
},
{
"_index" : "student",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "王五",
"age" : 11
},
"sort" : [
"王五"
]
},
{
"_index" : "student",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "李四",
"age" : 10
},
"sort" : [
"李四"
]
}
]
}
}
只获取了3条数据。
基于 scroll_id 继续请求:
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAFwWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ=="
}
响应:
{
"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAFwWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "student",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "张三",
"age" : 12
},
"sort" : [
"张三"
]
}
]
}
}
查询到了1条。
再继续查询,则查不到数据了。
什么时候结束查询?
当查询结果数量小于第一次查询执行的 size时,结束查询。
2020-08-06: 按照 https://www.elastic.co/guide/cn/elasticsearch/guide/current/scroll.html 的说法,涉及到多个分片时,查询结果可能超过指定的 size 大小。这篇文章针对的是较旧的 ES 版本,目前的新版本ES官方文档,暂时没找到相关描述。