Elasticsearch 7:滚动查询


#Elasticsearch 笔记


深度分页查询会给ES集群性能带来极大损耗,滚动查询可以避免这种情况。

示例

数据准备

PUT student
{
  "mappings" : {
    "properties" : {
      "name" : {
        "type" : "keyword"
      },
      "age" : {
        "type" : "integer"
      }
    }
  }
}


POST _bulk
{ "index" : { "_index" : "student", "_id" : "1" } }
{ "name" : "张三", "age": 12}
{ "index" : { "_index" : "student", "_id" : "2" } }
{ "name" : "李四", "age": 10 }
{ "index" : { "_index" : "student", "_id" : "3" } }
{ "name" : "王五", "age": 11 }
{ "index" : { "_index" : "student", "_id" : "4" } }
{ "name" : "陈六", "age": 11 }

查询示例1:

查询

GET student/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "sort": {
    "name": "desc"
  },
  "size": 10
}

结果:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "陈六",
          "age" : 11
        },
        "sort" : [
          "陈六"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "王五",
          "age" : 11
        },
        "sort" : [
          "王五"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "李四",
          "age" : 10
        },
        "sort" : [
          "李四"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "张三",
          "age" : 12
        },
        "sort" : [
          "张三"
        ]
      }
    ]
  }
}

返回结果中有 _scroll_id ,我们下次查询基于 scroll_id 查询:

POST /_search/scroll
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==" 
}

结果:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

已经查不到数据了,返回值中的 scroll_id 和查询是的 scroll_id 是一个值。继续查询时,返回值和上面的相同。

因为指定保存 scroll 上下文的时间是 1分钟( 1m ),所以,过1分钟后查询,会报错。

1分钟后,继续查:

POST /_search/scroll
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==" 
}

此时会报错:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "search_context_missing_exception",
        "reason" : "No search context found for id [243]"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : -1,
        "index" : null,
        "reason" : {
          "type" : "search_context_missing_exception",
          "reason" : "No search context found for id [243]"
        }
      }
    ],
    "caused_by" : {
      "type" : "search_context_missing_exception",
      "reason" : "No search context found for id [243]"
    }
  },
  "status" : 404
}

查询示例2:

请求:

GET student/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "sort": {
    "name": "desc"
  },
  "size": 3
}

响应:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAFwWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "陈六",
          "age" : 11
        },
        "sort" : [
          "陈六"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "王五",
          "age" : 11
        },
        "sort" : [
          "王五"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "李四",
          "age" : 10
        },
        "sort" : [
          "李四"
        ]
      }
    ]
  }
}

只获取了3条数据。

基于 scroll_id 继续请求:

POST /_search/scroll
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAFwWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==" 
}

响应:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAFwWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "张三",
          "age" : 12
        },
        "sort" : [
          "张三"
        ]
      }
    ]
  }
}

查询到了1条。

再继续查询,则查不到数据了。

什么时候结束查询?

当查询结果数量小于第一次查询执行的 size时,结束查询。

2020-08-06: 按照 https://www.elastic.co/guide/cn/elasticsearch/guide/current/scroll.html 的说法,涉及到多个分片时,查询结果可能超过指定的 size 大小。这篇文章针对的是较旧的 ES 版本,目前的新版本ES官方文档,暂时没找到相关描述。



( 本文完 )