Elasticsearch 介绍 Elasticsearch 7:快速上手 Elasticsearch 7:关于 Index、Type、Document Elasticsearch 7:安装与启动 Elasticsearch 7:Kibana 的使用 Elasticsearch 7:下载历史版本 Elasticsearch 7:文档唯一性 Elasticsearch 7:默认端口与端口设置 Elasticsearch 7:创建和删除索引 Elasticsearch 7:自定义 mapping 和 settings Elasticsearch 7:设置索引副本数量和分片数量 Elasticsearch 7:查看所有索引 Elasticsearch 7:数据类型 Elasticsearch 7:字符串类型 keyword 、text Elasticsearch 7:数组 Elasticsearch 7:添加和更新文档 Elasticsearch 7:通过 _bulk 批量添加文档 Elasticsearch 7:使用 from 、size 进行分页查询 Elasticsearch 7:查询中使用 sort 进行排序 Elasticsearch 7:查询结果只展示部分字段 Elasticsearch 7:查询结果中展示 _version 字段 Elasticsearch 7:使用 ignore_above 限制字符串长度 Elasticsearch 7:动态映射 Elasticsearch 7:doc_values 属性 Elasticsearch 7:刷新周期 refresh_interval Elasticsearch 7:使用 _refresh 刷新索引 Elasticsearch 7:分片(shard)限制 Elasticsearch 7:使用 _cat thread_pool 查询线程池运行情况 Elasticsearch 7:事务日志 translog Elasticsearch 7:文档 _id 的长度限制 Elasticsearch 7:分片 shard Elasticsearch 7:滚动查询 Elasticsearch 7:聚合查询 Elasticsearch 7:索引模板 Elasticsearch 7:获取文档所属的 shard Elasticsearch 7:获取版本号 Elasticsearch 7:获取指定 shard 中的文档 Elasticsearch 7:获取 shard 统计信息 Elasticsearch 7:搜索实战 Elasticsearch 7:Python 客户端 Elasticsearch 7:Java TransportClient API 客户端 Elasticsearch 7:Java REST Client API 客户端 Elasticsearch:将 SQL 转换为 DSL Elasticsearch 6 快速上手 Elasticsearch 5 快速上手 Elasticsearch 5:禁止自动创建索引 Elasticsearch 5:禁止动态增加字段 Elasticsearch 产品版本支持周期 基于 Elasticsearch 的站内搜索引擎实战

Elasticsearch 7:滚动查询


#Elasticsearch


深度分页查询会给ES集群性能带来极大损耗,滚动查询可以避免这种情况。

示例

数据准备

PUT student
{
  "mappings" : {
    "properties" : {
      "name" : {
        "type" : "keyword"
      },
      "age" : {
        "type" : "integer"
      }
    }
  }
}


POST _bulk
{ "index" : { "_index" : "student", "_id" : "1" } }
{ "name" : "张三", "age": 12}
{ "index" : { "_index" : "student", "_id" : "2" } }
{ "name" : "李四", "age": 10 }
{ "index" : { "_index" : "student", "_id" : "3" } }
{ "name" : "王五", "age": 11 }
{ "index" : { "_index" : "student", "_id" : "4" } }
{ "name" : "陈六", "age": 11 }

查询示例1:

查询

GET student/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "sort": {
    "name": "desc"
  },
  "size": 10
}

结果:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "陈六",
          "age" : 11
        },
        "sort" : [
          "陈六"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "王五",
          "age" : 11
        },
        "sort" : [
          "王五"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "李四",
          "age" : 10
        },
        "sort" : [
          "李四"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "张三",
          "age" : 12
        },
        "sort" : [
          "张三"
        ]
      }
    ]
  }
}

返回结果中有 _scroll_id ,我们下次查询基于 scroll_id 查询:

POST /_search/scroll
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==" 
}

结果:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

已经查不到数据了,返回值中的 scroll_id 和查询是的 scroll_id 是一个值。继续查询时,返回值和上面的相同。

因为指定保存 scroll 上下文的时间是 1分钟( 1m ),所以,过1分钟后查询,会报错。

1分钟后,继续查:

POST /_search/scroll
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAPMWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==" 
}

此时会报错:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "search_context_missing_exception",
        "reason" : "No search context found for id [243]"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : -1,
        "index" : null,
        "reason" : {
          "type" : "search_context_missing_exception",
          "reason" : "No search context found for id [243]"
        }
      }
    ],
    "caused_by" : {
      "type" : "search_context_missing_exception",
      "reason" : "No search context found for id [243]"
    }
  },
  "status" : 404
}

查询示例2:

请求:

GET student/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "sort": {
    "name": "desc"
  },
  "size": 3
}

响应:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAFwWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "陈六",
          "age" : 11
        },
        "sort" : [
          "陈六"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "王五",
          "age" : 11
        },
        "sort" : [
          "王五"
        ]
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "李四",
          "age" : 10
        },
        "sort" : [
          "李四"
        ]
      }
    ]
  }
}

只获取了3条数据。

基于 scroll_id 继续请求:

POST /_search/scroll
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAFwWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==" 
}

响应:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAFwWVlRranJQRUJSZTJ2SHl4UUFpcGo5QQ==",
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "张三",
          "age" : 12
        },
        "sort" : [
          "张三"
        ]
      }
    ]
  }
}

查询到了1条。

再继续查询,则查不到数据了。

什么时候结束查询?

当查询结果数量小于第一次查询执行的 size时,结束查询。

2020-08-06: 按照 https://www.elastic.co/guide/cn/elasticsearch/guide/current/scroll.html 的说法,涉及到多个分片时,查询结果可能超过指定的 size 大小。这篇文章针对的是较旧的 ES 版本,目前的新版本ES官方文档,暂时没找到相关描述。



( 本文完 )