Amazon Elasticsearch Service(AES)のシャードの未割り当て(UNASSIGNED)が原因でクラスター状態が黄色になってしまった場合の対応方法

概要

タイトルの通り、ある日突然Amazon Elasticsearch Service(以下AES)のクラスター状態が黄色になってしまいました。今回はクラスター状態が黄色になった場合の対処方法について、紹介します。

クラスター状態が黄色になってしまった原因

AWSのドキュメントを参考にすると、以下の状態になっているようです。

黄色のクラスター状態は、すべてのインデックスのプライマリシャードがクラスター内のノードに割り当てられ、少なくとも 1 つのインデックスのレプリカシャードは割り当てられていないことを意味します。

docs.aws.amazon.com

問題の特定

まずは、クラスターの状態を確認します。以下のコマンドを使用して確認します。

$ curl -X GET {エンドポイント}/_cluster/health?pretty=true
{
  "cluster_name" : {クラスター名},
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 41,
  "active_shards" : 80,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 2,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 97.5609756097561
}

unassigned_shardsが2つ存在することがわかりました。さらに以下のコマンドで、シャード情報について深ぼってみます。

$ curl -X GET {エンドポイント}/_cat/shards?h=index,shard,prirep,state,unassigned.reason
index1              1 p STARTED    
index1              1 r STARTED    
index1              2 p STARTED    
index1              2 r STARTED    
index1              4 p STARTED    
index1              4 r STARTED    
index1              3 r STARTED    
index1              3 p STARTED    
index1              0 p STARTED    
index1              0 r STARTED    
index2 1 p STARTED    
index2 1 r STARTED    
index2 2 p STARTED    
index2 2 r UNASSIGNED ALLOCATION_FAILED
index2 4 p STARTED    
index2 4 r STARTED    
index2 3 p STARTED    
index2 3 r UNASSIGNED ALLOCATION_FAILED
index2 0 r STARTED    
index2 0 p STARTED

こちらの結果から、index2のシャード2, 3のreplicaノードが割り当てられていないことがわかりました。さらに詳細を追います。上記のシャードについて、ALLOCATION_FAILEDとなった原因を以下のコマンドで調べます。

$ curl -XGET {エンドポイント}/_cluster/allocation/explain?pretty
{
  "index" : "index2",
  "shard" : 3,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2021-04-21T17:26:41.698Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index2][3]: obtaining shard lock timed out after 5000ms]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [ {
    "node_id" : "4CKlBkmrSOqZIGwfSPgufQ",
    "node_name" : "4CKlBkm",
    "node_decision" : "no",
    "deciders" : [ {
      "decider" : "max_retry",
      "decision" : "NO",
      "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-04-21T17:26:41.698Z], failed_attempts[5], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index2][3]: obtaining shard lock timed out after 5000ms]; ], allocation_status[no_attempt]]]"
    } ]
  }, {
    "node_id" : "6AhVfEvhRziT3i5kaoaPfg",
    "node_name" : "6AhVfEv",
    "node_decision" : "no",
    "deciders" : [ {
      "decider" : "max_retry",
      "decision" : "NO",
      "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-04-21T17:26:41.698Z], failed_attempts[5], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index2][3]: obtaining shard lock timed out after 5000ms]; ], allocation_status[no_attempt]]]"
    }, {
      "decider" : "awareness",
      "decision" : "NO",
      "explanation" : "there are too many copies of the shard allocated to nodes with attribute [zone], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
    } ]
  }, {
    "node_id" : "Wl2W4K1ESxGUWdiZQE65zQ",
    "node_name" : "Wl2W4K1",
    "node_decision" : "no",
    "deciders" : [ {
      "decider" : "max_retry",
      "decision" : "NO",
      "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-04-21T17:26:41.698Z], failed_attempts[5], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index2][3]: obtaining shard lock timed out after 5000ms]; ], allocation_status[no_attempt]]]"
    } ]
  }, {
    "node_id" : "kORYnIFEQX-yZB0SFrcnIQ",
    "node_name" : "kORYnIF",
    "node_decision" : "no",
    "deciders" : [ {
      "decider" : "max_retry",
      "decision" : "NO",
      "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-04-21T17:26:41.698Z], failed_attempts[5], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index2][3]: obtaining shard lock timed out after 5000ms]; ], allocation_status[no_attempt]]]"
    }, {
      "decider" : "same_shard",
      "decision" : "NO",
      "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[index2][3], node[kORYnIFEQX-yZB0SFrcnIQ], [P], s[STARTED], a[id=88w2bSjVTiGrreCA9UzgeQ]]"
    }, {
      "decider" : "awareness",
      "decision" : "NO",
      "explanation" : "there are too many copies of the shard allocated to nodes with attribute [zone], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
    } ]
  } ]
}

これをみると、インメモリシャードロックを取得できない状況であったことがわかります。ノード間の疎通が失敗していたことが原因のようです。

対処方法

/_cat/shardsでシャードの状態を確認しましが、delayed_unassigned_shardsが0でした。 delayed_unassigned_shardsは、割り当てを待っているシャードの数です。この値が1以上である場合は、ESが割り当てを行うのですが、0であるため(unassigned_shardsが1以上であるため)、この後に自動で復旧することはありません。

また、unassigned_shardsはシャード自体は存在するが、ノードに割り当てられていないだけであるため、再割り当てを実行する必要があります。再割り当ては以下のコマンドを実行します。

$ curl -X PUT {エンドポイント}/{インデックス名}/_settings -d '
{
  "index.allocation.max_retries" : 10
}
'

これは割り当てに失敗した場合の再試行回数なのですが、この値を更新するとリーダーノードがクラスター上の指定されたインデックスのシャードの割り当てを再試行してくれます。このコマンドを実行後に、再びシャードの情報を確認します。

$ curl -X GET {エンドポイント}/_cat/shards?h=index,shard,prirep,state,unassigned.reason
index1              1 p STARTED    
index1              1 r STARTED    
index1              2 p STARTED    
index1              2 r STARTED    
index1              4 p STARTED    
index1              4 r STARTED    
index1              3 r STARTED    
index1              3 p STARTED    
index1              0 p STARTED    
index1              0 r STARTED    
index2 1 p STARTED    
index2 1 r STARTED    
index2 2 p STARTED    
index2 2 r INITIALIZING ALLOCATION_FAILED
index2 4 p STARTED    
index2 4 r STARTED    
index2 3 p STARTED    
index2 3 r INITIALIZING ALLOCATION_FAILED
index2 0 r STARTED    
index2 0 p STARTED

状態がUNASSIGNEDだったシャードが、INITIALIZINGに変わっていますね！もう少し待って、確認してみるとSTARTEDに切り替わり、unassigned.reasonもなくなります。

$ curl -X GET {エンドポイント}/_cat/shards?h=index,shard,prirep,state,unassigned.reason
index1              1 p STARTED    
index1              1 r STARTED    
index1              2 p STARTED    
index1              2 r STARTED    
index1              4 p STARTED    
index1              4 r STARTED    
index1              3 r STARTED    
index1              3 p STARTED    
index1              0 p STARTED    
index1              0 r STARTED    
index2 1 p STARTED    
index2 1 r STARTED    
index2 2 p STARTED    
index2 2 r STARTED
index2 4 p STARTED    
index2 4 r STARTED    
index2 3 p STARTED    
index2 3 r STARTED
index2 0 r STARTED    
index2 0 p STARTED

最後に、クラスター状態を確認してみます。

$ curl -X GET {エンドポイント}/_cluster/health?pretty=true
{
  "cluster_name" : "{クラスタ名}",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 41,
  "active_shards" : 82,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

無事にクラスター状態が緑になりました！

まとめ

今回は、突然AESのクラスター状態が黄色になってしまいました。原因は、シャードのいくつかが、ノードに割当されなくなってしまったことが原因でした。クラスター状態を緑に戻す対処方法としては、割り当てられていないシャードをノードに割り当てることです。これは、index.allocation.max_retriesを変更することで、リーダーノードが割り当て処理を再試行してくれます。シャード自体に問題がない場合は有効な復旧方法です。