====== Cluster Elasticsearch ======

Extrait du site web d'Elasticsearch :

> Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. Architected from the ground up for use in distributed environments where reliability and scalability are must haves, Elasticsearch gives you the ability to move easily beyond simple full-text search. Through its robust set of APIs and query DSLs, plus clients for the most popular programming languages, Elasticsearch delivers on the near limitless promises of search technology.

Nous utilisons actuellement la version 5.1.1 d'Elasticsearch.

====== Utilisation ======

La methode native et directe d'accès au cluster est l'API REST, il suffit de faire des requettes HTTP, via CURL par exemple sur un des noeuds du stack.

exemple : http://192.168.102.227:9200/_cat/indices?v (affiche les indexes présent sur le cluster)

Pour des requetes très lourdes il est aussi possible d'installer un noeud slave sur sa machine.

====== Mise en place du cluster ======

===== Définition des noeuds =====

Notre cluster sera composé de quatre (K)VM avec 5 Go de RAM, 32 Go de disque dur et 2 coeurs + 2 sockets chacune.

^ Nom            | Luffy           | Dhalsim         | Padon           | Magus           |
^ hostname       | es-luffy        | es-dhalsim      | es-padon        | es-magus        |
^ IP             | 192.168.102.227 | 192.168.102.228 | 192.168.102.229 | 192.168.102.231 |

Pour faciliter le parametrage, on installe pas de swap dans la VM, sinon il faut regler la swapiness ou creer un lock pour empecher la JVM qui fait tourner Elasticsearch de passer en swap, ce qui dégraderait énormément les performances.

===== Installation d'ElasticSearch =====

On utilise les dépôts ElasticSearch afin de simplifier les mises à jour. 

<file bash install-sources.sh>
export http_proxy=http://192.168.102.61:82
wget -qO - http://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
sudo echo "deb http://packages.elastic.co/elasticsearch/5.x/debian stable main" > /etc/apt/sources.list.d/elasticsearch-5.x.list
</file>

<file bash install-es.sh>
sudo apt update
sudo apt install openjdk-8-jre elasticsearch
</file>

===== Configuration =====
Elle se trouve majoritairement dans le fichier ''/etc/elasticsearch/'', il y a 3 fichiers important :

==== ElasticSearch ====

<code>
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: es-minet
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: es-luffy
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 192.168.102.227
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["192.168.102.228", "192.168.102.229", "192.168.102.231"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
gateway.expected_nodes: 4
gateway.expected_master_nodes: 4
gateway.expected_data_nodes: 4
gateway.recover_after_time: 5m
gateway.recover_after_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
</code>

==== JVM ====

Editez le fichier ''/etc/elasticsearch/jvm.options'' :
<code>
## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms3g
-Xmx3g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## optimizations

# disable calls to System#gc
-XX:+DisableExplicitGC

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# force the server VM (remove on 32-bit client JVMs)
-server

# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true

# flags to keep Netty from being unsafe
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}

## GC logging

#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime

# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}

# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true
</code>

==== Logging ====

<code>
status = error

# log action execution errors for easier debugging
logger.action.name = org.elasticsearch.action
logger.action.level = debug

appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n

appender.rolling.type = RollingFile
appender.rolling.name = rolling
appender.rolling.fileName = ${sys:es.logs}.log
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%.-10000m%n
appender.rolling.filePattern = ${sys:es.logs}-%d{yyyy-MM-dd}.log
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true

rootLogger.level = info
rootLogger.appenderRef.console.ref = console
rootLogger.appenderRef.rolling.ref = rolling

appender.deprecation_rolling.type = RollingFile
appender.deprecation_rolling.name = deprecation_rolling
appender.deprecation_rolling.fileName = ${sys:es.logs}_deprecation.log
appender.deprecation_rolling.layout.type = PatternLayout
appender.deprecation_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%.-10000m%n
appender.deprecation_rolling.filePattern = ${sys:es.logs}_deprecation-%i.log.gz
appender.deprecation_rolling.policies.type = Policies
appender.deprecation_rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.deprecation_rolling.policies.size.size = 1GB
appender.deprecation_rolling.strategy.type = DefaultRolloverStrategy
appender.deprecation_rolling.strategy.max = 4

logger.deprecation.name = org.elasticsearch.deprecation
logger.deprecation.level = warn
logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_rolling
logger.deprecation.additivity = false

appender.index_search_slowlog_rolling.type = RollingFile
appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling
appender.index_search_slowlog_rolling.fileName = ${sys:es.logs}_index_search_slowlog.log
appender.index_search_slowlog_rolling.layout.type = PatternLayout
appender.index_search_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %marker%.-10000m%n
appender.index_search_slowlog_rolling.filePattern = ${sys:es.logs}_index_search_slowlog-%d{yyyy-MM-dd}.log
appender.index_search_slowlog_rolling.policies.type = Policies
appender.index_search_slowlog_rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.index_search_slowlog_rolling.policies.time.interval = 1
appender.index_search_slowlog_rolling.policies.time.modulate = true

logger.index_search_slowlog_rolling.name = index.search.slowlog
logger.index_search_slowlog_rolling.level = trace
logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling
logger.index_search_slowlog_rolling.additivity = false

appender.index_indexing_slowlog_rolling.type = RollingFile
appender.index_indexing_slowlog_rolling.name = index_indexing_slowlog_rolling
appender.index_indexing_slowlog_rolling.fileName = ${sys:es.logs}_index_indexing_slowlog.log
appender.index_indexing_slowlog_rolling.layout.type = PatternLayout
appender.index_indexing_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %marker%.-10000m%n
appender.index_indexing_slowlog_rolling.filePattern = ${sys:es.logs}_index_indexing_slowlog-%d{yyyy-MM-dd}.log
appender.index_indexing_slowlog_rolling.policies.type = Policies
appender.index_indexing_slowlog_rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.index_indexing_slowlog_rolling.policies.time.interval = 1
appender.index_indexing_slowlog_rolling.policies.time.modulate = true

logger.index_indexing_slowlog.name = index.indexing.slowlog.index
logger.index_indexing_slowlog.level = trace
logger.index_indexing_slowlog.appenderRef.index_indexing_slowlog_rolling.ref = index_indexing_slowlog_rolling
logger.index_indexing_slowlog.additivity = false
</code>

===== Configuration de l'OS =====

Elasticsearch est bien plus performant lorsqu'il possède beaucoup de ressources. On augmente donc le nombre de fichiers pouvant être ouverts par elasticsearch et on lui laisse utiliser toute la mémoire dont il dispose :

On crée le fichier ''/etc/security/limits.d/elasticsearch.conf'' :
<code>
elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch - memlock unlimited
</code>

Il est à noter qu'Elasticsearch est du genre à utiliser **toutes les ressources** dont il dispose, afin d'être le plus performant possible. Toutefois, on ne lui laisse que la moitié de la mémoire car il faut laisser le reste pour Apache Lucene (la librairie de recherche utilisée par Elasticsearch). 

===== Entretien du cluster avec Curator =====

Elasticsearch est assez bête dans sa gestion des données: il part du principe qu'il aura suffisamment d'espace disque et de RAM pour garder tous les indices et permettre d'y faire des recherches. Il faut donc que l'on gère cet aspect nous-mêmes afin de garder un cluster léger et performant.

Pour retirer un index "à la main", par exemple si on a fait une bourde de configuration : API REST : https://www.elastic.co/guide/en/elasticsearch/reference/5.1/indices-delete-index.html

Sinon, on utilise curator, depuis sa dernière version il utilise des fichiers de conf :

close30.yml
<code>
---
actions:
  1:
    action: close
    description: >-
      Close indices older than 30 days (based on index name), for logstash-
      prefixed indices.
    options:
      delete_aliases: False
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
      exclude:
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 30
      exclude:
</code>

delete60.yml
<code>
---
actions:
  1:
    action: delete_indices
    description: >-
      Delete indices older than 60 days (based on index name), for logstash-
      prefixed indices. Ignore the error if the filter does not result in an
      actionable list of indices (ignore_empty_list) and exit cleanly.
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
      exclude:
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 60
      exclude:
</code>

open60.yml
<code>
---
actions:
  1:
    action: open
    description: >-
      Open indices older than 30 days but younger than 60 days (based on index
      name), for logstash- prefixed indices.
    options:
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
      exclude:
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 30
      exclude:
    - filtertype: age
      source: name
      direction: younger
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 60
      exclude:
</code>

On lance les scripts Close30 et Delete60 dans cron.daily/curator :

<code>
curator --config /etc/curator/curator.yml /etc/curator/close30.yml
curator --config /etc/curator/curator.yml /etc/curator/delete60.yml
</code>

On lancera à la main Open60 :

<code>
curator --config /etc/curator/curator.yml /etc/curator/open60.yml
</code>

====== Monitoring ======

En dehors du monitoring traditionnel du service **elasticsearch** (vérifier que le processus tourne
et qu'il écoute sur le bon port), nous avons utilisé des **UserParameters** dans la configuration de
l'agent Zabbix afin d'avoir une surveillance plus poussée de notre cluster ainsi que de chacun de nos noeuds :

<code>

UserParameter=nb_node,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e 's/^.*"number_of_nodes":// p' | cut -d , -f 1
UserParameter=nb_data_node,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e 's/^.*                "number_of_data_nodes"://p' | cut -d , -f 1
UserParameter=active_shard,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e 's/^.*                "active_shards"://p' | cut -d , -f 1
UserParameter=active_primary_shard,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e 's/^.*        "active_shards"://p' | cut -d , -f 1
UserParameter=unasigned_shards,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e 's/^.*            "unassigned_shards"://p' | cut -d , -f 1
UserParameter=relocating_shards,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e 's/^.*           "relocating_shards"://p' | cut -d , -f 1
UserParameter=initializing_shards,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e 's/^.*         "initializing_shards"://p' | cut -d , -f 1
UserParameter=delayed_unassigned_shards,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e 's/^.*   "delayed_unassigned_shards"://p' | cut -d , -f 1
UserParameter=number_of_pending_tasks,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e 's/^.*     "number_of_pending_tasks"://p' | cut -d , -f 1
UserParameter=task_max_waiting_in_queue_millis,curl -s -XGET http://<IP du noeud>:9200/_cluster/health | sed -n -e   's/^.*"task_max_waiting_in_queue_millis"://p' | cut -d , -f 1
UserParameter=document_count,curl -s -XGET http://<IP du noeud>:9200/_cat/count | cut -d " " -f 3 
UserParameter=heap_use_percent,curl -s -XGET http://<IP du noeud>:9200/_cat/nodes?h=ip,heap.percent | grep <IP du noeud> | tr -s ' ' | cut -d ' ' -f 2
UserParameter=file_desc_percent,curl -s -XGET http://<IP du noeud>:9200/_cat/nodes?h=ip,file_desc.percent | grep <IP du noeud> | tr -s ' ' | cut -d ' ' -f 2
UserParameter=ram_percent,curl -s -XGET http://<IP du noeud>:9200/_cat/nodes?h=ip,ram.percent | grep <IP du noeud> | tr -s ' ' | cut -d ' ' -f 2
UserParameter=index_total,curl -s -XGET http://<IP du noeud>:9200/_cat/nodes?h=ip,indexing.index_total | grep <IP du noeud> | tr -s ' ' | cut -d ' ' -f 2 
UserParameter=indexing.delete_total,curl -s -XGET http://<IP du noeud>:9200/_cat/nodes?h=ip,indexing.delete_total |  grep <IP du noeud> | tr -s ' ' | cut -d ' ' -f 2
UserParameter=search.query_total,curl -s -XGET http://<IP du noeud>:9200/_cat/nodes?h=ip,search.query_total | grep   <IP du noeud> | tr -s ' ' | cut -d ' ' -f 2 

</code>

Cela nous permet également d'obtenir de beaux graphes Zabbix plutôt utiles pour comprendre ce qu'il se passe
sur notre cluster.