OpenDistro Multi-node ElasticSearch Cluster

Emmanuel Apau
3 min readJul 1, 2020

--

If you’re like me, you have arrived here after googling configuration examples of a self-hosted ElasticSearch (ES) cluster. Here you go: https://github.com/kave/opendistro-elasticsearch

Still, reading? Good let’s answer a few questions:

Why OpenDistro?

ElasticSearch started as a fully open source project, but over time (and popularity) started putting features behind a proprietary paywall. Hence OpenDistro. You can read more about this here:

Since this is backed by AWS and is used in their ElasticSearch service, it gave me confidence that when the option to move to the cloud became available the migration should be simpler

Multi-Node Cluster Configuration Walkthrough

We’re going with a multi-node setup because by distributing the documents in an index across multiple shards and distributing those shards across multiple nodes, Elasticsearch can ensure redundancy, which both protects against hardware failures and increases query capacity as nodes are added to a cluster. As the cluster grows (or shrinks), Elasticsearch automatically migrates shards to rebalance the cluster

The repo’s docker-compose configuration will create 1 master & 2 data persisted nodes communicating between containers via TLS.

Steps:

  1. Clone repo: https://github.com/kave/opendistro-elasticsearch

2. Create a local certificate authority and unique private keys & certificates for each node. I’ve simplified this by providing a script to create local self-signed certificates

cd certs
./ssl_creation.sh

3. Add the root authority certificate (root.pem) to your KeyChain. I’ll provide macOS Keychain instructions below:

macOS Keychain
  • Open the macOS Keychain app
  • Go to File > Import Items..
  • Select your root certificate file (i.e. root.pem)
Trust Root CA
  • Double click on your root certificate in the list
  • Expand the Trust section
  • Change the When using this certificate: select box to Always Trust

4. Increase docker memory to 6GB. I’ve currently set each node to have 2GB but feel free to edit this for your use-case.

5. Launch containers. This will bring up the 3 containers,

docker-compose --compatibility up

Done 🙌🏿

Goldblum, Goldbluming

6. Confirm cluster health via the health check endpoint.

From the repo root directory:

curl 'https://admin:admin@local.es:9200/_cluster/health?pretty' --cacert $PWD/certs/root-ca.pem

7. Success! 🚀

Tips for Production

  • Set Xmx and Xms of your Java heap space to no more than 50% of your physical RAM.

config/cluster.conf:

ES_JAVA_OPTS=-Xms1g -Xmx1g

docker-compose.yml:

deploy:
resources:
limits:
memory: 2G
  • Ensure ES can create the necessary amount of threads. This can be done by setting ulimit -n 4096 as root before starting Elasticsearch, or by setting nproc -n 4096 in /etc/security/limits.conf.
<app_user>         soft       nproc          4096<app_user>         hard       nproc          4096
  • Increate VM File Descriptors. set ulimit -n 65535 as root before starting Elasticsearch, or set nofile to 65535 in /etc/security/limits.conf.
<app_user>         soft       nofile          1024<app_user>         hard       nofile          65535
  • If you are bind-mounting a local directory or file, it must be readable by the elasticsearch user. Also, this user must have write access to the data and log dirs. A good strategy is to grant group access to gid 0 for the local directory. For example, to prepare a local directory for storing data through a bind-mount:
mkdir esdatadir
chmod g+rwx esdatadir
chgrp 0 esdatadir
  • If possible you want to have your nodes on different machines, but on the same network to improve redundancy and reduce the blast radius of a disaster.

For more tech musings.. Feel free to follow me here.

--

--

Emmanuel Apau
Emmanuel Apau

Written by Emmanuel Apau

Site Reliability Engineer with 9 years of experience developing innovative and valuable solutions for clients with Python/Go

No responses yet