OpenDistro Multi-node ElasticSearch Cluster
If you’re like me, you have arrived here after googling configuration examples of a self-hosted ElasticSearch (ES) cluster. Here you go: https://github.com/kave/opendistro-elasticsearch
Still, reading? Good let’s answer a few questions:
Why OpenDistro?
ElasticSearch started as a fully open source project, but over time (and popularity) started putting features behind a proprietary paywall. Hence OpenDistro. You can read more about this here:
Since this is backed by AWS and is used in their ElasticSearch service, it gave me confidence that when the option to move to the cloud became available the migration should be simpler
Multi-Node Cluster Configuration Walkthrough
We’re going with a multi-node setup because by distributing the documents in an index across multiple shards and distributing those shards across multiple nodes, Elasticsearch can ensure redundancy, which both protects against hardware failures and increases query capacity as nodes are added to a cluster. As the cluster grows (or shrinks), Elasticsearch automatically migrates shards to rebalance the cluster
The repo’s docker-compose configuration will create 1 master & 2 data persisted nodes communicating between containers via TLS.
Steps:
- Clone repo: https://github.com/kave/opendistro-elasticsearch
2. Create a local certificate authority and unique private keys & certificates for each node. I’ve simplified this by providing a script to create local self-signed certificates
cd certs
./ssl_creation.sh
3. Add the root authority certificate (root.pem) to your KeyChain. I’ll provide macOS Keychain instructions below:
- Open the macOS Keychain app
- Go to
File > Import Items..
- Select your root certificate file (i.e. root.pem)
- Double click on your root certificate in the list
- Expand the
Trust
section - Change the
When using this certificate
: select box toAlways Trust
4. Increase docker memory to 6GB. I’ve currently set each node to have 2GB but feel free to edit this for your use-case.
5. Launch containers. This will bring up the 3 containers,
docker-compose --compatibility up
Done 🙌🏿
6. Confirm cluster health via the health check endpoint.
From the repo root directory:
curl 'https://admin:admin@local.es:9200/_cluster/health?pretty' --cacert $PWD/certs/root-ca.pem
7. Success! 🚀
Tips for Production
- Set Xmx and Xms of your Java heap space to no more than 50% of your physical RAM.
config/cluster.conf:
ES_JAVA_OPTS=-Xms1g -Xmx1g
docker-compose.yml:
deploy:
resources:
limits:
memory: 2G
- Ensure ES can create the necessary amount of threads. This can be done by setting
ulimit -n 4096
as root before starting Elasticsearch, or by settingnproc -n 4096
in/etc/security/limits.conf
.
<app_user> soft nproc 4096<app_user> hard nproc 4096
- Increate VM File Descriptors. set
ulimit -n 65535
as root before starting Elasticsearch, or setnofile
to65535
in/etc/security/limits.conf
.
<app_user> soft nofile 1024<app_user> hard nofile 65535
- If you are bind-mounting a local directory or file, it must be readable by the
elasticsearch
user. Also, this user must have write access to the data and log dirs. A good strategy is to grant group access to gid0
for the local directory. For example, to prepare a local directory for storing data through a bind-mount:
mkdir esdatadir
chmod g+rwx esdatadir
chgrp 0 esdatadir
- If possible you want to have your nodes on different machines, but on the same network to improve redundancy and reduce the blast radius of a disaster.
For more tech musings.. Feel free to follow me here.