How to get regular snapshots with Elasticsearch

Created:
Author: Christoph Stoettner
Read in about 4 min · 787 words

Harddisk

Photo by Jandira Sonnendeck | Unsplash

Last week I played around with the HCL Connections documentation to backup Elasticsearch in the article Backup Elasticsearch Indices in Component Pack .

In the end I found that I couldn’t get the snapshot restored and that I have to run a command outside of my Kubernetes cluster to get a snapshot on a daily basis. That’s not what I want.

So the first idea was to move the job defined in the helm chart into a Kubernetes cronjob. So I changed the definition to a cronjob. So now I have a job running from Kubernetes.

I added a new default variable:

cronTimes: "0 6,18 * * *"

So without changing this, when we deploy the cronjob, at 6:00 and 18:00 (or 6am and 6pm) the cronjob runs and creates a snapshot.

So what happens if we want to restore a snapshot? When we add the restore script to the same helm chart as our backup script, we have to delete the installation and loose all logs of our backup jobs. The snapshots are still there, but the history is gone.

So I created seperate helm charts, first the cronjob to create snapshots and second the job to restore the snapshot. The restore script restores all indices in our snapshot, this fails because some of them are system indices and always open. So the restore fails all the time in my tests.

The biggest caveat with the restore script is that it closes all indices. Each index would automatically open again after a successful restore, but the restore fails and so all indices stay closed.

I tried adding command options to the delivered restore command to only restore the icmetrics*, orient-me-collection and quickresults indices, but the restore script was too limited for me.

If you want to use this helm chart, feel free to download:

I’m not happy with the restore script, so no download for the moment.

Use Kibana and create a snapshot policy

In former Componentpack versions, there was a helm chart to deploy Elasticstack (Kibana, Logstash and Filebeat). This chart is still contained in the Componentpack package, but the image are missing.

I asked for updated images in a case and got them from HCL support. As far as I know, these helm chart and images are not available on Flexnet until now, but I’m confident, that support will send them to you on request.

In Kibana we can define policies for automatic snapshots , these can be configured through the web UI and show the http request which is sent to Elasticsearch. So we can configure these snapshots without installing Kibana for now.

To create a snapshot in the evening each day:

  1. Open a shell in one of the es-client pods:
kubectl exec -it -c es-client $(kubectl get pods -l component=elasticsearch7,role=client | awk '/client/{print $1}' | head -n 1 ) -- bash
  1. The backup store is mounted into all Elasticsearch pods, so no need to change anything on the deployments or statefulsets.
cd /opt/elasticsearch-7.10.1/probe
./sendRequest.sh PUT /_slm/policy/daily-snapshot -H 'Content-Type: application/json' -d '
{
  "name": "<daily-snap-{now/d}>",
  "schedule": "0 31 16 * * ?",
  "repository": "connectionsmetrics",
  "config": {
    "indices": [
      "ic*",
      "quickresults",
      "orient-me-collection"
    ],
    "ignore_unavailable": true
  },
  "retention": {
    "expire_after": "3d",
    "min_count": 3,
    "max_count": 5
  }
}
'

This will create a scheduled snapshot of the configured indices (ic*, quickresults and orient-me-collection), at 16:31 UTC (the leading zero are the seconds). And keep 3 snapshots as a minimum. So retention and automatically delete snapshots after the configured time.

Elasticsearch cron expressions

Elasticsearch snapshots are automatically deduplicated!

Snapshots are automatically deduplicated to save storage space and reduce network transfer costs. To back up an index, a snapshot makes a copy of the index’s segments and stores them in the snapshot repository. Since segments are immutable, the snapshot only needs to copy any new segments created since the repository’s last snapshot.

Each snapshot is also logically independent. When you delete a snapshot, Elasticsearch only deletes the segments used exclusively by that snapshot. Elasticsearch doesn’t delete segments used by other snapshots in the repository.

So no waste of diskspace if you add more snapshots. I play around with hourly snapshots at the moment:

cd /opt/elasticsearch-7.10.1/probe
./sendRequest.sh PUT /_slm/policy/hourly-snapshot -H 'Content-Type: application/json' -d '
{
  "name": "<hourly-snap-{now/d}>",
  "schedule": "0 0 * * * ?",
  "repository": "connectionsmetrics",
  "config": {
    "indices": [
      "ic*",
      "quickresults",
      "orient-me-collection"
    ],
    "ignore_unavailable": true
  },
  "retention": {
    "expire_after": "1d",
    "min_count": 6,
    "max_count": 12
  }
}

'

This can be set with Kibana, but no need to deploy it, if you don’t need it. You can use the curl calls above to configure the snapshots.

Author
Christoph Stoettner
Thursday, 1. Sep 2022 15:05

Testing the new comment system. Just ignore this please.

Error
There was an error sending your comment, please try again.
Thank you!
Your comment has been submitted and will be published once it has been approved.

Your email address will not be published. Required fields are marked with *

Suggested Reading
Card image cap

During a migration from Cognos Metrics to Elasticsearch Metrics, I had some issues with the index. So I wanted to create a backup of the already migrated data and start over from scratch.

The official documentation has an article on the topic: Backing up and restoring data for Elasticsearch-based components , but I had to slightly adjust the commands to get a successful snapshot.

Created:
Last Update:
Read in about 6 min
Card image cap

Elasticsearch in HCL Connections Componentpack is secured with Searchguard and needs certificates to work properly. These certificates are generated by bootstrap during the initial container deployment with helm.

These certificates are valid for 10 years (chain_ca.pem) or 2 years (elasticsearch*.pem) and stored in the Kubernetes secrets elasticsearch-secret, elasticsearch-7-secret. So when your HCL Connections deployment is running for 2 years, the certficates stop working.

Created: Read in about 3 min
Aaron Burden: Fountain pen and a notebook

Today I activated Elasticsearch Metrics and Typeahead Search on my demo HCL Connections cluster .

To my surprise the indices weren’t created and I got errors on the wsadmin.sh commands.

SearchService.createESQuickResultsIndex()

I checked the Elasticsearch pods which showed a running state, but the logs showed following messages:

Created:
Last Update:
Read in about 3 min