Ensuring Data Integrity: Elasticsearch Scheduled Backup with SLM Policy

In the realm of data management, ensuring the integrity and availability of information is paramount. For Elasticsearch clusters handling critical data, implementing robust backup strategies become indispensable. This blog explores the effective use of Elasticsearch's Snapshot Lifecycle Management (SLM) policies combined with Amazon S3 storage for scheduled backups, ensuring data reliability and recovery readiness.

Introduction

Elasticsearch, a powerful distributed search and analytics engine, is widely utilized to manage vast amounts of structured and unstructured data. As organizations rely more on Elasticsearch for real-time data insights, the need to safeguard this data through regular backups grow in importance. This blog delves into setting up and managing scheduled backups using Elasticsearch's Snapshot Lifecycle Management (SLM) feature, leveraging Amazon S3 as the storage backend.

Prerequisites

Before diving into the backup setup, it's crucial to ensure that the Elasticsearch instance is configured to access Amazon S3 storage. This involves verifying or adding S3 credentials to the Elasticsearch keystore, which can be done either directly within the Elasticsearch pod or by utilizing Kubernetes secrets for a more secure approach.

Step-by-Step Guide to Setting Up Scheduled Backups

1. Register Snapshot Repository

Firstly, repositories within Elasticsearch need registration to define where snapshots will be stored. To begin, configure the snap1 repository within Elasticsearch to specify the storage location for snapshots. This setup ensures that snapshots are stored in an S3 bucket named 'app-es-backup', with a base path of '1hour-backup/'.

curl -X PUT "localhost:9200/_snapshot/snap1?pretty" -H 'Content-Type: application/json' -d'{
  "type": "s3",
  "settings": {
    "bucket": "app-es-backup",
    "base_path": "1hour-backup/"
  }
}'

2. Create SLM Policies

With repositories in place, proceed to define SLM policies that govern the snapshot scheduling, retention rules, and other configurations. Define an SLM policy named 1hr-snapshots to schedule and manage snapshots at regular intervals. This policy specifies that snapshots will be taken every hour (0 0 * * * ? schedule), capturing all indices within Elasticsearch. It also ensures that the global state of the cluster, including critical metadata, is included in each snapshot.

curl -X PUT "localhost:9200/_slm/policy/1hr-snapshots?pretty" -H 'Content-Type: application/json' -d'{
  "schedule": "0 0 * * * ?",  
  "name": "<snap1-{now{yyyy-MM-dd_HH:mm:ss|Asia/Kolkata}}>",
  "repository": "snap1",
  "config": {
    "indices": ["*"],
    "ignore_unavailable": false,
    "include_global_state": true
  },
  "retention": {
    "expire_after": "3d",
    "min_count": 5,
    "max_count": 14
  }
}'

3. Verifying and Managing Backups

Once policies are configured, it's essential to verify their status and manage snapshots effectively. Elasticsearch provides commands to list existing policies and snapshots, ensuring transparency and control over the backup process.

Listing SLM Policies:

curl -X GET "localhost:9200/_slm/policy?pretty"

Listing Snapshots:

curl -X GET "localhost:9200/_cat/snapshots/snap1?v&s=id"

Ensuring Data Integrity: Restoration Process

In the unfortunate event of data loss or corruption, restoring from these snapshots becomes crucial. The restoration process involves registering the snapshot repositories, listing available snapshots, and then initiating a restore operation on the desired snapshot.

1. Register Snapshot Repository (for Restoration):

Ensure that the repository (snap1) is registered as per the backup setup process.

2. Listing Snapshots (for Restoration):

Before restoration, list the available snapshots to identify the specific snapshot intended for restoration.

curl -X GET "localhost:9200/_cat/snapshots/snap1?v&s=id"

3. Restoring Specific Snapshot:

To restore a specific snapshot (replace <snapshot_name> and <timestamp> with actual values):

curl -X POST "localhost:9200/_snapshot/snap1/<snapshot_name>/<timestamp>/_restore?pretty" -H 'Content-Type: application/json' -d'{
  "indices": "*",
  "ignore_unavailable": false,
  "include_global_state": true
}'

Conclusion

Implementing Elasticsearch scheduled backups with SLM policies ensures that your organization's critical data is protected against unforeseen circumstances. By leveraging Amazon S3 as a reliable storage solution, Elasticsearch users can confidently manage data integrity and facilitate timely recovery when needed. This structured approach not only enhances operational resilience but also aligns with best practices in data management across distributed systems.

In summary, the combination of Elasticsearch's powerful capabilities with robust backup strategies provides a solid foundation for maintaining data integrity and availability in dynamic enterprise environments. By following the outlined steps and best practices, organizations can safeguard their Elasticsearch data effectively, supporting continuous operations and mitigating risks associated with data loss.

Author

Janani S
Software Engineer - Level 1