Atlas metadata is stored in the form of HBase tables (atlas_janus or atlas_titan[1] and ATLAS_ENTITY_AUDIT_EVENTS) and Solr collections (vertex_index, fulltext_index, and edge_index).
a. Create HBase table snapshot:
hbase shell
hbase> snapshot 'atlas_janus', 'atlas_janus_snapshot_<insert-date-here>'
hbase> snapshot 'ATLAS_ENTITY_AUDIT_EVENTS', 'atlas_entity_audit_events_snap_<insert-date-here>'
b. Export Snapshot from server terminal:
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'atlas_janus_snapshot_<insert-date-here>' -copy-to /tmp/hbasebackup/
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'atlas_entity_audit_events_snap_<insert-date-here>' -copy-to /tmp/hbasebackup/
c. The contents of '/tmp/hbasebackup/' contain the table backup.
Note: the '.hbase-snapshot' directory is also needed to restore the HBase snapshot.
- In addition to HBase tables, Atlas data is stored in 3 Solr collections: vertex_index, fulltext_index, and edge_index. These need to be backed up from Ambari Infra Solr.
- The backup command will backup Solr indexes and configurations for a specified collection. The backup command takes one copy from each shard for the indexes. For configurations, it backs up the configSet that was associated with the collection and metadata.
Use following syntax to run Solr backup API using curl command:
http://<infra-solr-host:port>/solr/admin/collections?action=BACKUP&name=myBackupName&collection=<myCollectionName>&location=/path/to/my/shared/drive
For Example:
curl -ivk "https://host1.example.com:8886/solr/admin/collections?action=BACKUP&name=vertex_index_bkp&collection=vertex_index&location=/opt/vertex_index_backup"
curl -ivk "https://host1.example.com:8886/solr/admin/collections?action=BACKUP&name=edge_index_bkp&collection=edge_index&location=/opt/edge_index_backup"
curl -ivk "https://host1.example.com:8886/solr/admin/collections?action=BACKUP&name=fulltext_index_bkp&collection=fulltext_index&location=/opt/fulltext_index_backup"
curl -ivk "http://host1.example.com:8886/solr/vertex_index/replication?command=backup&name=vertex_index_backup&location=/opt/vertex_index"
curl -ivk "http://host1.example.com:8886/solr/edge_index/replication?command=backup&name=edge_index_backup&location=/opt/edge_index"
curl -ivk "http://host1.example.com:8886/solr/fulltext_index/replication?command=backup&name=fulltext_index_backup&location=/opt/fulltext_index"
Note:
- If the cluster is kerberized, then run kinit against Solr keytab first and add "--negotiate -u :" after -ivk flag in the above curl commands.
Example:
curl -ivk --negotiate -u : https://host1.example.com:8886
- If you have multiple Solr instances, ensure you crete the Solr backup directories on all the instances. Else the backup will fail complaining that there should be a shared storage used for backup.
- In some cases if Shards are present on different nodes, backup might fail with following message:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://host1.example.com:8886/solr: Failed to backup core=vertex_index_shard because org.apache.solr.common.SolrException: Directory to contain snapshots doesn't exist: file:///opt/vertex_index. Note that Backup/Restore of a SolrCloud collection requires a shared file system mounted at the same path on all nodes!"
Solr recommends having backup using HDFS repository in such scenario, check this.
Quick workaround is to create directory/opt/vertex_index
manually beforehand in every solr node other than node where backup API would be called.
Backup all the files under /etc/atlas/conf folder in your Atlas Metadata server. recently compiled the steps to backup and restore HBase and Solr data.
[1] In Atlas versions earlier than 1.0, the HBase table is atlas_titan. In Atlas versions including and later than 1.0, the HBase table is atlas_janus. In HDP 2.x, the table would be atlas_titan, whereas in HDP 3.x, it would be atlas_janus.