Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore of Elasticsearch indexes in json form is inoperable on standalone nodes #457

Open
terrorobe opened this issue Nov 29, 2018 · 10 comments
Open

Comments

@terrorobe
Copy link
Contributor

@terrorobe terrorobe commented Nov 29, 2018

The following code will not achieve the desired results for two reasons

# Restore exported audit and hookshot logs to 2.12.9 and newer single nodes and
# all releases of cluster
if $CLUSTER || [ "$(version $GHE_REMOTE_VERSION)" -ge "$(version 2.12.9)" ]; then
echo "Restoring Elasticsearch Audit logs ..."
ghe-restore-es-audit-log "$GHE_HOSTNAME" 1>&3
echo "Restoring hookshot logs ..."
ghe-restore-es-hookshot "$GHE_HOSTNAME" 1>&3
fi

Firstly, the version check fails due to the included v in $GHE_REMOTE_VERSION:

++ version v2.15.3
++ echo v2.15.3
++ awk -F. '{ printf("%d%03d%03d%03d\n", $1,$2,$3,$4); }'
++ version 2.12.9
++ echo 2.12.9
++ awk -F. '{ printf("%d%03d%03d%03d\n", $1,$2,$3,$4); }'
+ '[' 0015003000 -ge 2012009000 ']'

Secondly, the imports are run before ghe-config-apply is triggered. This means that all imported data will be written to the ES data directory that is going to be overwritten on the next ghe-config-apply invocation.

On second reading, the import does not happen during the respective ghe-restore scripts but during ghe-storage-prepare, so that should work.

On third reading, the import will not happen because ghe-storage-prepare will only mv /data/user/elasticsearch-restore to /data/user/elasticsearch. Importing of the json dumps happens in elasticsearch-post-start (systemd ExecStartPost for the elasticsearch unit) which only checks /data/user/elasticsearch-restore.

Given that the data in the json dumps should be present in the rsync backup anyhow I'm inclined to remove the json restore for standalone appliances. The additional json dumps are helpful when trying to recover from corrupted indexes but don't seem to provide any additional value during regular restores.

Thoughts?

@lildude
Copy link
Member

@lildude lildude commented Nov 29, 2018

Nice catch with the version comparison... I suspect I may have missed something, somewhere as I'm pretty sure this did work correctly when I implemented it.

That said, as things stand, neither condition would be true so the JSON dumps won't be restored on single nodes so this is working as you would like by pure luck 😄

@lildude
Copy link
Member

@lildude lildude commented Nov 29, 2018

I've just checked and the version string should not have the v in it as it comes from ghe-negotiate-version run on the appliance and it does not return the v.

Here's what I get when when I test this against 2.15:

$ ghe-negotiate-version backup-utils 2.14.0
GitHub Enterprise version 2.15.3
backup-utils 2.14.0 is supported.
$

The first line is what is used for the $GHE_REMOTE_VERSION so that makes your first point moot but does make the second relevant.

@lildude
Copy link
Member

@lildude lildude commented Nov 29, 2018

On third reading, the import will not happen because ghe-storage-prepare will only mv /data/user/elasticsearch-restore to /data/user/elasticsearch. Importing of the json dumps happens in elasticsearch-post-start (systemd ExecStartPost for the elasticsearch unit) which only checks /data/user/elasticsearch-restore.

I've only just seen this update. So are you saying this isn't really a problem now?

@terrorobe
Copy link
Contributor Author

@terrorobe terrorobe commented Nov 29, 2018

We're using the output of ghe-host-check for GHE_REMOTE_VERSION:

ghe_remote_version_required () {
if [ -z "$GHE_REMOTE_VERSION" ]; then
_out=$(ghe-host-check "$@")
echo "$_out"
# override hostname w/ ghe-host-check output because the port could have
# been autodetected to 122.
GHE_HOSTNAME=$(echo "$_out" | sed 's/Connect \(.*:[0-9]*\) OK.*/\1/')
export GHE_HOSTNAME
GHE_REMOTE_VERSION=$(echo "$_out" | sed 's/.*(\(.*\))/\1/')
export GHE_REMOTE_VERSION
ghe_parse_remote_version "$GHE_REMOTE_VERSION"
ghe_remote_version_config "$GHE_REMOTE_VERSION"
fi
true
}

ghe-host-check includes the v in the output:

echo "Connect $hostname:$port OK (v$version)"

Example:

terrorobe@nostalgia ~/github/backup-utils (git)-[tags/v2.15.1] % ./bin/ghe-host-check
Connect gheaws.michaelrenner.at:122 OK (v2.15.3)

This GHE_REMOTE_VERSION is then used as input for version:

if $CLUSTER || [ "$(version $GHE_REMOTE_VERSION)" -ge "$(version 2.13.0)" ]; then

which doesn't deal with v's:

version() {
echo "$@" | awk -F. '{ printf("%d%03d%03d%03d\n", $1,$2,$3,$4); }';
}

@terrorobe
Copy link
Contributor Author

@terrorobe terrorobe commented Nov 29, 2018

On third reading, the import will not happen because ghe-storage-prepare will only mv /data/user/elasticsearch-restore to /data/user/elasticsearch. Importing of the json dumps happens in elasticsearch-post-start (systemd ExecStartPost for the elasticsearch unit) which only checks /data/user/elasticsearch-restore.

I've only just seen this update. So are you saying this isn't really a problem now?

From my cursory reading it looks like even if we were to copy the compressed json files correctly to the appliance, the files would never be loaded into Elasticsearch.

@lildude
Copy link
Member

@lildude lildude commented Nov 29, 2018

We're using the output of ghe-host-check for GHE_REMOTE_VERSION:

Oh yes. Not sure why I thought it was from ghe-negotiate-version. It of course brings us back to my original point that the JSON restore would never happen on single nodes then 😄

@lildude
Copy link
Member

@lildude lildude commented Nov 30, 2018

From my cursory reading it looks like even if we were to copy the compressed json files correctly to the appliance, the files would never be loaded into Elasticsearch.

I've looked into this and once the version issue has been resolved, yes, the JSON files are loaded because they're loaded as part of the restore process itself, for example here for audit logs:

if $CLUSTER || [ -n "$configured" ]; then
for index in $(cat $tmp_list | sed 's/\.gz$//g'); do
ghe_verbose "* Restoring $index"
echo "export PATH=\$PATH:/usr/local/share/enterprise && sudo gzip -dc $GHE_REMOTE_DATA_USER_DIR/elasticsearch-restore/$index | ghe-es-load-json 'http://localhost:9201/$index'" |
ghe-ssh "$GHE_HOSTNAME" -- /bin/bash 1>&3
done
ghe-ssh "$GHE_HOSTNAME" -- "sudo sh -c 'rm $GHE_REMOTE_DATA_USER_DIR/elasticsearch-restore/*.gz'" 1>&3

But you're also right in that this is a bit pointless as they'll then be overwritten by exactly the same indices that have been transferred as part of...

if ! $CLUSTER && [ -d "$GHE_RESTORE_SNAPSHOT_PATH/elasticsearch" ]; then
echo "Restoring Elasticsearch indices ..."
ghe-restore-es-rsync "$GHE_HOSTNAME" 1>&3
fi

... when ghe-config-apply is run, but only if that step actually restored something, which won't be the case if there isn't an elasticsearch/ directory in the backup.

I'm sure I had a reason for this belt-and-braces approach at the time, but can't think of it right now.

@terrorobe
Copy link
Contributor Author

@terrorobe terrorobe commented Nov 30, 2018

I've looked into this and once the version issue has been resolved, yes, the JSON files are loaded because they're loaded as part of the restore process itself, for example here for audit logs:

You are absolutely right, I blanked on the || [ -n "$configured" ] part of the condition 😞

In any case, do you have any objections to removing the version check from

if $CLUSTER || [ "$(version $GHE_REMOTE_VERSION)" -ge "$(version 2.12.9)" ]; then
echo "Restoring Elasticsearch Audit logs ..."
ghe-restore-es-audit-log "$GHE_HOSTNAME" 1>&3
echo "Restoring hookshot logs ..."
ghe-restore-es-hookshot "$GHE_HOSTNAME" 1>&3
fi

so that JSON dumps are only restored on cluster?

If not, I can prepare a PR for that.

@lildude
Copy link
Member

@lildude lildude commented Nov 30, 2018

In any case, do you have any objections to removing the version check

Hold off for the moment. I want to try and remember why I implemented it in the first place.

so that JSON dumps are only restored on cluster?

Other than being a slight waste of time (assuming it worked - fix coming), what problems is this currently causing?

@terrorobe
Copy link
Contributor Author

@terrorobe terrorobe commented Nov 30, 2018

We've got customers with GiB worth of audit logs due to earlier LDAP-sync over-logging, so for them I assume it will be a quite noticeable waste of time. Not sure how relevant it is in the context of complete restore time though. Hookshot indexes can also be hefty for active appliances.

There's also the topic of correctness, we do some validation that the JSON dumps match the index in question, but at the end of the day the rsync copy will always be closer to the actual data on the source server than the data that has been dumped and re-ingested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.