I recently helped migrate some internal services, including DNS, from conventional Linux boxes to Infoblox devices. Nifty bits of kit, but one issue occured during the migration: a RAC node eviction! This is an interesting scenario as it illustrates something that ideally should not happen: a conflict between the goals of reliability and manageability.
Wearing my DBA hat, I care most about uptime, uptime, uptime†. Part of planning for that is minimizing dependencies, especially on external systems that may not be designed for equivalent level of HA to the RAC system. A chain is only as strong as its weakest link, after all. But wearing my general Unix engineering hat, I recognize that /etc/hosts
files on every node are time-consuming and indeed error-prone to edit by hand, however this does eliminate the external dependency. Of course there’s no need to use names at all, the machines are perfectly happy with just IP addresses. The names are for the benefit of the human operators.
Hopefully now we have arrived at a situation that reconciles this conflict: centrally managed, highly-available name resolution that we can all trust as a single source of truth. The cost is, well, a cost (money), and also, the introduction of a new type of device to maintain. Engineering is always a matter of compromise.
† the other nodes kept running so there was no actual interruption in service, but still…