Choosing the Right Disaster Recovery Solution for Your Enterprise Applications
By Fadi Kanafani, Middle East Managing Director & General Manager at NetApp
In today’s constantly connected global business environment, protecting your data from disasters (natural disasters, multiple disk failures, and man-made mishaps) and recovering applications instantly without data loss requires a cost-effective and reliable data protection strategy. IT demands are changing rapidly, requiring the rapid repurposing and reconfiguration of data centers. Using multiple new management tools is not a viable option. Enterprises are looking for a single business continuity solution that meets all their requirements—from protecting a small number of volumes and single applications to protecting large clusters and multisite environments.
Digital Transformation efforts coupled with rapid cloud and AI adoption across sectors are boosting global DR trends. A report from marketwatch reveals that the global DR solutions market size is expected to grow with a CAGR of 19.8% between 2020-2025 and expected to reach US$3778.2 million by 2025.
Enterprises need to put in place a robust and fail-proof DR strategy to avoid catastrophic loss of assets and data and to restore business operations quickly in the event of a natural disasters, hardware failures, human error, cybercrimes or unexpected events.
But there is more to DR than the simply retrieving of data. It is also about how fast it is recovered to ensure business continuity and how securely this data is stored and handled.
Native Application Replication vs Storage Replication
Conventionally, most enterprise-grade databases and applications provide technology to handle their own replication, both synchronously and asynchronously. Database replication tools like Oracle Data Guard and others only work on individual databases and thus have the advantage that the standby database is continuously in recovery mode. Therefore, it is very fast and easy to activate the standby database as the new primary in the event of a disaster.
The real challenge occurs when there is a minor bug in the software stack or in the config settings that renders the entire standby database useless. A bug could also interfere with the failover of the remaining application servers, middleware, monitoring systems, and so on.
Despite these facts, many enterprises use different replication mechanisms for different applications. For example, they might use a separate tool for Oracle, SQL Server, MySQL, IBM DB2, and so on, which makes overall IT management highly complex from the CXO perspective.
An alternative to application replication is storage mirroring. If the mirrors are in sync, then you can be certain that you can fail over in case of a disaster because all the data is present in the mirror. You can also speed up the overall recovery and application startup process by configuring automated storage failover rather than performing a manual DR operation. In the case of end-to-end DR, it might take a few minutes to mount the storage devices to the DR servers and bring up the databases to use the remote data set.
Solve Your DR Problem Cost Effectively
Today, various leading storage vendors offers protection at various scales with extra hardware, but most of these options are not cost effective and reliable enough for relational databases. Furthermore, using an expensive secondary facility for value-added purposes such as development and testing and analytics is relatively limited.
We designed the cost-effective and reliable NetApp MetroCluster® and SnapMirror® technologies from scratch as business continuity tools (for example, DR, backup, and restore scenarios). By their architectural design, these products do not depend on any host platform hardware, operating system, driver, database, application, filesystem, volume manager, or any other component on the application host.
NetApp MetroCluster is a continuous-availability solution that protects critical data and provides data availability 24/7, and it is also a set-it-once technology. After MetroCluster instances are correctly configured, you only need to monitor and administer them. As the environment grows with new applications and workloads, data is automatically replicated and protected. You do not incur any change management or administration overhead. MetroCluster can be configured in a number of different ways depending on your environment as shown in the below figure. With the latest releases of NetApp ONTAP® storage, MetroCluster IP ISL can cover a maximum distance of up to 700 km with a maximum round trip latency of 10ms.
Rise of SnapMirror Synchronous
NetApp SnapMirror Synchronous (SM-S), introduced in NetApp ONTAP 9.5, offers the flexibility to synchronously protect a subset of volumes in a cluster rather the entire cluster as in MetroCluster. In addition, replication can be between ONTAP storage systems that are on different platforms.
Conclusion
The most important part of a DR strategy is evaluating the overall RPO and RTO, depending on the nature of the disaster, which could be limited to storage, databases, or the network or potentially could affect the entire site.
Choosing a database replication technology, such as Oracle Data Guard, Oracle GoldenGate, Availability groups, or other options, might be acceptable for few mission critical databases. However, managing a large scale infrastructure in this way with hundreds or thousands of different databases along with their middleware and front-end applications is too complex and expensive.
Simple and efficient storage DR solutions like NetApp SnapMirror and NetApp MetroCluster are significantly more cost effective than application replication tools and can be easily managed by IT generalists without the need for an additional expertise. NetApp recommends evaluating both of these synchronous replication solutions and deciding between them based on your specific requirements. To learn more about NetApp synchronous and asynchronous replication solutions, see the SnapMirror and MetroCluster product pages.