Availability Groups

Seeding an Availability Group Replica from Snapshot

Background

If you’ve been using Availability Groups, you’re familiar with the replica seeding (sometimes called initializing, preparing or data sychronization) process. Seeding is a size of data operation, copying data from a primary replica to one or more secondary replicas. This is required before joining a database to an Availability Group. You can seed a replica with backup and restore or automatic seeding, each with its own challenges. Regardless of which method you use, the seeding operation can take a long amount of time. The time it takes to seed a replica is based on the size of the database, the speed of the network, and storage. If you have multiple replicas seeding all of them is N times the fun!

Why Did Your Availability Group Creation Fail?

Availability Groups are a fantastic way to provide high availability and disaster recovery for your databases, but it isn’t exactly the easiest thing in the world to pull off correctly. To do it right there’s a lot of planning and effort that goes into your Availability Group topology. The funny thing about AGs is as hard as they are to plan…they’re pretty easy to implement…but sometimes things can go wrong. In this post I’m going to show you how to look into things when creating your AGs fails.

Monitoring SLAs with SQL Monitor Reporting

Proactive Reporting for SQL Server

If you’re a return reader of this blog you know I write often about monitoring and performance of Availability Groups. I’m a very big proponent of using monitoring techniques to ensure you’re meeting your service level agreements in terms of recovery time objective and recovery point objective. In my in person training sessions on “Performance Monitoring AlwaysOn Availability Groups”, I emphasize the need for knowing what your system’s baseline for healthy replication and knowing when your system deviates from that baseline. From a monitoring perspective, there are really two key concepts here I want to dig into…reactive monitoring and proactive monitoring.

Understanding Network Latency and Impact on Availability Group Replication

When designing Availability Group systems one of the first pieces of information I ask clients for is how much transaction log their databases generate. *Roughly*, this is going to account for how much data needs to move between their Availability Group Replicas. With that number we can start working towards the infrastructure requirements for their Availability Group system. I do this because I want to ensure the network has a sufficient amount of bandwidth to move the transaction log generated between all the replicas. Basically are the pipes big enough to handle the generated workload. But bandwidth is only part of the story, we also need to ensure latency is low. Why, well we’re going to explore that together in this post!

Using Extended Events to Visualize Availability Group Replication Internals

SQL 2014 Service Pack 2 was recently released by Microsoft and there is a ton of great new features and enhancements in this release.This isn’t just a collection of bug fixes…there’s some serious value in this Service Pack. Check out the full list here. One of the key things added in this Service Pack is an enhancement of the Extended Events for AlwaysOn Availability Group replication.

Why are the new Availability Group Extended Event interesting?

If you’ve used Availability Groups in production systems with high transaction volumes you know that replication latency can impact your availability. If you want to brush up on that check out our blog posts on AG Replication Latency, Monitoring for replication latency, and issues with the DMVs when monitoring. These new extended events add insight at nearly every point inside your Availability Group’s replication. More importantly they also include duration. So using these Extended Events we can pinpoint latency inside our Availability Group replication.

Speaking at IT/Dev Connections!

I’m proud to announce that I will be speaking at IT/Dev Connections on October 11th 2016 in Las Vegas! I can’t begin to tell you how excited I am to be able to speak at this conference! I look forward to seeing you there!

Here’s the information on the talk!

Designing High Availability Database Systems using AlwaysOn Availability Groups

**Track: **Development Platform Tools and Devops

Abstract:

Are you looking for a high availability solution for your business critical application? You’re heard about AlwaysOn Availability Groups and they seem like a good solution, but you don’t know where to start. It all starts with a solid design. In this session we introduce the core concepts needed to design a Availability Group based system. Covering topics such as recovery objectives, replica placement, failover requirements, synchronization models, quorum, backup and recovery and monitoring. This session is modeled after real world client engagements conducted by Centino Systems that have lead to many successful Availability Groups based systems supporting tier 1 business critical applications.

Availability Group DMVs Reporting Incorrect Values

In my opinion one of the key features of SQL Server 2016 is the rebuilt and optimized log redo mechanism for AlwaysOn Availability Groups. Check out the many new AG features here. Check out my posts here and here to learn about how Availability Groups move data.

Early last week I was conducting a load test using SQL Server 2016 and wanted to compare the performance of the log redo thread with that of SQL Server 2014. To establish baseline the performance of 2014, I constructed a load test using a heavy insert workload on the primary. To measure that workload I used the following script to pull database replication performance data from sys.dm_hadr_database_replica_states

Monitoring Availability Groups with Redgate SQL Monitor

In previous posts here and here we discussed AlwaysOn Availability Group replication latency and monitoring concepts, specifically the importance of monitoring the send_queue and redo_queue. In this post I’m going to show you a technique for monitoring Availability Group replication latency with Redgate SQL Monitor and its Custom Metric functionality.

Here’s the rub, monitoring AGs is a little interesting for the following reasons

  1. We’re interested in trending and monitoring and that isn’t built into SQL Server or SSMS’s AlwaysOn Dashboard.  Both provide only point in time values.
  2. We’ll need to monitor the health of the Availability Group as a whole. So we want to track performance data on all replicas in the AG. But interestingly the redo queue and send queue values in the DMVs on the primary are always NULL. So we need to get those values from the secondary replicas.
  3. Further, to work this into SQL Monitor’s Custom Metric framework we’ll need to limit our query’s result set to a single row and value.

Redo Queue

The redo queue is the amount of log records that haven’t been sent to a secondary replica in an AG. We want to track this as it is a measure of the amount of data on a secondary that is not yet redone into the database and can impact operations offloaded to secondaries

Designing for offloaded log backups in AlwaysOn Availability Groups – Part 2 – Monitoring

AlwaysOn Availability Groups have made a big splash in the SQL world and are quickly becoming the HA and DR technology of choice for many SQL Server environments. Crucial to their success is the ability to move data between the replicas in the Availability Group quickly and efficiently. In the previous [post][1] we discussed design concepts for offloaded backups in AlwaysOn Availability Groups, specifically we focused on how data is moved between AGs and the potential impact on backups and recovery. It is important to measure and trend replication health and this article introduces techniques and queries that you can use in your environment to measure and trend replication health and some of the nuances of the data reported in DMVs. ### Measuring Availability Group Replication Latency Central to measuring replication health is the [sys.dm_hadr_database_replica_states][2] DMV. On the primary replica this DMV returns rows representing the current state for each database and it’s replicas participating in AvailabilityGroups.  The key fields we’re going to focus on for our monitoring are: * log_send_queue_size – the amount of log records not sent to a secondary * redo_queue_size – the amount of log records not yet redone on the secondary * last_commit_time – the time of the last committed log record on a replica * last_redo_time – the time of the last log record was redone on a replica