When Billions of Rows Bring SQL Server to Its Knees

The Setup: A High-Stakes Data Transfer

In the world of database administration, moving data is a daily task. But when the table you need to archive contains over 8.6 billion rows, a routine task becomes a high-stakes, multi-day operation. This was the exact scenario I faced recently. The goal was simple: copy the contents of a massive table from a production database to an archive database.

The initial transfer began with an export wizard and ran for nearly a week before an unexpected server restart interrupted the process mid-flight. With over 2.6 billion rows already copied, the last thing I wanted to do was truncate and start over. I needed a robust, restartable script to pick up where the wizard left off.

My first step was to create a standard batching script. The logic is simple and follows industry best practices:

Find the highest primary key (ComponentAttributeID) that was successfully copied to the destination.
In a loop, insert the next batch of rows (e.g., 500,000) from the source where the primary key is greater than the last one copied.
Use ORDER BY on the primary key to ensure we process records sequentially.

Theoretically, with a clustered index on the primary key, this operation should be efficient. SQL Server should just seek to the starting point in the index and read the next 500,000 rows in order. Simple, right?

-- The "Best Practice" Batch Script
DECLARE @BatchSize INT = 500000;
DECLARE @LastMaxID BIGINT;
DECLARE @RowsAffected INT = 1;

-- Get our starting point
SET @LastMaxID = (SELECT MAX(ComponentAttributeID) FROM ArchiveDB.dbo.TheBigTable);

WHILE @RowsAffected > 0
BEGIN
    INSERT INTO ArchiveDB.dbo.TheBigTable (...)
    SELECT TOP (@BatchSize) ...
    FROM ProductionDB.dbo.TheBigTable
    WHERE ComponentAttributeID > @LastMaxID
    ORDER BY ComponentAttributeID; -- This should be efficient!

    SET @RowsAffected = @@ROWCOUNT;
    SELECT @LastMaxID = MAX(ComponentAttributeID) FROM ArchiveDB.dbo.TheBigTable;
END

I launched the script, expecting to see steady progress. Instead, I saw a server brought to its knees.

The First Clue: A tempdb Under Siege

The script was incredibly slow, and server monitoring lit up with alarms: tempdb was under extreme pressure, with massive log file growth and disk I/O. This is a classic symptom of a huge SORT operation spilling from memory to disk.

But why would it be sorting? The ORDER BY clause was on the clustered primary key. There should be no sorting involved! My investigation began.

The Usual Suspects: A Process of Elimination

When facing a performance mystery, you start with a checklist of the most common culprits.

Missing or Incorrect Index? I double-checked. The source table had a clustered index on ComponentAttributeID. This wasn’t the problem.
Triggers on the Destination? An INSERT trigger on the destination table could be firing and performing its own complex logic. I checked the system views. No triggers.
A Sneaky Computed Column? There was a computed column on the destination table. If it were non-persisted and called a user-defined function (UDF), it could execute that function for every single one of the 500,000 rows, causing a “death by a thousand cuts.” While a good suspect, a quick check of the definition showed it wasn’t calling a UDF. It was a red herring.

With the easy answers ruled out, it was time to get the ground truth: the Actual Execution Plan.

The Smoking Gun: When Parallelism Turns Against You

The execution plan revealed the shocking truth. Even with the clustered index available, the optimizer was choosing a Parallel Plan. And here’s the kicker: to execute in parallel, it had to split the work across multiple threads. However, to satisfy the ORDER BY ... TOP clause, it had to gather all the streams back together and re-sort the entire 8.6 billion rows before it could select the top 500,000.

The parallelism was defeating the purpose of the index.

The first fix seemed obvious: force a serial plan. I killed the query and added a hint to the end of the SELECT statement:

...
ORDER BY ComponentAttributeID
OPTION (MAXDOP 1);

This tells SQL Server to use only a single thread, which should force it to use the clustered index in its natural, efficient order. I reran the script, confident I had solved it.

The execution plan confirmed the query was now running in serial. But incredibly, the giant Sort operator was still there! The optimizer was so stubbornly convinced that it needed to read and sort all remaining billions of rows that it ignored the perfectly good index path.

This is a rare but frustrating behavior of the optimizer. When a direct hint fails, you escalate. I tried the FAST hint, which tells the optimizer to prioritize getting the first N rows as quickly as possible.

...
ORDER BY ComponentAttributeID
OPTION (FAST 500000);

Again, I checked the plan. And again, the optimizer ignored the hint and insisted on its inefficient Sort.

The Final Showdown: Trusting Your Gut

After trying every conventional hint in the book, I was stumped. The optimizer was unshakeable. It was at this point that a nagging thought I’d had from the very beginning came back to me: “I wonder if that ORDER BY is the problem.”

Theoretically, it shouldn’t be. On a clustered key, it should be the most efficient path. But after seeing the optimizer fail repeatedly, I realized I had to abandon theory and focus on the one operator that was causing the pain: the Sort. The only way to guarantee the Sort disappears is to remove the clause that requires it: ORDER BY.

This feels wrong. How do you process a batch in sequence without ordering the source? By letting go of sequential processing. The new goal was simply to move any 500,000 rows that hadn’t been copied yet. This changes the reads from a clean, sequential scan to a more scattered pattern, but it completely eliminates the tempdb-killing sort.

To achieve this, I used an older but still effective T-SQL command, SET ROWCOUNT, which simply stops processing after a certain number of rows are affected.

Here is the final, victorious script:

-- The Final, Unconventional Solution
DECLARE @BatchSize INT = 500000;
DECLARE @LastMaxID BIGINT = [Your Starting ID];
DECLARE @RowsAffected INT = 1;

WHILE @RowsAffected > 0
BEGIN
    -- Limit the number of rows for the next statement
    SET ROWCOUNT @BatchSize;

    INSERT INTO ArchiveDB.dbo.TheBigTable (...)
    SELECT ...
    FROM ProductionDB.dbo.TheBigTable
    WHERE ComponentAttributeID > @LastMaxID;
    -- NOTE: The ORDER BY clause is gone!

    -- Capture the rows affected and immediately turn off the limit
    SET @RowsAffected = @@ROWCOUNT;
    SET ROWCOUNT 0;

    IF @RowsAffected = 0 BREAK;

    -- We still need to know where we are, so we get the new max ID from the destination
    SELECT @LastMaxID = MAX(ComponentAttributeID) FROM ArchiveDB.dbo.TheBigTable;
END

I captured the execution plan one last time. It was perfect. The Sort was gone. The row estimates were correct. The estimated cost was a tiny fraction of the previous plans. The tempdb usage flatlined. The transfer was finally running at full speed.

Lessons from the Trenches

This deep dive into a single query was a powerful reminder of some core database truths:

Trust, But Verify: “Best Practices” are your starting point, not an infallible law. My initial script was theoretically perfect, but the optimizer disagreed with it. Always trust the execution plan over the theory.
The Optimizer Has Quirks: The SQL Server Query Optimizer is a phenomenal piece of engineering, but it’s not perfect. In this rare edge case, a combination of factors led it down a deeply inefficient path.
Simplify to Isolate: When a complex query misbehaves and refuses to be hinted, break it down. We ultimately won by simplifying the query to the point where the optimizer had no choice but to generate an efficient plan.
Listen to Your Instincts: My initial suspicion about the ORDER BY clause was correct. While it’s essential to follow a methodical process, don’t dismiss that nagging feeling that something is fundamentally wrong. Dig until you prove it right or wrong.

In the end, we managed to wrangle the billion-row table and its stubborn query plan into submission. The data is now safely archived, and I have a new favorite war story to tell.