RFC: Framerate Limiting Blog Series

Kaldaien · September 1, 2020, 1:22am

I know a few users have expressed interest in reading this stuff, but I think before the end of Part I, that is probably something they will regret as stuff gets insanely technical very quickly.

Part I. Fundamentals

VSYNC: Limiting for All the Wrong Reasons

VSYNC synchronizes updates to the memory that constantly feeds attached display devices, it allows modification to this memory only during a brief period of time each refresh cycle where the electron gun you never knew your display had returns from scanning the bottom of the screen to the top and stops shooting electrons at you.

Obviously something is amiss here: what electron gun?

The gun is gone, but the digital signal standards still behave according to the same principles as their analog predecessors. VGA scanned raster lines across your screen one-by-one and then returned the gun to the top-left corner (from the bottom-right) to begin scanning again. During this period of time, the gun turns off to prevent scanning a diagonal line.

We call the period in the signal where no image is produced the Vertical Blanking period. You can do anything you want with the RAMDAC-attached memory during VBLANK because no display device is going to do anything with the signal it receives. Thus, it is convenient to shove a new finished frame into RAMDAC memory during VBLANK; tearing is impossible.

VSYNC is something that has to be implemented by the driver, the RAMDAC ↔ VRAM connection is what is being synchronized and none of that is visible to an application programmer (though VBLANK is something that graphics APIs can be asked to signal events, this is more for convenience and you cannot possibly use this event for the purpose VBLANK was intended).

The only reason that VSYNC causes a game to obey a certain speed limit is because it runs out of places to store new images until a finished frame can be moved to the front-buffer for scan-out, and with VSYNC on and only two buffers the only opportunity to perform this move operation comes once per-refresh during the Vertical Blanking period.

That is with two buffers, of course.

If we introduce a third buffer, penalties for missing VBLANK get pushed elsewhere (increased latency vs. invisible frames).

There are multiple ways queuing can be implemented when you have a front-buffer and two or more back-buffers, older graphics APIs always used a FIFO queue (First In First Out) and required that any image rendered be displayed on screen for at least one refresh.

FIFO-based triple-buffering adds constant latency as the buffer swaps always occur in sequence, no matter how old an image in the queue is.

Newer methods for triple-buffering (i.e. Unsequenced Flip Model, which NVIDIA refers to as Fast Sync and AMD refers to as Enhanced Sync) remove the sequencing from buffer swaps and for each screen refresh, the front-buffer becomes the most recent finished frame. Old frames are dropped, in simpler terms.

Triple-Buffering: Queueing Suddenly Has Options!

Sequencing (Drop vs. Hold)

Placement (Copy vs. Move/Flip)

Memory Reserve (Backbuffers Eat Memory!)

Multi-buffer D3D11 and D3D12 Swapchains are
remarkably different.
Driver implicitly handles resource allocation
for each backbuffer in D3D11; the engine is
required to micromanage this stuff in D3D12.

 * D3D12 buffer queuing is set in stone by
   whatever setup the engine was designed for
   and SK cannot add extra backbuffers.
Part II. Implementation

Timing Frames the Stateless Way

Time is Relative, but Previous and Next are not

Latency Problems: Yield to More Accurate Clocks

Reference Frame Time = t0 + dT * N = …

Start Frame (t0): QueryPerformanceCounter (…)
Time Slice (dT): (1000.0 / TargetFPS)
Frame Index (Ni)

This Frame = t0 + dT * N
Next Frame = t0 + dT * (N + 1)
Prev Frame = t0 + dT * (N - 1)
Wait?
--
  If ( CurrentTime < t0 + dT * (N + 1) ) [Delay]
  Else                                   [Continue]
Part III. Thread Scheduling Black Magic

Delaying Execution Like a Boss

Sharing the CPU when Time is Limited

Priority Thread Scheduling Misconceptions

Correctly Waiting on Win32 Message Threads

The Finer Details of Rescheduling

Schrodinger’s Clock: CPU Timestamps

Windows Includes a Task-based Scheduler
… and you should be using it!

Part IV. Best Practices

Your Reference Clock (t0) is VBLANK
period …

Non-Blocking Presentation

Submit Work Early using a Waitable Swapchain
Unoptimized Third-party Overlays Perform Better
( If you wait after Present rather than before )

Restart and Re-Sync Your Clock Frequently

Clock is aligned to VBLANK; when in doubt
re-align your clock and restart frame count.
+ When changing framerate limit (duh!)
+ When changing display modes
+ When switching applications
+ When running multiple frames behind target
  ... many more opportunities

Aemony · September 1, 2020, 5:01am

Oooh, looks good so far.

Also, you can prevent the code syntax “highlighting” by specifying the “text” markup/language on the first line after the three `

So like ```text and then on the next line start the actual code section.

With:

Driver implicitly handles resource allocation
for each backbuffer in D3D11; the engine is
required to micromanage this stuff in D3D12.

 * D3D12 buffer queuing is set in stone by
   whatever setup the engine was designed for
   and SK cannot add extra backbuffers.

Without:

Driver implicitly handles resource allocation
for each backbuffer in D3D11; the engine is
required to micromanage this stuff in D3D12.

 * D3D12 buffer queuing is set in stone by
   whatever setup the engine was designed for
   and SK cannot add extra backbuffers.

Kaldaien · September 1, 2020, 5:56am

Ah hah! That’s why it keeps trying to highlight keywords

Anyway, I need to make the intro a lot more concise. That was my attempt to engage the people I am about to massively confuse with the remainder of the discussion, lol. Give em’ at least something they can briefly understand before it becomes numbers and math and Computer Science.

It looks like Mirosoft recently updated their HDR primer.

That’ll be my next series of articles