I know a few users have expressed interest in reading this stuff, but I think before the end of Part I, that is probably something they will regret as stuff gets insanely technical very quickly.
Part I. Fundamentals
VSYNC: Limiting for All the Wrong Reasons
VSYNC synchronizes updates to the memory that constantly feeds attached display devices, it allows modification to this memory only during a brief period of time each refresh cycle where the electron gun you never knew your display had returns from scanning the bottom of the screen to the top and stops shooting electrons at you.
Obviously something is amiss here: what electron gun?
The gun is gone, but the digital signal standards still behave according to the same principles as their analog predecessors. VGA scanned raster lines across your screen one-by-one and then returned the gun to the top-left corner (from the bottom-right) to begin scanning again. During this period of time, the gun turns off to prevent scanning a diagonal line.
We call the period in the signal where no image is produced the Vertical Blanking period. You can do anything you want with the RAMDAC-attached memory during VBLANK because no display device is going to do anything with the signal it receives. Thus, it is convenient to shove a new finished frame into RAMDAC memory during VBLANK; tearing is impossible.
VSYNC is something that has to be implemented by the driver, the RAMDAC <–> VRAM connection is what is being synchronized and none of that is visible to an application programmer (though VBLANK is something that graphics APIs can be asked to signal events, this is more for convenience and you cannot possibly use this event for the purpose VBLANK was intended).
The only reason that VSYNC causes a game to obey a certain speed limit is because it runs out of places to store new images until a finished frame can be moved to the front-buffer for scan-out, and with VSYNC on and only two buffers the only opportunity to perform this move operation comes once per-refresh during the Vertical Blanking period.
That is with two buffers, of course.
If we introduce a third buffer, penalties for missing VBLANK get pushed elsewhere (increased latency vs. invisible frames).
There are multiple ways queuing can be implemented when you have a front-buffer and two or more back-buffers, older graphics APIs always used a FIFO queue (First In First Out) and required that any image rendered be displayed on screen for at least one refresh.
FIFO-based triple-buffering adds constant latency as the buffer swaps always occur in sequence, no matter how old an image in the queue is.
Newer methods for triple-buffering (i.e. Unsequenced Flip Model, which NVIDIA refers to as Fast Sync and AMD refers to as Enhanced Sync) remove the sequencing from buffer swaps and for each screen refresh, the front-buffer becomes the most recent finished frame. Old frames are dropped, in simpler terms.
Triple-Buffering: Queueing Suddenly Has Options!
- Sequencing (Drop vs. Hold)
- Placement (Copy vs. Move/Flip)
- Memory Reserve (Backbuffers Eat Memory!)
Multi-buffer D3D11 and D3D12 Swapchains are
Driver implicitly handles resource allocation for each backbuffer in D3D11; the engine is required to micromanage this stuff in D3D12. * D3D12 buffer queuing is set in stone by whatever setup the engine was designed for and SK cannot add extra backbuffers.
Part II. Implementation
Timing Frames the Stateless Way
- Time is Relative, but Previous and Next are not
- Latency Problems: Yield to More Accurate Clocks
Reference Frame Time = t0 + dT * N = …
Start Frame (t0): QueryPerformanceCounter (…)
Time Slice (dT): (1000.0 / TargetFPS)
Frame Index (Ni)
This Frame = t0 + dT * N
Next Frame = t0 + dT * (N + 1)
Prev Frame = t0 + dT * (N - 1)
Wait? -- If ( CurrentTime < t0 + dT * (N + 1) ) [Delay] Else [Continue]
Part III. Thread Scheduling Black Magic
Delaying Execution Like a Boss
- Sharing the CPU when Time is Limited
- Priority Thread Scheduling Misconceptions
- Correctly Waiting on Win32 Message Threads
The Finer Details of Rescheduling
- Schrodinger’s Clock: CPU Timestamps
- Windows Includes a Task-based Scheduler
… and you should be using it!
Part IV. Best Practices
Your Reference Clock (t0) is VBLANK
Submit Work Early using a Waitable Swapchain
Unoptimized Third-party Overlays Perform Better
( If you wait after Present rather than before )
Restart and Re-Sync Your Clock Frequently
- Clock is aligned to VBLANK; when in doubt
re-align your clock and restart frame count.
+ When changing framerate limit (duh!) + When changing display modes + When switching applications + When running multiple frames behind target ... many more opportunities