Topic-Free Mega Thread - v 1.11.2020

Kaldaien · September 23, 2020, 5:21am

Gotcha… we’ll see where things go

Until about a month or two ago, the notion of using framerate limiting to control latency never occurred to me. In fact, from my perspective, the best result you could hope for when limiting framerate would be a net-zero increase. The number of places to accidentally add latency via framerate limiting far outnumbers the one or two places where you can reduce it, and consistent frametimes and low latency are opposing goals most of the time.

I saw a video produced by Battle(non)sense that confirmed my suspicions (RTSS never decreases latency, only increases it). Going by software measurement tools, Special K’s limiter actually does decrease input latency and hammering out precisely why would be great

I need to have the results validated with a physical test of some sort before publishing suggested design guidelines. My thinking right now is between getting ahold of an LDAT, buying a 300 Hz G-Sync monitor I will never use for anything other than its latency OSD or finding a magician with a high-speed camera

Darktalon · September 23, 2020, 5:25am

If you haven’t seen it yet, this video is very fascinating.

Kaldaien · September 23, 2020, 5:27am

Somehow I missed that video. Awesome, GamersNexus never ceases to amaze me

NVIDIA could really have picked a better name for that setting.

“Max Framerate” … . Your framerate will be at its maximum when that is turned off.

There is no way NVIDIA did not realize the connotation and this has got to be a joke. Any rational person would have chosen a name such as “Framerate Limit” or “Framerate Target,” or at the very least not given the setting a possible state of “Off.”

JonasBeckman · September 23, 2020, 6:08am

Better than Low, Medium, High I guess.
But yeah a custom string of Unlimited instead of 0 / Off would probably be a bit more fitting.

JonasBeckman · September 23, 2020, 6:14am

What was that Crysis 2 used again?

Gamer → Advanced → Hardcore I believe.

Think it got patched though once D3D11 support landed and the addition of a Ultra preset and the renaming of the others to Low, Medium and High instead.
(EDIT: Extreme, not Ultra so that one got a name change too.)

EDIT: It kinda works for DOOM or Wolfenstein because you get a little funny picture depicting the difficult level choice despite some weird naming plus it’s a standard tiered selection from lowest to highest.

Although UI design and naming conveniences are a whole topic in it’s own ha ha.

Kaldaien · September 23, 2020, 6:17am

The setting directly above it is cause for alarm too, because it’s never explained that turning “Low Latency Mode” ‘On’ or setting it to ‘Ultra’ will create stutter.

Instead, they discuss “GPU queuing.” That is technically what that setting adjusts, but nobody who did not already know that understands how queuing frames works. Thus, the person the tooltip is supposed to be assisting needs another tooltip that explains the tooltip.

I know it seems silly, but this setting has the potential to wreck frame pacing more than any other setting in the driver and they assume the end-user can work out in their head why the GPU is supposed to queue multiple frames and what happens when you prevent it from doing so.

Kaldaien · September 23, 2020, 6:22am

lol

It works the other way too… there’s nothing funnier than the setting that fills your entire screen with performance information being called “Ultra Nightmare.”

JonasBeckman · September 23, 2020, 6:22am

Yeah the FPS Limit (V3?) and related latency settings could do with a full tooltip explanation, not sure if NVIDIA added a good one in their driver settings or if that’s still from forum posts and articles like the latency one recently for Ampere on what these actually do and any potential trade-offs from enabling some of these.

Low latency to Ultra seems like a given the way it’s presented without the info needed here otherwise.

EDIT: One of the driver engineers or a community representative did a breakdown and explanation of these options when they got added and I believe the settings themselves have been tweaked a bit too since that first inclusion but it’s still a bit vague unless the user looks up what differs between low to ultra for the latency option.

Or what you would want for a optimal condition for avoiding micro stutters or framerate spikes and hitches by altering the queue count here.

EDIT: Or that ultra at least if it works the same as it used to also disables multi threading optimizations in addition to setting the pre-rendered frame queue to zero.
(I’d have to look it up I think it was zero for high and zero plus multi threading being disabled for ultra.)

EDIT: Might have been revamped, on/off and ultra now as the only settings here?

Wonder if that means off is a queue value of 3 as per default then and setting it to on is the former default of 1 and then ultra is the only one that reduces it to 0 (immediate) but I can’t find if it still affects other options from when it was first explanation in more detailed what these did.
(Think it was Off, Low, Medium and High but I could be misremembering it since it bundles in I think it was at least two prior settings now combined into this single option.)

Kaldaien · September 23, 2020, 7:47am

For the life of me, I don’t even know what “multi-threading optimization” is in the graphics driver. I know of a few optimization techniques involving the driver offloading stuff to separate threads, but it’s all completely unrelated and API-specifc…

D3D11 has one such optimization where the driver creates a background thread to compile and cache optimized versions of shaders after a game has loaded the base unoptimized version. It’s basically doing what D3D12 games spend an eternity doing the first time you launch them, only … the game is playable while the driver does this in D3D11

OpenGL has some optimizations where textures are converted to a more efficient color format (e.g. RGBA → BGRA) on a background thread.

I guess this is why D3D12 / Vulkan came along ultimately, engine devs were tired of guessing all the wacky things the drivers were doing to try and optimize stuff in the background

Those optimizations sound nice on paper, but if they can be disabled in driver settings then this means your game’s going to perform differently w/ and w/o these optimizations that aren’t documented anywhere in the first place. Way too many driver hacks holding the older APIs together.

Incidentally, “Low Latency Mode” itself is a driver hack that only works with the older APIs. Vk and D3D12 give explicit control over command queuing to the engine, which is why I couldn’t magically make Horizon: Zero Dawn use triple-buffering after it shipped the way it did

JonasBeckman · September 23, 2020, 8:00am

I assumed it had to do with NVIDIA optimizations through NVAPI and there’s a document on it somewhere but lower level driver optimizations and extending D3D functionality akin to OpenGL and now Vulkan extensions.

AMD does the same through ADL or a AMD_AGS.dll file whereas NVIDIA often uses something like NVAPI.dll and a few files depending on which SDK’s are used here.
(Horizon Zero Dawn and HDR I think is through this as one example case for AMD.)

Command contexts and multi threaded rendering and deferred render yeah I need to find that document on this it gets a bit muddy with how it’s utilized and then AMD’s hardware inefficiency and lack of support for D3D11.1 plus NVIDIA doing a major overhaul with the 338 drivers I think and not just games supporting it natively which back then was Civilization 5 and that’s about it.
(AMD had support for that one game initially through some other hacked together solution but it got removed far as I know.)

EDIT: It’s been some years though since NVIDIA did that so it’s going to take a while to find it all.

2012 somewhere I believe, near a 60% performance uplift.

EDIT: That took a while, 337.50 for that driver at least.

Hmm two years off there with the date.

Kaldaien · September 23, 2020, 8:13am

Yeah, the NVIDIA driver can evaluate deferred command lists in parallel on each thread that is creating them. AMD has to wait until the command lists are spliced together and played back on an immediate context.

AMD’s drivers don’t really do multi-threading in D3D11 :-\ They’ll accept command list generation on independent threads, but all the overhead of validation happens during command list merging, rather than at the time of creation the way NV’s drivers do it.

NVIDIA’s drivers have much less work to do when replaying combined command lists because the thread that created them already looked at them and determined they’re valid and ready for merging.

JonasBeckman · September 23, 2020, 9:37am

So a collection then same as some of the earlier ones.

From:

EDIT: Yeah it’s basically like running it through a emulator.

JonasBeckman · September 23, 2020, 9:47am

Yeah I’ve been trying to read up on it although I’m still not too sure on the NVIDIA driver option for multi threaded optimizations for what exactly you’d lose it has to be some additional driver optimizations so nothing major performance wise.

And then AMD’s plans for D3D11 but it’s a extensive code overhaul even if their hardware now has the capabilities (Some info pointed at a lack of cache memory for a issue in supporting this earlier.) plus AMD’s focus on D3D12 and Vulkan though the usage of compute shaders have also resulted in some gains as that seems to avoid a few hardware bottlenecks with their cards so it’s not as limited by ROP’s and rasterization performance by skipping some of the limits here when handled via compute instead.

Think I read somewhere too that D3D11 is still somewhat limited in it’s multi threaded design and implementation whereas D3D12 has a better way of handling this plus the optional and mandatory states of some of these functions also.

Just needs developers to actually utilize them, looks like NVIDIA is putting an end to their SLI support which does have API support for D3D12 and Vulkan as of 1.1 I think it was but that’s barely used at all.

Maybe also something that we’ll see both with the new console hardware and as PC CPU’s are now going up to hexa core instead of quad core for mainstream components and mid-range not just the high end or enthusiast segments and the price for these processors.

Possibly also AMD’s improvements in overall performance including gaming and single threaded even if it’s not quite on par though Intel still has a bit of a lead on the market and what’s popular.
(Zen+ and Zen2 after that certainly made a bit of a impact though, Zen3 will hopefully continue with that.)

GPU wise no idea it’s going to be interesting to see what AMD has for October with Zen3 and RDNA2 plus NVIDIA and tomorrow I think for the full 3090 GPU review date with a estimated “up to” 20% performance over the 3080’s and more VRAM but also a significant price difference.
(Seems to be around ~10% in some of the earlier benchmarks but that can change too once drivers mature a bit more depending on how these additional cores scale and under which workloads the hardware will truly excel at.)

JonasBeckman · September 23, 2020, 9:54am

The smaller performance difference between the 3080 and 3090 is also interesting for the Super or Ti type refreshes for the 3080’s and 3090’s but it’s possible the 3080 Ti could end up faster in more gaming oriented workloads at 20 GB too for VRAM but the 3090 could be a bit faster under workstation or ray tracing and similar conditions or application work without the need for the full Quadro series of cards.
(Home usage and smaller servers or systems.)

3090 availability with this bot script stuff going on might be a problem too but maybe only for the first batch or so before NVIDIA has started getting manufacturing and shipping going in full and restocking more quickly after these.

Still though was expecting a 20 - 25% gain for the 3090 letting the 3080S hit 10 - 15% with extra VRAM as a additional incentive slotting in between the two in terms of pricing.

3070S as something similar for a close to but less costly GPU over the 3080 especially with the price variance both reference and third party designs.

Can still be used like that but if the 3080S displaces the 3090 performance wise that’s going to be a bit of a odd position for the 3090 price/performance wise and if there’s only a minor performance gain for the 3080S it’s effectively paying for a higher VRAM variant.
(Though going by the Gigabyte SKU lineup a 3080 regular 20GB might also happen.)

EDIT: Should see more info on how the 3090 fares tomorrow I guess, for now it’s I think it was Control 8k or 7680x4320 but it’s DLSS 2.0 Ultra Performance however it was named so actually 2560x1440 scaled up.

And then Forza 4 at native which already runs really well, some early Youtube(?) profiles getting the cards from NVIDIA for I assume marketing and hype purposes but it also shows some initial performance figures in addition to a few other leaked sources for how the GPU performs.

Darktalon · September 23, 2020, 10:10am

3080 Ti needs to have either 12GB or 24GB, or 11/22 if they are stingy. The reason is bandwidth. These cards need all the memory bandwidth they can get. Not too much use having 20% more SM’s if bandwidth stays the same.

JonasBeckman · September 23, 2020, 11:22am

Yeah the initial listing was for 20 but it could change depending on the memory controller and design plus the bus width and overall bandwidth and overall performance.

GDDR6X at 1 GB per module also requires a bit more space than GDDR6 with 2 GB modules although the extra performance is probably a nice boost for memory intensive workloads.

Curious how AMD’s planning RDNA2 here with 192 and 256 bit though final specs could differ from earlier leaks but this is supposedly offset by a ridiculously large cache from 2, 4, 6 MB and now 128 MB and how that’s going to turn out especially compared to using a 300-bit bus size or larger.

Cost and complications as well, 290 with it’s ring-bus 512-bit which AMD has never attempted again other than how the HBM stacking affects bandwidth total I suppose.

Err let’s see the way that worked was 256 bit but it’s four so 1024 and then four stacks total thus Fury = 4096-bit I think or Vega with two stacks at 2048-bit though effective bandwidth and performance is more than just the bit size.
Fury capped by the GPU’s clock speeds so it had problems exceed past 340 - 360 GB/s if I remember correctly, Vega with lower clock speeds than initially planned and less stacks outside of the re-purposed Mi50 GPU that became the Radeon VII but it had other changes too.
(Marketing and such I suppose is where these bigger numbers are thrown around more frequently.)

Could also open up for a Navi20 HBM2 GPU but I’m not expecting it outside of the professional lineup FirePro or what AMD calls their cards here again.
(Maybe through Apple and their Mac Pro GPU designs.)

But I don’t expect it, although HBM isn’t entirely ruled out either for the top end RDNA2 GPU models just yet. (Cost of it and then memory with the interposer and HBM2 or HBM2E VRAM type.)

EDIT: Gaming segment too which RDNA targets whereas CDNA is what focuses on compute I would assume GDDR6 to be the choice AMD goes with here.

HBM could have it’s uses but I don’t know if the extra cost is worth it yet.
(Plus if AMD could match or challenge NVIDIA with GPU pricing that might be more flexible with GDDR6 instead.)

Aemony · September 23, 2020, 11:23am

Wasn’t that basically a setting that allowed the driver thread/NVAPI thread to change which core it gets executed on freely?

I remember reading that somewhere anyway, that disabling it would result in the driver thread being restricted to a single CPU core instead of the sporadic behavior it otherwise experiences where it switches core multiple times every second.

JonasBeckman · September 23, 2020, 11:35am

Interesting, thought it still had a primary reliance on CPU0 seen in games like the recent Assassin’s Creed ones but maybe those aren’t the best examples.

AMD hovers around a lower overall CPU utilization whereas NVIDIA is almost pegged at 99% constantly for this core.

Aemony · September 23, 2020, 11:36am

I dunno… This is how the old TweakGuides site puts it:

Threaded Optimization: This setting controls the use of multithreaded optimization for games on systems with multi-core/HyperThreaded CPUs. In theory, by allowing the driver to offload certain GPU-related processing tasks as separate thread(s) on available CPU cores, performance can be improved.

The available settings are Auto, On and Off. Auto allows the GeForce driver to determine if it applies threaded optimization, On forcibly enables it, and Off disables it. In testing this setting in several games, I saw no performance impact. It should be noted however that none of the games I tested were old enough to have been developed before multi-core CPUs became available to consumers (around 2005). User feedback indicates that some older games, as well as some OpenGL games, will exhibit problems if Threaded Optimization is set to On.

It is recommended that Threaded Optimization be set to Auto under Global Settings. This will allow the driver to determine if and how to use threaded optimization for various games. You can turn this setting Off under the Program Settings tab for particular (typically old) games which have problems at the Auto or On setting.

GPUnity · September 23, 2020, 1:31pm

Do mid-towers that can fit a lot of drives exist? I have about five internal drives and need space for more.