EDIT: Root caused. Two problems: (1.) The Dell monitors at work have some problems in "Game" mode. They work fine in "Standard" mode. I'm guessing "Standard" uses some (latency adding?) logic to correct for some color distortion of the panel, and this logic gets turned off in "Game" mode. (2.) Displays tested at work are slow IPS panels with larger gamut than the fast displays at home. The hue-to-warm-to-white transition in my algorithm is too punchy in the yellows for large gamut displays. Algorithm needs some more tuning there.
I have a shadertoy program which presents photo editing controls and a different way to handle over-exposure. Fully saturated hues blend towards white instead of clamping at a hue, and all colors in over-exposure take a path towards white which follows a warm hue shift. So red won't go to pink then white (which looks rather unnatural), but rather red to orange to yellow to white. This test program also adds 50% saturation to really stress the hard exposure cases.
The result looks awesome on my wife's MacPowerBook, and on my home Linux laptop using either a CRT or laptop LCD. However at work on a Win7 box with two Dell monitors and on a co-worker's personal Win7 laptop, the result looks like garbage. Specifically some hues on their route towards white have non-contiguous gradient which looks like some color management mapping operation failed.
I tried lots of different things to adjust color management in Win7 to fix it, but was unable. Was certain this had to be a problem in Win7 because two different Win7 machines with completely different displays all had the same problem. Then another co-worker running Win8 with a set of two different Dell monitors (also color calibrated) got a different result: one display had the same problem, the other looked good.
So not a Windows problem, but rather some new problem I had never seen before on any display I personally have owned: LCD displays with really bad distortion on saturated hues. And apparently it is common enough such that 4 of 5 different displays in the office had the problem. 4 of these displays were color calibrated via the GPU's LUT per color channel, but calibrated to maintain maximum brightness. Resetting that LUT made no difference. However changing some of the color temp settings on the display itself did reduce the distortion on one monitor (only tested one monitor).
Maybe the source of the problem is that the display manufactures decided that the "brightness" and "contrast" numbers were more important, so they overdrive the display by default to the point where they have bad distortion? Changing the color temp would reduce the maximum output of one or two channels. Not at work right now, so not able to continue to test the theory, but guessing the solution is to re-calibrate the display at a lower brightness.
Still have some challenges to iron out. The only VGA connector I had was a little thick, so had to open up TC1600 box and add more clearance to the VGA connector cutout (no problem). First attempt would not maintain sync, ended up just trying the second jumper setting, and everything worked enough to get a signal. EDIT, tried manual tuning, not able to resolve a problem where image brightness (input signal) effects h-sync (bright lines have different h-offset?) and what looks like a few random red or blue horizontal streaks in dark but not black regions. Otherwise the signal works well enough to try a bunch of things, but I do not have the tools to track down and resolve the remaining problems.
Impressions vs Memory
While I'm able to also run an interlaced signal with double the vertical resolution, this is effectively useless due to visual artifacts (as expected). However the NTSC TV looks better at 60Hz interlaced than the VGA CRT at 180Hz interlaced. The combined higher persistence and more diffuse beam makes a big difference here. The progressive 60Hz output on the higher persistence TV does visibly flicker a lot more than I remember thanks to viewing the TV a foot away with a white background. Old TVs managed to get away with this probably because content was almost never white, and the TV was far enough away to not trigger the peripheral vision as much.
Believe the crappy TV I have is actually low-pass filtering the component chroma input to match s-video or even horrible composite. Not sure, but single red pixels look very bad. Have a bit of work here to get the vintage arcade feel (probably need a better TV).
EXT_post_depth_coverage + NV_framebuffer_mixed_samples
Going to enable really fast good transparency blending with MSAA. Render opaque with standard MSAA. Then attach a no-MSAA color buffer to use with the MSAA depth buffer to composite transparent content. When blending, use "post depth coverage" to get the percentage of the pixel which passes the depth test, which is then used to reduce transparency. Render via front-first alpha blending (where at the end, alpha keeps amount of transparency left). Finally during a custom MSAA resolve, composite the final transparency over the resolved MSAA.
Getting disassembly of code generated at runtime to verify dynamic code generation can be a slow process.
On Linux, I just use gdb.
Using gdb requires setting the breakpoint for run-time generated code
after that code is generated by the application
(if there is an easier way, I don't know it).
Here is a quick example,
Find the entry point of the ELF: readelf -a filename
Look for "Entry point address:" and write down address
Start gdb: gdb filename
Insert breakpoint, this example using 0x2c3 as entry point: break *0x2c3
Start program which will break at above entry point: run
Show disassembly with raw bytes example: disassemble /r 0x2c3,0x300
The "=>" shows the instruction pointer
Set another breakpoint to something after run-time generated code
Use the "continue" command to continue to that breakpoint
Show disassembly of the run-time generated code
After hearing Carmack's talk at the Oculus keynote, wanted to see first hand if interlacing artifacts are still visible at high rates on low persistence displays. Tried 180Hz interlaced on my CRT (get a pair of even-line and odd-line frames every 90 times per second). Still see massive artifacts. For instance when on-screen vertical motion is some multiple of scan-out, it is possible to see the black gaps between lines. The visible size of the gap is a function of the velocity of motion. Horizontal motion also does not look good. Probably has a lot to do with resolution (running 640x480 on a CRT which can do 1600x1200). Coarse shadow mask interlaced NTSC TVs did not look as bad at low frame rates, probably because the even and odd lines would partly illuminate the same bits of the screen (or maybe they had higher persistence phosphors)? If I use the display's Vertical Moire control to align the even and odd scan-lines to the same position in space, then the output looks much better even with constant black gaps between the lines. To me this whole experiment suggests that some kind of motion adaptive de-interlacing logic is needed at high frame rates (best case) or at a minimum the OLED display controller needs to fill in the missing lines in any interlaced mode using some spatial filter.
GTX 980 vs GTX 680: Perf/Area
GTX 980: 11.6 Mflop/ms/mm2, 0.56 MB/ms/mm2, 0.36 Mtex/ms/mm2, 0.18 Mpix/ms/mm2
GTX 680: 10.5 Mflop/ms/mm2, 0.65 MB/ms/mm2, 0.44 Mtex/ms/mm2, 0.11 Mpix/ms/mm2
Performance per total chip area goes down a little for bandwidth and texture, but up for ALU and ROP.
GTX 980 vs GTX 680: Perf/TDP
GTX 980: 28.0 Mflop/ms/W, 1.36 MB/ms/W, 0.87 Mtex/ms/W, 0.44 Mpix/ms/W
GTX 680: 15.8 Mflop/ms/W, 0.99 MB/ms/W, 0.66 Mtex/ms/W, 0.17 Mpix/ms/W
Performance per total chip power goes way up for ALU and ROP, a little less for bandwidth and even less for texture.
For framebuffer compression, looks like for 32-bit/pixel a block of 16x16 fragments gets compressed in 3 steps. Implies that samples for 8xMSAA are stored in 2x4 or 4x2 fragment blocks (depending on if the image or text description is correct), and samples for 4xMSAA stored in 2x2 fragment blocks. The three steps: (1.) 8:1 compression if the fragment values are all uniform for 2x4? or 4x2? blocks, (2.) 4:1 compression if fragment values are all uniform for 2x2 blocks, or lastly (3.) 2:1 compression if delta compression works out (stores deltas between fragments for less bits). Looks like Maxwell adds some more options for delta compression. Sounds like 8xMSAA shaded with a forced 2 samples per pixel would nicely fallback to the 4:1 compression case for blocks in triangle interiors. This compression ability is one of the major reasons MSAA is ultra fast on NVIDIA hardware.
Maxwell in the 980/970 has viewport multicast from VS without the need for GS, conservative raster, programmable sample positions for a repeating block of samples, and ability to stall the pixel shader to implement in-raster-order shader processing for a given pixel (could use for programmable blending for example).