20150423

Source-Less Programming : 4

Still attempting to fully vet the design before the bootstrap reboot...

DAT words in the edit image need to maintain their source address in the live image This way on reload the live data can be copied over, and persistent data gets saved to disk. DAT annotations no longer have 30 bits of free space, instead they have a live address. When live address is zero. then DAT words won't maintain live data. This way read-only data can be self-repairing (as long as the annotations don't get modified). Going to use a different color for read-only DAT words. New persistent data DAT words will reference their edit-image hex value before reload (then get updated to the live address).

REL words always get changed on reload (self repairing). No need to keep the live address. REL is only used for relative branching x86 opcodes. Don't expect to have any run-time (non-edit-time) self-modifying of relative branch addresses. Given that branching to a relative branch opcode immedate is not useful, the LABEL of a REL word is only useful as a comment.

GET words also get changed on reload (self repairing). GET is only designed for opcodes and labeled constants. GET words will often be LABELed as a named branch/call target. Been thinking about removing GET, and instead making a new self-annotating word (display searches for a LABELed DAT word with the same image value, then displays the LABEL instead of HEX). This opens up the implicit possibility of mis-annotations. Would be rare for opcodes given they are large 32-bit values. But for annotating things like data structure immediate offsets, this will be a problem (4 is the second word offset in any structure).

ABS words always get changed on reload (self repairing). ABS words are targets for self-modifying code/data, so they also need LABELs. Reset on reload presents a problem in that ABS cannot be used to setup persistent data unless that persistent data is constant or only built/changed in the editor. But this limitation makes sense in the context that ABS addresses in live data structures can get invalidated by moving stuff around in memory. The purpose of ABS is edit-time relinking.

Source-Less Programming : 3

Annotation Encoding
Refined from last post, two 32-bit annotation words per binary image word,

FEDCBA9876543210FEDCBA9876543210
================================
00EEEEEEDDDDDDCCCCCCBBBBBBAAAAAA - LABEL : 5 6-bit chr string ABCDE

FEDCBA9876543210FEDCBA9876543210
================================
..............................00 - DAT : hex data
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA01 - GET : get word from address A*4
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA02 - ABS : absolute address to A*4
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA03 - REL : relative address to A*4

Going to switch to just 2 lines per word displayed in the editor, Only DAT annotations show hex value, other types show LABEL of referenced address in the place of the hex value. So no need for an extra note. In practice will be using some amount of binary image memory to build up a dictionary of DAT words representing all the common somewhat forth like opcodes, then GET words in the editor to build up source.

Need to redo the bootloader from floppy to harddrive usage, and switch even the bootloader's 16-bit x86 code to 32-bit aligned LABEL'ed stuff so the final editor can edit the bootloader. Prior was avoiding manually assembling the 16-bit x86 code in the boot loader, but might as well ditch NASM and use something else to bootstrap everything.

20150422

Source-Less Programming : 2

Continuing with what will either be an insanely great or amazingly stupid project...

Making slow progress with bits of free-time after work, far enough thinking through the full editor design to continue building. Decided to ditch 64-bit long mode for 32-bit protected mode. Not planning on using the CPU for much other than driving more parallel friendly hardware... so this is mostly a question of limiting complexity. Don't need 16 registers and the REX prefix is too ugly for me to waste time on any more. The 32-bit mode uses much more friendly mov reg,[imm32] absolute addressing, also with ability to use [EBP+imm32] without an SIB byte (another thing I mostly avoid). Unfortunately still need relative addresses for branching. 32-bit protected mode thankfully doesn't require page tables unlike 64-bit long mode. Can still pad out instructions to 32-bits via reduntant segment selectors.

Source-Less Analog to Compile-Time Math?
Compile-time math is mostly for the purpose of self-documenting code: "const uint32_t willForgetHowICameUpWithThisNumber = startHere + 16 * sizeof(lemons);". The source-less analog is to write out the instructions to compute the value, execute that code at edit time, then have anotations for 32-bit data words which automatically pull from the result when building 32-bit words for opcode immediates for the new binary image.

Reduced Register Usage Via Self Modifying Code
Sure, kills the trace cache in two ways, what do I care. Sometimes the easist way to do something complex is to just modify the opcode immediates before calling into the function...

What Will Annotations Look Like?
The plan so far is for the editor to display a grid of 8x8 32-bit words. Each word is colored according to a tag annotation {data, absolute address, relative address, pull value}. Each word has two extra associated annotations {LABEL, NOTE}. Both are 5 6-bit character strings. Words in grid get drawn showing {LABEL, HEX VALUE, NOTE} as follows,

LABEL
00000000
NOTE

The LABEL provides a name for an address in memory (data or branch address). Words tagged with absolute or relative addresses or pull value show in the NOTE field the LABEL of the memory address they reference. Words tagged with data use NOTE to describe the opcode, or the immediate value. Editor when inserting a NOTE can grab the data value from other words with the same NOTE (so only need to manually assemble an opcode once). Edit-time insert new words, delete words, and move blocks of words, all just relink the entire edit copy of the binary image. ESC key updates a version number in the edit copy, which the excuting copy sees triggering it to replace itself with the edit copy.

Boot Strapping
I'm bootstrapping the editor in NASM in a way that I'll be able to see and edit later at run-time. This is a time consuming process to get started because instead of using NASM to assemble code, I need to manually write the machine code to get the 32-bit padded opcodes. Once enough of the editor is ready, I need a very tiny IDE/PATA driver to be able to store to the disk image. Then I can finish the rest of the editor in the editor. Then I'll also be self hosted outside the emulator and running directly on an old PC with a non-USB keyboard, but with a proper PCIe slot...

Look No Triangles : Scatter vs Gather

There are a bunch of people working-on and succeeding in non-triangle rendering. With GPU perf still climbing IMO it is possible to return to the golden age of a different kind of software rendering: the kind done in a pipeline built out of compute shaders.

In my sphere tracing of math based SDF fields I was purely ALU bound, tracing to the limit of floating point precision. The largest performance win was found by doing a many-level hierarchical trace (starting with very coarse grain empty space skipping). But the limit of all this is just a log reduction of the number of steps in the search, still requires many search steps per pixel. And when doing a memory based trace (instead of a math based trace) the search is just a very long latency chain with divergent access patterns. Tracing via searching on the GPU hits a wall. To make matters worse when tracing, the ALU units are loaded up with work involved in tracing, instead of something useful.

The alternative to this is to switch to a mostly scatter based design. A large amount of the tree structure traversed each frame in a gather based approach is similar across frames. Why not just have the tree stored mostly expanded in memory based on the needs of the view. Then expand or collapse the tree based on the new visibility needs of the next frame. Rendering is then a mostly scatter process which reads leaves in the tree once. Reads of memory can now be coherent, and ALU can be used for things more interesting than search. Scatter will be somewhat divergent, but that cost can be managed by loading up enough useful ALU work in parallel. There are a lot of ways to skin this. Nodes of the tree can be bricks. Bricks can be converted into little view based depth sprites, then binned into tiles and composited. Seems as if bricks converted into triangle meshes and rasterized is the popular path now, but still using the CPU to feed everything. This could get much more interesting when the GPU is generating the cached geometry bricks: artistically controlled procedual volume generation...

20150421

From Scratch Bug 2 : Source-Less Programming

This is a disruptive idea which comes back periodically: source-less programming. Is it possible to efficiently program at a level even lower than an assembler?

The general idea is that the editor is now something similar to an enhanced hex editor which edits a binary image directly. Lowest memory is split into three parts {{running image, annotations for edit image}, edit image}. The {running image, annotations for edit image} is the boot image. The {edit image} is working space which gets snapshot replacement for {running image} on a safe transition zone. The "annotation" section is what enables human understanding of the binary image.

Words

20150415

Pixel Art and Slot Mask Pitch

This and the prior post are all shots from the same late model Arcade CRT, a 29" SVGA Makvision which can scan 30-40KHz and 47-90Hz. I'm cheating somewhat in taking a Metal Slug screen shot and displaying it on a non-15KHz monitor. Metal Slug was roughly 304x224 if I'm remembering right, so ultra low resolution to enable a 60Hz scan-out on CGA CRTs.

Arcade titles over the years with CRTs had a range of monitors and resolutions. Displays would provide a different look depending on the Slot Mask Pitch (effectively the number of dots for a given scanline). In this next shot I'm driving the monitor near it's lowest resolution (at roughly 312 lines), then using H-size and V-size control to enlarge the screen shot as much as possible (showing maybe 250 lines on a 600 line display, so higher slots/line count than the Metal Slug titles). The 29" Makvision is a flat screen and thus suffers from moire patterns more than a curved display. In order to get the classic scan-line look (which is caused by scanning only half the display's lines to get double the frame rate), this shot has the moire reduction turned off (which keeps the beam from having vertical line jitter, which would otherwise cause lines to blend together).

Alternatively I can drive this monitor at 800x600 and then set the moire reduction to blend scan lines. This is to simulate various Arcade games which displayed a relatively higher resolution compared the display slot mask pitch (lower slots/line count, the other extreme). The prior post's image was somewhere in-between these two examples.

Indie vs Real Slug Fest

If you see squares you are doing it wrong. The classic pixel art masters never intended for it to look as ugly as exact square pixels.

Shot from Metal Slug. The shot on the right is from a photo of an arcade CRT monitor.

20150414

From Scratch Bug

Inspired by Jaymin's JayStation2 effort and remembering a past life building custom OSs for early x86 machines, haven't been able to avoid the custom OS bug any longer. It starts easy with a harmless QEMU install, followed by a 512-byte bootloader switching to 80x50 text mode and installing a custom 48 character Forth font, then bring up of a Forth assembler/editor, then on to the pain of modern PCI and USB driver bring-up... with the eventual goal of a tiny bootable USB thumb system.

Amazingly refreshing to not have the OS telling you NO. Or the API telling you NO. Modern systems are all about the NO. Systems I grew up on were all about the YES.

Reworking my language from scratch, trying something new, replacing the Forth data stack with a new concept, but maintaining zero operand opcodes. Not sure if the idea will pan out. Dropping everything but 32-bit word support from the language, no need to interop with other software. No more 8/16/64-bit loads or stores (can still just inline machine code if required). Still running in x86-64 64-bit mode, so return stack PUSH/POP/CALL/RET is still a 64-bit stack operation, just don't need that 64-bit address space or 64-bit pointers anywhere else. Trying padding out all x86 opcodes to 32-bit alignment. This makes the 32-bit immediate 32-bit aligned. Wastes space, gives up some perf? Why would I care when most of the CPU side of the system fits in the L1 cache. Dropping paging, dropping interrupts, dropping everything, none of that stuff is needed.

Reworking an editor and binary source encoding. Switching to 32-bit tokens with 5 character max strings. 48 character character-set. Doing something horrible with font design: 1=I, 0=O, 2=Z, 5=S, etc. All caps font with no non-vertical or non-horizonal lines. Actually looks awesome. When you don't have to interop with the NO machine, long symbol names are not required. Color Forth like editors have almost no state. It is magical how they function simultaneously as an editor/assembler/console/debugger/UI/etc. Take the idea of "editor-time-words", words embedded in the source code which are evaluated when the block of source is drawn to the screen. Becomes possible to build out UI tools in the source. Can have an editor-time word read system data and draw in real-time updates in the source code itself. Editor-time words are just like any other word in the system, just color tagged to only be evaluated at draw time.

Minimal systems are a blessing, more so when you have only minimal free time to work on them.

20150406

End of an Era

A followup with stills from the why I'm using Fedex from now on post...
The 29" Makvision/Wei-ya 30K-50KHZ XGA monitor. Actually the one I found looks like an early 50khz model in the original box, never used, perfect condition!!! Unable to find any of the 50khz models, was super thrilled to finally find one in this condition,

Then UPS Killed It
The last and only "new" 50khz model I could find, destroyed by UPS. Looks like it was dropped on the corner, or rammed by a forklift, causing the tube to implode. Would have been a huge bang, someone probably got a good laugh, then sent the corpse to it's new owner, me. The loss of a CRT is very sad. No one will ever again manufacture them, they are a superior technology forgotten by the world, far better than even the best low persistence flat panels. To avoid another tragic loss like this, do the world a favor and ship with Fedex instead.

20150403

Why I'm Using Fedex From Now On

Finally got the package today, many days late on Monday, as they wheeled it to my door, it sounded like broken glass was banging around in the package, after taking a look inside, it was totally destroyed. Brand new (from 2006) never been opened 29" VGA arcade monitor of specs which are impossible to find anywhere (special 800x600 @ 90Hz tube). Picture tube totaled. Would have made some serious noise when the vacuum imploded, something very obvious. Instead of sending it back to the shipper after they destroyed it, they decided it was better to give me the problem and drop it on my doorstep.

20150327

Other CRT Options

29" Makvision CRT SVGA Arcade Monitor
Link: XGaming has these for roughly $500 and around$60 shipping to where I live.
Uses VGA input. Looks like there are three kHz spec ranges depending on version of display: {90 Mhz, 15-40 kHz or 30-40 kHz or 30-50 kHz (model C2929D1), 47-90 Hz, 800x600 max}. The peak kHz model might have capacity for 800x600 @ 80 Hz. Wonder what kind of persistence this display has.

Sony GDM-FW900 24" Widescreen CRT Monitor
Possible to find on ebay. Does {30-121 kHz, 48-160 Hz, 2304x1440 max}. Seems possible to do 960x600 at 160 Hz, and peak resolution around 80 Hz.