To see this working, head to your live site.

Edited: Oct 02, 2018

Character sets

** Update Oct 1, 2018

I spent some time with my new character editor (available on GitHub as part of the upcoming SDK, look for an announcement soon.)

The image on the right is RC2 of what I've decided to call the NuPET ASCII character set. This character set is 100% compatible with 7-bit ASCII and largely compatible with PETSCII, with the shifting of certain character blocks.

This will be the default character set of the Foenix and the Nu256 virtual machine.

Notable changes are:

1. Shifted graphic characters have been removed from the lower 7 bits. Add 96 to PETSCII characters in the 96-127 range to get the NuPET version of the symbol.

2. The vertical bar (|), tilde (~), underscore (_), braces ({ and }), and backslash (\) replace the Britsh Pound (£) and CBM-Only glpyhs in the lower half of the character set. This makes NuPET ASCII 100% compatible with standard 7-bit ASCII.

3. PETSCII does not include half of the symbols needed for bar graph drawing. Those are added in the $90 row.

4. PETSCII does not have double-bar box graphics. Those are added in the $E0 row with single-double intersections in $8A-$8F.

5. Other assorted symbols missing from the default PETSCII set are added.

6. I'm still toying with making letters 5 rows to give lower case letters proper descenders. This changes the look of the character set by quite a bit, though, and some people might find it less pleasing than the standard CBM character set. I'd like to get your input on this. (The letters in the above image have been converted. The numbers have not.)

The characters included are done with an eye toward building text-based GUI screens. All standard GUI elements should be included with this iteration of the character set. If you find a necessary element is missing, please let me know.

It's worth noting that some graphic characters overlap the ranges used by PETSCII control characters.

To print NuPET graphic symbols, print $0F (15, or Shift In). This will display the NuPET symbol for everything except CR, LF, and ESC.
To use CBM control characters, print $0E (14, Shift Out). All CBM control characters (cursor controls, colors, etc) will be interpreted the same way as a Commodore computer
In all modes, Sending the ESC character will prefix an ANSI control sequence. For example, {ESC}[2J will clear the screen.

Original Post

So I've been digging through some prior work I did, especially related to character sets.

As all you Commodore aficionados know, the character set introduced in the Commodore PET computer is not actually ASCII; it is mostly ASCII, but with some choices that make interacting with ASCII systems sometimes challenging. We will call this character set PETSCII, to make things simple.

The ASCII grave (`), tilde (~), backslash (\), pipe (|), and underline (_) symbols are not present on PETSCII.
Many glyphs are present in the upper-case/graphics set that are not present in ASCII (all the shifted letters).
In upper/lower case mode, the case of the text is reversed: upper-case letters in ASCII are lower case letters in PETSCII and vice-versa.

I'm going to propose that we reconcile the PETSCII and ASCII character sets with the a new way of handling text. In addition to the existing upper-case/graphics and lower-case text modes, I'm proposing adding an additional 128 glyphs that cover the 1982 IBM PC character set (also known as Code Page 437), several glyphs that fix shortcomings in the PETSCII set, and a character set mapping feature that allows several predefined maps, as well as custom maps, in the VICKY graphics processor.

It would work like this: the built in character ROM would contain room for 1024 character slots. When VICKY renders the screen, she can select a character from any of the character slots, based on the current selected character map. This is similar to the way the VIC chips handle character mapping, but with 4 banks instead of two. In addition, VICKY should be able to select a character map from VRAM, as well, like the VIC chips.

The predefined maps would include:

PETSCII upper case/graphics (A, ♠, £)
PETSCII upper case/lower case (a,A, £)
IBM ASCII: a, A, \, |, ~, `, _, and all glyphs in ANSI Code Page 437.
PET/ASCII: codes 32-127, standard ASCII characters, including `,~,\,|,_. 160-255 contains all PET graphic glyphs and 32 new glyphs to round out the block drawing set, plus some bullets and other assorted symbols.

Switching between character sets is currently done by printing CHR$(14) (text mode), CHR$(142) (upper/graphics mode). I propose adding CHR$(15) and CHR$(143) as IBM ASCII and PET/ASCII, respectively, with PET/ASCII being the default character mode on bootup. I suggest extending the Shift/C= keyboard command so that it cycles through all 4 character sets.

I've attached the PET/ASCII layout, to show you the proposed PET/ASCII character set. The left image is the original PETSCII character set, with redundant characters removed. The right image is my proposed character set. Note the extra glyphs starting at $E0.

Finally... a justification for these changes:

An ASCII compatible character set is necessary for data interchange with other computers. PETSCII's lack of compatibility with the PC has long been a thorn in users' sides. This creates a standard that is easier for developers to use, without the need to resort to uploading custom character sets. Without a standard, every person who writes a text editor or terminal program will end up rolling his own, which will cause undesired fragmentation in the software community.
PETSCII is still desirable because it's unique and a bit of a calling card for Commodore users. Its inclusion is necessary if we are to support CBM BASIC.
The PET/ASCII mode lets a user include all of the original Commodore glyphs, including some not originally included (but which should have been.) This is, I believe, the ultimate evolution of the 8-bit Commodore character set.

33 comments

33 Comments

Jim W4JBM

Feb 04, 2019

I like the Borland style menus that were used in various DOS products also. Having the character set to implement that style would be useful.

Th ability to define a customer character set for a particular application would also be nice. If there was a way to develop a font / character set, save it to a standard format disk file, and then have any program that wanted to use it "load' that file to enable the characters, I think it might encourage some creativity around fonts for particular types of applications.

Character graphics were pretty integral to BASIC games back then because of speed and even then you often ended up needing machine language routines to call if you were going to reverse scroll or scroll sideways. (Effectively, scrolling with character graphics was usually just a block memory move.)

On the ZX Spectrum, there were actually just 16 characters that served as User Defined Graphics (UDG). For simple things like an invaders style game, a simple "scrolling" racer game, or other applications, it doesn't necessarily take an entire new character set.

https://zxspectrumastronomy.wordpress.com/tag/user-defined-graphics/

Digging into the hardware to better understand how they did this has been on my to-do list for a while. But in an FPGA implementation, it seems like having "registers" that allowed loading a replace set of bitmaps for a few characters would be manageable. You basically would have a flag that told you whether to use the characters in the external character ROM or render them with the loaded replacement character. Limiting it to 16 characters also means you are less likely to end up with a totally trashed screen if you somehow revert between modes unexpectedly.

Thanks,

Jim W4JBM

eacontrerasd

Dec 05, 2018

Adding support for these type of menu is very thoughtful , I used all the Borland products and loved their menus, it also suits well the interfaces on computers in the early 90s! Great work!

Tom

Oct 02, 2018

Here is what a window might look like, using NuPET symbols. Everything in this image is a character from the NuPET character set, although I might have missed the 8-pixel alignment (I used Paint .Net to draw this)

Tom

Oct 01, 2018

Just a note; I've added some info to the first post.

I'm hoping to get a chance to put together some demo screens showing text GUIs soon. I'll probably make further changes as I explore the possibilities at a later date. Expect changes in the $00-$20 block as I play with GUI designs and APIs in the future.

wavestarinteractive

Aug 02, 2018

Absolutely.

Tom

Aug 01, 2018

Yes, I'm thinking of mono fonts, with a foreground and background color for each character cell.

I never even considered multicolor characters; that sounds fascinating, but it makes sense as a way to do modular game maps. A person could use a block of 4 or 6 characters to define a background tile, which is easier than drawing a bitmap every time the screen updates.

And yes, I'm definitely up for getting a board. I'm itching to write some real code. The WDC simulator is one thing, but it's no substitute for hardware.

stef

Aug 01, 2018

Tom,

Sounds great to me, Color wise, I was thinking about 1 text mode that would be rudimentary with monochrome (1 color) font, mainly used for text, coding, etc... The system you are actually talking about, is about monochrome chars, right?

This monochrome mode could either be 16 hard-coded colors or from a 24bit palette. A Second monochrome mode could possibly be extended to 8bits (256 different colors) (4 Planes) still out of a 24bits palette. Although, if the first mode, you can define the 16 colors, then, there might not be much need for more.

Then, there would be a Color FONT mode, where you can actually define the color of each pixel of the font, that mode of course would be used for gaming. Does that sound right?

Listen, what you are proposing to me sounds great, this is where I will start when I will tackle Vicky's FPGA code, probably early on, so I can have a way to see what is going on in the system.

On a different note, I ought to finish the PCB tonight and possibly sent it to manufacturing as well... Either tonight or tomorrow. So, this week-end I will have time to start doing some memory mapping and FPGA work.

Are you still up to get a board before CRX?

Cheers

Tom

Aug 01, 2018

Stefany, I've been playing with my terminal/simulator, and I think a 3-page system is going to be best way to go, at this point. (Yes, I've got some redundant info in this post. It's just easier to summarize, at this point.)

Page 1: Character code. This is the ASCII value of the character. (Codes 1-32 will be blank or filled with a representation of the command code's function, such as a down-and-left arrow for Return).

Page 2: Color Map. Each byte will contain the foreground and background color for the character cell. The upper 4 bits is background, and the lower 4 bits is foreground.

Page 3: Attribute Map. The upper 4 bits selects a character map. The lower 4 bits selects the rendering mode for the character - Bold, Italic, Underline, Reverse.

I know you also want to do 8-bit color modes, so if you support 8-bit text color, this would mean adding one more page, for 4 pages total.

In order to support multiple code pages or character sets, we should have 16 registers that set the address of the font data. Fonts will always use 4K or 8KB of memory, and so we can assume fonts will always be loaded on a 4K boundary and will reside in VRAM during system operation.

Fonts are always stored with 1 byte per raster line. Each bit represents one pixel in the row. So the letter A, for example, would look something like this:

bits        Pixels   Value in decimal
00011000  |   **   | 24
00100100  |  *  *  | 36
01000010  | *    * | 66
01111110  | ****** | 126
01000010  | *    * | 66
01000010  | *    * | 66
01000010  | *    * | 66
00000000  |        | 0

(I know you know how this works; I'm spelling this out for folks who may never had designed a custom font.)

An 8x16 font would be similar, just with more rows and with the character image extended further vertically. I figure on using 10 rows for upper case characters , 5 or so for lower case characters, and 4-5 rows for descenders (the bottom part of j, g, etc.) that will leave 2-3 rows for line spacing.

If you use any exotic rendering modes, like 7x14, the rasterizer should crop an 8x16 font. So the first line of a font will always fall on an 8-byte boundary and will always be 8 bits wide.

I will include a total of 8 fonts in the system, an 8x8 and 8x16 version of the following fonts:

PETSCII Upper Case - this is the default font on a Commodore 64, 128, and VIC-20

PETSCII Lower Case - this is what you get when you press C= + Shift on a Commodore

PET ASCII - ASCII-ordered font with PET symbols in the upper 128 positions.

IBM ASCII - ASCII-ordered font with IBM graphic symbols in the upper 128 positions.

This will use a total of 48K of ROM: 16K for the 8x8 fonts and 32K for the 8x16 fonts.

wavestarinteractive

Jul 20, 2018

So I've been digging through some prior work I did, especially related to character sets.

Tom, I apologize for explaining PETSCII to you. When I read "We will call this character set PETSCII", it read like you're naming it PETSCII when it had been named that for three to four decades. I read into it to those words too much.

I would have said: "This character set introduced with the PET was called PETSCII among the Commodore community including Commodore."

Commodore had officially referenced it as PETSCII in a modem manual at the time frame of the C128 but its based on 1963 ASCII not the later 1967 ASCII. Anyway, by 1985, it was already common practice especially with all the magazines like Compute! / Compute! Gazette, Transactor, and many others to call things PETSCII to distinguish from ASCII as it was commonly used. This was also a point made not so much to you, Tom, but the forum for those who may not have the history with the Commodore platform which I have had for THREE DECADES or you.

wavestarinteractive

Jul 20, 2018

If this is something you feel strongly about, feel free to write a TrueType font renderer.

I could use this: http://www.angelcode.com/products/bmfont/

Then take the bitmap and scale proportionally to an 8x8 matrix and an 8x16 matrix and even 16x16 matrix.

Truetype is a vector font system and the rasterizer would probably suck up CPU performance of a 14 MHz 65c816. Enough said on that.

Seriously, I am more interested in ASCII/ANSI and PETSCII for terminal emulation/BBS applications. As for Unicode or other stuff, it's more an icing on the cake but it could be implemented by third-party. The trick there is to use "code pages". A GEOS-like OS could implement something like Unicode. It should be noted that ASCII and PETSCII tended to have the upper and lower case reversed. What is lower case in PETSCII is upper case in ASCII. I've written an ASCII / PETSCII converter on the C64 back in the late 1980s.

ANSI would be a little more compelling but then it is really just a replacing of the fonts mapped to the character value.

wavestarinteractive

Jul 20, 2018

http://wavestar.x10host.com/indexbbs.html#

or with your terminal emulator set to 80x50.....

wavestar.ddns.net : Port#: 23 TELNET

It is set up for the time being.

I have a BBS to test what an 8x8 font at 80x50 would look like but an 80x60 screen should look similar in size on a full-screen. It should appear larger on a 1920 x 1080 screen than what it appears via the web page. As you may notice, it is still legible. It should give a since of what it should look like in a quick and dirty fashion. On a modern monitor, the vertical dimension and horizontal dimensions of the screen should be larger and hence the font be legible.

I'm confident that it'll still look fine. I've only mentioned it because it might be something to use as a gauge to approximate the size of the letters. I might be able to tune the window to 80x60 but then I would have to tweak some other things in the BBS.

On my screen: horizontal active area: 20.92" and vertical active area: 11.77"

If your screen area is larger, it should be fairly legible. It'll be smaller than 80x25 would look like back on the average size monitors back in 1987. I personally think it'll display just fine.

wavestarinteractive

Jul 20, 2018

I'm good for 80x60 screen mode. Even for a Terminal emulator that is 80x50 character set mode, I can still allow frame an 80x50 window for the terminal emulation window and have 10 rows of characters for the terminal emulator menu and what not.

wavestarinteractive

Jul 20, 2018

" Personally, I just don't care, and you're hijacking my thread again. This thread is about porting PETSCII and integrating it with ASCII. If you want to explore Unicode support, be my guest - in a new thread, please. "

Okay. If anything, I'd suggest full ANSI ASCII and PETSCII. Very usable for dial-up BBS with ANSI, ASCII, and PETSCII. Only requires 256 characters for ANSI/ASCII and 256 character set for PETSCII. The thread title is "character sets" btw. I was thinking about terminal emulators and BBSs where having the characters would be useful.

Tom

Jul 20, 2018

Stefany... if your display chip can push 1920x1080, then you can line-double the 640x480 image (making it 960 rows) and still have some room for TV overscan (aka the borders). When the system is running at 1920x1080, you could just center the 4:3 image. That's going to be the most widely supported resolution, with most people hooking this system up to their TVs and HD computer monitors. (I'm actually going to be using a 32" 4K screen.)

I'll do the math on other resolutions, too, and see what I can come up with up for different high-res modes.

Tom

Jul 20, 2018

Personally, I just don't care about Unicode right now. Put it in a new thread if you want to talk about it.

wavestarinteractive

Jul 20, 2018

Unicode existed in 1987 and roots back to XCCS in 1980. Also, 4 MB wasn't a specification for a C= 8 bit product or similar priced computers in the under $1,000 range until 1994 or so. RAM wasn't cheap enough then. It is an interesting thing but Tom's response was interesting so it was brought up to be a mechanism to facilitate while the routine can initially be JSL-ing to an RTL but I'll get to that in a moment about paging in the Unicode set instead of mapping it all into RAM or it can be on "ROM" not mapped onto the RAM itself. I could still implement it even if not in Kernal.

Tom

Jul 19, 2018

:) Actually, my favorite text mode on PC is the 80x43 mode (actually, it might have been 80x50. I can't remember any more.) I used that exclusively when programming, once I got a VGA graphics card. That's why I figured we'd want the 8x8 and 8x16 fonts as separate banks. I'll whip up the line-doubled font tonight; it's just a matter of writing each byte twice (for now.) Later, I'll have to build a font editor. It's not hard work, but it takes time.

Actually, if someone wants to take that on, I can send them what I've already got; my viewer/converter is written in c# and isn't big or complicated.

stef

Jul 19, 2018

Tom,

Fundamentally, in monochrome text mode, which is the simplest and the one at boot-up, the resolution will be, yes, 640x480.

Considering that there will be borders, then the Dot Clock will be slightly faster than 25.175Mhz, so the full screen might end-up be 800x600 or 720x560 to cover the borders.

8x16 sounds great, but I would like to keep in a different bank the original 8x8 matrix. If it turns out to be too small, we can remove it later. I like the possibility of having a 80x60 matrix, more info at the same time. Although, If I can't see it, it is not helpful either! ;o)

Tom

Jul 19, 2018

Stefany, I have another question about the video system.

Are you planning on rendering the text modes in 640x480 resolution? If so, I'd like to to give you an 8x16 character set. For now, it'll just be a line-doubled version of this one. Later, I'd like to build a new, cleaner font with a higher baseline, so we get proper descenders (the bottom part of g, j, etc.)

Here's an example of the letter g without and with a descender:

I think the one on the right looks better and is more readable; If you like this look, I can either tackle it later on, or maybe you can get an actual artist to draw up an 8x16 character set; then I can just convert the bitmap like I did with the ROM I sent you.

stef

Jul 19, 2018

I am wondering sometime why we are having that kind of discussions... I think I repeated myself a gazillion time that this is a 1987 computer... I don't see the Amiga 500 having unicode support... What the fuck!

Tom, what you have proposed makes total sense to me. The Fonts need to be a certain size and the memory is limited, so... NO.

At this point, if the posts could stay within the scope of what the project is about and its philosophy it would be great!

Thank you!

Stefany

Tom

Jul 19, 2018

I'm not sure why you felt the need to explain PETSCII to me, when I just described it in the first post. I think I explained the concepts clearly in the first post, so I won't belabor the point.

Unicode... I've considered that. The problem with Unicode is its sheer size; Unicode allows for over 1 million characters, about 10% of which has already been defined. If we use 8x8 character cells, we'd need 800KB just to store the whole thing. An 8x16 character set (for 640x480 text mode) would need 1.6MB. I think any Unicode rendering will have to be done in bitmap mode, and the application will have to provide its own Unicode support.

If this is something you feel strongly about, feel free to write a TrueType font renderer; that would make it much easier to import Unicode fonts from other platforms.

For text modes, one possibility is a 3-layer screen. The first layer contains the character to be displayed. The second layer contains the color for that cell. (4 bits for foreground, 4 bits for the background). The third layer contains attributes (underline, italic, bold, blink) and selects the character bank to be displayed. With 4 bits available to select the character bank, you could load 16 character sets, for a total of 4096 glyphs, at once.

Of course, I don't know the first thing about FPGA programming, or how Stefany intends to design the display engine. At this point, I'm just throwing out ideas that I can't actually do much about.

wavestarinteractive

Jul 18, 2018

Tom, it's called PETSCII or CBM ASCII. PETSCII has been commonly used for 35+ years for the Commodore character set.

We should have BOTH ASCII and PETSCII character set. Maybe even Unicode. Something we can swap in or use as we need or want it. I'd support PETSCII, ASCII.... or more specifically the extended ASCII set known as ANSI character set, and Unicode as four selectable or switchable character sets.

Don't forget that we also have many tools for programmable character fonts that we can employ into this. I would support the ability to enter a full Unicode set or at least UTF-8 and/or UTF-16. It would be more practical to support UTF-8 as we have a mechanism for dynamically mapping in Code Pages at a time from external storage without having them in RAM all the time. It is something to consider at some point.

Tom

Jul 12, 2018

I don't, yet. I drew the grid on the left using VICE, then copied and pasted in PAINT.NET to get the one on the right. Because the Commodore ROMs are stored in a different order, it'll be necessary to move them around to get them into PETSCII or ASCII order. I'll put together a sample and a preview bitmap. It'll probably be next week; this weekend is gonna be nuts.

stef

Jul 12, 2018

Hey Tom,

Sounds very reasonable... it make sense.

Do you happen to have the actual Binary file for all those different sets? or a link?

Please advise...

Awesome post!

Cheers