Just my 2 cents - I think the jumplist of the kernel should be copied to RAM during the initialization, and the official way to call the kernel should be to use precisely this RAM copy of the jump list (think of AmigaOS library base structure). Same for the BASIC jumplist. This way it would be easier to extend the OS from the 'usermode', and the GUI could easily provide the 'terminal emulation' - imagine the C256 BASIC within the GEOS application window.
" @Tom, I agree with your strategy. For me, I just want to be sure that what ever Function I call in the Jump Vectors list, the PC can find its way back to the original caller. If I am not mistaken, the early MAC with the 68K had a similar call list (it was huge actually).
This will give us the freedom to upgrade/modify the Kernel/Monitor/Text Editor/BASIC without having to worry about compatibility issues."
On the C64, we JSR the jump vector. We can just use the 65c816 JSL command instead as we can JSL anywhere in the 24-bit address space of the 65c816 and kernal routines ends with RTL (instead of RTS) in order to return back anywhere in the 24-bit Address space. So even with 16 MEGABYTES.... we can JSL to the jump vector and RTL back from the kernal routines. whereever they are. The RTL does what RTS does but not bound to 64KB memory space but only bound by the 16 MB address space. Granted, there may or may not be 16 MB on-board but the 65c816 can do that.
The Fortran implementation would have a POKE and PEEK implementation in my opinion. Not normally part of Fortran but it would be important in my opinion for ease of use and powerful functionality.
Stef, alright. I have no problems with that. That's why we have an expansion port if we want a SUPER-FAST 65c816 (100 MHz 65c816 in an FPGA implementation... WDC proved that already... thank you Bill Mensch and WDC Team).
14 MHz is plenty fast enough so a couple clock cycles more to do full range addressing isn't a problem in my opinion.
Whatever we do in the Kernal in jump table (and the documentation) would be useful in everything from BASIC, ML Monitor, FORTRAN DEVELOPMENT KIT (you heard that right !!!! FORTRAN !!!! The BIG BROTHER OF BASIC which is FULL STRUCTURED PROGRAMMING AND WOULD MAKE SENSE AS A SOURCE CODE IDE + COMPILER APPROACH), and other languages.... if I am devious, Ada. Believe it or not, there was an implementation of Ada for C64 of it.
Now, imagine this for C256 because after all, we got a tad more resources. These would be additional programs that could be made available at some point in time.
" The number of clock cycles won't be a big issue when you are operating at say..... 20 MHz to 21 MHz (without a problem considering the 14 MHz 65c816 could be driven to about 25 MHz without any real problem and within Western Design Center's design tolerance as long as you keep the CPU cooled (say a heatsink). "
Unfortunately, you won't be able to overclock the system, since the 14.318Mhz is the base clock where everything is based on, if you change the clock, nothing will work, since most of the chipsets will be synchronous. It is not like there is clock for the CPU and everything else. At least the First C256 Foenix will be locked.
Secondly, the CPU is powered with 3.3V as opposed to 5V which limits greatly the overclocking possibilities... I know some people will try and they could actually succeed, but this will be their burden not mine. 14Mhz in 1987 would have been considered quite fast. The Amiga 500 was 7Mhz right? Besides, if they really want to speed things up, they might as well go and fetch a Mega65.
The C64 was 1.022Mhz and it did pretty well, so you know I am not worried if we loose a couple of clock cycles here and there. Trust me the rest of the system will compensate.
@Tom, I agree with your strategy. For me, I just want to be sure that what ever Function I call in the Jump Vectors list, the PC can find its way back to the original caller. If I am not mistaken, the early MAC with the 68K had a similar call list (it was huge actually).
This will give us the freedom to upgrade/modify the Kernel/Monitor/Text Editor/BASIC without having to worry about compatibility issues.
Yes, that's more or less what I'm thinking. Yes, I'm aware of the CPU's addressing modes. All those weird modes are going to be key in doing things like allowing relocatable user resident routines...
Not a problem when you're at several MHz. I'll add more little later. I'd suggest JSL because of cross 64KB bank boundaries to call the (Jump to Subroutines) to jump to Jump Vectors. All routines should be RTL in return. Each vector may use a JMP to the actual routines using opcode $5C (hex value of Opcode). The routines would invoke an RTL to return to the actual ML program including ML Monitor.
ML Monitor and ML programs could call a kernal routine at $F0:FCxx using JSL $F0FCxx and then the Jump Vector would have $5C, xxyyzz (where xxyyzz would be the hexadecimal value for the 24 bit memory address). $5C is the opcode value for a JMP with 24 bit absolute "long" address. The 24 bit address would be the actual starting address for a kernal routine (or any subroutine). The last instruction of the subroutine shall RTL (long return from subroutine). Remember, you're invoking a long jump to begin with. Doing this will allow you to cross the 64KB boundary using absolute addressing. The 65c816 actually has such addressing modes. It is found here: https://wiki.nesdev.com/w/images/7/76/Programmanual.pdf
The number of clock cycles won't be a big issue when you are operating at say..... 20 MHz to 21 MHz (without a problem considering the 14 MHz 65c816 could be driven to about 25 MHz without any real problem and within Western Design Center's design tolerance as long as you keep the CPU cooled (say a heatsink).
So the next question is: how do we manage kernel calls? Classically, Commodore has used a vector table: that is, a series of addresses intended to be accessed by an indirect JMP call, such as JMP ($5597).
With a vector, the actual routine isn't actually hosted at $5597. Instead, the bytes at $5597 contain another address in memory, where the actual routine is located. Vector tables are very efficient, since they only require one execution step, but the actual machine language opcode takes 5 clock cycles, compared to the 3 cycles used by a regular JMP. However operating system calls should actually be initiated by a JSR, or Jump to Subroutine, which doesn't have an indirect addressing mode.
Since there are going to be four different subsystems living in ROM (the kernel, BASIC, the monitor, and the editor), I'm going to suggest creating four jump tables.
The first jump table would live at the end of the first bank of kernel ROM, backfilled from $F0:FFFF. The boot code for BASIC, the monitor, and the editor would each live in the subsequent banks, with their jump tables at the end of that bank. If set aside 1K for each set of jump tables (that gives us 256 entries for each API), we would get:
kernel: $F0:FC00
BASIC: $F1:FC00
monitor: $F2:FC00
editor: $F3:FC00
It's entirely possible that BASIC and the kernel will take more than one bank of memory. This is perfectly acceptable, since the '816 processor works in 64K banks. Each page of code would simply need to be managed as a separate source file in source control, with a master file to link them all together in the right order.
So digging some more into the 816's architecture, it looks like you really want to keep the first 64K free for Zero Page (now called Direct Page) operations. I also like CP/M and DOS's model of keeping applications as low in RAM as possible, so application memory can start immediately after Direct Page space.
So here's a simplified memory map based on that idea:
$00:0000-00:00FF: System reserved Direct Page memory.
$00:0100-00:FEFF: Application Direct page, Stack, RS-232, IEC, and USB buffers.,
$00:FF00-00:FFFF: Interrupt vectors. These are hard-wired into the CPU, so they have to live in these addresses.
$01:0000-EFFFFF: Application memory; uses available RAM.
$E0:0000-EFFFFF: VRAM (1MB, as per initial specs)
$F0:0000-FF:FFFF: Kernel, Monitor, BASIC, Text Editor
I'm not sure yet where the I/O address space will be. Since the memory management unit, GAVIN, will be an FPGA core, I have a feeling we can design this however we want. Provisionally, I'd expect I/O to either use space at the top of the VRAM banks or to sit just below VRAM and be backfilled to the top of installed memory.
Given the choice, I'd rather see something like 64K of address space taken out of the video address space, as this potentially leaves 14MB for user RAM, before doing bank swapping or other tricks.
This sounds like an exciting project. It's been so long since I've written 6502 assembly that I'm basically starting over again.
Do you have a system architecture in mind? Are you going to try to follow the same basic outline as the Commodore 64 or 128, or are you going to start from scratch?
I have a few thoughts on the topic... basically the same things I came up with when pondering how I'd design a 16-bit computer.
Looking at something like CP/M, like the way it loads the OS into the upper part of memory. This allows user programs to always start with a consistent address, with the first page being reserved for vectoring operating system calls. However, due to the critical nature of Page 0 in the 6502 ISA, I would reserve Page 0 for transient data, instead using a block higher in memory for OS calls.
If I understand correctly, your intent is to use a CPU with a 24-bit address bus, which should give you 16MB of address space. So I'm thinking that the top 1MB of address space could be reserved for the operating system, I/O registers, I/O buffers, and VRAM. This reserves, theoretically, 15MB for user data.
I also noticed that you want to go with smart peripherals (and even included a Commodore-compatible IEC bus). I fully support this idea; making the devices smarter leaves core memory and clock cycles free for running the user's programs, rather than controlling the disk drive. Instead, the CPU could deposit data in a buffer, issue a "write to file" command, then let the I/O controller and disk drive work to get the data out to the drive.
What are you planning on doing for a user-facing OS? Are you going to go with something like a modified BASIC? If so, are you considering including a structured variant (aka: Microsoft Quick BASIC)? As much as I love BASIC, I have always hated line numbers, so including a text editor and Quick BASIC style language would definitely bring the system into the 21'st Century while also keeping the retro flavor.
Likewise, it's going to need a full machine monitor and debug environment. The debugger doesn't need to be huge, and I can start a separate thread for that (probably for BASIC, too.)
Just my 2 cents - I think the jumplist of the kernel should be copied to RAM during the initialization, and the official way to call the kernel should be to use precisely this RAM copy of the jump list (think of AmigaOS library base structure). Same for the BASIC jumplist. This way it would be easier to extend the OS from the 'usermode', and the GUI could easily provide the 'terminal emulation' - imagine the C256 BASIC within the GEOS application window.
" @Tom, I agree with your strategy. For me, I just want to be sure that what ever Function I call in the Jump Vectors list, the PC can find its way back to the original caller. If I am not mistaken, the early MAC with the 68K had a similar call list (it was huge actually).
This will give us the freedom to upgrade/modify the Kernel/Monitor/Text Editor/BASIC without having to worry about compatibility issues."
On the C64, we JSR the jump vector. We can just use the 65c816 JSL command instead as we can JSL anywhere in the 24-bit address space of the 65c816 and kernal routines ends with RTL (instead of RTS) in order to return back anywhere in the 24-bit Address space. So even with 16 MEGABYTES.... we can JSL to the jump vector and RTL back from the kernal routines. whereever they are. The RTL does what RTS does but not bound to 64KB memory space but only bound by the 16 MB address space. Granted, there may or may not be 16 MB on-board but the 65c816 can do that.
The Fortran implementation would have a POKE and PEEK implementation in my opinion. Not normally part of Fortran but it would be important in my opinion for ease of use and powerful functionality.
Stef, alright. I have no problems with that. That's why we have an expansion port if we want a SUPER-FAST 65c816 (100 MHz 65c816 in an FPGA implementation... WDC proved that already... thank you Bill Mensch and WDC Team).
14 MHz is plenty fast enough so a couple clock cycles more to do full range addressing isn't a problem in my opinion.
Whatever we do in the Kernal in jump table (and the documentation) would be useful in everything from BASIC, ML Monitor, FORTRAN DEVELOPMENT KIT (you heard that right !!!! FORTRAN !!!! The BIG BROTHER OF BASIC which is FULL STRUCTURED PROGRAMMING AND WOULD MAKE SENSE AS A SOURCE CODE IDE + COMPILER APPROACH), and other languages.... if I am devious, Ada. Believe it or not, there was an implementation of Ada for C64 of it.
Now, imagine this for C256 because after all, we got a tad more resources. These would be additional programs that could be made available at some point in time.
" The number of clock cycles won't be a big issue when you are operating at say..... 20 MHz to 21 MHz (without a problem considering the 14 MHz 65c816 could be driven to about 25 MHz without any real problem and within Western Design Center's design tolerance as long as you keep the CPU cooled (say a heatsink). "
Unfortunately, you won't be able to overclock the system, since the 14.318Mhz is the base clock where everything is based on, if you change the clock, nothing will work, since most of the chipsets will be synchronous. It is not like there is clock for the CPU and everything else. At least the First C256 Foenix will be locked.
Secondly, the CPU is powered with 3.3V as opposed to 5V which limits greatly the overclocking possibilities... I know some people will try and they could actually succeed, but this will be their burden not mine. 14Mhz in 1987 would have been considered quite fast. The Amiga 500 was 7Mhz right? Besides, if they really want to speed things up, they might as well go and fetch a Mega65.
The C64 was 1.022Mhz and it did pretty well, so you know I am not worried if we loose a couple of clock cycles here and there. Trust me the rest of the system will compensate.
@Tom, I agree with your strategy. For me, I just want to be sure that what ever Function I call in the Jump Vectors list, the PC can find its way back to the original caller. If I am not mistaken, the early MAC with the 68K had a similar call list (it was huge actually).
This will give us the freedom to upgrade/modify the Kernel/Monitor/Text Editor/BASIC without having to worry about compatibility issues.
Cheers!
S
Yes, that's more or less what I'm thinking. Yes, I'm aware of the CPU's addressing modes. All those weird modes are going to be key in doing things like allowing relocatable user resident routines...
Not a problem when you're at several MHz. I'll add more little later. I'd suggest JSL because of cross 64KB bank boundaries to call the (Jump to Subroutines) to jump to Jump Vectors. All routines should be RTL in return. Each vector may use a JMP to the actual routines using opcode $5C (hex value of Opcode). The routines would invoke an RTL to return to the actual ML program including ML Monitor.
ML Monitor and ML programs could call a kernal routine at $F0:FCxx using JSL $F0FCxx and then the Jump Vector would have $5C, xxyyzz (where xxyyzz would be the hexadecimal value for the 24 bit memory address). $5C is the opcode value for a JMP with 24 bit absolute "long" address. The 24 bit address would be the actual starting address for a kernal routine (or any subroutine). The last instruction of the subroutine shall RTL (long return from subroutine). Remember, you're invoking a long jump to begin with. Doing this will allow you to cross the 64KB boundary using absolute addressing. The 65c816 actually has such addressing modes. It is found here: https://wiki.nesdev.com/w/images/7/76/Programmanual.pdf
The number of clock cycles won't be a big issue when you are operating at say..... 20 MHz to 21 MHz (without a problem considering the 14 MHz 65c816 could be driven to about 25 MHz without any real problem and within Western Design Center's design tolerance as long as you keep the CPU cooled (say a heatsink).
So the next question is: how do we manage kernel calls? Classically, Commodore has used a vector table: that is, a series of addresses intended to be accessed by an indirect JMP call, such as JMP ($5597).
With a vector, the actual routine isn't actually hosted at $5597. Instead, the bytes at $5597 contain another address in memory, where the actual routine is located. Vector tables are very efficient, since they only require one execution step, but the actual machine language opcode takes 5 clock cycles, compared to the 3 cycles used by a regular JMP. However operating system calls should actually be initiated by a JSR, or Jump to Subroutine, which doesn't have an indirect addressing mode.
Since there are going to be four different subsystems living in ROM (the kernel, BASIC, the monitor, and the editor), I'm going to suggest creating four jump tables.
The first jump table would live at the end of the first bank of kernel ROM, backfilled from $F0:FFFF. The boot code for BASIC, the monitor, and the editor would each live in the subsequent banks, with their jump tables at the end of that bank. If set aside 1K for each set of jump tables (that gives us 256 entries for each API), we would get:
kernel: $F0:FC00
BASIC: $F1:FC00
monitor: $F2:FC00
editor: $F3:FC00
It's entirely possible that BASIC and the kernel will take more than one bank of memory. This is perfectly acceptable, since the '816 processor works in 64K banks. Each page of code would simply need to be managed as a separate source file in source control, with a master file to link them all together in the right order.
So digging some more into the 816's architecture, it looks like you really want to keep the first 64K free for Zero Page (now called Direct Page) operations. I also like CP/M and DOS's model of keeping applications as low in RAM as possible, so application memory can start immediately after Direct Page space.
So here's a simplified memory map based on that idea:
$00:0000-00:00FF: System reserved Direct Page memory.
$00:0100-00:FEFF: Application Direct page, Stack, RS-232, IEC, and USB buffers.,
$00:FF00-00:FFFF: Interrupt vectors. These are hard-wired into the CPU, so they have to live in these addresses.
$01:0000-EFFFFF: Application memory; uses available RAM.
$E0:0000-EFFFFF: VRAM (1MB, as per initial specs)
$F0:0000-FF:FFFF: Kernel, Monitor, BASIC, Text Editor
I'm not sure yet where the I/O address space will be. Since the memory management unit, GAVIN, will be an FPGA core, I have a feeling we can design this however we want. Provisionally, I'd expect I/O to either use space at the top of the VRAM banks or to sit just below VRAM and be backfilled to the top of installed memory.
Given the choice, I'd rather see something like 64K of address space taken out of the video address space, as this potentially leaves 14MB for user RAM, before doing bank swapping or other tricks.
This sounds like an exciting project. It's been so long since I've written 6502 assembly that I'm basically starting over again.
Do you have a system architecture in mind? Are you going to try to follow the same basic outline as the Commodore 64 or 128, or are you going to start from scratch?
I have a few thoughts on the topic... basically the same things I came up with when pondering how I'd design a 16-bit computer.
Looking at something like CP/M, like the way it loads the OS into the upper part of memory. This allows user programs to always start with a consistent address, with the first page being reserved for vectoring operating system calls. However, due to the critical nature of Page 0 in the 6502 ISA, I would reserve Page 0 for transient data, instead using a block higher in memory for OS calls.
If I understand correctly, your intent is to use a CPU with a 24-bit address bus, which should give you 16MB of address space. So I'm thinking that the top 1MB of address space could be reserved for the operating system, I/O registers, I/O buffers, and VRAM. This reserves, theoretically, 15MB for user data.
I also noticed that you want to go with smart peripherals (and even included a Commodore-compatible IEC bus). I fully support this idea; making the devices smarter leaves core memory and clock cycles free for running the user's programs, rather than controlling the disk drive. Instead, the CPU could deposit data in a buffer, issue a "write to file" command, then let the I/O controller and disk drive work to get the data out to the drive.
What are you planning on doing for a user-facing OS? Are you going to go with something like a modified BASIC? If so, are you considering including a structured variant (aka: Microsoft Quick BASIC)? As much as I love BASIC, I have always hated line numbers, so including a text editor and Quick BASIC style language would definitely bring the system into the 21'st Century while also keeping the retro flavor.
Likewise, it's going to need a full machine monitor and debug environment. The debugger doesn't need to be huge, and I can start a separate thread for that (probably for BASIC, too.)