Stack register

A stack register is a computer central processor register whose purpose is to keep track of a call stack. On an accumulator-based architecture machine, this may be a dedicated register. On a machine with multiple general-purpose registers, it may be a register that is reserved by convention, such as on the IBM System/360 through z/Architecture architecture and RISC architectures, or it may be a register that procedure call and return instructions are hardwired to use, such as on the PDP-11, VAX, and Intel x86 architectures. Some designs such as the Data General Eclipse had no dedicated register, but used a reserved hardware memory address for this function.

Machines before the late 1960s—such as the PDP-8 and HP 2100—did not have compilers which supported recursion. Their subroutine instructions typically would save the current location in the jump address, and then set the program counter to the next address.^[1] While this is simpler than maintaining a stack, since there is only one return location per subroutine code section, there cannot be recursion without considerable effort on the part of the programmer.

A stack machine has 2 or more stack registers — one of them keeps track of a call stack, the other(s) keep track of other stack(s).

Stack registers in x86

In 8086, the main stack register is called "stack pointer" (SP). The stack segment register (SS) is usually used to store information about the memory segment that stores the call stack of currently executed program. SP points to current stack top. By default, the stack grows downward in memory, so newer values are placed at lower memory addresses. To save a value to the stack, the PUSH instruction is used. To retrieve a value from the stack, the POP instruction is used.

Example: Assuming that SS = 1000h and SP = 0xF820. This means that current stack top is the physical address 0x1F820 (this is due to memory segmentation in 8086). The next two machine instructions of the program are:

PUSH AX
PUSH BX

These first instruction shall push the value stored in AX (16-bit register) to the stack. This is done by subtracting a value of 2 (2 bytes) from SP.
The new value of SP becomes 0xF81E. The CPU then copies the value of AX to the memory word whose physical address is 0x1F81E.
When "PUSH BX" is executed, SP is set to 0xF81C and BX is copied to 0x1F81C.^[2]

This illustrates how PUSH works. Usually, the running program pushes registers to the stack to make use of the registers for other purposes, like to call a routine that may change the current values of registers. To restore the values stored at the stack, the program shall contain machine instructions like this:

POP BX
POP AX

POP BX copies the word at 0x1F81C (which is the old value of BX) to BX, then increases SP by 2. SP now is 0xF81E.
POP AX copies the word at 0x1F81E to AX, then sets SP to 0xF820.^{[nb 1]}^{[nb 2]}

Stack engine

Simpler processors store the stack pointer in a regular hardware register and use the arithmetic logic unit (ALU) to manipulate its value. Typically push and pop are translated into multiple micro-ops, to separately add/subtract the stack pointer, and perform the load/store in memory.^[3]

Newer processors contain a dedicated stack engine to optimize stack operations. Pentium M was the first x86 processor to introduce a stack engine. In its implementation, the stack pointer is split among two registers: ESP_O, which is a 32-bit register, and ESP_d, an 8-bit delta value that is updated directly by stack operations. PUSH, POP, CALL and RET opcodes operate directly with the ESP_d register. If ESP_d is near overflow or the ESP register is referenced from other instructions (when ESP_d ≠ 0), a synchronisation micro-op is inserted that updates the ESP_O using the ALU and resets ESP_d to 0. This design has remained largely unmodified in later Intel processors, although ESP_O has been expanded to 64 bits.^[4]

A stack engine similar to Intel's was also adopted in the AMD K8 microarchitecture. In Bulldozer, the need for synchronization micro-ops was removed, but the internal design of the stack engine is not known.^[4]

Notes

^ The program above pops BX first because it was pushed last.
^ In 8086, PUSH & POP instructions can only work with 16-bit elements.

References

^ Salomon, David (February 1993) [1992]. Written at California State University, Northridge, California, USA. Chivers, Ian D. (ed.). Assemblers and Loaders (PDF). Ellis Horwood Series In Computers And Their Applications (1 ed.). Chicester, West Sussex, UK: Ellis Horwood Limited / Simon & Schuster International Group. ISBN 0-13-052564-2. Archived (PDF) from the original on 2020-03-23. Retrieved 2008-10-01. Most computers save the return address in either the stack, in one of the registers, or in the first word of the procedure (in which case the first executable instruction of the procedure should be stored in the second word). If the latter method is used, a return from the procedure is a jump to the memory location whose address is contained in the first word of the procedure. (xiv+294+4 pages)
^ Howard, Brian. "Assembly Tutorial - Instructions". Computer Science Department, DePauw University. Retrieved 2013-07-19.
^ Stokes, Jon "Hannibal" (2004-02-25). "A Look at Centrino's Core: The Pentium M". archive.arstechnica.com. p. 5.
^ ^a ^b Fog, Agner. "The microarchitecture of Intel, AMD and VIA CPUs" (PDF). Technical University of Denmark.

[NB1-3] The program above pops BX first because it was pushed last.

[NB2-4] In 8086, PUSH & POP instructions can only work with 16-bit elements.

[Salomon_1993-1] Salomon, David (February 1993) [1992]. Written at California State University, Northridge, California, USA. Chivers, Ian D. (ed.). Assemblers and Loaders (PDF). Ellis Horwood Series In Computers And Their Applications (1 ed.). Chicester, West Sussex, UK: Ellis Horwood Limited / Simon & Schuster International Group. ISBN 0-13-052564-2. Archived (PDF) from the original on 2020-03-23. Retrieved 2008-10-01. Most computers save the return address in either the stack, in one of the registers, or in the first word of the procedure (in which case the first executable instruction of the procedure should be stored in the second word). If the latter method is used, a return from the procedure is a jump to the memory location whose address is contained in the first word of the procedure. (xiv+294+4 pages)

[Howard_2013-2] Howard, Brian. "Assembly Tutorial - Instructions". Computer Science Department, DePauw University. Retrieved 2013-07-19.

[Stokes_2004-5] Stokes, Jon "Hannibal" (2004-02-25). "A Look at Centrino's Core: The Pentium M". archive.arstechnica.com. p. 5.

[Fog-6] Fog, Agner. "The microarchitecture of Intel, AMD and VIA CPUs" (PDF). Technical University of Denmark.

[1]