6502

The 6502 CPU

The MOS Technology 6502 is an 8-bit microprocessor designed by Chuck Peddle in 1975 for MOS Technology (later purchased by Commodore). When it was introduced it was the least expensive full featured CPU on the market by far, at about 1/6th the price, or less, of competing designs from larger companies such as Motorola and Intel. It was nevertheless faster than most of them, and, along with the Zilog Z80, sparked off a series of computer projects that would eventually result in the home computer revolution of the 1980s. The 6502 design was originally second-sourced by Rockwell and Synertek and later licensed to a number of companies; it is still made for embedded systems.

Originally the CPC was destined to be designed around the 6502 processor. But when Amstrad approached Locomotive Software to develop a Basic for it with a very tight deadline, Locomotive PLC, who already had a Z80 Basic in the works, urged and convinced Amstrad to switch to the Z80.

Description

The 6502 microprocessor is an 8-bit CPU with an 8-bit ALU and a 16-bit address bus capable of direct access to 64KB of memory space. Like the Z80, the 6502 is a little-endian CPU, meaning it stores 16-bit values with the least significant byte first, followed by the most significant byte. The 6502 has 151 instructions, which are composed of 56 distinct opcodes across various addressing modes.

Although it lacks the raw processing power of processors like the Intel 80x86 or the Motorola 68000 series, the 6502 was known for its efficiency and affordability, making it a popular choice for embedded systems and early home computers. Its simple design contributed to lower manufacturing costs and simplified integration.

The 6502 chip is made up of 4528 transistors (3510 enhancement transistors and 1018 depletion pullup transistors). Despite having so few transistors, it is generally viewed as being twice as fast as the Z80 for same clock speed. Two reasons explain it:

The 6502 has an 8-bit ALU while the Z80 has a 4-bit ALU
The 6502 has a clock doubler circuit built inside the chip

The 6502 comes in a 40-pin DIP package. It has been produced by various manufacturers and used in a wide range of applications, from gaming consoles like the Atari VCS, Atari Lynx, Nintendo Entertainment System and PC-Engine to personal computers like the Apple II, BBC Micro, Oric, VIC20 and Commodore 64.

Registers

Register	Size	Description	Notes
A (Accumulator)	8-bit	Main register for arithmetic, logic, and data transfer	Most operations use this register
X (Index Register X)	8-bit	Used for indexing memory and loop counters	Can be used for addressing modes like Indexed Indirect, Zero Page Indexed, and Absolute Indexed
Y (Index Register Y)	8-bit	Used for indexing memory and loop counters	Often used in Absolute and Zero Page Indexed addressing
P (Processor Status)	8-bit	bit7 - NF - Negative Flag: 1 when result is negative bit6 - VF - Overflow Flag: 1 on signed overflow bit5 - Unused: always set to 1 bit4 - BF - Break Flag: 1 when pushed by instructions (BRK / PHP) and 0 when pushed by interrupts (NMI / IRQ) bit3 - DF - Decimal Mode Flag: 1 when CPU is in Decimal Mode bit2 - IF - Interrupt Disable Flag: when 1, no interrupt will occur (except BRK and NMI) bit1 - ZF - Zero Flag: 1 when all bits of a result are 0 bit0 - CF - Carry Flag: 1 on unsigned overflow	Flags are affected by most operations. BF is not a physical flag implemented in a register. It only appears on the stack when the P register is pushed to it. PHP (Push Processor Status) and PLP (Pull Processor Status) can be used to set or retrieve P directly via the stack. Interrupts (BRK / NMI / IRQ) implicitly push P to the stack. Interrupts returning with RTI will implicitly pull P from the stack. The effect of toggling the IF flag is delayed by 1 instruction when caused by SEI, CLI, or PLP.
S (Stack Pointer)	8-bit	Points to the current location in the stack	Stack is located in page 1 ($0100-$01FF), 8-bit S register is offset to this base
PC (Program Counter)	16-bit	Points to the next instruction to be executed	Automatically increments as instructions are executed

Memory Access

The address space that the 6502 uses is split into pages. There are 256 pages and each page is 256 bytes in size, ranging from page 0 to page 255.

In order to make up for the lack of registers, the 6502 includes a zero page addressing mode ($0000-$00FF) that uses only 1 address byte in the instruction instead of the 2 that are needed to address the full 64 KB of memory. This provides fast access to the first 256 bytes of RAM by using shorter instructions.

The stack is permanently located in page 1 ($0100-$01FF) and managed by the 8-bit stack pointer (S), with an initial value of $FF. It grows downward as data is pushed onto the stack. The stack has a 256-byte limit, and overflow occurs if not managed properly.

Instructions PHA and PHP push the accumulator and processor status onto the stack, while PLA and PLP pull them back. Subroutine calls with JSR store the return address on the stack, and RTS retrieves it to continue execution. Similarly, interrupts (BRK) push the program counter and status, while RTI restores them.

All I/O operations are memory-mapped. There are no port-based I/O instructions.

Memory-mapped ports often have different properties than normal RAM:

A read-only port is what it sounds like. Attempting to write to this address will not affect the contents. It is also possible that reading a port will alter its contents, or alter the contents of other related ports.

A write-only port can be written to, but reading it will result in undefined behavior. The value read from the address is not necessarily what was last stored in it.

Interrupts

6502 machines use the last 6 bytes of their address space to hold a vector table containing (in order) the addresses of the NMI routine, the program's start, and the IRQ routine.

On a RESET, the CPU loads the vector from $FFFC/$FFFD into the program counter and continues fetching instructions from there.

On an NMI, the CPU pushes the low byte and the high byte of the program counter as well as the processor status onto the stack, disables interrupts and loads the vector from $FFFA/$FFFB into the program counter and continues fetching instructions from there.

On an IRQ, the CPU does the same as in the NMI case, but uses the vector at $FFFE/$FFFF.

On a BRK instruction, the CPU does the same as in the IRQ case, but sets BF in the copy of the status register that is saved on the stack.

The priority sequence for interrupts, from top priority to bottom, is as follows: RESET, BRK, NMI, IRQ. Source at chapter 7.19

Interrupt hijacking

On NMOS, if NMI is asserted during the first 4 ticks of a BRK instruction, the BRK instruction will execute normally at first (PC increments will occur and P will be pushed with BF set to 1), but execution will branch to the NMI vector instead of the IRQ/BRK vector. On CMOS, this situation is correctly handled by executing BRK and then servicing the interrupt.

An IRQ can also hijack a BRK, though it won't be as visible since they use the same interrupt vector.

Similarly, an NMI can hijack an IRQ. But this is not usually a problem because the IRQ will normally still be asserted when the NMI returns and generate a new interrupt.

Branch instructions and Interrupts

The branch instructions have subtle interrupt polling behavior. When executing a branch instruction, the 6502 checks for interrupts before fetching the operand (cycle 2).

If the branch is taken (i.e., the CPU decides to jump), it does not check for interrupts again before proceeding unless the branch crosses a page boundary (like moving from memory address $01FF to $0200).

If the branch crosses a page boundary, the CPU checks for interrupts once more before fixing the program counter.

If an interrupt is detected at any of these points (before the operand fetch or the page boundary fixup), the CPU will handle the interrupt immediately, interrupting the branch execution.

Decimal Mode

BCD operations are limited to addition and subtraction using the ADC and SBC instructions.

On NMOS, when Decimal Mode is on, the ADC and SBC instructions update NF, VF and ZF based on the binary result before the decimal correction is applied. Only CF is updated correctly. On CMOS, all the flags are updated correctly, at the cost of 1 additional cycle.

On NMOS, DF is not defined after RESET. On CMOS, DF is automatically cleared on RESET.

On NMOS, DF is unchanged when entering an interrupt of any kind. This can cause unexpected bugs in the interrupt handler if Decimal Mode is on when an interrupt occurs. On CMOS, DF is automatically cleared on interrupt. Upon returning from an interrupt, the processor restores the status register from the stack, including DF.

Half Cycles

The 6502 divides each clock cycle into two phases (ϕ1 and ϕ2):

During the ϕ1 half-cycle, no bus access occurs. This phase is dedicated to internal CPU operations.
During the ϕ2 half-cycle, the CPU accesses the external bus for memory reads/writes or I/O operations.

The use of half-cycles ensures that memory and I/O devices have predictable timing windows when the CPU will access the bus, while still allowing the CPU to perform internal operations in parallel.

Unlike most microprocessors, the 6502 does not make memory accesses on an "as needed" basis. It always does a fetch or store on every single clock cycle. When there isn't anything to be fetched or stored, a "garbage" fetch or store occurs. This is mainly of importance with the memory-mapped I/O devices:

On NMOS, when adding a carry to the MSB of an address, a fetch occurs at a garbage address. On CMOS, the last byte of the instruction is refetched.
On NMOS, when doing a fetch-modify-store instruction (INC, DEC, ASL, LSR, ROL, ROR), garbage is stored into the location during the "modify" cycle... followed by the "real" store cycle which stores the correct data. On CMOS, a second fetch is performed instead of a garbage store.

Pipelining

The 6502 CPU uses some sort of pipelining. If an instruction does not store data in memory on its last cycle, the processor can fetch the opcode of the next instruction while executing the last cycle. This is very primitive as the 6502 does not have an instruction cache nor even a prefetch queue. It relies on RAM to hold all program information.

As an example, the instruction EOR #$FF truly takes 3 cycles:

On the first cycle, the opcode $49 will be fetched
During the second cycle the processor decodes the opcode and fetches the parameter #$FF
On the third cycle, the processor will perform the operation and store the result in register A, but simultaneously it fetches the opcode for the next instruction

This is why the EOR instruction effectively takes only 2 cycles.

However, this pipelining only makes sense when looking at full cycles. If we break it down into half-cycles, there's no actual overlap. In fact, it's the other way around. If the previous instruction ends with a memory write, the CPU has to wait for a half-cycle before fetching the next instruction on the next ϕ2 half-cycle.

Adressing Modes

Addressing Mode	Example	Operation
Immediate	LDA #$EA	A ← $EA
Absolute	LDA $0314	A ← M($0314)
Absolute,X	LDA $0314,X	A ← M($0314+X)
Absolute,Y	LDA $0314,Y	A ← M($0314+Y)
Zeropage	LDA $02	A ← M($02)
Zeropage,X	LDA $02,X	A ← M($02+X)
Zeropage,Y	LDA $02,Y	A ← M($02+Y)
(Zeropage,X)	LDA ($02,X)	A ← M(PTR($02+X))
(Zeropage),Y	LDA ($02),Y	A ← M(PTR($02)+Y)

NMOS 6502 Instruction Set

Standard instructions

Cycles are shown in parenthesis for each opcode. p=1 if page is crossed. t=1 if branch is taken.

Mnemonic	Addressing Modes													Flags							Operation	Description
Mnemonic	No arg	A	#$nn	$nnnn	$nnnn,X	$nnnn,Y	($nnnn)	$nn	$nn,X	$nn,Y	($nn,X)	($nn),Y	rel	N	V	B	D	I	Z	C	Operation	Description
ADC			69 (2)	6D (4)	7D (4+p)	79 (4+p)		65 (3)	75 (4)		61 (6)	71 (5+p)		N	V	-	-	-	Z	C	A + M + CF → A, CF	Add Memory to Accumulator with Carry
AND			29 (2)	2D (4)	3D (4+p)	39 (4+p)		25 (3)	35 (4)		21 (6)	31 (5+p)		N	-	-	-	-	Z	-	A ∧ M → A	"AND" Memory with Accumulator
ASL		0A (2)		0E (6)	1E (7)			06 (5)	16 (6)					N	-	-	-	-	Z	C	CF ← /M₇...M₀/ ← 0	Arithmetic Shift Left
BCC													90 (2+t+p)	-	-	-	-	-	-	-	Branch on CF = 0	Branch on Carry Clear
BCS													B0 (2+t+p)	-	-	-	-	-	-	-	Branch on CF = 1	Branch on Carry Set
BEQ													F0 (2+t+p)	-	-	-	-	-	-	-	Branch on ZF = 1	Branch on Result Zero
BIT				2C (4)				24 (3)						N	V	-	-	-	Z	-	A ∧ M, M₇ → NF, M₆ → VF	Test Bits in Memory with Accumulator
BMI													30 (2+t+p)	-	-	-	-	-	-	-	Branch on NF = 1	Branch on Result Minus
BNE													D0 (2+t+p)	-	-	-	-	-	-	-	Branch on ZF = 0	Branch on Result Not Zero
BPL													10 (2+t+p)	-	-	-	-	-	-	-	Branch on NF = 0	Branch on Result Plus
BRK	00 (7)													-	-	1	-	1	-	-	PC + 2↓, [FFFE] → PCL, [FFFF] → PCH	Force Interrupt
BVC													50 (2+t+p)	-	-	-	-	-	-	-	Branch on VF = 0	Branch on Overflow Clear
BVS													70 (2+t+p)	-	-	-	-	-	-	-	Branch on VF = 1	Branch on Overflow Set
CLC	18 (2)													-	-	-	-	-	-	0	0 → CF	Clear Carry Flag
CLD	D8 (2)													-	-	-	0	-	-	-	0 → DF	Clear Decimal Mode
CLI	58 (2)													-	-	-	-	0	-	-	0 → IF	Clear Interrupt Disable
CLV	B8 (2)													-	0	-	-	-	-	-	0 → VF	Clear Overflow Flag
CMP			C9 (2)	CD (4)	DD (4+p)	D9 (4+p)		C5 (3)	D5 (4)		C1 (6)	D1 (5+p)		N	-	-	-	-	Z	C	A - M	Compare Memory and Accumulator
CPX			E0 (2)	EC (4)				E4 (3)						N	-	-	-	-	Z	C	X - M	Compare Index Register X To Memory
CPY			C0 (2)	CC (4)				C4 (3)						N	-	-	-	-	Z	C	Y - M	Compare Index Register Y To Memory
DEC				CE (6)	DE (7)			C6 (5)	D6 (6)					N	-	-	-	-	Z	-	M - 1 → M	Decrement Memory By One
DEX	CA (2)													N	-	-	-	-	Z	-	X - 1 → X	Decrement Index Register X By One
DEY	88 (2)													N	-	-	-	-	Z	-	Y - 1 → Y	Decrement Index Register Y By One
EOR			49 (2)	4D (4)	5D (4+p)	59 (4+p)		45 (3)	55 (4)		41 (6)	51 (5+p)		N	-	-	-	-	Z	-	A ⊻ M → A	"Exclusive OR" Memory with Accumulator
INC				EE (6)	FE (7)			E6 (5)	F6 (6)					N	-	-	-	-	Z	-	M + 1 → M	Increment Memory By One
INX	E8 (2)													N	-	-	-	-	Z	-	X + 1 → X	Increment Index Register X By One
INY	C8 (2)													N	-	-	-	-	Z	-	Y + 1 → Y	Increment Index Register Y By One
JMP				4C (3)			6C (5)							-	-	-	-	-	-	-	[PC + 1] → PCL, [PC + 2] → PCH	Jump
JSR				20 (6)										-	-	-	-	-	-	-	PC + 2↓, [PC + 1] → PCL, [PC + 2] → PCH	Jump To Subroutine
LDA			A9 (2)	AD (4)	BD (4+p)	B9 (4+p)		A5 (3)	B5 (4)		A1 (6)	B1 (5+p)		N	-	-	-	-	Z	-	M → A	Load Accumulator with Memory
LDX			A2 (2)	AE (4)		BE (4+p)		A6 (3)		B6 (4)				N	-	-	-	-	Z	-	M → X	Load Index Register X From Memory
LDY			A0 (2)	AC (4)	BC (4+p)			A4 (3)	B4 (4)					N	-	-	-	-	Z	-	M → Y	Load Index Register Y From Memory
LSR		4A (2)		4E (6)	5E (7)			46 (5)	56 (6)					0	-	-	-	-	Z	C	0 → /M₇...M₀/ → CF	Logical Shift Right
NOP	EA (2)													-	-	-	-	-	-	-	No operation	No Operation
ORA			09 (2)	0D (4)	1D (4+p)	19 (4+p)		05 (3)	15 (4)		01 (6)	11 (5+p)		N	-	-	-	-	Z	-	A ∨ M → A	"OR" Memory with Accumulator
PHA	48 (3)													-	-	-	-	-	-	-	A↓	Push Accumulator On Stack
PHP	08 (3)													-	-	1	-	-	-	-	P↓	Push Processor Status on Stack
PLA	68 (4)													N	-	-	-	-	Z	-	(S)↑ → A	Pull Accumulator From Stack
PLP	28 (4)													N	V	-	D	I	Z	C	(S)↑ → P	Pull Processor Status From Stack
ROL		2A (2)		2E (6)	3E (7)			26 (5)	36 (6)					N	-	-	-	-	Z	C	CF ← /M₇...M₀/ ← CF	Rotate One Bit Left (Memory or Accumulator)
ROR		6A (2)		6E (6)	7E (7)			66 (5)	76 (6)					N	-	-	-	-	Z	C	CF → /M₇...M₀/ → CF	Rotate One Bit Right (Memory or Accumulator)
RTI	40 (6)													N	V	-	D	I	Z	C	(S)↑ → P, (S)↑ → PCL, (S)↑ → PCH	Return From Interrupt
RTS	60 (6)													-	-	-	-	-	-	-	(S)↑ → PCL, (S)↑ → PCH, PC + 1 → PC	Return From Subroutine
SBC			E9 (2)	ED (4)	FD (4+p)	F9 (4+p)		E5 (3)	F5 (4)		E1 (6)	F1 (5+p)		N	V	-	-	-	Z	C	A - M - (1 - CF) → A	Subtract Memory from Accumulator with Borrow
SEC	38 (2)													-	-	-	-	-	-	1	1 → CF	Set Carry Flag
SED	F8 (2)													-	-	-	1	-	-	-	1 → DF	Set Decimal Mode
SEI	78 (2)													-	-	-	-	1	-	-	1 → IF	Set Interrupt Disable
STA				8D (4)	9D (5)	99 (5)		85 (3)	95 (4)		81 (6)	91 (6)		-	-	-	-	-	-	-	A → M	Store Accumulator in Memory
STX				8E (4)				86 (3)		96 (4)				-	-	-	-	-	-	-	X → M	Store Index X in Memory
STY				8C (4)				84 (3)	94 (4)					-	-	-	-	-	-	-	Y → M	Store Index Y in Memory
TAX	AA (2)													N	-	-	-	-	Z	-	A → X	Transfer Accumulator to Index X
TAY	A8 (2)													N	-	-	-	-	Z	-	A → Y	Transfer Accumulator to Index Y
TSX	BA (2)													N	-	-	-	-	Z	-	S → X	Transfer Stack Pointer to Index X
TXA	8A (2)													N	-	-	-	-	Z	-	X → A	Transfer Index X to Accumulator
TXS	9A (2)													-	-	-	-	-	-	-	X → S	Transfer Index X to Stack Pointer
TYA	98 (2)													N	-	-	-	-	Z	-	Y → A	Transfer Index Y to Accumulator

Illegal instructions

Opcodes in red are unstable. Only 2 of those 7 opcodes ($8B, $AB) are actually unstable in the sense that they may produce a truly unpredictable result. The other 5 opcodes actually produce predictable results – but the conditions under which they do that and the produced results are a bit unexpected.

Mnemonic	Combines	Addressing Modes										Flags							Operation	Description
Mnemonic	Combines	No arg	#$nn	$nnnn	$nnnn,X	$nnnn,Y	$nn	$nn,X	$nn,Y	($nn,X)	($nn),Y	N	V	B	D	I	Z	C	Operation	Description
ANC (ANC2)	AND + ASL/ROL		0B, 2B (2)									N	-	-	-	-	Z	C	A ∧ M → A, NF → CF	"AND" Memory with Accumulator then Move Negative Flag to Carry Flag
ARR	AND + ROR		6B (2)									N	V	-	-	-	Z	C	(A ∧ M) / 2 → A	"AND" Accumulator then Rotate Right
ASR (ALR)	AND + LSR		4B (2)									0	-	-	-	-	Z	C	(A ∧ M) / 2 → A	"AND" then Logical Shift Right
DCP (DCM)	DEC + CMP			CF (6)	DF (7)	DB (7)	C7 (5)	D7 (6)		C3 (8)	D3 (8)	N	-	-	-	-	Z	C	M - 1 → M, A - M	Decrement Memory By One then Compare with Accumulator
ISC (ISB, INS)	INC + SBC			EF (6)	FF (7)	FB (7)	E7 (5)	F7 (6)		E3 (8)	F3 (8)	N	V	-	-	-	Z	C	M + 1 → M, A - M → A	Increment Memory By One then SBC then Subtract Memory from Accumulator with Borrow
JAM (KIL, HLT)		02, 12, 22, 32, 42, 52, 62, 72, 92, B2, D2, F2 (X)										-	-	-	-	-	-	-	Stop execution	Halt the CPU
LAS (LAR)	STA/TXS + LDA/STX					BB (4+p)						N	-	-	-	-	Z	-	M ∧ S → A, X, S	"AND" Memory with Stack Pointer
LAX (LXA)	LDA + LDX		AB (2)	AF (4)		BF (4+p)	A7 (3)		B7 (4)	A3 (6)	B3 (5+p)	N	-	-	-	-	Z	-	M → A, X	Load Accumulator and Index Register X From Memory
NOP (DOP, TOP)		1A, 3A, 5A, 7A, DA, FA (2)	80, 82, 89, C2, E2 (2)	0C (4)	1C, 3C, 5C, 7C, DC, FC (4+p)		04, 44, 64 (3)	14, 34, 54, 74, D4, F4 (4)				-	-	-	-	-	-	-	No operation	No Operation
RLA	ROL + AND			2F (6)	3F (7)	3B (7)	27 (5)	37 (6)		23 (8)	33 (8)	N	-	-	-	-	Z	C	CF ← /M7...M0/ ← CF, A ∧ M → A	Rotate Left then "AND" with Accumulator
RRA	ROR + ADC			6F (6)	7F (7)	7B (7)	67 (5)	77 (6)		63 (8)	73 (8)	N	V	-	-	-	Z	C	CF → /M7...M0/ → CF, A + M + CF → A	Rotate Right and Add Memory to Accumulator
SAX (AXS, AAX)	STA + STX			8F (4)			87 (3)		97 (4)	83 (6)		-	-	-	-	-	-	-	A ∧ X → M	Store Accumulator "AND" Index Register X in Memory
SBC (USBC)	SBC + NOP		EB (2)									N	V	-	-	-	Z	C	A - M - ~CF → A	Subtract Memory from Accumulator with Borrow
SBX (AXS, SAX)	CMP + DEX		CB (2)									N	-	-	-	-	Z	C	(A ∧ X) - M → X	Subtract Memory from Accumulator "AND" Index Register X
SHA (AHX, AXA)	STA/STX/STY					9F (5)					93 (6)	-	-	-	-	-	-	-	A ∧ X ∧ V → M	Store Accumulator "AND" Index Register X "AND" Value
SHS (TAS, XAS)	STA/TXS + LDA/TSX					9B (5)						-	-	-	-	-	-	-	A ∧ X → S, S ∧ (H + 1) → M	Transfer Accumulator "AND" Index Register X to Stack Pointer then Store Stack Pointer "AND" Hi-Byte In Memory
SHX (SXA, XAS)	STA/STX/STY					9E (5)						-	-	-	-	-	-	-	X ∧ (H + 1) → M	Store Index Register X "AND" Value
SHY (SYA, SAY)	STA/STX/STY				9C (5)							-	-	-	-	-	-	-	Y ∧ (H + 1) → M	Store Index Register Y "AND" Value
SLO (ASO)	ASL + ORA			0F (6)	1F (7)	1B (7)	07 (5)	17 (6)		03 (8)	13 (8)	N	-	-	-	-	Z	C	M * 2 → M, A ∨ M → A	Arithmetic Shift Left then "OR" Memory with Accumulator
SRE (LSE)	LSR + EOR			4F (6)	5F (7)	5B (7)	47 (5)	57 (6)		43 (8)	53 (8)	N	-	-	-	-	Z	C	M / 2 → M, A ⊻ M → A	Logical Shift Right then "Exclusive OR" Memory with Accumulator
XAA (ANE)	TXA + AND		8B (2)									N	-	-	-	-	Z	-	(A ∨ V) ∧ X ∧ M → A	Non-deterministic Operation of Accumulator, Index Register X, Memory and Bus Contents

Opcodes

The 6502 follows a 3-3-2 opcode bit pattern. If we arrange the opcode table in a slightly different way than it is usually done, we can observe some interesting symmetries:

Opc	Mnemonic
00	BRK
04	NOP zpg
08	PHP
0C	NOP abs
10	BPL rel
14	*NOP zpg,X*
18	CLC
1C	*NOP abs,X*

Opc	Mnemonic
20	JSR abs
24	BIT zpg
28	PLP
2C	BIT abs
30	BMI rel
34	*NOP zpg,X*
38	SEC
3C	*NOP abs,X*

Opc	Mnemonic
40	RTI
44	NOP zpg
48	PHA
4C	JMP abs
50	BVC rel
54	*NOP zpg,X*
58	CLI
5C	*NOP abs,X*

Opc	Mnemonic
60	RTS
64	NOP zpg
68	PLA
6C	JMP zpg
70	BVS rel
74	*NOP zpg,X*
78	SEI
7C	*NOP abs,X*

Opc	Mnemonic
80	NOP #
84	STY zpg
88	DEY
8C	STY abs
90	BCC rel
94	STY zpg,X
98	TYA
9C	*SHY abs,X*

Opc	Mnemonic
A0	LDY #
A4	LDY zpg
A8	TAY
AC	LDY abs
B0	BCS rel
B4	LDY zpg,X
B8	CLV
BC	LDY abs,X

Opc	Mnemonic
C0	CPY #
C4	CPY zpg
C8	INY
CC	CPY abs
D0	BNE rel
D4	*NOP zpg,X*
D8	CLD
DC	*NOP abs,X*

Opc	Mnemonic
E0	CPX #
E4	CPX zpg
E8	INX
EC	CPX abs
F0	BEQ rel
F4	*NOP zpg,X*
F8	SED
FC	*NOP abs,X*

Opc	Mnemonic
01	ORA (zpg,X)
05	ORA zpg
09	ORA #
0D	ORA abs
11	ORA (zpg),Y
15	ORA zpg,X
19	ORA abs,Y
1D	ORA abs,X

Opc	Mnemonic
21	AND (zpg,X)
25	AND zpg
29	AND #
2D	AND abs
31	AND (zpg),Y
35	AND zpg,X
39	AND abs,Y
3D	AND abs,X

Opc	Mnemonic
41	EOR (zpg,X)
45	EOR zpg
49	EOR #
4D	EOR abs
51	EOR (zpg),Y
55	EOR zpg,X
59	EOR abs,Y
5D	EOR abs,X

Opc	Mnemonic
61	ADC (zpg,X)
65	ADC zpg
69	ADC #
6D	ADC abs
71	ADC (zpg),Y
75	ADC zpg,X
79	ADC abs,Y
7D	ADC abs,X

Opc	Mnemonic
81	STA (zpg,X)
85	STA zpg
89	NOP #
8D	STA abs
91	STA (zpg),Y
95	STA zpg,X
99	STA abs,Y
9D	STA abs,X

Opc	Mnemonic
A1	LDA (zpg,X)
A5	LDA zpg
A9	LDA #
AD	LDA abs
B1	LDA (zpg),Y
B5	LDA zpg,X
B9	LDA abs,Y
BD	LDA abs,X

Opc	Mnemonic
C1	CMP (zpg,X)
C5	CMP zpg
C9	CMP #
CD	CMP abs
D1	CMP (zpg),Y
D5	CMP zpg,X
D9	CMP abs,Y
DD	CMP abs,X

Opc	Mnemonic
E1	SBC (zpg,X)
E5	SBC zpg
E9	SBC #
ED	SBC abs
F1	SBC (zpg),Y
F5	SBC zpg,X
F9	SBC abs,Y
FD	SBC abs,X

Opc	Mnemonic
02	JAM
06	ASL zpg
0A	ASL A
0E	ASL abs
12	JAM
16	ASL zpg,X
1A	NOP
1E	ASL abs,X

Opc	Mnemonic
22	JAM
26	ROL zpg
2A	ROL A
2E	ROL abs
32	JAM
36	ROL zpg,X
3A	NOP
3E	ROL abs,X

Opc	Mnemonic
42	JAM
46	LSR zpg
4A	LSR A
4E	LSR abs
52	JAM
56	LSR zpg,X
5A	NOP
5E	LSR abs,X

Opc	Mnemonic
62	JAM
66	ROR zpg
6A	ROR A
6E	ROR abs
72	JAM
76	ROR zpg,X
7A	NOP
7E	ROR abs,X

Opc	Mnemonic
82	NOP #
86	STX zpg
8A	TXA
8E	STX abs
92	JAM
96	STX zpg,Y
9A	TXS
9E	*SHX abs,Y*

Opc	Mnemonic
A2	LDX #
A6	LDX zpg
AA	TAX
AE	LDX abs
B2	JAM
B6	LDX zpg,Y
BA	TSX
BE	LDX abs,X

Opc	Mnemonic
C2	NOP #
C6	DEC zpg
CA	DEX
CE	DEC abs
D2	JAM
D6	DEC zpg,X
DA	NOP
DE	DEC abs,X

Opc	Mnemonic
E2	NOP #
E6	INC zpg
EA	NOP
EE	INC abs
F2	JAM
F6	INC zpg,X
FA	NOP
FE	INC abs,X

Opc	Mnemonic
03	*SLO (zpg,X)*
07	SLO zpg
0B	ANC #
0F	SLO abs
13	*SLO (zpg),Y*
17	*SLO zpg,X*
1B	*SLO abs,Y*
1F	*SLO abs,X*

Opc	Mnemonic
23	*RLA (zpg,X)*
27	RLA zpg
2B	ANC #
2F	RLA abs
33	*RLA (zpg),Y*
37	*RLA zpg,X*
3B	*RLA abs,Y*
3F	*RLA abs,X*

Opc	Mnemonic
43	*SRE (zpg,X)*
47	SRE zpg
4B	ASR #
4F	SRE abs
53	*SRE (zpg),Y*
57	*SRE zpg,X*
5B	*SRE abs,Y*
5F	*SRE abs,X*

Opc	Mnemonic
63	*RRA (zpg,X)*
67	RRA zpg
6B	ARR #
6F	RRA abs
73	*RRA (zpg),Y*
77	*RRA zpg,X*
7B	*RRA abs,Y*
7F	*RRA abs,X*

Opc	Mnemonic
83	*SAX (zpg,X)*
87	SAX zpg
8B	XAA #
8F	SAX abs
93	*SHA (zpg),Y*
97	*SAX zpg,Y*
9B	*SHS abs,Y*
9F	*SHA abs,Y*

Opc	Mnemonic
A3	*LAX (zpg,X)*
A7	LAX zpg
AB	LXA #
AF	LAX abs
B3	*LAX (zpg),Y*
B7	*LAX zpg,Y*
BB	*LAS abs,Y*
BF	*LAX abs,Y*

Opc	Mnemonic
C3	*DCP (zpg,X)*
C7	DCP zpg
CB	SBX #
CF	DCP abs
D3	*DCP (zpg),Y*
D7	*DCP zpg,X*
DB	*DCP abs,Y*
DF	*DCP abs,X*

Opc	Mnemonic
E3	*ISC (zpg,X)*
E7	ISC zpg
EB	SBC #
EF	ISC abs
F3	*ISC (zpg),Y*
F7	*ISC zpg,X*
FB	*ISC abs,Y*
FF	*ISC abs,X*

Opcodes in bold are illegal. Opcodes in red are unstable.

Oddities

On NMOS, an indirect JMP will behave unexpectedly when the indirect address crosses a page boundary, because the 6502 does not add the carry to calculate the address of the high byte. For example, JMP ($19FF) will use the contents of $19FF and $1900 for the JMP address. On CMOS, this issue was fixed, at the cost of 1 additional cycle. In our example, JMP ($19FF) will use the contents of $19FF and $2000 for the JMP address.

Some instructions, particularly those involving branches or indexed addressing modes, incur an extra cycle if the processor has to cross a memory page boundary. This is problematic for time-sensitive code.

Conditional jumps are only 8-bit relative. And unconditional jumps are only 16-bit absolute.

ADC is the only command for addition. To perform an addition without carry, the carry flag must be cleared manually first. Same with SBC for subtract.

The CLV (Clear Overflow Flag) instruction exist but not the SEV (Set Overflow Flag) instruction.

The NOP instruction takes 2 full-cycles. This is the minimum amount of cycles an instruction can take.

The alternate NOPs are not created equal. Some have one- or two-byte operands (which they don't do anything with), and they take different amounts of time to execute.

Block Diagrams

Simple view

Detailed view

CPU Pinout

Notes:

SYNC is an output signal. It is high at T0.
S.O. is an input signal, which stands for Set Overflow. It allows the hardware to affect VF independently of the software.
Some pins are modified in CPU variants.

Chip Variants

The ROR instruction didn't exist in the very earliest (pre-1977) chips.

The 6502 core used inside the NES is missing the Decimal Mode feature.

The 6507 CPU, used in the Atari VCS, has only 13 address lines. So it can only address 8KB instead of 64KB. It also lacks the IRQ and NMI interrupt lines.

The 6510 CPU, used in the Commodore 64, is a 6502 with an additional AEC pin that puts the bus in high impedance mode. It also includes a 6-bit I/O port that occupies addresses 0 and 1.

The disk drive of the Commodore 64 has its own 6502 processor that acts as a floppy disk controller (FDC) and as a disk operating system (DOS) processor. It can also be used as a general coprocessor for the main system.

The 6502C used in Atari 8-bit computer range, adds an additional HALT pin. The 6502C is otherwise a regular NMOS 6502, not to be confused with the CMOS 65C02.

The CMOS 65C02 fixed multiple bugs of the original NMOS 6502, but also removed access to all illegal instructions. Some cycle counts have been modified and some extra instructions have been added. In fact, there are multiple implementations of the 65C02 (WDC 65C02, WDC 65C02S, Rockwell R65C02, CSG 65CE02, ...), each with its own variant of the instruction set.

The HuC6280, used in the PC-Engine gaming console, is an improved version of the CMOS 65C02.

The 65C816, used in the SNES and the Apple IIGS, is a 16-bit version of the 65C02. It contains a compatibillity mode, enabled by default upon reset, that makes it behave like a regular 65C02.

The Sony SPC700 sound CPU used inside the SNES also behaves similarly to a 6502 with some extensions. Source