MIPS Products
Font Resize:

MIPS32® 1004K™

The MIPS32® 1004K™ Coherent Processing System (CPS) is the industry's first multi-threaded multiprocessor IP core. Incorporating multi-threading in each core in a coherent multi-core architecture enables the 1004K™ multiprocessor to surpass the performance of multi-core systems based on single-threaded processor cores. This performance boost essentially is "free" in both hardware and software, as the additional hardware threads in the cores are minimal in size relative to a typical SoC design, and multi-threading makes use of the same Symmetric Multiprocessing (SMP) versions of operating systems and software programming models as coherent multi-core platforms.

The 1004K coherent processing system is comprised of 1-4 multi-threaded cores, connected via a coherence management unit to maintain coherency between the L1 caches in each CPU. The system includes an optional block to provide coherency on data transfers from I/O peripherals, enabling additional performance by offloading I/O coherency schemes typically run in software as part of the operating system.

The coherent processing system also includes a global interrupt controller that accepts up to 256 interrupts and distributes them down to the cores, or even h/w threads within each core. The whole system can be used with the MIPS® L2 cache controller (available separately), which connects to the coherence management unit via an extended 256-bit wide interface for optimized throughput between the coherent system and the L2 cache. An EJTAG and a "coherence-aware" PDtrace (program and data trace) block rounds out the system, providing synchronized visibility into each of the CPU cores and the coherency units in the system via development tools.

Initially, the 1004K CPS is available in two versions: the 1004Kc™ using integer cores, and the 1004Kf™ with a floating point unit in each core.


  • A coherent multiprocessor system using multi-threading to extend performance beyond traditional multiprocessor solutions
    • Up to four multi-threaded CPU cores, with two hardware threads/core
    • Multi-threading complements multi-core – leverages SMP operating systems and programming models, with minimal silicon cost adder
  • Hardware I/O coherency – offloads CPU software I/O coherency overhead
  • Configuration and scalability at core and system levels, addressing a broad range of price/performance implementation points for optimal product implementations
  • Licensable IP core – enables broad industry adoption

A complete system for coherent multiprocessing, including:

  • 1 to 4 1004K multi-threaded "base” cores (up to 8 hardware threads)
  • Coherence Management (CM) unit – the system “glue” for managing coherent operation between cores and I/O
  • I/O Coherence Unit (IOCU) – hardware block for offloading I/O coherence from software implementation on CPUs
  • Global Interrupt Controller (GIC) – system and inter-processor interrupt controller
  • High bandwidth extended 256-bit read and write data busses from 1004K multi-core system to L2 cache controller and from L2 cache to system
  • EJTAG/PDtraceTM block for advanced debug/trace of complete coherent system

1004K Base Core

  • 9-stage pipeline delivering more than 2.9 Coremark/MHz and nearly 1.6 DMIPS/MHz per core
  • Supports single- or dual-threaded operation per core
  • Uses Virtual Processing Elements (VPEs) for hardware multi-threading
  • Integer (1004Kc™) and floating point (1004Kf™) versions
  • Support for Revision 1 of MIPS32 DSP ASE
  • Coherency port has duplicate data cache tags for background coherency checks
  • Design-time configurability for inclusion and sizing of instruction and data TLBs, caches, scratchpad RAM and other options

Floating Point Unit (FPU)

  • IEEE 754-compliant FPU, compliant to MIPS® 64-bit FPU architecture (1004Kf version only)
  • Supports single- and double-precision data types
  • Separate in-order, dual-issue pipeline decoupled from integer pipeline

Coherency Management (CM) Unit

  • Manages coherency using the MESI protocol
  • Operates at same clock (1:1) as CPUs for maximum performance
  • 256-bit extended interface for maximum throughput to (optional) L2 cache controller
  • Supports performance enhancements via L1 cache-to-cache transfers, speculative reads to external memory, and globalized cache operations
  • Global Configuration Registers (GCRs) for configuring/controlling CM scheme

I/O Coherence Unit (IOCU) – optional use

  • Bridges non-coherent I/O peripheral transfer and makes transactions coherent
  • Supports per-transaction attributes for snooping L1 caches, L1+L2 caches, or noncoherent transactions, plus I/O prioritization

Global Interrupt Controller (GIC) – optional use

  • Supports system-level interrupts; inter-processor interrupts
  • Routes interrupts to particular core or VPE
  • Configurable # of system interrupts (up to 256)

Development Tools

  • MIPS Navigator™ ICS - IDE, software toolkit, MIPSsim™, EJTAG and PDtrace probes
  • CodeSourcery - SG++ toolchains for MIPS

1004Kf Tri-core (with FPUs) Performance, Power, and Area
 TSMC 40GTSMC 40G
Process Type/Nominal VoltagePerformance, 0.9VPerformance, 0.9V
Optimization MethodSpeedSpeed
Standard Cell LibraryTSMC 9 track SVtTSMC 12T SVt
Performance  
 Total Coremark7880> 9600
 Total DMIPS> 4200> 5100
Frequency (wc/ss corner)1900 MHz1.1 GHz
Energy Efficiency  
 Coremark/mW219.517.4
Total dynamic power2 @ target frequency400 mW550 mW
Silicon Area (mm2)3< 4.0 mm2< 4.7 mm2

Worst case, slow/slow corner with production margins of 10% OCV and 25ps clock jitter

The numbers quoted above are illustrative of synthesized cores using general purpose process technologies, TSMC standard cell libraries, and Dolphin RAMs. No Voltage overdrive used.

1 Worst case, slow-slow corner with 10% OCV and 50ps production margins.

2 Power measured at typical corner for core + cache memory, running Dhrystone benchmark.

3 Area for tri-core multi-processor implementations above include Coherence Manager logic, GIC, Cluster Power Controller (CPC), IO Coherence Unit, L2 cache controller logic, and supporting logic. Each core configured with hardware FPU DSP ASE, 64 entry TLB, 32KB I-cache and 32KB D-cache 2 VPEs (hardware threads).

4 For reference purposes, implementation estimated to achieve > 2 GHz frequency with typical silicon and using voltage overdrive.


MIPS32® 1004K™ Core - Simplified Overview

Simplified Overview