(资料图片仅供参考)
ARM familyARM architectureARM coreFeatureCache(I/D),MMUTypicalMIPS@MHzReferenceARM1ARMv1ARM1First implementationNoneARM2ARMv2ARM2ARMv2 added the MUL (multiply) instructionNone4MIPS @ 8MHz 0.33DMIPS/MHzARMv2aARM250IntegratedMEMC(MMU), graphics and I/O processor. ARMv2a added the SWP and SWPB (swap) instructionsNone, MEMC1a7MIPS @ 12MHzARM3ARMv2aARM3First integrated memory cache4KBunified12MIPS @ 25MHz 0.50 DMIPS/MHzARM6ARMv3ARM60ARMv3 first to support 32-bit memory address space (previously 26-bit). ARMv3M first added long multiply instructions (32x32=64).None10MIPS @ 12MHzARM600As ARM60, cache and coprocessor bus (for FPA10 floating-point unit)4KB unified28MIPS @ 33MHzARM610As ARM60, cache, no coprocessor bus4KB unified17MIPS @ 20MHz 0.65 DMIPS/MHz[4]ARM7ARMv3ARM7008KB unified40MHzARM710As ARM700, no coprocessor bus8KB unified40MHz[5]ARM710aAs ARM7108KB unified40MHz 0.68 DMIPS/MHzARM7TARMv4TARM7TDMI(-S)3-stage pipeline, Thumb, ARMv4 first to drop legacy ARM26-bitaddressingNone15MIPS @ 16.8MHz 63 DMIPS @ 70MHzARM710TAs ARM7TDMI, cache8KB unified, MMU36MIPS @ 40MHzARM720TAs ARM7TDMI, cache8KB unified, MMU with FCSE (FastContext Switch Extension)60MIPS @ 59.8MHzARM740TAs ARM7TDMI, cacheMPUARM7EJARMv5TEJARM7EJ-S5-stage pipeline, Thumb, Jazelle DBX, enhanced DSP instructionsNoneARM8ARMv4ARM8105-stage pipeline, static branch prediction, double-bandwidth memory8KB unified, MMU84MIPS @ 72MHz 1.16 DMIPS/MHz[6][7]ARM9TARMv4TARM9TDMI5-stage pipeline, ThumbNoneARM920TAs ARM9TDMI, cache16KB / 16KB, MMU with FCSE (Fast Context Switch Extension)200MIPS @ 180MHz[8]ARM922TAs ARM9TDMI, caches8KB / 8KB, MMUARM940TAs ARM9TDMI, caches4KB / 4KB, MPUARM9EARMv5TEARM946E-SThumb, enhanced DSP instructions, cachesVariable, tightly coupled memories, MPUARM966E-SThumb, enhanced DSP instructionsNo cache, TCMsARM968E-SAs ARM966E-SNo cache, TCMsARMv5TEJARM926EJ-SThumb, Jazelle DBX, enhanced DSP instructionsVariable, TCMs, MMU220MIPS @ 200MHzARMv5TEARM996HSClockless processor, as ARM966E-SNo caches, TCMs, MPUARM10EARMv5TEARM1020E6-stage pipeline, Thumb, enhanced DSP instructions, (VFP)32KB / 32KB, MMUARM1022EAs ARM1020E16KB / 16KB, MMUARMv5TEJARM1026EJ-SThumb, Jazelle DBX, enhanced DSP instructions, (VFP)Variable, MMU or MPUARM11ARMv6ARM1136J(F)-S8-stage pipeline,SIMD, Thumb, Jazelle DBX, (VFP), enhanced DSP instructions,unaligned memory accessVariable, MMU740 @ 532–665MHz (i.MX31 SoC), 400–528MHz[9]ARMv6T2ARM1156T2(F)-S9-stage pipeline,SIMD, Thumb-2, (VFP), enhanced DSP instructionsVariable, MPU[10]ARMv6ZARM1176JZ(F)-SAs ARM1136EJ(F)-SVariable, MMU +TrustZone965 DMIPS @ 772MHz, up to 2,600DMIPS with four processors[11]ARMv6KARM11MPCoreAs ARM1136EJ(F)-S, 1–4 core SMPVariable, MMUSecurCoreARMv6-MSC0000.9 DMIPS/MHzARMv4TSC100ARMv7-MSC3001.25 DMIPS/MHzCortex-MARMv6-MCortex-M0[12]Microcontroller profile, most Thumb + some Thumb-2,[13]hardware multiply instruction (optional small), optional system timer, optional bit-banding memoryOptional cache, no TCM, no MPU0.84 DMIPS/MHzCortex-M0+[14]Microcontroller profile, most Thumb + some Thumb-2,[13]hardware multiply instruction (optional small), optional system timer, optional bit-banding memoryOptional cache, no TCM, optional MPU with 8 regions0.93 DMIPS/MHzCortex-M1[15]Microcontroller profile, most Thumb + some Thumb-2,[13]hardware multiply instruction (optional small), OS option adds SVC / banked stack pointer, optional system timer, no bit-banding memoryOptional cache, 0–1024KB I-TCM, 0–1024KB D-TCM, no MPU136 DMIPS @ 170MHz,[16](0.8DMIPS/MHz FPGA-dependent)[17]ARMv7-MCortex-M3[18]Microcontroller profile, Thumb / Thumb-2, hardware multiply and divide instructions, optional bit-banding memoryOptional cache, no TCM, optional MPU with 8 regions1.25 DMIPS/MHzARMv7E-MCortex-M4[19]Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv4-SP single-precisionFPU, hardware multiply and divide instructions, optional bit-banding memoryOptional cache, no TCM, optional MPU with 8 regions1.25 DMIPS/MHz (1.27 w/FPU)Cortex-M7[20]Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv5 single and double precisionFPU, hardware multiply and divide instructions0−64KB I-cache, 0−64KB D-cache, 0–16MB I-TCM, 0–16MB D-TCM (all these w/optional ECC), optional MPU with 8 or 16 regions2.14 DMIPS/MHzARMv8-MCortex-M23[21]Microcontroller profile, Thumb-1 (most), Thumb-2 (some), Divide, TrustZoneOptional cache, no TCM, optional MPU with 16 regions0.99 DMIPS/MHzCortex-M33[22]Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processorOptional cache, no TCM, optional MPU with 16 regions1.50 DMIPS/MHzCortex-M35P[23]Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processorBuilt-in cache (with option 2–16KB), I-cache, no TCM, optional MPU with 16 regions1.50 DMIPS/MHzCortex-RARMv7-RCortex-R4[24]Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3FPU, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core runninglockstepwith fault logic0–64KB / 0–64KB, 0–2 of 0–8MBTCM, opt. MPU with 8/12 regions1.67 DMIPS/MHz[25]Cortex-R5[26]Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lock-step with fault logic / optional as 2 independent cores, low-latency peripheral port (LLPP), accelerator coherency port (ACP)[27]0–64KB / 0–64KB, 0–2 of 0–8MB TCM, opt. MPU with 12/16 regions1.67 DMIPS/MHz[25]Cortex-R7[28]Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 11-stage pipeline dual-core running lock-step with fault logic / out-of-order execution / dynamicregister renaming/ optional as 2 independent cores, low-latency peripheral port (LLPP), ACP[27]0–64KB / 0–64KB,? of 0–128KB TCM, opt. MPU with 16 regions2.50 DMIPS/MHz[25]Cortex-R8[29]TBDTBD2.50 DMIPS/MHz[25]ARMv8-RCortex-R52[30]TBDTBD2.16 DMIPS/MHz[31]Cortex-A (32-bit)ARMv7-ACortex-A5[32]Application profile, ARM / Thumb / Thumb-2 / DSP / SIMD / Optional VFPv4-D16FPU/ Optional NEON / Jazelle RCT and DBX, 1–4 cores / optional MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP)4−64KB / 4−64KB L1, MMU + TrustZone1.57DMIPS/MHz per coreCortex-A7[33]Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / Jazelle RCT and DBX / Hardware virtualization, in-order execution,superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), architecture and feature set are identical to A15, 8–10 stage pipeline, low-power design[34]8−64KB / 8−64KB L1, 0–1MB L2, MMU + TrustZone1.9 DMIPS/MHz per coreCortex-A8[35]Application profile, ARM / Thumb / Thumb-2 / VFPv3 FPU / NEON / Jazelle RCT and DAC, 13-stagesuperscalarpipeline16–32KB / 16–32KB L1, 0–1MB L2 opt. ECC, MMU + TrustZoneUp to 2000 (2.0DMIPS/MHz in speed from 600MHz to greater than 1GHz)Cortex-A9[36]Application profile, ARM / Thumb / Thumb-2 / DSP / Optional VFPv3 FPU / Optional NEON / Jazelle RCT and DBX,out-of-orderspeculative issuesuperscalar, 1–4 SMP cores, MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP)16–64KB / 16–64KB L1, 0–8MB L2 opt. parity, MMU + TrustZone2.5 DMIPS/MHz per core, 10,000DMIPS @ 2GHz on Performance Optimized TSMC40G(dual-core)Cortex-A12[37]Application profile, ARM / Thumb-2 / DSP / VFPv4 FPU / NEON / Hardware virtualization,out-of-orderspeculative issuesuperscalar, 1–4 SMP cores, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP)32−64 KB3.0 DMIPS/MHz per coreCortex-A15[38]Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization,out-of-orderspeculative issuesuperscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP, 15-24 stage pipeline[34]32KB w/parity / 32KB w/ECCL1, 0–4MB L2, L2 has ECC, MMU + TrustZoneAt least 3.5DMIPS/MHz per core (up to 4.01DMIPS/MHz depending on implementation)[39]Cortex-A17[40]Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization,out-of-orderspeculative issuesuperscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP32 KB L1, 256KB–8MB L2 w/optional ECC2.8DMIPS/MHzARMv8-ACortex-A32[41]Application profile, AArch32, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, dual issue, in-order pipeline8–64KB w/optional parity / 8−64KB w/optional ECC L1 per core, 128KB–1MB L2 w/optional ECC sharedCortex-A (64-bit)ARMv8-AARM Cortex-A34[42]Application profile, AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline8−64KB w/parity / 8−64KB w/ECC L1 per core, 128KB–1MB L2shared, 40-bit physical addressesCortex-A35[43]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline8−64KB w/parity / 8−64KB w/ECC L1 per core, 128KB–1MB L2shared, 40-bit physical addresses1.78 DMIPS/MHzCortex-A53[44]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline8−64KB w/parity / 8−64KB w/ECC L1 per core, 128KB–2MB L2shared, 40-bit physical addresses2.3 DMIPS/MHzCortex-A57[45]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline48 KB w/DED parity / 32KB w/ECC L1 per core; 512KB–2MB L2shared w/ECC; 44-bit physical addresses4.1–4.5DMIPS/MHz[46][47]Cortex-A72[48]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width superscalar, deeply out-of-order pipeline48 KB w/DED parity / 32KB w/ECC L1 per core; 512KB–2MB L2shared w/ECC; 44-bit physical addresses4.7DMIPS/MHzCortex-A73[49]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width superscalar, deeply out-of-order pipeline64 KB / 32−64KB L1 per core, 256KB–8MB L2shared w/ optional ECC, 44-bit physical addresses4.8DMIPS/MHz[50]ARMv8.2-ACortex-A55[51]Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline[52]16−64 KB / 16−64KB L1, 256KB L2 per core, 4MB L3 sharedArm Cortex-A65AE[53]Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, out-of-order pipeline,SMT64 / 64 KB L1, 256KB L2 per core, 4MB L3 sharedCortex-A75[54]Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline[55]64 / 64 KB L1, 512KB L2 per core, 4MB L3 sharedCortex-A76[56]Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way issue, 13 stage pipeline, deeply out-of-order pipeline[57]64 / 64 KB L1, 256−512KB L2 per core, 512KB−4MB L3 sharedCortex-A77[58]Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 6-width instruction fetch, 12-way issue, 13 stage pipeline, deeply out-of-order pipeline[57]1.5K L0 MOPs cache, 64 / 64KB L1, 256−512KB L2 per core, 512KB−4MB L3 sharedNeoverseNeoverse N1[59]Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way dispatch/issue, 13 stage pipeline, deeply out-of-order pipeline[57]64 / 64 KB L1, 512−1024KB L2 per core, 2−128MB L3 shared, 128MB system level cacheNeoverse E1Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, 10 stage pipeline, out-of-order pipeline,SMT32−64 KB / 32−64KB L1, 256KB L2 per core, 4MB L3 sharedARM familyARM architectureARM coreFeatureCache (I/D),MMUTypicalMIPS@ MHzReference