About UsARM ArchitecturePublicationsPatentsAwardsContact Us

The ARM Processor: A Technical Transformation


Introduction

The ARM processor began life in 1983 as a low cost microprocessor for an Acorn personal computer, specifically targeted at the UK education market. In fact the name ARM orignall' stood for the "Acorn RISC Machine". Due to the cost restraints the original ARM designers, Sophie Wilson and Steve Furber designed a CPU that was far simpler than other processors of the time. They drew on their design knowledge from the BBC microcomputer (specifically the 6502 8-bit microprocessor it used) to implement a completely new microprocessor design, which went into volume production as the ARM2 in 1987. The ARM2 processor differed from all other RISC CPUs as it was designed to run directly from 32 bit wide DRAM memory, without the use of any expensive on-chip cache memory systems. David Flynn worked from 1984 to 1988 at Acorn Computers Ltd on the ARM chipset and subsystem development under both Wilson and Furber. Later, Alasdair Thomas designed an on-chip cache memory for ARM2 to boost performance, and this processor went into volume production as the ARM3 in 1990. Together ARM2 and ARM3 shipped approximately 200,000 units. What follows are the key engineering milestones that transformed the original ARM processor into a global phenomenon as the most prolific microprocessor on the planet, because in 2018, some 27 years later, 150 billion ARM processors have shipped, with another 100 billion projected in the next 5 years.

ARM Limited

Th ARM company, or Advanced RISC Machines, was formed in late 1990, largely as a joint venture between Acorn and Apple. Acorn would continue their line of education workstations, and Apple were developing a PDA device called the Newton. Robin Saxby was hired to head the new company. Saxby's background was at Motorola and ES2, and his goal from the start was to drive the ARM processor into new markets as a global standard, especially embedded control and ASIC methodology. Sophie Wilson remained at Acorn, and Steve Furber had taken a chair at Manchester University. Alasdair Thomas was tasked with making the changes Apple requested of the architecture, which became the ARM6 family of processors.

Dave Jaggar joins ARM

In 1990 Dave Jaggar received an MSc with 1st class honours in Computer Science from the University of Canterbury in Christchurch, New Zealand. His thesis was titled "A Performance Study of the Acorn RISC Machine (ARM)". Specifically he wrote an instruction set simulator for the ARM3 processor, and ported a C compiler to create a custom development environment for the ARM (before ARM hardware was commercially available). This allowed Jaggar to port Unix to the ARM and evaluate the performance, and to suggest several changes to the architecture. He sent his Masters Thesis to Acorn in late October 1990 (about 4 weeks before the ARM company was formed), hoping to get a job. Over the next few months, as the ARM company gained momentum they recognised the need for an instruction set simulator. Jaggar was given a telephone interview in May 1991 and was subsequently hired.

Jaggar's Master thesis found 4 main areas that were of interest to ARM:
1) The ARM3 did not contain a memory write buffer. When Apple invested in the ARM company, one of their requests was to add a write buffer, and the performance improvement of the resulting ARM600 processor had over ARM3 due to this small addition was precisely the 40% Jaggar's thesis predicted.
2) The ARM3 Memory Management Unit was very simple, and inefficient for running a full virtual memory Operating System like Unix. Jaggar had grafted a more sophisticated VAX style MMU onto my ARM simulator, and it achieved almost double the system performance of the ARM3 systems. Again, Apple requested a similar MMU for the first chips from the ARM company.
3) Acorn's Floating Point system was based upon a Western Electric 32206 processor, which was very complex and not in line with the contemporary CPU design principles. Jaggar created a new Floating Point architecture, influenced by the MIPS R3010 FP chip.
4) There was no concept of a background debugging mode, which was very common for embedded control. Jaggar's instruction set simulator implemented hardware breakpoints that could pass IO requests to a host computer to aid in software development.

Jaggar was hired first to incorporate his ARM simulator, called ARMulator, into the ARM software development tools. This would allow end customers to evaluate the ARM processor before silicon was available, and in fact to model their hardware to make design trade-offs.

David Flynn joins ARM

David Flynn joined ARM October 1991. David gained a BSc with 1st class honours in Computer Science in 1984. He worked at Acorn for 4 years as part of the team that developed the first ARM microprocessor chip-set and reference designs. He then spent 3 years as the ASIC lead engineer Active Book Company Ltd, an early ARM processor customer. David joined ARM in October 1991, less than a year after the company was set up.

Flynn's first job at ARM was to develop a evaluation board for the ARM6 family, called the Platform Independent Evaluation card (PIE), drawing on his vast experience of ARM hardware and interfacing, and Jaggar developed a portable low level debug monitor, called DEMON, for the PIE and other customer evaluation/development boards. DEMON incorporated the background debug mode from his thesis. Flynn and Jaggar also developed the first Verilog and VHDL models of the ARM by adding a "wrapper layer" to ARMulator, which allowed ARM to ship functional models to customers without exposing any of it's core intellectual property.

ARM7

As the author of ARMulator and Demon, Dave Jaggar became the technical port of call for a great deal of ARM's benchmarking and competitive analysis effort. David Flynn with his detailed and extensive knowledge of ARM interfacing and board design was intimately involved with sales support and field application engineering. Together Jaggar and Flynn were the founders of ARM's technical Marketing Department under Mike Muller. However, within a month of that move, Alasdair Thomas, ARM's head of CPU development died suddenly, and Jaggar took over the development of the ARM processor itself, starting with the ARM7, the follow-on processor from the ARM6 family that had been developed for Apple. David Flynn was also to be a key contributor to the ARM7 family. Jaggar also chaired ARM's patent review committee, which was tasked with starting ARM's patent portfolio, as Acorn had not filed any patents on the original ARM processors.

While Flynn was overseeing production of the PIE card, Jaggar first validated the processor for 3.3 volt operation, and architected the same background debug mode interface from his thesis work, working with Flynn to ensure it could be implemented in hardware, so that any ARM7 CPU could be debugged even when buried deep inside a customer specific integrated circuit. The same debug interface therefore existed in the software simulator, development cards and production hardware, greatly simplifying developing and debugging ARM based systems, and thereby enabling the extensive 3rd party ecosystem of development tools for ARM. Secondly, responding to customer feedback, Jaggar architected faster multiplier for ARM7, so that ARM could start to perform the sort of digital signal processing that was becoming common as system moved from 8 bit and 16 bit microcontrollers to high performance 32 bit microprocessors.

AMBA on-chip bus standardization

Flynn worked closely with ARM's early licensees to develop a modular bus interface which satisfied the many constraints each of the different companies and end customers required of the ARM. The Advanced Modular Bus Architecture (AMBA) interconnect standard was driven out of the need to consolidate ad-hoc SoC design and verification, and involved addressing 1) Base-line memory-mapped protocols, interworking with many legacy (8, 16 and 32-bit) buses, 2) Partitioning for low power (high-performance pipelined system buses with bridges to un-pipelined low-activity peripheral buses) and 3) tackling the challenge of test access to 32-bit IP cores. The resultant specification proved a firm foundation for the more recent higher-performance interconnect derivatives for multi-core SoC designs. Learning the art of engineering compromise was important in negotiating with multiple licensee companies and refining specifications to get to a practical standard in a tight timeframe. The AMBA bus interface has become an industry de-facto standard and a primary IP reuse standard (and used by many Synopsys customers as DesignWare DW_AMBA).

ARM7TDMI

The ARM7DM with it's enhanced debug and multiplier was shown to several key customers, including Nokia, Nintendo and Seagate, As more and more code was benchmarked by Jaggar one problem stood out as insurmountable: ARM programs were about 50% larger than the 8 and 16 bit code of the older processors that ARM was trying to replace. For a customer to upgrade to an ARM processor they would need also to significantly upgrade the amount of memory their products contained. Furthermore, because the ARM had originally been designed to run directly from 32 bit wide memory, the memory system of the processors had to double or even quadruple in chip count to deliver full ARM performance.

The solution to this problem very much changed the fortunes of the ARM microprocessor, and is widely regarded as the key architectural feature that made the ARM processor successful. Jaggar added a completely new 16 bit instruction set to the ARM7DM, which he named Thumb to signify the "useful bit on the end of an ARM". By carefully examining the ARM instruction set, Jaggar realized it was possible to encode the most common ARM instructions in just 16 bits. Even better, it was possible to transform the 16 bit Thumb instructions back into 32 bit ARM instructions as they were executed without any cycle time penalty. Programs compiled with the Thumb instruction set were both 70% more compact than ARM programs, and ran approximately 50% faster from 8 or 16 bit wide memory, so it offered more performance, lower cost and lower power consumption from the memory systems typical of embedded control systems ... a win-win-win. The original 32 bit instruction was left in to support migration to the new processor, and to allow fast on chip 32 bit memory to be used for speed critical routines, although it turned out that executing 16 bit Thumb code was often a better use of precious on chip memory. Halving the size of the instructions to 16 bits also increased the performance of on-chip instruction caches. It is often misunderstood that the Thumb instruction set is merely a mixed 16/32 bit instruction set; Thumb enabled peocessors have two separate instruction sets which allow the processor to vary their operation across a broad price/performance/power consumption range to specifically suit the application simply by migrating code between off-chip and on-chip memory. Such tradeoffs are simply not possible using a normal CPU with a single instruction set.

Thumb also solved another enormous problem for ARM, while ARM inherited intellectual property from Acorn, none of it was protected by patents, making it effectively valueless to a company whose business model was selling IP. However the new Thumb instruction set had broad patent cover, allowing ARM to properly protect its intellectual property. The Thumb architecture has proved to be very successful, ARM7TDMI and its derivatives (the Cortex-M family) are by far ARM's most successful product family. Jaggar originally envisaged CPUs without the ARM instruction set at all, which he coined TOM, for Thumb Only Machine (and as a pun on the small Tom Thumb character). The TOM concept was indeed implemented in the form of the smallest members of the Cortex-M family to minimize the system cost for Internet of Things (IoT) devices. Subsequently IBM, Motorola and MIPS, and more recently the RISC-V added 16 bit instructions to the original 32 bit instruction sets to compete with the ARM7TDMI, which might also illuminate how important the Thumb concept was. The processor reasonably deserved it's Advanced RISC Machine name from the company, as it offered far superior efficiency to emerging applications in the digital domain.

StrongARM

In late 1994, after the success of ARM7TDMI Dave Jaggar was seconded to work with Digital Equipment Corp on the StrongARM processor in Austin, Texas. The Digital team had implemented a very high performance processor for Apple and Jaggar was tasked with to ensuring its compatibility with existing ARM processors. Digital had realized that the 32 bit ARM instruction set had no patent protection so had practically finished the design by the time they contacted ARM.  It was at this time Jaggar took the time to properly define the ARM architecture, and he subsequently wrote the ARM Architecture Reference Manual (ARM ARM). During his time with Digital he also learned their methodology for defining high performance chips, and bought that methodology back into ARM in 1995 as the specification for ARM9TDMI, a faster, Harvard architetcure version of the ARM7TDMI.

ARM7TDMI-S

Meanwhile David Flynn was tasked with producing a synthesizable version of the ARM to satisfy strong customer demand. The original CPU products were implemented with lovingly hand-crafted two-phase latched based macro-cells (pre-verified IP components with technology characterized performance/power/area), but the demand was for more configurable CPU cores that enabled customers to implement and verify their own cores, conforming to user-friendly rising-edge Register-Transfer-Level design amenable to Static Timing Analysis and Automatic Test Pattern Generation tools. Flynn was given a time-frame of only a few months to develop a cycle-based specification, with synchronized serial debug, and a strong backwards compatibility story to allow retrofitting to existing designs based around the ‘flag-ship’ ARM7TDMI® processor.

ARM Austin

Dave Jaggar moved to Austin full time in 1996 to do a collaborative design with Digital. StrongARM2 and ARM10 were to be the same processor, with Digital take the lead role in the design of the integer unit, and ARM staffing a small team to architect, design and implement a new Floating Point addition to the ARM architecture. However, Intel acquired Digital in 1997 and the StrongARM team resigned en-masse rather than work for Intel. Jaggar had no option but to start a new design center from scratch, along with delivering both the new integer unit, new floating point, and a long overdue total revamp of the system control and debug mechanisms. The ARM Austin Design Center delivered fully functional silicon two years later in 1999, complete with the high speed Vector Floating Point (VFP), which used almost exactly the same instruction set as the one defined by Jaggar for his Master's thesis ten years earlier. Jaggar also rewrote the ARMulator code for ARM10 and used it to predict the performance of Linux systems, thereby also realising the Unix performance increases his thesis had predicted for ARM.

Clean Room Synthesizable ARM

Flynn spent 1997 and 1998 overseeing the “clean-room” development of the synthesizable CPU core with the Synopsys Inc Design Reuse Group in Mountain View. This processor re-architecture contribution became the foundation for all subsequent ARM CPU product developments and the ‘Soft-IP’ model that enabled the proliferation of products across a wide range of technology process nodes and semiconductor foundries (that the hard-IP CPU business model could not sustain). The AMBA AHB-Lite interface designed for this project and product became the basis of “multi-layer” AMBA systems that are predominately used today (rather than multiple masters sharing a single shared back-plane bus form of interconnect).

Dave Jaggar Retires

The start of the new millennium Dave Jaggar was promoted to a new position of Fellow within ARM, however his wife was becoming increasing unwell, and he returned to New Zealand in June 2000 to spend more time with his family, 9 years after starting in Cambridge. He retired fully from ARM in 2002. The Austin design center continues to thrive and to be responsible for many of ARM's high performance cores.

Low Power ARMs

Low power and energy efficiency was becoming increasingly important to ARM’s customers as process technologies and voltage scaling headroom shrink while leakage currents and power density challenges grow. Having also been appointed in 2000 as one of the first ARM Fellows in the light of the AMBA and Synthesizable IP contributions David Flynn was given responsibility for developing the low power technology roadmap in the newly established Research and Development group. This has been the primary focus of his work at ARM and in university research supervision ever since.

Much of Flynn's work in ARM was and is of a commercially confidential nature making it hard to get permission to publish details of the work, but the opportunity to study part-time for an Engineering Doctorate (with Loughborough University, awarded 2007) was key to enabling him to work through a series of canonical designs and start to develop best-practice low power design examples free of customer and licensee IP and data. Building collaborative projects with Synopsys Inc., foundry partnerships with TSMC Inc. and UMC Inc., and power regulator companies such as National Semiconductor Inc., Flynn designed a series of technology demonstrators to drive EDA tools enhancements, methodologies and flows through to characterized silicon and demonstrator boards to showcase dynamic and standby power management (DVFS, Power Gating, State Retention) together with the OS and architectural control interfaces. This work contributed to him being the primary author to the Low Power Methodology Manual ("LPMM)” collaborating with Mike Keating as a follow up to his acclaimed Reuse Methodology Manual ("RMM"). The practical and proven approaches led to wide-spread industry take-up and usage, and a series of requests for talks, tutorials and industry papers ever since.

In 2004 Flynn was invited to serve on the Technical Advisory Board for ArchPro DA Inc. at a pivotal stage in the development of commercial multi-voltage tools support. While mainstream EDA companies were retrofitting voltage supply domain support to tools and databases, ArchPro pioneered the approach of a 7-rail model that also included wells, power gated virtual rails and retention supplies, and addressed both UPF and CPF interoperability. Flynn provided a number of DVFS and SRPG test cases and supported the technology development up until the time that the company was acquired by Synopsys Inc. in 2007.

Subsequently David Flynn has dedicated considerable time in the USA to integrate technology and staff from acquired companies in both the Physical IP area (Artisan Inc in Sunnyvale, California) and low-power Personal Area Network Wireless technology (Boca Raton, Florida).

Summary

David Flynn's primary focus in power management has been on techniques and IP-encapsulation that enable designers to implement and validate low-power SoC designs without needing to resort to ‘expert’ full-custom circuit design techniques, so the contributions are strongly “practitioner” focussed rather than “academic”. Working with EDA tools companies and customers to see designs through to billions of product shipments has been the driver, and the methodologies developed are appropriate to wider usage than just ARM’s commercial exploitation. Dave Jaggar was focussed on transforming the original ARM2 and ARM3 processors into high performance embedded controllers, by replacing the entire instruction set to provide industry leading code density, highly efficient Floating Point, and sophisticated debug and system architectures that scale from tiny power efficient micro-controllers for IoT to high performance central processors for devices such as cell phones. Subsequently, ARM7TDMI and its derivatives are by far the world's most prolific microprocessors, with tens of billions in annual shipments.