Don’t Let Your NOR Flash Be “The Herbie”

Early in my career I worked for a start-up software company, Maxager, which was trying to change the profitability of asset-intensive manufacturers by analyzing profit in a new way.  We didn’t determine profitability based on GAAP-based profit per unit, but on profit per minute.  We discovered that products once thought to be stars based on standard gross margin, where actually dogs because it consumed valuable time on the production floor.  The genesis of the software was steeped in the Theory of Constraints, a concept written about in the book The Goal by Eliyahu Goldratt.  So Who’s Herbie and what does he have to do with NOR Flash?

The Goal is a fictional account of a plant manager, Alex, who is tasked with making his factory profitable and efficient.  After many starts and stops and a very interesting analysis of a Boy Scout hike featuring Herbie – the slowest scout, Alex comes to realize that the final throughput of the factory is determined by the rate of the slowest operation in the sequence (“the Herbie”).

You Have to Program the NOR Flash At Least Once

So what does this have to do with NOR Flash? NOR Flash, whether parallel or serial, will store software that runs an embedded system.  At some point in the manufacturing process, it will be programmed at least once if not multiple times and maybe more when the final product is put into service.  So the speed at which you can program the Flash memory is an important factor when selecting a device and ultimately contributes to the profitability of your products.  You need to optimize the programming step in the manufacturing process to ensure that it has sufficient capacity to be close to the required demand.

Notice that the flow is balanced with demand, not capacity.  If the programming step becomes the bottleneck in the manufacturing process, an hour lost there is an hour lost by the entire system.  It will be the place where valuable inventory builds up and, if the manufacturing line is not optimized, will cause excessive inventory in other locations of the factory until programmed chips can be completed.

3X Faster Programming Speeds

Programming speed is critical to address this concern. For example, Spansion® FL-S serial Flash memory and Spansion GL-S parallel Flash memory have a programming speed of 1.5Mbytes per sec. It is 3x faster than the closest competing devices in the industry.

With up to one third of the programming time of its competitors, the Spansion FL-S and GL-S Flash memories are much less likely to become “the Herbie” in the manufacturing process, can enhance the throughput in a balanced manufacturing environment and/or reduce the costs in the programming step by requiring less programming stations.

While pin count, performance and capacity may be on the top of your mind when selecting a NOR Flash product to put into your embedded system, don’t forget that the product needs to be built with profit in mind and faster programming speeds can also help reduce the indirect costs of making the product.

It is interesting to find myself working for another company that’s innovating in a way that will help boost our customer’s profitability.

XiP on NOR Flash: Meet Your Microcontroller’s Main Memory

With the rising complexity of today’s mobile devices and embedded systems, developers are facing an increasingly challenging task to design efficient memory subsystems that maximize system performance.  NOR Flash often contains the boot code, operating system kernel, device drivers, middleware and other application-specific software and can result in megabytes of programs stored in non-volatile Flash memory. 

Where performance is essential, these programs are moved from non-volatile memory to faster RAM for execution.  However, where device size and cost are critical, an alternative approach known as Execute-in-Place (XiP) where the programs are executed directly from non-volatile memory is becoming increasingly popular.

Executing Code from NOR Flash

When using XiP, the non-volatile memory subsystem is constantly being accessed to retrieve program code and may potentially introduce memory bottlenecks into the primary execution path. Understanding the system architecture is critical to identifying any factors that affect memory performance and the resulting system performance.

System performance is often measured as the number of instructions per cycle (IPC).  A CPU that requires 4 cycles to execute an instruction has an ideal IPC of 0.25, but many factors influence the actual IPC with one in particular, a cache miss being critical for XiP.  A cache miss will stall the system as an instruction is fetched from memory, resulting in a lower IPC. Fortunately, due to a “locality of reference” in systems with level 1 and level 2 caches one can achieve cache hit rates over 99%. 

Because system performance is affected by the ability of the memory subsystem to fill the cache when there is a cache miss, there are several factors to consider:

  • Read Bandwidth: A high bandwidth bus is needed to minimize the overall read latency even though only a single cache line of memory is being read (typically 32 bytes).  In addition, the nature of application programs requires the ability to make small, fast memory accesses throughout the entire code region with minimum latency.Read bandwidth performance varies across bus interfaces and operating frequencies and must be balanced against pin count. Consider the performance of a low-pin count SPI-DDR NOR with an initial access time of 120ns.  It significantly outperforms Async Parallel NOR and is comparable to Page Mode NOR.  While Burst Mode Parallel NOR has the highest bandwidth, its advantage over SPI-DDR is minimized in a cache-based system.
  • Controller Latency: Initiating a read command incurs controller latency when dealing with address and protocol overhead, measured from the time the command is sent to the controller to when the controller returns the first byte of data.Controller latency is higher for SPI-DDR NOR, primarily due to the serialization of the command and address information required at the beginning of an SPI transaction.  This gap in performance closes significantly as the memory bus frequency is increased.  In many mobile and embedded systems a sub 200ns controller latency would provide adequate performance and allow SPI-DDR to be considered as a viable alternative to Parallel NOR.
  • Instant and Average CPU Stall Times: When the next instruction to execute is not available in the cache, it must be loaded from memory. The impact on system responsiveness from instant delay depends upon how often the cache misses; if the miss rate is very low, the system can usually tolerate a relatively higher instant delay.The impact of stall time on system performance depends upon the CPU clock frequency. For CPU operating frequencies from 100 MHz to 166 MHz, SPI-DDR also provides an acceptable stall response when compared with both Burst and Page NOR. When SPI-DDR is compared to Burst Mode devices, a system developer will need to consider whether the additional pins (30+) required for the higher performance Burst Mode interface are a desirable tradeoff.

So what is the overall effect these factors have on a system’s IPC?

A typical mobile or embedded system has a cache miss rate of less than 1%. With a system with a CPU operating at 166 MHz and a 66 MHz memory bus and a cache miss rate of 0.5%, both Burst Parallel NOR and SPI-DDR NOR have a minimal impact on IPC of 1 to 2%.  With a higher cache miss rate of 1%, Burst Parallel NOR provides an advantage by impacting the IPC by only 6% compared to 12% for SPI-DDR NOR. 

In high-performance systems, Burst Parallel NOR will continue to be the preferred solution; however for slightly lower performance systems, SPI-DDR provides an attractive, low pin count alternative.

Technology is Racing Inside the Car

Time flies.  I can’t believe it is almost the end of the year already.  For many, that means the shock of holiday shopping starts to set in, but for me, it is time to start watching the parade of announcements from the automakers as they roll out new models and announce their latest technology breakthroughs.  TV commercials, magazines, social media and of course, the car shows, will all be a buzz about the newest must-have features and 2012 models.

I work closely with many of the major auto companies in my role at Spansion so I get to see what’s coming before most.  Some of the trends that you will hear a lot more about over the coming months are advanced applications of Bluetooth, cloud-connected electronics and advanced safety systems.

I personally am most excited with what’s happening behind the steering wheel.  The dashboard is undergoing a major upgrade.  Your classic speedometer, odometer, warning signals, known as the instrument cluster, is moving to TFT (thin film transistor) displays.  The mechanical gauges are going digital.  We’ve already seen the change in some luxury cars but it is making its way into mainstream vehicles.

Riding the Smartphone and Tablet Wave

Thanks to the proliferation of smart phones and tablets, the price of TFT displays has dropped significantly over the last couple years.  The affordability of displays has allowed the automakers to use them more widely within the car and throughout their product line.  It is no longer a luxury for the auto elite.

I’m excited about TFTs, not because of the “cool factor,” but for safety.  The screen will provide important data to enable safer and smarter driving.  The information will be the most relevant data depending on the driving environment or health status of the car.  It will be a smart display that knows what to tell you and when.  It will have to support rich graphics, 3D imagery, multi-languages, and even high resolution video.

Bringing High Technology Mainstream

Advances in automotive electronics will make the roads a safer place, but only to the degree the technology proliferates throughout all car segments.  Collectively the industry needs to optimize systems to make it affordable.

Spansion is doing its part by focusing on the memory subsystem with our chipset partners.  Our newly announced Spansion® FL-S family is great example of this.  We are using the low pin count serial interface to strip out complexity in the printed circuit board, simplifying the connections to the microcontroller.  And by delivering a high-performance double data rate serial Flash memory that is capable of 66 MB/s reads, automotive designers can simplify designs further by removing DRAM altogether in the TFT display and execute and render graphics directly from the Spansion FL-S memory.

It is innovation like this that is needed to advance the start of the art, affordably, so it can reach the masses.  I’m very excited to see the new crop of cars that are coming in 2012 and the coming years.

The LA Auto Show is right around the corner, November 18-27.  All of the latest advancements and a look into the future will be on display.  I’ll be watching all the excitement and will share my thoughts with you after the event.

Set Top Box: The Center of Your Viewing World

The Set Top Box (STB) is the center of your viewing world delivering premium content and services to you through a variety of broadcasting mediums (Satellite, Cable, Terrestrial, IP).   Consumers are demanding media-rich home entertainment requiring STB evolution to include more advanced features and services and driving an architecture for faster performance, scalability and security.

More than just ABC/CBS/NBC with a Remote

In a Digital TV (DTV) system, a Set Top Box receives, filters and processes all the content and services accessed by the TV viewer.  Evolving from a simple Standard Definition device that offered a basic Electronic Program Guide (EPG) information and access to a limited set of services and content, newer STBs support:

  • High definition channels
  • Multi-tuner capabilities (enabling simultaneous viewing and recording)
  • IPTV in addition to broadcast TV
  • Pay per view/video on demand services
  • Internet interactivity

STBs are also increasingly assuming the functions of home gateway or home server devices that can store as well as distribute content to many TVs, PCs and portable devices throughout different rooms inside the home.

Give me Access to that Content

Increasingly, consumers want access to the content on their terms: on-demand and on this device. While everyone would love to get the content for free, normally access comes at a cost.  Consequently, both the consumer and the pay-TV operator must rely on the STB to be the platform for secure transactions and content protection. 

For consumers wanting privacy and content providers wanting to protect their revenue stream, security within the STB is critical and falls in two areas:

  • Conditional access systems (CAS) – Securing the content as it is delivered from the operator to the STB by ensuring that consumers can only play content to which they are entitled.   The overall user entitlement process is handled by the conditional access kernel (CAK) and a conditional access module – often contained in a SmartCard- inside the STB.
  • DRM (digital rights management) – Protecting the content as it is stored in the STB or shared with other devices and users through a home network such as WiFi.  The DRM technology is typically handled by the middleware software and therefore needs to be secured from hacking as well.

NOR flash provides a key role

NOR Flash contains the boot code, CA kernel, operating system kernel, device drivers, middleware and the EPG.   The boot code and the operating system code require high random read performance and data retention making NOR flash the ideal non-volatile memory solution, directly impact the viewing experience by providing a near “Instant On” experience so they can access their desired programs quickly.  Unlike other consumer electronic systems, TV operators actively manage the STB once it has been supplied to the consumer.  Higher-density NOR flash memory can be provisioned for future software additions or upgrades.

In providing security in the STB, NOR Flash is particularly important. To secure the CA kernel, the OS and the middleware codes, the NOR flash contains security features such as permanent lockable region to protect against write or erase access by hackers and pirates and a One Time Programmable (OTP) region for sensitive data such as encryption keys and unique IDs.

Spansion’s 65nm NOR products Family offer features that address the STB’s performance, scalability, performance and total cost of ownership.   The Spansion® GL-S family is the latest generation of Parallel NOR products that have been well established in the STB market for many years, and Spansion Serial NOR products have ramped up quickly in STB applications since 2010 because of its reduced pin count which simplifies board layout and results in lower system costs and reduces the form factor of many embedded designs.  The latest generation of Serial flash, Spansion FL-S family, brings high performance benefits to simplified memory interface.

Introducing Spansion® FL-S SPI family: Your New Four-Lane Superhighway

You’ll get no arguments from me.  I love driving my car on an open, six-lane superhighway. With speed limits in the US of upwards of 80MPH, you can quickly get from point A to point B. However, that huge superhighway comes with a cost; it takes up a lot of real estate, it is costly to build, and it is sometimes overkill for the job at hand.  Sometimes, a smaller superhighway is simply perfect as the solution.

In the world of NOR Flash memory, Parallel NOR has quite effectively provided the “six-lane superhighway” for embedded system applications, with quick data delivery to and from the memory and the microcontroller (MCU). In contrast, NOR solutions using the Serial Peripheral Interface (SPI) have been more akin to a dirt road. However, with new innovations in DDR SPI, Spansion has transformed SPI NOR into a four-lane superhighway and increased the speed limit substantially.

Parallel vs. Serial Bus Architectures

System developers must make design choices between parallel bus architecture for more bandwidth and serial bus architecture for lower pin count. While the performance of Parallel NOR is still required for many applications, each I/O pin at the CPU and memory interface adds to the overall system cost. As a result, some hardware system designs are migrating to alternate complex code execution models with optimized pin count solutions.

Regardless of Parallel or Serial, the ultimate NOR advantage is real-time code execution for an enhanced end-user experience. This implies a suitability toward high read bandwidth and low data access latency.  One of the increasing advantages of SPI is the growing number of chipsets and microcontrollers that support the interface.  Recently the market has moved to multi-input output (MIO) functionality to create x2 and x4 interfaces that provide high performance at an optimal system overhead.

Introducing Spansion® FL-S SPI family – 66MB/s, 1Gb of storage and only a 16-pin SIOC package

Our newly-released Spansion FL-S family leads the industry in performance with a 20% increase in read speed combined with a 3X boost in programming time. Moreover, the product line supports up to 1Gb for greater applicability in automotive instrument clusters, digital TVs, set-top boxes and industrial designs.

By implementing an innovative Double Data Rate (DDR) bus on a Serial interface, the SPI NOR read performance is increased up to 66 MB/s for faster execute in place (XiP) operation.  All this functionality sits in an industry-standard 16-pin SO package with an active signal count of 6 pins.  This provides a throughput of 11MB /s per active signal pin.

Speed improvements also translate into operational cost savings.  A 1.5 MB/s programming speed reduces system cost and increases throughput  by 300% over the nearest competing solution. Lastly, the new family has expanded security options to protect customer IP through a 1kB one-time programmable (OTP) region, individual sector protection, and advanced data protection.

Rich User Experiences with More Affordable Designs

By offering high performance and high density, SPI Flash can become the mainstream choice for low-power MCUs. Bypassing the need for any additional memory within the MCU, a system designer can expose the SPI Flash in the main memory map and treat it like on-chip Flash, maximizing performance and throughput.

Additionally, performance intensive LCD images can stream from the Flash and even raster directly– all from a very pin-efficient footprint provided by SPI NOR.  And with a storage density of 1Gb, the system can scale to handle more comprehensive vector engines for realistic graphics.

With the competing pressures of increased functionality and performance at a lower system cost driving requirements in the embedded application space, architects and designers are forced to choose and establish trade-offs.  Now, with the DDR SPI performance and 1Gb density scalability announced with the newly released Spansion FL-S family, SPI Flash may have just become your new four-lane superhighway.

File Systems on Flash

For all new technology devices, whether it is the latest consumer device or a new driver-assist feature in an automobile or a next-generation telecommunications component, the end-user will always ask, “What is in it for me?”  More often than not, that means innovative software that unleashes the potential, raw power of the hardware.                                                                                                                                                                                                

Applications Driving Component Designs

Today’s designers are faced with the ongoing challenge of creating more complex designs in less time without sacrificing performance or increasing costs. They are looking to their suppliers to wrap value-added software around their hardware to help them meet that challenge. In fact, applications are truly driving the entire ecosystem.

To meet this goal, we introduced Spansion® FFS™ Flash File System software, customized to support both parallel and serial flash memories.  With this flexible software solution, you can rapidly create a full-featured data storage subsystem where a universal interface of a block driver isolates the command interface of the Flash memory from your software application.

Removing Complexity From Your Design

At the heart of the Spansion FFS package is the Spansion Block Driver (BD) and the Low Level Driver (LLD).  The Spansion BD maps logical blocks to physical blocks for you, automatically managing dirty space cleanup, wear leveling, and power failure recovery.  Supporting both serial and parallel interfaces, the Low Level Driver contains all of the device-specific logic to manage the Flash command presentation and Flash status.

The Spansion FFS package completes the data storage abstraction by also including the Spansion File System (FS) – useful if your system has no disk file system or if you want to integrate your application directly with Spansion FS.  Additionally, OS Bindings are provided for Linux and Windows CE, enabling rapid integration of Spansion BD into your preferred OS, so your applications can continue to use the file system interface provided by your OS.

And we have made it easy to license and procure.  The Spansion FFS has a click-thru license that enables easy evaluation and acceptance and is available at no cost to all Spansion customers.  You receive full source code, user guide and porting guide.

Committed to Meeting Your Needs

Spansion recognizes that the challenges of embedded systems designs are changing.  We are committed to meeting your needs not only by delivering the most powerful hardware solutions, but also the software to unleash its full potential. We are dedicated to providing a complete Flash solution to manage your changing design needs leveraging our full roadmap, now and into the future.