When Apple, Microsoft or Google, unveil a new smartphone or laptop, it comes with a slew of attractive but minimal upgrades—a slightly faster CPU, a few more megapixels, and a bit more battery life. However, what if one of those must-have devices could double the speed while consuming just half the power? That’s a technological development worth people’s attention.
These same computing advantages are promised by a new processing breakthrough using a straightforward software method. This advanced computational technique is called “simultaneous and heterogeneous multithreading” (SHMT). Essentially, it makes better use of the different processors on your laptop or smartphone, including graphics processing units (GPUs), central processing units (CPUs), and even the relatively new hardware accelerators for AI (called tensor processing units, or TPUs).
The IT world has hailed SHMT as “ground-breaking” as it intends to eliminate data flow bottlenecks and enable the smooth cooperation of several processing units.
Breaking Down Limitations of Previous Processors
Let’s first examine the constraints of existing computing systems before attempting to explore the potential of simultaneous and heterogeneous multi-threading fully.
The central processing unit (CPU), graphics processing unit (GPU), and tensor processing unit (TPU), among other components, handle information independently in most devices. Transferring data between processor units frequently causes “bottlenecks” that impair system performance as a whole.
Traditional programming paradigms exacerbate this further by assigning tasks to a particular processor type, hence underutilizing and idling other resources. In line with these findings, Kuan-Chieh Hsu and Hung-Wei Tseng’s research article “Simultaneous and Heterogeneous Multi-threading” says: The ingrained programming paradigms underutilize the processing capability of heterogeneous computers by focusing solely on the most effective processing units for each code section.
By utilizing the variety of various components inside a computer system, SHMT deviates from this method. This idea is called “heterogeneity.” True parallel processing is made possible by SHMT, which divides computing tasks into smaller pieces and distributes them across available processing units.
By breaking down computing tasks and splitting them among several processing units, this method makes the most of the resources at hand to enhance efficiency and reduce energy consumption. By noting that traditional programming models “can only delegate a code region exclusively to one kind of processor, leaving other computing resources idle without contributing to the current function,” the study report further analyzes the drawbacks of traditional programming models.
Conversely, SHMT seeks to overcome these limitations by utilizing the unique abilities of each processing unit in conjunction with their cooperative efforts on a common code area. The authors also note that as all computer systems have a variety of CPUs and hardware accelerators, modern computing technology is diverse. This means that a programming model—exactly what SHMT seeks to accomplish—must be able to efficiently harness the power of these disparate components.
Therefore, by removing the obstacles in the path of current traditional computing, SHMT opens the door to quicker and more effective computing.
How Simultaneous and Heterogeneous Multi-Threading Technology works
Simultaneous Multithreading (SMT), which splits a processor core into two threads for more efficient processing, is one multithreading approach already used by many devices. SHMT, on the other hand, uses a CPU, a GPU, and at least one accelerator driven by AI. The goal is to have every CPU working on a different job at the same time, and even to distribute GPU and AI resources across several projects.
Hung-Wei Tseng and Kuan-Chieh Hsu’s research claims that SHMT can lower power consumption by 51% and boost performance by 1.95 times. The Nvidia Jetson Nano, which has a quad-core Cortex A57 Arm CPU, 4GB of LPDDR4, and a 128-core GPU, was used to record these findings during the Maxwell period. Furthermore, since the Jetson comes with an M.2 slot, the researchers inserted a Google Edge TPU there to offer the AI accelerator.
This outcome was attained by the researchers through the development of a quality-aware work-stealing (QAWS) scheduler. The scheduler is essentially adjusted to prevent excessive mistake rates and to distribute the workload uniformly among all components. duties requiring high precision and accuracy won’t be allocated to AI accelerators that occasionally make mistakes, according to QAWS regulations. If a component isn’t performing up to par, duties will be dynamically redistributed to other components.
So what’s the catch
You may be asking what the catch is, considering that there is four times the efficiency, half the power, and twice the performance. As per the study, the main constraint of SHMT is not the model per se, but rather the programmer’s ability to rework the algorithm to demonstrate the kind of parallelism that makes SHMT highly exploitable. This highlights the fact that not every software can fully exploit SHMT and that it must be built to take advantage of it.
Software rewriting is known to be difficult; Apple, for example, had a lot of footwork to do when switching from Intel to its own Arm CPUs for Mac computers. Particularly about multithreading, developers may need some time to adapt. Software takes many years to adapt to multi-core CPUs, and it can take a while before developers start using numerous components to do the same work.
The report also describes how the performance uplift of SHMT depends on the magnitude of the issue. The largest issue size the article evaluated accounts for the 1.95 times quicker figure; nevertheless, lesser problem sizes exhibit modest performance benefits. Since smaller issue sizes provide fewer opportunities for all components to operate in parallel, there was virtually no performance improvement at the lowest problem size.
With more and more computers being equipped with numerous computing devices, such as AI processors, developers will inevitably seek to leverage additional hardware, for instance, from Boost Hardware to increase processing speed. If and when SHMT or a similar technology acquires popular momentum, it might still benefit PCs and smartphones, even if it falls short of the best-case scenario outlined in the paper.
Members of the editorial and news staff of the Daily Caller were not involved in the creation of this content.