For the last few years, the Architecture & Innovation group at Kinaxis has been investigating hardware acceleration. In this blog, we’ll take a moment to distill the lessons learned and explain the rationale for our ongoing experimentation with hardware acceleration.
What is hardware acceleration?
Hardware acceleration is the use of specialized processing hardware to speed up the implementation of an algorithm. These chips usually speed up a specific type of computation, at the expense of making other computations slower or by removing support for more general computation.
Here is a rough summary of the main hardware options available:
This list is sorted in order of cost of implementation. For example, the last entry in the list is ASIC which would require a team of hardware designers and software developers, and since the algorithm is implemented in physical hardware it costs millions to manufacture a single batch and cannot be upgraded after production.
As an interesting note, Kinaxis actually made its debut decades ago by building an ASIC based product for production planning at the factory level. The significant cost associated with development and maintenance of such a product prompted a switch to a software-based product in the nineties.
Why do we care about hardware acceleration?
Much has been written about the end of Moore’s law. To summarize in a picture:
(image is from blog post A Decade of Accelerated Computing Augurs Well for GPUs (nextplatform.com)
For many decades, CPU performance has been improving by a significant margin year-over-year. Compare Intel’s cutting edge 33MHz processor from 1985 with their 5.5 GHz processors of 2022 is an improvement of 150x over 37 years. There is a lot more than just clock speed that has improved as well since 1985 – your average smartphone today is comparable to cutting edge super-computers of 1985.
So why are we still sitting around for minutes waiting for seemingly simple calculations to finish?
With every advance in hardware or software performance, we also increase the amount of data and computation that is required of our systems. Our largest customers have some of the biggest supply chains in the world with truly massive requirements for data and computation.
We ruled out FPGAs early in the investigation as being prohibitively expensive and focused exclusively on GPUs. Although there has been a lot of investment in and improvement of FPGA developer efficiency such as Xilinx Vitis and Intel oneAPI, the cost is still too high for a pure software company like Kinaxis.
GPU acceleration at Kinaxis
Our investigation started out with an enumeration of candidate use cases throughout our product that might benefit from hardware acceleration. We estimated of the following criteria for each use case:
- Total cost of ownership
- User visible performance impact
The top two candidates for hardware acceleration were database queries and our Multi-Level Search algorithm for supply planning.
First prototype – GPU accelerated database queries
Since the heart of our product is our custom-built versioned database, the idea of database accelerated queries fits naturally into the Kinaxis philosophy and approach.
The two database use cases we developed GPU prototypes for were version resolution and hash joins.
Every query is associated with one or more scenarios, and very often the query will need to scan through every record in one or more database tables in order to resolve which records are visible to each scenario.
Algorithms like version resolution that can be naturally split across the data and processed in parallel are ideal for GPU acceleration. The prototype obtained a significant speedup over the CPU equivalent logic, but to be faster than CPU required the data to be pre-loaded into GPU memory.
Pre-loading the data onto the GPU however is not practical because of it does not have enough memory to store all the tables and we would not be able to use the GPU to accelerate other query features.
The second database use case was hash join. This use case is also very well suited to GPU acceleration, and we were able to obtain an even bigger speed up than the version resolution. It proved much more practical than version resolution, since it only makes temporary use of GPU memory in order to see a massive performance improvement.
Outcome of database GPU acceleration investigation
In the end, the cost of attaching a GPU to our database servers would not be worth the limited benefits it would bring. A feature like hash join or version resolution may have a big impact on 5% or 10% of user queries, but because our customers have such a diverse set of queries no one GPU accelerated feature would justify the cost of developing and deploying GPUs with our database.
Second prototype – GPU accelerated supply planning heuristics
After showing that GPU acceleration has significant potential for Kinaxis with a couple of easy technical wins, the investigation focused on difficult to implement but more practical to deploy use cases which:
1. Can be deployed as a multi-tenant service
2. Can improve the entire use case with a single algorithm, not just a fraction of it
The best use case we identified was our core supply planning algorithms and their use by Supply.AI, which optimizes supply plans through a blend of exact and heuristic algorithms.
Our early prototyping in this area has demonstrated that hardware accelerated meta-heuristics have the potential to bring significant value to our customers. We’re quite excited to see where this technology will take us next.
Read more of our Kinaxis Engineering expert blog posts here!