Hardware acceleration at Kinaxis

For the last few years, the Architecture & Innovation group at Kinaxis has been investigating hardware acceleration. In this blog, we’ll take a moment to distill the lessons learned and explain the rationale for our ongoing experimentation with hardware acceleration.

What is hardware acceleration?

Hardware acceleration is the use of specialized processing hardware to speed up the implementation of an algorithm. These chips usually speed up a specific type of computation, at the expense of making other computations slower or by removing support for more general computation.

Here is a rough summary of the main hardware options available:

Table of available hardware options

This list is sorted in order of cost of implementation. For example, the last entry in the list is ASIC which would require a team of hardware designers and software developers, and since the algorithm is implemented in physical hardware it costs millions to manufacture a single batch and cannot be upgraded after production.

As an interesting note, Kinaxis actually made its debut decades ago by building an ASIC based product for production planning at the factory level. The significant cost associated with development and maintenance of such a product prompted a switch to a software-based product in the nineties.

Why do we care about hardware acceleration?

Much has been written about the end of Moore’s law. To summarize in a picture:

Diagram of peak memory bandwidth and peak double precision for CPUs and GPUs
(image is from blog post A Decade of Accelerated Computing Augurs Well for GPUs (nextplatform.com)

For many decades, CPU performance has been improving by a significant margin year-over-year. Compare Intel’s cutting edge 33MHz processor from 1985 with their 5.5 GHz processors of 2022 is an improvement of 150x over 37 years. There is a lot more than just clock speed that has improved as well since 1985 – your average smartphone today is comparable to cutting edge super-computers of 1985.

So why are we still sitting around for minutes waiting for seemingly simple calculations to finish?

With every advance in hardware or software performance, we also increase the amount of data and computation that is required of our systems. Our largest customers have some of the biggest supply chains in the world with truly massive requirements for data and computation.

We ruled out FPGAs early in the investigation as being prohibitively expensive and focused exclusively on GPUs. Although there has been a lot of investment in and improvement of FPGA developer efficiency such as Xilinx Vitis and Intel oneAPI, the cost is still too high for a pure software company like Kinaxis.

GPU acceleration at Kinaxis

Our investigation started out with an enumeration of candidate use cases throughout our product that might benefit from hardware acceleration. We estimated of the following criteria for each use case:

Total cost of ownership
User visible performance impact

The top two candidates for hardware acceleration were database queries and our Multi-Level Search algorithm for supply planning.

First prototype – GPU accelerated database queries

Since the heart of our product is our custom-built versioned database, the idea of database accelerated queries fits naturally into the Kinaxis philosophy and approach.

The two database use cases we developed GPU prototypes for were version resolution and hash joins.

Version resolution

Every query is associated with one or more scenarios, and very often the query will need to scan through every record in one or more database tables in order to resolve which records are visible to each scenario.

Algorithms like version resolution that can be naturally split across the data and processed in parallel are ideal for GPU acceleration. The prototype obtained a significant speedup over the CPU equivalent logic, but to be faster than CPU required the data to be pre-loaded into GPU memory.

Pre-loading the data onto the GPU however is not practical because of it does not have enough memory to store all the tables and we would not be able to use the GPU to accelerate other query features.

Hash join

The second database use case was hash join. This use case is also very well suited to GPU acceleration, and we were able to obtain an even bigger speed up than the version resolution. It proved much more practical than version resolution, since it only makes temporary use of GPU memory in order to see a massive performance improvement.

Outcome of database GPU acceleration investigation

In the end, the cost of attaching a GPU to our database servers would not be worth the limited benefits it would bring. A feature like hash join or version resolution may have a big impact on 5% or 10% of user queries, but because our customers have such a diverse set of queries no one GPU accelerated feature would justify the cost of developing and deploying GPUs with our database.

Second prototype – GPU accelerated supply planning heuristics

After showing that GPU acceleration has significant potential for Kinaxis with a couple of easy technical wins, the investigation focused on difficult to implement but more practical to deploy use cases which:
1. Can be deployed as a multi-tenant service
2. Can improve the entire use case with a single algorithm, not just a fraction of it

The best use case we identified was our core supply planning algorithms and their use by Supply.AI, which optimizes supply plans through a blend of exact and heuristic algorithms.

Our early prototyping in this area has demonstrated that hardware accelerated meta-heuristics have the potential to bring significant value to our customers. We’re quite excited to see where this technology will take us next.

Read more of our Kinaxis Engineering expert blog posts here!

Display option

Main