Blackwell can be said to be NVIDIA’s most significant GPU architecture change in recent years. It introduces neural network shaders in an epoch-making way, creating an advanced, high-speed and more realistic rendering method for games.
The Blackwell architecture design focuses on four directions: optimizing new neural network operations, reducing memory consumption, introducing new service quality functions, and improving energy efficiency.
Let’s look at the core first. Compared with the coloring core (that is, the CUDA core) in the Ada architecture SM (Streaming Multiprocessor), it is split into half and can be dynamically adjusted according to needs to process FP32 (single-precision floating point numbers) and INT32 (32-bit integers). With half dedicated to handling FP32, Blackwell SM’s shading core was changed to handle FP32 and INT32 dynamically on demand.
At the same time, in the past, the shading workload was only handled by these shading cores. Now it is changed to use the neural network shading method to incorporate the 5th generation Tensor core owned by the Blackwell architecture to share the shading workload.
In addition, compared to the Ada architecture’s Shader Excution Reordering (SER), which focuses on the reordering of ray tracing workloads, the Blackwell architecture further sorts neural network shading workloads and allocates more traditional shading workloads. For the CUDA core, workloads that require neural network operations are assigned to the Tensor core, and both cores can be used at the same time, increasing the overall efficiency by 2 times.
The 5th generation Tensor core supports accelerated processing of FP4 precision models. Compared with the FP8 precision model supported by the 4th generation Tensor core of the Ada architecture, the data throughput can reach 2 times, which is equivalent to 32 times the throughput of the Pascal core, so it can meet the needs of DLSS 4’s multi-frame generation technology.
In addition, compared with the third-generation ray tracing core of the Ada architecture, the fourth-generation ray tracing core of the Blackwell architecture maintains the Box Intersection Engine and Opacity Micromap Engine, expands the Triangle Intersection Engine into the Triangle Cluster Intersection Engine, and then adds the Triangle Cluster Decompression Engine. and Linear Swept Spheres.
Taken together, the polygon interaction rate of Blackwell architecture ray tracing can reach 2 times that of Ada architecture, but the memory consumption is only 75% of that of Ada architecture.
Through more advanced resource scheduling, the Blackwell architecture allows multiple internal cores to simultaneously handle loads such as shading, ray tracing, and AI computing, significantly shortening rendering response times and thereby improving overall performance.
In terms of memory, the GDDR6X memory signal used by the first two generations of Ampere and Ada GPUs adopts PAM4 encoding. The GDDR7 memory signal encoding used by Blackwell GPU is changed to PAM3. The noise distortion is smaller and the signal quality is clearer, thus bringing better quality. With high operating frequency and lower voltage, the data rate can reach 2 times that of GDDR6, the power consumption per bit is close to half of GDDR6, and the energy efficiency is 2 times that of GDDR6.
The Blackwell architecture has a more advanced energy-saving design. Compared with the Ada architecture, Blackwell has better performance and can complete the workload earlier and enter the low-power state. Through the new clock and power supply control, the low-power state can be achieved The efficiency is better, and the delay in entering deep sleep is less. In total, the Blackwell architecture can save up to 50% of power consumption compared to the Ada architecture when it enters the energy-saving state.
The Blackwell architecture also has a faster clock control mechanism that can dynamically adjust the shader clock according to workload requirements, which of course uses energy more efficiently.
The display engine and video encoding core have also been significantly updated. First of all, Blackwell finally supports DisplayPort 2.1 UHBR 20, with video output bandwidth up to 20 Gbps. It also introduces high-speed hardware Flip Metering to make frame output more stable.
Blackwell’s 9th generation encoder and 6th generation decoder bring AV1 UHQ (Ultra High Quality AV1) and MV-HEVC (Multiple View HEVC) codecs, H.264 decoding capability doubled, and YUV422 codec.
For more information about GeForce RTX 50 series GPUs, please stay tuned to the tracking reports on this site.