The GTX 1070, the second commercially available card to use the Pascal architecture
|
|
Fabrication process | 16 nm |
---|---|
History | |
Predecessor | Maxwell |
Successor | Volta |
Pascal is the codename for a GPU microarchitecture developed by Nvidia as the successor to the Maxwell microarchitecture. The Pascal microarchitecture was introduced April 2016 with the GP100 chip. The architecture is named after Blaise Pascal, the 17th century mathematician.
On May 27, 2016 the GP104 chip to be found on the GeForce GTX 10XX branded graphics cards. Graphics cards are part of the GeForce 10 series.
In March 2014, Nvidia announced that the successor to Maxwell would be the Pascal microarchitecture; announced on the 6th May 2016 and released on the 27th May 2016. The Tesla P100 (GP100 chip) has a different version of the Pascal architecture compared to the GTX GPUs (GP104 chip). The shader units in GP104 have a rather Maxwell-like design.
Architectural improvements of the GP100 architecture include the following:
Architectural improvements of the GP104 architecture include the following:
A chip is partitioned into Graphics Processor Clusters (GPCs). For the GP104 chips, a GPC engulfs 5 SMs.
A "Streaming Multiprocessor" corresponds to AMD's Compute Unit. An SMP encompasses 128 single-precision ALUs ("CUDA cores") on GP104 chips and 64 single-precision ALUs on GP100 chips.
What AMD calls a CU (compute unit) can be compared to what Nvidia calls an SM (streaming multiprocessor). While all CU versions consist of 64 shader processors (i.e. 4 SIMD Vector Units (each 16-lane wide)= 64), Nvidia (regularly calling shader processors "CUDA cores") experimented with very different numbers:
The Polymorph Engine version 4.0 is the unit responsible for Tessellation. It corresponds functionally with AMD's Geometric Processor. It has been moved from the shader module to the TPC to allow one Polymorph engine to feed multiple SMs within the TPC.
On the GP104 chip an SM consists of 128 single-precision ALUs ("CUDA cores"), on the GP100 of 64 single-precision ALUs. Due to different organization of the chips, like number of double precision ALUs, the theoretical double precision performance of the GP100 is half of the theoretical one for single precision; the ratio is 1/32 for the GP104 chip.