Thread with 4 posts

jump to expanded post
FCLC , @fclc@mast.hpc.social
(open profile)

@hikari yes, many times over!

The per thread addressable register file per core is:
AVX512(f) vector RF 32x512b -> 2 KiB
AVX512(BW) mask RF 8x64b -> .064 KiB
AMX matrix RF 8x8096b -> 8 KiB
APX GPR 32x64b -> .256 KiB

That ignores that current physical register files tend to be between 2-3X the software addressable RF; helps for performance when context switching by not having to pop all of register state to cache

Open remote post (opens in a new window)