Libmklccgdll New 🚀 📢

| Workload | Old libmklccg | | Improvement | |----------|----------------|------------------------|--------------| | 3D FFT (2048³, 64 nodes) | 2.4 sec | 1.7 sec | 29% | | ScaLAPACK PDGESV (50k x 50k) | 320 sec | 240 sec | 25% | | Cluster FFT + MPI all-to-all | 180 GB/s | 245 GB/s | 36% |