Microsoft planetscale AI infrastructure GPUs Can Be up-to 100K


As per the latest reports, it has been speculated that the Microsoft planetscale AI infrastructure GPUs count can reach up to 100000 with 692GB RAM. This will improve the overall infrastructure and it will be packed with grand new latest features.

Microsoft has recently disclosed that it runs a global distributed scheduling service for AI workloads that it humbly refers to as “Singularity.”

According to a pre-press report (PDF) that was written about Singularity and was written in collaboration by 26 employees of Microsoft, the company’s goal is to assist the software giant in controlling costs by achieving high utilisation for deep learning workloads.

This objective is accomplished by Singularity by utilising what is described in the paper as a “novel workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilisation across a global fleet of AI accelerators (such as GPUs, FPGAs), without impacting their correctness or performance.”

The paper devotes a greater portion of its content to the scheduler than it does to Singularity itself, however it does provide some graphics that depict the architecture of the system. A test was conducted on Nvidia DGX-2 servers using a Xeon Platinum 8168 processor with two sockets of 20 cores each, eight V100 Model GPUs per server, 692GB of RAM, and InfiniBand for networking.

This test was mentioned in an examination of the performance of Singularity. Microsoft has at least tens of thousands of servers that fit this description thanks to the fact that the Singularity fleet contains hundreds of thousands of GPUs, in addition to FPGAs and probably other accelerators.

The artificial intelligence system known as Singularity, developed by Microsoft!!

In this paper, the emphasis is placed on Singularity’s scaling technology and schedulers, which the company claims are the company’s “secret sauce” due to the fact that they save costs and boost reliability.

This means that when jobs scale up or down, “we simply change the number of devices the workers are mapped to: this is completely transparent to the user, as the world-size (i.e. total number of workers) of the job remains the same regardless of the number of physical devices running the job.” Since the software automatically decouples jobs from accelerator resources, this means that when jobs scale up or down, “we simply change the number of devices the workers are mapped to.”

This is made feasible by “a unique approach called replica splicing that makes it possible to time-slice several workers on the same device with low overhead, while at the same time permitting each worker to utilise the complete device memory.”

To make something like that happen, you need what the authors call a “device proxy is a word that refers to a programme that “runs in its own address space” and has a correspondence of one-to-one with a physical accelerator device. When a task worker begins device APIs, the requests are intercepted and forwarded via the shared memory to the device proxy process. This process operates in a distinct address space and has a lifetime that is independent from that of the worker process.”

Because of the aforementioned, it is now able to schedule more jobs in an efficient manner, which allows the thousands of servers to be in service for an extended period of time. In addition to this, it enables rapid scalability, either up or down, without causing disruption.

The conclusion of the paper states that “Singularity achieves a significant breakthrough in scheduling deep learning workloads,” transforming previously niche features such as elasticity into mainstream, always-on features that the scheduler can rely on for the purpose of implementing stringent service level agreements (SLAs).

Sadly, the document does not reveal any of Microsoft’s own research or approaches that are freely shared with the public; nevertheless, it does provide light on the AI activities of the firm.

That’s regarding the Microsoft planetscale AI infrastructure GPUs and RAMs. If you would like to know more, then comment below and don’t forget to share your thoughts on the singularity project on social media platforms!