OSG rides a Comet

OSG rides a Comet

Last week at the Supercomputing 2015 conference (SC15) in Austin, TX, the San Diego Supercomputer Center (SDSC) presented details about the US National Science Foundation’s (NSF) newest supercomputer, CometAmong its many features, Comet boasts high-performance virtualization clusters, anunique attribute that secured inclusion in the Open Science Grid’s global infrastructure.  

Comet, the result of an NSF grant worth almost $24 million (€22.7) including hardware and operating funds, is the first eXtreme Science and Engineering Discovery Environment (XSEDE) production system to support high-performance virtualization at the multi-node cluster level. The cluster’s use of Single Root I/O Virtualization (SR-IOV) means researchers can use their own software environment, as they do with cloud computing, but achieve the high performance they expect from a supercomputer.

“Together, we're creating a seamless interface between the nation's two leading open scientific computing infrastructures – OSG and XSEDE.” ~ Michael Norman. 

“Scientists at campuses across the nation will be able to transparently compute from their desktops, labs, and campus infrastructures onto Comet, significantly expanding the reach of our new cluster toward what’s called the ‘long tail’ of science, or the idea that the large number of modestly-sized computationally-based research projects represents, in aggregate, a tremendous amount of research that can yield scientific discovery,” says SDSC director Michael Norman. “We’re already seeing interest in Comet’s virtual clusters from other institutions, and expect that additional projects will enter production with them in the coming months.”

Beginning with the next XSEDE allocation review in December, it will be possible to request allocations transparently across Comet and OSG.


Blazing Comet

  • Comet is a Dell-integrated cluster using Intel’s Xeon® Processor E5-2600 v3 family, with two processors per node and 12 cores per processor running at 2.5 GHz.
  • Each compute node has 128 GB (gigabytes) of traditional DRAM and 320 GB of local flash memory. There are 27 racks of these compute nodes, totaling 1,944 nodes or 46,656 cores.
  • Comet has four large-memory nodes, each with four 16-core processors and 1.5 TB of memory, as well as 36 GPU nodes, each with four NVIDIA GPUs (graphic processing units).

Comet’s ‘bare metal’-like approach means that a virtual cluster looks, feels, and performs almost exactly like the physical hardware. This enabled OSG to dynamically turn servers provisioned by Comet into an HTCondor pool and add new capability with very little additional overhead and significantly reduced administrative burden.

The integration of Comet into the OSG provisioning system was led by a team including UC San Diego professor Frank Würthwein, an expert in experimental particle physics and advanced computation. “Everybody wins in this collaboration, as OSG members are already conducting scientific research on this expanded infrastructure,” says Würthwein. “OSG’s user community across physics, chemistry, biology, mathematics, and the social sciences gains transparent access to new capabilities, and neither SDSC nor OSG system engineers need to maintain a large new set of services that they wouldn’t be supporting anyway.”

Würthwein joined SDSC, an Organized Research Unit of UC San Diego, in January 2015 to help implement a high-capacity data cyberinfrastructure across all UC campuses, as well as connect to key cyberinfrastructure organizations such as OSG. Würthwein was OSG’s founding executive during 2005, and has again served as its executive director since February 2015.

“We are pioneering the area of virtualized clusters, specifically with SR-IOV,” said Philip Papadopoulos, SDSC’s chief technical officer. “This will allow virtual sub-clusters to run applications over InfiniBand at near-native speeds – and that marks a huge step forward in HPC virtualization. In fact, a key part of this is virtualization for customized software stacks, which will lower the entry barrier for a wide range of researchers by letting them project an environment they already know onto Comet.”