[FLUKA-TRIUMF] Computing update: job scheduler now available, direct login restricted on some nodes
Camille Belanger-Champagne
cbchampagne at triumf.ca
Thu Jul 18 12:20:36 PDT 2024
Dear FLUKA-users,
All the former FLUKA working nodes have now been incorporated into the computing cluster (see earlier announcement about the formation of the computing cluster below). Only fluka00 and fluka03 are available for direct login, and are designated for specific purposes. DO NOT RUN LONG SIMULATION JOBS ON THESE NODES. They will be killed without warning.
There are now a total of 720 computing cores available in the cluster.
Please refer to the Sharepoint site to learn how to use HTCondor: https://triumfoffice365.sharepoint.com/sites/FLUKA/SitePages/FLUKA-resources-at-TRIUMF.aspx
Best regards,
Camille
From: Camille Belanger-Champagne <cbchampagne at triumf.ca>
Date: Monday, May 6, 2024 at 10:06
To: fluka-users at lists.triumf.ca <fluka-users at lists.triumf.ca>
Subject: Computing update: job scheduler now available, direct login restricted on some nodes
Dear FLUKA-users,
Please note the following changes to the FLUKA computing resources available at TRIUMF. Half the computing nodes (fluka31-44) are no longer available for direct log in. Instead, they have been bundled into a real computing cluster, and they now accept jobs through a job scheduler called HTCondor.
The user site fluka.triumf.ca has been updated with this information as well as instructions and examples on how to use HTCondor: https://triumfoffice365.sharepoint.com/sites/FLUKA/SitePages/FLUKA-resources-at-TRIUMF.aspx
In addition to those written resources, I will host an in-person “HTCondor with Flair and Fluka” tutorial session on Wednesday, May 8th at 1pm in the MOB auditorium. Please bring your computer so you can give it a try live!
The other nodes (fluka12-24) are still available for direct login at the moment, but as all the wrinkles get ironed out of the cluster system, expect the number of direct login nodes to decrease until most of the resources become only available via HTCondor. This system is primarily being implemented to make it easier to run the very large job sets that are needing for the significant safety and design studies necessary to support TRIUMF projects like ARIEL and IAMI but it should benefit all users that run FLUKA simulations at scale by maximizing the use of our computing resources.
Best regards,
Camille
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.triumf.ca/pipermail/fluka-users/attachments/20240718/bcba69c5/attachment.html>
More information about the FLUKA-users
mailing list