[Cfat] system management for computing clusters at TRIUMF

Isabel Trigger itrigger at triumf.ca
Fri Aug 6 10:17:08 PDT 2010


Dear Colleagues,

(I know CFAT doesn't exist anymore, but it's still a useful list.)

I have been struggling for a while with the problem of system management 
for the ATLAS Tier 3 and local desktop cluster at TRIUMF.  Basically 
this is a NIS cluster which allows a number of machines to access ATLAS 
analysis software and lets users log in to all machines with the same 
home directory and so forth.

The problem is that I am effectively the main system manager for the 
cluster - it isn't a TRIUMF central system, so CCN can provide advice, 
and hardware help with the machines that live in the main server room, 
but they can't do day-to-day system management; similarly it is not part 
of the ATLAS Tier 1 centre so the Tier 1 personnel can provide advice 
and tools, and help with problems, but not do basic system management 
for us.

Our group (ATLAS) is now large enough that it is a moderately big job to 
keep all machines up to date (OS upgrades, account creation, monitoring 
and balancing of resource usage, etc.).  It seems inappropriate for only 
one faculty member to have root access to all of the machines in the 
cluster, but inefficient and rather insecure to have multiple faculty 
members as superusers.  It is not at all obvious that this is an 
appropriate task for a physics post-doc either.

I was wondering whether other groups with private clusters had similar 
problems with system management.  T2K and Theory came to mind in the 
Science Division, but I know there are groups like ISAC Controls which 
employ people to do sys administration... probably there are many 
scattered about?  One possibility would be that if there are enough 
groups needing a part-time professional sys-admin, we could try to get 
funding, either through Discovery and Project grants, or perhaps through 
something like an MRS application, to cover all or part of a salary for 
someone to support system administration for group clusters at TRIUMF.

Please let me know your thoughts on this matter.

Isabel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: itrigger.vcf
Type: text/x-vcard
Size: 414 bytes
Desc: not available
Url : http://lists.triumf.ca/pipermail/cfat/attachments/20100806/0d7e11f0/itrigger.vcf


More information about the CFAT mailing list