[Cfat] system management for computing clusters at TRIUMF
Isabel Trigger
itrigger at triumf.ca
Fri Aug 6 10:17:08 PDT 2010
Dear Colleagues,
(I know CFAT doesn't exist anymore, but it's still a useful list.)
I have been struggling for a while with the problem of system management
for the ATLAS Tier 3 and local desktop cluster at TRIUMF. Basically
this is a NIS cluster which allows a number of machines to access ATLAS
analysis software and lets users log in to all machines with the same
home directory and so forth.
The problem is that I am effectively the main system manager for the
cluster - it isn't a TRIUMF central system, so CCN can provide advice,
and hardware help with the machines that live in the main server room,
but they can't do day-to-day system management; similarly it is not part
of the ATLAS Tier 1 centre so the Tier 1 personnel can provide advice
and tools, and help with problems, but not do basic system management
for us.
Our group (ATLAS) is now large enough that it is a moderately big job to
keep all machines up to date (OS upgrades, account creation, monitoring
and balancing of resource usage, etc.). It seems inappropriate for only
one faculty member to have root access to all of the machines in the
cluster, but inefficient and rather insecure to have multiple faculty
members as superusers. It is not at all obvious that this is an
appropriate task for a physics post-doc either.
I was wondering whether other groups with private clusters had similar
problems with system management. T2K and Theory came to mind in the
Science Division, but I know there are groups like ISAC Controls which
employ people to do sys administration... probably there are many
scattered about? One possibility would be that if there are enough
groups needing a part-time professional sys-admin, we could try to get
funding, either through Discovery and Project grants, or perhaps through
something like an MRS application, to cover all or part of a salary for
someone to support system administration for group clusters at TRIUMF.
Please let me know your thoughts on this matter.
Isabel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: itrigger.vcf
Type: text/x-vcard
Size: 414 bytes
Desc: not available
Url : http://lists.triumf.ca/pipermail/cfat/attachments/20100806/0d7e11f0/itrigger.vcf
More information about the CFAT
mailing list