[Cfat] Re: system management for computing clusters at TRIUMF

Isabel Trigger itrigger at triumf.ca
Fri Aug 6 12:07:16 PDT 2010


Hi Sonia,

Thanks.  I will wait for some more feedback and then try to put together 
something a bit more coherent to ask Gordon and Nigel.

Unfortunately it looks as if the MRS program 
http://www.nserc-crsng.gc.ca/Professors-Professeurs/RTII-OIRI/index_eng.asp 
may have been rescoped to the point where it can't be used to support 
sys admins.  It's not off-limits to TRIUMF, but it is hard to argue that 
local sys admins provide a unique facility.

I know that Theory grants do not have large fractions of a person's 
salary floating around in the spare change.  Actually nor do subatomic 
Project grants - the total may be larger, but it is committed to 
specific things and can't just be spent on whatever we want.  Supporting 
local computing is always a very touchy topic in our split meetings - 
"Surely your university pays for THAT?"

Akira: It is rare to find a post-doc who has the required competencies, 
is completely reliable, and is willing to spend a large fraction of 
his/her time on system administration.  Building a large computing 
centre is an exciting challenge and looks good on a CV; maintaining a 
local cluster is frankly not that thrilling.

cheers,
Isabel

On 08/06/2010 11:35 AM, Sonia Bacca wrote:
> Hi Isabel et al,
>
> I also think that we would need more help in managing the computers at
> TRIUMF.
> Opposite to what Akira said, I had heard that additional personnel for
> the CCN group
> WAS recommended by the review committee. (?)
> I think the science division head knows about the problem, but it would
> be good to make
> again the case all together.
>
> Concerning NSERC money, I think that we theorists have such small grants
> that it would
> not be possible to take money there for this. If you are thinking about
> putting in
> a new proposal all together, then I am for it and I could present the
> perspective of the Theory
> Cluster Cougar (even though it is operating since few months only).
> Richard will write more about all other computing help needs in the
> theory group.
>
> What is this MRS? If one can apply only from the University then it
> sound not very likely
> to get help there...
> Maybe the best would be to make the case to the science division and the
> director.
>
> Cheers,
> Sonia
>
>
> Isabel Trigger wrote:
>> Dear Colleagues,
>>
>> (I know CFAT doesn't exist anymore, but it's still a useful list.)
>>
>> I have been struggling for a while with the problem of system
>> management for the ATLAS Tier 3 and local desktop cluster at TRIUMF.
>> Basically this is a NIS cluster which allows a number of machines to
>> access ATLAS analysis software and lets users log in to all machines
>> with the same home directory and so forth.
>>
>> The problem is that I am effectively the main system manager for the
>> cluster - it isn't a TRIUMF central system, so CCN can provide advice,
>> and hardware help with the machines that live in the main server room,
>> but they can't do day-to-day system management; similarly it is not
>> part of the ATLAS Tier 1 centre so the Tier 1 personnel can provide
>> advice and tools, and help with problems, but not do basic system
>> management for us.
>>
>> Our group (ATLAS) is now large enough that it is a moderately big job
>> to keep all machines up to date (OS upgrades, account creation,
>> monitoring and balancing of resource usage, etc.). It seems
>> inappropriate for only one faculty member to have root access to all
>> of the machines in the cluster, but inefficient and rather insecure to
>> have multiple faculty members as superusers. It is not at all obvious
>> that this is an appropriate task for a physics post-doc either.
>>
>> I was wondering whether other groups with private clusters had similar
>> problems with system management. T2K and Theory came to mind in the
>> Science Division, but I know there are groups like ISAC Controls which
>> employ people to do sys administration... probably there are many
>> scattered about? One possibility would be that if there are enough
>> groups needing a part-time professional sys-admin, we could try to get
>> funding, either through Discovery and Project grants, or perhaps
>> through something like an MRS application, to cover all or part of a
>> salary for someone to support system administration for group clusters
>> at TRIUMF.
>>
>> Please let me know your thoughts on this matter.
>>
>> Isabel
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: itrigger.vcf
Type: text/x-vcard
Size: 414 bytes
Desc: not available
Url : http://lists.triumf.ca/pipermail/cfat/attachments/20100806/b3db464c/itrigger.vcf


More information about the CFAT mailing list