From itrigger at triumf.ca Fri Aug 6 10:17:08 2010 From: itrigger at triumf.ca (Isabel Trigger) Date: Fri Aug 6 10:17:14 2010 Subject: [Cfat] system management for computing clusters at TRIUMF Message-ID: <4C5C4394.5000203@triumf.ca> Dear Colleagues, (I know CFAT doesn't exist anymore, but it's still a useful list.) I have been struggling for a while with the problem of system management for the ATLAS Tier 3 and local desktop cluster at TRIUMF. Basically this is a NIS cluster which allows a number of machines to access ATLAS analysis software and lets users log in to all machines with the same home directory and so forth. The problem is that I am effectively the main system manager for the cluster - it isn't a TRIUMF central system, so CCN can provide advice, and hardware help with the machines that live in the main server room, but they can't do day-to-day system management; similarly it is not part of the ATLAS Tier 1 centre so the Tier 1 personnel can provide advice and tools, and help with problems, but not do basic system management for us. Our group (ATLAS) is now large enough that it is a moderately big job to keep all machines up to date (OS upgrades, account creation, monitoring and balancing of resource usage, etc.). It seems inappropriate for only one faculty member to have root access to all of the machines in the cluster, but inefficient and rather insecure to have multiple faculty members as superusers. It is not at all obvious that this is an appropriate task for a physics post-doc either. I was wondering whether other groups with private clusters had similar problems with system management. T2K and Theory came to mind in the Science Division, but I know there are groups like ISAC Controls which employ people to do sys administration... probably there are many scattered about? One possibility would be that if there are enough groups needing a part-time professional sys-admin, we could try to get funding, either through Discovery and Project grants, or perhaps through something like an MRS application, to cover all or part of a salary for someone to support system administration for group clusters at TRIUMF. Please let me know your thoughts on this matter. Isabel -------------- next part -------------- A non-text attachment was scrubbed... Name: itrigger.vcf Type: text/x-vcard Size: 414 bytes Desc: not available Url : http://lists.triumf.ca/pipermail/cfat/attachments/20100806/0d7e11f0/itrigger.vcf From itrigger at triumf.ca Fri Aug 6 12:07:16 2010 From: itrigger at triumf.ca (Isabel Trigger) Date: Fri Aug 6 12:07:27 2010 Subject: [Cfat] Re: system management for computing clusters at TRIUMF In-Reply-To: <4C5C5602.4080807@triumf.ca> References: <4C5C4394.5000203@triumf.ca> <4C5C5602.4080807@triumf.ca> Message-ID: <4C5C5D64.1090202@triumf.ca> Hi Sonia, Thanks. I will wait for some more feedback and then try to put together something a bit more coherent to ask Gordon and Nigel. Unfortunately it looks as if the MRS program http://www.nserc-crsng.gc.ca/Professors-Professeurs/RTII-OIRI/index_eng.asp may have been rescoped to the point where it can't be used to support sys admins. It's not off-limits to TRIUMF, but it is hard to argue that local sys admins provide a unique facility. I know that Theory grants do not have large fractions of a person's salary floating around in the spare change. Actually nor do subatomic Project grants - the total may be larger, but it is committed to specific things and can't just be spent on whatever we want. Supporting local computing is always a very touchy topic in our split meetings - "Surely your university pays for THAT?" Akira: It is rare to find a post-doc who has the required competencies, is completely reliable, and is willing to spend a large fraction of his/her time on system administration. Building a large computing centre is an exciting challenge and looks good on a CV; maintaining a local cluster is frankly not that thrilling. cheers, Isabel On 08/06/2010 11:35 AM, Sonia Bacca wrote: > Hi Isabel et al, > > I also think that we would need more help in managing the computers at > TRIUMF. > Opposite to what Akira said, I had heard that additional personnel for > the CCN group > WAS recommended by the review committee. (?) > I think the science division head knows about the problem, but it would > be good to make > again the case all together. > > Concerning NSERC money, I think that we theorists have such small grants > that it would > not be possible to take money there for this. If you are thinking about > putting in > a new proposal all together, then I am for it and I could present the > perspective of the Theory > Cluster Cougar (even though it is operating since few months only). > Richard will write more about all other computing help needs in the > theory group. > > What is this MRS? If one can apply only from the University then it > sound not very likely > to get help there... > Maybe the best would be to make the case to the science division and the > director. > > Cheers, > Sonia > > > Isabel Trigger wrote: >> Dear Colleagues, >> >> (I know CFAT doesn't exist anymore, but it's still a useful list.) >> >> I have been struggling for a while with the problem of system >> management for the ATLAS Tier 3 and local desktop cluster at TRIUMF. >> Basically this is a NIS cluster which allows a number of machines to >> access ATLAS analysis software and lets users log in to all machines >> with the same home directory and so forth. >> >> The problem is that I am effectively the main system manager for the >> cluster - it isn't a TRIUMF central system, so CCN can provide advice, >> and hardware help with the machines that live in the main server room, >> but they can't do day-to-day system management; similarly it is not >> part of the ATLAS Tier 1 centre so the Tier 1 personnel can provide >> advice and tools, and help with problems, but not do basic system >> management for us. >> >> Our group (ATLAS) is now large enough that it is a moderately big job >> to keep all machines up to date (OS upgrades, account creation, >> monitoring and balancing of resource usage, etc.). It seems >> inappropriate for only one faculty member to have root access to all >> of the machines in the cluster, but inefficient and rather insecure to >> have multiple faculty members as superusers. It is not at all obvious >> that this is an appropriate task for a physics post-doc either. >> >> I was wondering whether other groups with private clusters had similar >> problems with system management. T2K and Theory came to mind in the >> Science Division, but I know there are groups like ISAC Controls which >> employ people to do sys administration... probably there are many >> scattered about? One possibility would be that if there are enough >> groups needing a part-time professional sys-admin, we could try to get >> funding, either through Discovery and Project grants, or perhaps >> through something like an MRS application, to cover all or part of a >> salary for someone to support system administration for group clusters >> at TRIUMF. >> >> Please let me know your thoughts on this matter. >> >> Isabel > > -------------- next part -------------- A non-text attachment was scrubbed... Name: itrigger.vcf Type: text/x-vcard Size: 414 bytes Desc: not available Url : http://lists.triumf.ca/pipermail/cfat/attachments/20100806/b3db464c/itrigger.vcf