SL5 Experience May 2009
Hi all,
In response to the various requests to have a look at the glite 3.2 / SL5 WN install, we've done just that, and after a few hitches (more below) the resulting sixteen core system is available on the grid via our latest computing element, t2ce05.physics.ox.ac.uk. At the moment, according to the information system[1] it's the only SL5 service on the production grid in the UK, and one of only a handful on EGEE as a whole. Yay us :-)
At the moment it has no VO software installed, and is only accepting jobs for ops and dteam, but it should otherwise be fully capable, and we'd hope (subject to some discussion) to open it up to real VO work fairly shortly, and then we can see about moving more worker nodes from the SL4 system across to the SL5 one in due course. In the meantime, please send it any
test jobs you like.
The hitches we've hit so far are:
- Installation:
The installation guide still refers to installing the glite-WN metapackage, but it doesn't exist any more. This is a very bad thing, but the short version is that you need to 'yum groupinstall glite-WN' instead. AIUI this is the same as the 64 bit SL4 WN installation.
- Test failures:
Initially the new nodes failed two SAM tests:
- the CE-sft-lcg-rm-free test initially went into a warning state[1] because of a lack of the ldapsearch tool on the WN. On a glite 3.1/SL4 node this is included in the vdt_globus_info_essentials-VDT1.6.1x86_rhas_4-7 package, but the corresponding ones on the glite 3.2/SL5 WN don't have it. I fixed it by simply installing the openldap-clients package from SL.
- the CE-sft-brokerinfo test showed an 'error' state since it involves runningthe glite-brokerinfo command on the worker node. This is installed, but is linked against libclassad_ns.so.0. That's also installed, but not in a directory in the linker's search path. Adding:
gridpath_prepend "LD_LIBRARY_PATH" "/opt/classads/lib64/"
to /etc/profile.d/grid-env.sh fixes it.
- Gstat:
Gstat (http://gstat.gridops.org/gstat/UKI-SOUTHGRID-OX-HEP/) currently shows all three CEs as working, but the site overall in an error state. This is because it doesn't recognise SL5.3 as a valid operating system. It's just comparing against an out-of-date static list though; according to the documentation at:
http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_the_OS_name
(which gstat refers to) the publishing's actually fine.
I'll expand on the many excellent reasons why the glite-WN metapackage
shouldn't have
been replaced with a yum group in another email, but for now I have some
questions:
- There are lists of the RPMs that the VOs want to be installed on worker nodes; are there separate lists for SL5 workers? Given the failures on two basic SAM tests with the 'out-of-the-box' configuration I suspect that there will be other things that are not immediately available on a basic SL5 WN that are assumed on SL4.
- Are the VOs ready enough for SL5 that I should be able to open this system up to general use without it taking and killing jobs that expect but don't request an SL4 environment? If only some of the VOs, which ones?
- I've given the SL5 system a new VO software area; at the moment it's empty. Does anyone think it would be a good idea to 'seed' it with a copy of the SL4 area? So far, I'm mainly thinking 'no'.
Ewan