DCache FAQ
This FAQ will attempt to document the experiences of deploying dCache at the UK institutions involved in the LCG. A lot of this information can be found by trawling through the GRIDPP-STORAGE list archives. Clearly, dCache is developing software, as is the yaim method of installing it for LCG sites. Therefore the issue highlighted below may become fixed in later releases. The current release of dCache is 1.6.5 and LCG 2.6.0.
Contents
- 1 How do I install a dCache pool node that is not running SL3 (or the LCG middleware stack)?
- 2 How do you prevent srm-advisory-delete from hanging forever?
- 3 How can you view the contents of the files in dCache via the /pnfs namespace?
- 4 How does dCache manage pools?
- 5 How do you add pools to an existing setup?
- 6 How do I create a gridftp door on a pool node?
- 7 How do I record the DN of a user who is uploading files?
- 8 How do I get the SRM cell online after installation?
- 9 My site uses a non-standard Globus TCP port range. How do I configure dCache to use this?
- 10 How do I publish information about my SRM?
- 11 How do I add a VO to an existing setup? (PNFS databases)
- 12 How do I map a VO to a pool?
- 13 How do I flush and then remove a pool?
- 14 How do I start and stop the postmaster process?
- 15 How do use dCache with dual homed machines?
- 16 How do I migrate the pnfs database to a new machine?
- 17 How do I NFS mount storage onto a pool node?
- 18 How do I use ssh keys to interface with the admin module?
- 19 How do I list the pnfs directory tags?
- 20 Which filesystem should the OS use on the pool nodes?
- 21 Which ports does dCache use?
- 22 What happens if a pools local filesystem fills up?
- 23 I cannot get some dCache components (i.e. doors) to start up, why?
- 24 What do I do after upgrading my CA certificates?
- 25 What does it mean if a file is locked or bad?
- 26 Configuring a GridFTP door not to be used by the SRM
- 27 What does the output of rep ls mean?
- 28 What does java.io.EOFException: EOF on input socket (fillBuffer) in the log files mean?
How do I install a dCache pool node that is not running SL3 (or the LCG middleware stack)?
A problem that some sites have is that they want to operate dCache pool nodes using disk servers which may not use SL3 nor have any of the LCG middleware stack installed. In this case, sites can still set up dCache on these pool nodes by manualling installing the dCache server and client rpms and configuring the software (i.e. adding entries to $poolname.poollist) by hand. Running YAIM is not required. Refer to the dCache documentation that is packaged with the rpms for further instructions for installing the rpms. Also, see this FAQ entry.
How do you prevent srm-advisory-delete from hanging forever?
add -connect_to_wsdl=true to the command line.
How can you view the contents of the files in dCache via the /pnfs namespace?
Data files do not actually reside in the /pnfs namespace and errors will occur if you try to read/write to the files contained there. Only non-I/O UNIX commands may typically be issued in the /pnfs namespace (i.e. ls, pwd) To read files from pnfs using standard UNIX commands directly the system needs to over-ride the normal POSIX interface. Preloading is the name given to this process in the UNIX operating system. To over-ride the system, the enviroment varable LD_PRELOAD must be set.
# export LD_PRELOAD=/opt/d-cache/dcap/lib/libpdcap.so
This instructs libary linking to link libpdcap.so in preferance to the system libraries and so allows unmodified UNIX applications to be able to read from pnfs. NOTE: If you export this environmental variable, you will not be able to modify the contents of the /pnfs/fs/admin/ directory tree or evan view the contents of /pnfs/fs/README. An error such as:
# cat /pnfs/fs/README Command failed! Server error message for [1]: "Couldn't determine hsmType" (errno 37). Failed open file in the dCache. cat: README: Input/output error
will be generated.
How does dCache manage pools?
One dcache pool process manages all the dcache pools on a node. Check the
/opt/d-cache/config/`hostname`.poollist
file, it should have one line for each pool. So for a host called "srm" with a pool path of /export/data??, the poollist takes the form:
srm_1 /export/data01/pool sticky=allowed recover-space recover-control recover-anyway lfs=precious tag.hostname=srm.epcc.ed.ac.uk srm_2 /export/data02/pool sticky=allowed recover-space recover-control recover-anyway lfs=precious tag.hostname=srm.epcc.ed.ac.uk
The dCache pool process can be started/stopped using
# service dcache-pool start|stop
How do you add pools to an existing setup?
To add a pool to a node by hand:
- Ensure the d-cache-core rpm is installed.
- Stop dcache-pool services.
- Create the pool directories:
# mkdir -p /path/to/pool/control # mkdir -p /path/to/pool/data
- Create the pool setup file. The easiest is to copy from another pool to /path/to/new/pool/setup and fix the diskspace. Ensure that the following are set:
set max diskspace [Size of pool in GB]g mover set max active 100
- Add the new pool to the file /opt/dcache/config/[FQDN of pool node].poollist. Use the format:
[FQDN of pool node]_[next integer in sequence] /path/to/new/pool sticky=allowed recover-space recover-control \ recover-anyway lfs=precious tag.hostname=[HOSTNAME]
- Start dcache-pool services.
- After a few minutes check the http and admin interfaces to see if the new pool has registered itself.
How do I create a gridftp door on a pool node?
How do I mount pnfs on another node
Creating doors on the pool nodes takes the burden away from the admin node and makes the dCache system a more scalable SRM solution as the number of pool nodes increases. To create a grid-ftp door, follow the following procedure: pnfs does not need to be installed on the pool node. There should only be one pnfs running in your entire dcache instance. However, the pnfs service from the admin node must be nfs mounted on the pool node. On the admin node, go to /pnfs/fs/admin/etc/exports and copy the file 127.0.0.1
to a file that has the IP address of your pool node as the name, then cd into the trusted directory and do the same. Then on the pool node add something like:
srm.epcc.ed.ac.uk:/fs /pnfs/fs nfs hard,intr,rw,noac,auto 0 0
to your /etc/fstab, create the /pnfs/fs directory, mount it and create the symlink.
- NOTE 1: it is {hostname}:/fs and not {hostname}:/pnfs/fs that you need to use here.
- NOTE 2: nfs must not be running on the admin node.
- NOTE 3: noac should definitely be used, otherwise you may run into synchronisation problems.
Then, to run a gridftp door on your pool node: Set the /opt/d-cache/etc/door_config file on the pool node to have the following entries:
GSIDCAP no GRIDFTP yes SRM no
and then run /opt/d-cache/install/install_doors.sh
and then start dcache-opt. dCache uses the name of things to work out which one to use. So if you have two gridftp doors named GFTP then the one that started last gets used. dCache randomly picks one of the open gridftp doors to use for a particular file transfer. You can see which is being used by looking for the TURL that is used in the transfer.
Changing the web moitoring page to report the status of the new door
Once the new door has been added, by default its status is not displayed on the web monitoring page. In fact, this is true in any case where the install_doors.sh script is executed. What the script does is change the door name to something more unique (GFTP-`hostname -s` basically) so you don't get name collisions. You can change the names yourself in the /opt/d-cache/config/*.batch files. In the case of the new pool door, it is necessary to add an new entry at the bottom of /opt/d-cache/config/httpd.batch. If the pool node is called dcache.epcc.ed.ac.uk, then the default behaviour install_doors.sh will be to create a new gridftp door called GFTP-dcache. If this is the case, you should add a corresponding line in the httpd.batch file:
# create diskCacheV111.cells.WebCollectorV3 collector \ "PnfsManager \ PoolManager \ GFTP \ SRM \ DCap-gsi \ GFTP-dcache \ -replyObject" #
and then stop and start the httpd service:
/opt/d-cache/jobs/httpd stop /opt/d-cache/jobs/httpd -logfile=/opt/d-cache/log/http.log start
If the dcache-opt services are running on the pool node, the new door will appear online in the web page after a few minutes.
Non-SL3/Non-LCG Pool node
If your pool node is not running the LCG middleware stack, then you will have had to install the dCache rpms by hand. If you subsequently add a gridftp door to this pool node then you will find that attempting to srmcp or globus-url-copy into the dCache using this door will fail with an error like:
535 Authentication failed: GSSException: Failure unspecified at GSS-API level [Caused by: Unknown CA]
This is due to the fact that the pool node does not have any of the LCG security components installed (i.e. the list of CAs and the grid-map-file/dcache.kpwd file). The following steps should be exeecuted to rectify this situation and allow the door to operate properly (Thanks to Kostas Georgiou). Download and install the following edg component (or the most recent version of it):
# wget http://grid-deployment.web.cern.ch/grid-deployment/gis/apt/LCG-2_6_0/sl3/ \ en/i386/RPMS.lcg_sl3/edg-utils-system-1.7.0-1.noarch.rpm # rpm -Uvh edg-utils-system-1.7.0-1.noarch.rpm # echo "37 4,10,16,22 * * * root /opt/edg/etc/cron/edg-fetch-crl-cron >> /var/log/edg-fetch-crl-cron.log 2>&1" > /etc/cron.d/edg-fetch-crl # chmod 755 /etc/cron.d/edg-fetch-crl
Then install the list of CAs using yum:
# echo "yum lcg2_CA http://grid-deployment.web.cern.ch/grid-deployment/gis/apt/LCG_CA/en/i386/RPMS.lcg/" >> /etc/sysconfig/rhn/sources # up2date-nox -u --nosig lcg-CA
or apt-get:
# echo "rpm http://grid-deployment.web.cern.ch/grid-deployment/gis apt/LCG_CA/en/i386 lcg/" >> /etc/apt/sources.list.d/lcg-ca.list # apt-get install lcg-CA
Then, if you do not want to install the mkgridmap software, you should use the following script (or one tailored to your system) that will set up a cron job to copy the dcache.kpwd file from the admin node.
# cat <<'EOM' > /etc/cron.hourly/getkpwd.sh #!/bin/bash #Quick hack to get dcache.kpwd files from the admin node KPWDDIR="/opt/d-cache/etc" KPWD="$KPWDDIR/dcache.kpwd" KPWDOLD="$KPWDDIR/dcache.kpwd-old" KPWDTMP=`mktemp $KPWDDIR/dcache.kpwd-XXXXXX` ADMINNODE="dcacheadmin.hep.ph.ic.ac.uk" [ -z "$KPWDTMP" -a -r "$KPWDTMP" ] && exit ssh -o PasswordAuthentication=no -n2akxe none -i /root/.ssh/id_dsa_dcache "$ADMINNODE" 2>/dev/null | egrep -v "(edginfo|$ADMINNODE)" > $KPWDTMP 2>/dev/null if [ -f "$KPWDTMP" -a -s "$KPWDTMP" ]; then # Need to check that the file is sane /bin/mv -f "$KPWD" "$KPWDOLD" /bin/mv -f "$KPWDTMP" "$KPWD" else # Failed /bin/rm -f "$KPWDTMP" fi EOM
See the bash man page if you are unsure what parts of this script are trying to do. Create the public-private key pair on the pool node:
ssh-keygen -t dsa -f /root/.ssh/id_dsa_dcache -N ""
and copy the public part into .ssh/authorized_keys on the admin node, along with the following options:
# cat .ssh/authorized_keys from="apool.hep.ph.ic.ac.uk,anotherpool.hep.ph.ic.ac.uk",command="/bin/cat /opt/d-cache/etc/dcache.kpwd",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-dss ............
The dcache.kpwd file should now be updated by the cron job and srmcp's globus-url-copy's will now work using the new pool node door.
Also, for versions of dCache > 1.6.6, you will need to ensure that this symbolic link exists in /pnfs on the pool node, otherwise the GridFTP door will not work and return with a Path does not exist
error:
[root@dcache pnfs]# ln -s ftpBase /pnfs/fs
How do I record the DN of a user who is uploading files?
For gridftp: In the /opt/d-cache/config/gridftpdoor.batch, change the first line from:
set printout 2
to
set printout 3
and then restart the dcache-opt service. This increases the verbosity of the gridftp door and includes in the output the DN.
How do I get the SRM cell online after installation?
After a dCache install you will often find (upon checking the web interface) that the SRM cell is offline. This appears to be due to some conflict with the pool services. In fact, this problem only appears to occur on system with fast hardware: initially, the problem could not be replicated with a dCache install using machines with PIII processors. To rectify the problem, turn the dCache pool services off (on the pool node and admin node, if you have a pool here) using service dcache-pool stop
. Then restart the dcache services using service dcache-opt restart
and then start dcache-pool. After waiting for approximately 3 minutes, the web interface should now show that SRM is online.
My site uses a non-standard Globus TCP port range. How do I configure dCache to use this?
The yaim install method for LCG sites uses the site-info.def file to define site specific parameters. One of the key-value pairs is used to specify the GLOBUS_TCP_PORT_RANGE. The default value is "20000 25000" as this is what a large number of sites use. If your site uses something different (i.e. "50000 52000") then this range must be changed before installing the LCG software (otherwise you will need to change the range by hand after installation). The site-info.def file also contains the lines:
#DCACHE_PORT_RANGE="20000,25000" #DPM_PORT_RANGE="20000,25000"
As you can see, these are commented out and by default dCache uses the 20000 25000 port range. It is possible that during the setup of dCache, some components will attempt to communicate on using the correct gridftp port range for the site (i.e. 50000-52000), but others will not, leading to "Handle request errors" during gridftp transfers. If this is the case, then you should check the /opt/d-cache/config/dCacheSetup file. Check that the java options near the top is using the right ports i.e.
java_options="-server -Xmx512m -XX:MaxDirectMemorySize=512m -Dorg.globus.tcp.port.range=50000,52000"
Further down, there's also
clientDataPortRange=50000:52000
Once changes have been made to this file the dcache-opt services should be restarted. You should check that /etc/sysconfig/globus contains the correct port range. You MUST ensure that a colon (:) is used when specifying the above range and not a comma (,) otherwise you will experience grid FTP problems.
How do I publish information about my SRM?
Fuller documentation on the LCG Generic Information Provider is available here.
The grid information system works roughly as follows. Each node runs a Grid Resource Information Service (GRIS) which collects local information (i.e. name of the machine, amount of available space, etc.). This information is collected together by the Grid Index Information Service (GIIS) that runs on a designated CE at a particular site. BDII is a particular implementation of a GIIS. The BDII consists of two or more standard Lightweight Directory Access Protocols (LDAP) databases that are populated by an update process. A cron job runs on your system every few minutes to keep the published information up-to-date. This information is stored in .ldif files (LDAP Data Interchange Format). The LDIF file(s) should conform to a certain schema (template for the information) as used by the Generic Information Provider (GIP). This is a highly configurable framework for creating LDIF files. It splits the information up into static and dynamic since generally, only a few attributes of any system need to be determined dynamically, most stay the same. For LCG, the GLUE schema is used to describe the information. At the time of writing, there is are bugs in the information system associated with the publishing of storage information for sites with an SRM. These are described below along with how to resolve them. Each day, the grid-map-file is generated by running /etc/cron.d/edg-mkgridmap. And each hour, the dCache grid mapfile (which is in a different format) is updated by running /etc/cron.hourly/grid-mapfile2dcache-kpwd. This is a perl script that requires the use of openssl to manipulate the host certificate. Currently, there are two versions of openssl on host machines. One in /usr/bin/ and one in /opt/globus/bin/. The problem is that these are different versions and produce different formats of DNs when they are run on host certificates. One version produces DNs with emailAddress= in them, and the other produces E=. Clearly, this leads to conflicts when trying to compare DNs from the edg-gridmapfile and those in the dCache file. To resolve this issue, you need to ensure that the same versions of openssl are being consistently used. To do this, add
/opt/globus/bin
to the start of the PATH in the following files on the admin node:
/etc/cron.d/edg-mkgridmap /etc/crontab
If the incorrect version of openssl is used, the dynamic information about your sites storage space will not be picked up and you will find that an ldapsearch of your GIIS will report the default values that appear in config_gip file. You have to ensure that you are publishing the correct endpoint to your SRM storage. This requires the customisation of your site-info.def file, making sure that the path points to the correct location for the storage. For example, in Edinburgh we have:
CE_CLOSE_SE1_ACCESS_POINT=/pnfs/epcc.ed.ac.uk/data
At the time of writing, we had both a Classic SE and a SRM (dCache). When site-info.def is changed, you need to run the following:
# /opt/lcg/yaim/scripts/run_function /opt/lcg/yaim/examples/site-info.def config_gip
If you are running LCG 2.6.0 you should make sure that you have the latest version of the run_function script so that it properly deals with the different allowed SE types. It can be downloaded by running:
wget http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/*checkout*/lcg-scripts/yaim/scripts/run_function
Executing run_function will generate a new version of /opt/lcg/var/gip/lcg-info-generic.conf. Whenever /opt/lcg/var/gip/lcg-info-generic.conf is changed, a new /opt/lcg/var/gip/lcg-info-static.ldif needs to be generated by running:
# /opt/lcg/sbin/lcg-info-generic-config /opt/lcg/var/gip/lcg-info-generic.conf
and also:
# su - edginfo -c '/opt/lcg/libexec/lcg-info-wrapper'
to start the script heirarchy that also deals with the dynamic part of the information. Once this is done, other sites will then be able to find the path to your SRM and will be able to start copying files into/out of it. If your site is still not publishing correctly, try restarting globus-mds:
# service globus-mds restart
on both the CE and SRM box.
What do I do if my site is displaying WARN status in GStat?
Often after setting up an SRM (either dCache or DPM) you will find that the GStat page for your site is returning WARN status due to it failing a GIIS sanity check. Often this check will return with the error:
Missing DN and Attributes: ============================================== DN: 'dn: GlueSEUniqueID=your.srm.ac.uk'
or something similar. If this is the case, you will need to modify your lcg-info-generic.conf file by hand to include the fields:
dn: GlueServiceUniqueID=httpg://your.srm.ac.uk:8443/srm/managerv1,mds-vo-name=local,o=grid GlueServiceAccessPointURL: gsiftp://your.srm.ac.uk:2811/ GlueServiceURI: httpg://your.srm.ac.uk:8443/srm/managerv1 GlueServiceEndpoint: httpg://your.srm.ac.uk:8443/srm/managerv1
Also ensure that you have this line (Mds-Vo should have capitals):
dn: GlueSEUniqueID=your.srm.ed.ac.uk,Mds-Vo-name=local,o=grid
and not only:
GlueSEUniqueID: your.srm.ed.ac.uk
Once you have done this, re-run the scripts that were referred to above to get this information into the system.
How do I add a VO to an existing setup? (PNFS databases)
The LCG-2.6.0 YAIM install of dCache sets up two PNFS databases, admin and data1. It does not create databases for each of the VOs. However, the YAIM install does create the VO directories in /pnfs/`hostname -i`/data/ and links them to the data1 database. This means that by default, all of the VOs share the same database. This is not acceptable for a production system in which the database could become large and would result in a bottleneck when there are a large number of queries of the database. YAIM needs to be modified to allow for the creation of a single database per VO.
Before this occurs, it is recommended that the system admin manually creates a separate PNFS database for each VO. Up to and including release 1.6.5 of dCache, gdbm was used for the PNFS databases, meaning that they had an upper size limit of 2GB. This could be a problem as it is not possible to split a database that is in use. From 1.6.6 onwards, gdbm will be replaced with postgreSQL, removing this 2GB limit. Tools will be provided to migrate to the new database format.
For both postgreSQL and gdbm backends, use the PNFS tools to create the databases:
. /usr/etc/pnfsSetup PATH=$pnfs/tools:$PATH for vo in 'dteam atlas ....' do mdb create ${vo} /opt/pnfsdb/pnfs/databases/${vo} touch /opt/pnfsdb/pnfs/databases/${vo} done
The reason that touch is required is that the PostgreSQL version of the pnfs server only uses the file as reference and stores the actual data in the PostgreSQL server while the gdbm version actually stores data in the file.
Force the database deamon see the new databases:
mdb update
Then create the PNFS directories that will hold the data and link them to the databases that have just been created.
mkdir -p "/pnfs/pp.rl.ac.uk/.(${ID})(${VO})" chown ${VO}001:${VO} ${VO}
where ${ID} is the ID from the command:
mdb show
It is recommended to attach the VO directory directly below '/pnfs/domain.name.ac.uk/' and not below '/pnfs/domain.name.ac.uk/data/'. The reason is, that data is already a different database. It is best to keep the databases decoupled as much as possible. Now try copying some files into the dCache. To check that everything has worked properly, use the ssh admin interface:
(local) admin > cd PnfsManager (PnfsManager) admin > storageinfoof -v /pnfs/gridpp.rl.ac.uk/dteam/fts_test/fts_test-9 Local Path : /pnfs/fs/usr/dteam/fts_test/fts_test-9 Resolved Path : /pnfs/fs/usr/dteam/fts_test/fts_test-9 PnfsId : 00020000000000000002A678 Storage Info : size=1055162368;new=false;stored=false;sClass=dteam:dteam;cClass=-;hsm=osm;StoreName=dteam;store=dteam;group=dteam;bfid=<Unknown>;
The first 4 digits of the PnfsId are the ID of pnfs database that the file entry is in (in hex). If this number does not correspond to the number of the database that the file should be in, then something has gone wrong with your setup. Also note the sClass=dteam:dteam in the Storage Info.
How do I map a VO to a pool?
To map a VO to a pool firstly you have to tag the directory in the pnfs filesystem that the VO will use. The tags will be inherited by any directory created under the tagged directory after it has been tagged. To tag a directory, change into it and run the following commands:
# echo "StoreName ${vo}" >".(tag)(OSMTemplate)" # echo ${vo} > ".(tag)(sGroup)"
where ${vo} is the name of the VO e.g. dteam. You must use sGroup in the second tag. It is not necessary to use the same VO name in both of these tags. For instance the Tier-1 has a dteam directory where the .(tag)(sGroup) contains the words tape, and this is used to map to a seperate set of pools for access to the Atlas DataStore. Once tags have been created in a directory, follow the instructions here to view them and their contents. The second part of configuring mappings between VOs and pools involves the PoolManager. If your dCache instance is halted then you can add them to the /opt/d-cache/config/PoolManager.conf on the admin node, otherwise they should be entered into the PoolManager modules of the admin interface, remembering to finish with save
to write the configuration to disk.
psu create pgroup ${vo}-pgroup psu create unit -store ${vo}:${vo}@osm psu create ugroup ${vo} psu addto ugroup ${vo} ${vo}:${vo}@osm psu create link ${vo}-link world-net ${vo} psu add link ${vo}-link ${vo}-pgroup psu set link ${vo}-link -readpref=10 -writepref=10 -cachepref=10
Note that most of the names of things in the above commands are convention, and there is no requirement to actually follow this scheme. The first command creates a pool group, this is exactly what it sounds: a group of pools. The second command defines a unit, this is something that matches against a property of the incoming request, in this case the storage information of where the file should be written. The names in this command do matter, they should match those used to tag the directory earlier, the name used in the .(tag)(OSMTemplate) comes first. The third command creates a unit group, this is just a group of units. The fourth command adds the unit created to the new unit group. The fifth commmand creates a link, this is the mapping between incoming requests and destination pool, and adds two unit groups to it. world-net is a existing unit group that matches requests coming from any ip address and the second unit group is the one just created. The sixth command adds the pool group created to the new link. The seventh command set various properties of the link. Once all those commands are down, run:
psu addto pgroup ${vo}-pgroup ${poolname}
to add a pool to the pool group. If this pool is not for all VOs to access, you may wish to remove it from the default pool group with:
psu removefrom pgroup default ${poolname}
to ensure that files from other VOs cannot get written to that pool. Note that a pool can belong to more than one pool group, so it is perfectly possible to have two VOs writing to the same pool, however there is no way to stop one VO using all of the space in the pool.
How do I flush and then remove a pool?
If you want to remove a pool from your dCache instance (since the disk the pool is on needs replacing) then it is necessary to first stop any new data being copied into that pool and then for all the data currently on the pool to be transferred to other disk pools (i.e. you need to flush the pool). Call the pool to be removed pool_1
and the destination pool pool_2
.
- In the PoolManager admin module, disable the pool with:
psu set disabled pool_1
- From the PoolManager, change to the destination pool admin module:
cd ..
cd pool_2
and run:
pp get file PNFS_ID pool_1
and
rep set precious PNFS_ID -force
for all the files in the source pool, where they each have a pnfs ID number of PNFS_ID
.
- In the source pool admin module, do:
pnfs unregister
- Stop the dcache-pool service on the appropriate system, remove the source pool line from the /opt/d-cache/.poollist and the restart the dcache-pool service.
Clearly, it is best to script the above steps as pools can contain thousands of files.
Update
dCache now can now use the CopyManager to copy the contents of one pool to another (assuming it has sufficient space). This functionality is documented in the dCache Book, here.
How do I start and stop the postmaster process?
The postmaster process must be running for dCache to operate properly. It should be started at boot time, but there may be cases where it is necessary to restart the entire dCache instance without rebooting the machine. In this case it is useful to know how to go about starting and stopping the postmaster.
To start:
# su postgres # postmaster -i -D /var/lib/pgsql/data/ > /tmp/logfile 2>&1 &
and to stop:
# su postgres # pg_ctl stop -D /var/lib/pgsql/data
How do use dCache with dual homed machines?
It is possible to use dCache with dual homed machines (i.e. machines with 2 network connections). This allows for internal dCache communications to take place via a private network, leaving the public network to conly deal with data transfers into and out of the dCache. Dual homing has been attempted within the UK and shown to be possible. What follows is a summary of this experience (from Jiri Mencak):
Scenario
Admin node: dual-homed box with a /pool on the same box (3 dual-homed boxes would be better with no pool on the admin node, but this should do as a proof of concept) Pool node: dual-homed box with a /pool.
Installation
- Installed SL 3.0.4 and grid certificates
- Made sure `hostname` returns FQDN associated with E0a and E0p, in other words, public FQDN.
- To make internal dCache communication pass through private interfaces I've set up an internal DNS server to fool admin and pool nodes into thinking admin.public.ac.uk is 192.168.0.32 and pool.public.ac.uk is 192.168.0.33.
- Made sure
`hostname -d` = `grep ^search /etc/resolv.conf | awk '{print $2}'`
- Set up site-info.def:
MY_DOMAIN=`hostname -d` DCACHE_ADMIN= DCACHE_POOLS="`hostname -f`:2:/pool"
- Installed dCache using GridPP storage dependency RPMs.
Testing
globus-url-copy and dCache SRM copy worked fine including third party copying (get) _from_ dual-homed boxes. Third party (put) _to_ dual-homed boxes worked after a reboot.
SRM returns IP address of the gridftp door
For example, a dual-homed dCache installation, with both internal and external IPs. Then the SRM, when asked for a file, returns gsiftp://150.242.10.11:2811/pnfs, which is the external IP of the pool with the gridftp door. From a machine with external IP, the copies work fine. But when a job on a worker node runs and tries to access the dCache, via the internal network, it fails. The external IP is reachable though a NAT. This is the log:
------------------------ + lcg-cr -v --vo dteam -d grid002.ft.uam.es -l lfn:/grid/dteam/rep-man-test-grid2n2f3.ft.uam.es-0707171552 file:///home/dte049/globus-tmp.grid2n2f3.31056.0/WMS_grid2n2f3_031526_https_3a_2f_2fgridrb01.ft.uam.es_3a9000_2fv8REKBZeRIPZRZ2IlNG-zQ/testFile.grid2n2f3.ft.uam.es-0707171552.txt globus_ftp_control_connect: globus_libc_gethostbyaddr_r failed java.rmi.RemoteException: srm advisoryDelete failed; nested exception is: java.lang.RuntimeException: advisoryDelete(User [name=dte001, uid=5006, gid=5005, root=/],pnfs/ft.uam.es/data/dteam/generated/2007-07-17/fileb9d79cca-e884-450d-bb29-2cb66ad9c179) Error file does not exist, cannot delete lcg_cr: Transport endpoint is not connected + result=1 --------------------------
BNL had a similar problem and they contributed an alternative way for resolving IP addresses into the domain names. To activate their code, you need to specify the following in dCacheSetup (available only in the release 1.7.0-38):
srmCustomGetHostByAddr=true
and restart the SRM.
How do I migrate the pnfs database to a new machine?
You may experience the following scenario: Your admin node needs to be reinstalled with a new OS, or you have to upgrade to a completely new admin node. Clearly, you would like to retain all of the data that you currently have in your dCache instance. Therefore, it is necessary to somehow copy the pnfs database over to your new installation in order to keep track of all the file mappings. Anyone attempting this should read the available documentation at /opt/pnfs.3.1.10/pnfs/docs/html/movedb.html. Before wiping the exisiting admin node, take a backup of the /opt/pnfsdb directory tree and the file /usr/etc/pnfsSetup (not the same as /opt/d-cache/config/pnfsSetup) and place them on a different machine. On the new machine (or the newly rebuilt machine), place the backups into the same locations in the file system. Now install and configure dCache (preferably using yaim). The pnfs database should not be overwritten, presumable due to 'PNFS_OVERWRITE = no' in /opt/pnfs.3.1.10/pnfs/etc/pnfs_config. It is now necessary to create the following log files:
# ls /var/log/pnfsd.log/ dbserver.log pmountd.log pnfsd.log
by touching them. It should now be possible to start the pnfs and dCache services. NOTE: Make sure you have your firewall setup properly.
How do I NFS mount storage onto a pool node?
Use TCP, not UDP. Network may become overloaded.
For example, /etc/fstab on the pool node should contain entries like:
sannas-srif:/scotgrid1 /san-storage/scotgrid1/pool/data nfs proto=tcp 0 0 sannas-srif:/scotgrid2 /san-storage/scotgrid2/pool/data nfs proto=tcp 0 0
where sannas-srif is the name of NFS server. You will notice that only the pools data directory is NFS mounted. The pools control directory and setup file are retained on the pool node itself.
Initial testing of SAN mounted volumes has shown data transfer rates to be approximately 2-3 times slower than with non-NFS mounted storage.
How do I use ssh keys to interface with the admin module?
This feature is useful as it allows passwordless login to the admin interface, thereby facilitating the use of scripts to perform repetitive admin tasks (possibly even using cron jobs).
# ssh-keygen -t rsa1 -b 1024
# vi .ssh/identity.pub
# cat .ssh/identity.pub >> /opt/d-cache/config/authorized_keys
# cat .ssh/config Host your.admin.node.ac.uk Port 22223 User admin IdentityFile .ssh/identity Protocol 1 Cipher blowfish
# ssh admin@your.admin.node.ac.uk The authenticity of host 'dev01.gridpp.rl.ac.uk (130.246.184.124)' can't be established. RSA1 key fingerprint is 93:7a:52:c0:44:1e:95:9b:02:52:f2:d1:a5:5e:32:4a. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'dev01.gridpp.rl.ac.uk,130.246.184.124' (RSA1) to the list of known hosts. dCache Admin (VII) (user=admin)
How do I list the pnfs directory tags?
The tags contain metadata about where dCache should store data.
cd /pnfs/epcc.ed.ac.uk/data/ $ cat '.(tags)()' .(tag)(OSMTemplate) .(tag)(sGroup)
You can inspect the contents of each tag using:
$ cat '.(tag)(OSMTemplate)' StoreName myStore
All tags and their contents can be displayed using:
$ grep "" $(cat ".(tags)()") .(tag)(OSMTemplate):StoreName myStore .(tag)(sGroup):STRING
Which filesystem should the OS use on the pool nodes?
dCache should work with any filesystem. However, many large Tier-1 sites have seen performance enhancements in using XFS or GPFS rather than ext3.
Which ports does dCache use?
- 2811 : GridFTP
- 8443 : SRM (v1)
- 2288 : dCache web interface
- 22223 : dCache admin interface
- 22125 : dCache Dcap protocol
- 22128 : dCache GSI enabled Dcap protocol
- 32768 : NFS layer within dCache (based upon rpc)
There may be additional ports used for internal dCache communication, but only ports 2811, 8443 and the Globus TCP port range need to be open in your site firewall in order to achieve a working dCache. If you require GSIdcap, then you will also need to open up 22128. It is useful to open up 2288 in order that you can use the web monitoring.
As an example of what you should ports should be being listened on by a dCache admin node:
[gcowan@srm gcowan]$ netstat -tlp (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 *:32768 *:* LISTEN - tcp 0 0 localhost.localdomain:32772 *:* LISTEN - tcp 0 0 *:32774 *:* LISTEN - tcp 0 0 *:32777 *:* LISTEN - tcp 0 0 *:8649 *:* LISTEN - tcp 0 0 *:22125 *:* LISTEN - tcp 0 0 *:22223 *:* LISTEN - tcp 0 0 *:sunrpc *:* LISTEN - tcp 0 0 *:22128 *:* LISTEN - tcp 0 0 *:2288 *:* LISTEN - tcp 0 0 *:ssh *:* LISTEN - tcp 0 0 srm.epcc.ed.ac.uk:2135 *:* LISTEN - tcp 0 0 localhost.localdomain:ipp *:* LISTEN - tcp 0 0 *:postgres *:* LISTEN - tcp 0 0 localhost.localdomain:smtp *:* LISTEN - tcp 0 0 localhost.lo:x11-ssh-offset *:* LISTEN - tcp 0 0 *:8443 *:* LISTEN - tcp 0 0 *:2811 *:* LISTEN -
To be more specific to dCache, we can look at the output of the command:
[root@wn4 gcowan]# netstat -lntp | grep java tcp 0 0 0.0.0.0:22125 0.0.0.0:* LISTEN 8949/java tcp 0 0 0.0.0.0:22223 0.0.0.0:* LISTEN 9042/java tcp 0 0 0.0.0.0:22128 0.0.0.0:* LISTEN 10195/java tcp 0 0 0.0.0.0:2288 0.0.0.0:* LISTEN 9217/java tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN 10286/java tcp 0 0 0.0.0.0:2811 0.0.0.0:* LISTEN 10110/java tcp 0 0 0.0.0.0:33276 0.0.0.0:* LISTEN 8776/java
What happens if a pools local filesystem fills up?
Experience at Edinburgh was that the local disk (12GB) on our single pool node filled up due to dCache creating very large dcacheDomain.log logfiles (5.8GB and 1.2GB). As soon as the space ran out, file transfers into (and presumably out of) the dCache system failed with errors stating that there was not enough space left on the device. This had nothing to do with the amount of space in the disk pools, but due to the local disk. Once the offending logfiles were compressed, space was released. Transfers were still failing after this and only resumed after a restart of all dCache services on the pool and admin nodes. The reason for the bloated log files is being investigated. I have now started logrotating the dCache log files daily instead of weekly.
This bug was supposed to have been fixed in version 1.6.5 of dCache.
I cannot get some dCache components (i.e. doors) to start up, why?
If you are having trouble starting up some dCache services, one possible cause could be a stale java processes that resides on the node. Before you start the dCache, make sure all java processes that are dCache related have been killed off.
Also, see the dCache user-forum for a discussion about this.
What do I do after upgrading my CA certificates?
The dCache SRM caches the CA certificates (/etc/grid-security/certificates) in memory, so it must be restarted after they have been upgraded in order to pick up any changes.
What does it mean if a file is locked or bad?
A locked file is currently in use by a client. For example, a file that is currently being transferred into the dCache.
A bad file has some issues between PNFS and the pool nodes respective databases. For example the file is lost but still exists in PNFS or the file exists on a pool but is not referenced in PNFS. Please see this wiki entry for more information about finding such orphaned files.
Configuring a GridFTP door not to be used by the SRM
If you remove the line:
-loginBroker=LoginBroker \
from gridftpdoor.batch, the door will not register in the LoginBroker so will not be used by the SRM.
What does the output of rep ls mean?
If you do rep ls inside the dCache pool cells, you get output like:
00000000000000000177EA28 <C----------(0)[0]> 94863451 si={zeus:mdst-05}
This can be explained by:
PNFSID <MODE-BITS(LOCK-TIME)[OPEN-COUNT]> SIZE si={STORAGE-CLASS}
PNFSID : obvious MODE-BITS <CPCScsRDXE> <CPCScsRDXE> |||||||||| |||||||||+--- (E) File is in error state ||||||||+---- (X) File is pinned |||||||+----- (D) File is in process of being destroyed ||||||+------ (R) File is in process of being removed |||||+------- (s) File sends data to back end store ||||+-------- (c) File sends data to client (dcap,ftp...) |||+--------- (S) File receives data from back end store ||+---------- (C) File receives data from client (dcap,ftp) |+----------- (P) File is precious +------------ (C) File is cached LOCK-TIME : The number of milli-seconds, this file will still be locked. Please note that this is an internal lock and not the pin-time (SRM). OPEN-COUNT : Number of clients, currently reading this file. SIZE : File size STORAGE-CLASS : The storage class of this file.
What does java.io.EOFException: EOF on input socket (fillBuffer) in the log files mean?
The billing database (and log files) may contain messages like this:
Unexpected Exception : java.io.EOFException: EOF on input socket (fillBuffer)
I think this is due to applications which are accessing files via dcap not closing the file handles after they have finished with the files.