DPM-admin-tools
Contents
- 1 GridPP DPM administration toolkit
- 1.1 Update
- 1.2 Update 2
- 1.3 Installation
- 1.4 Environment
- 1.5 Available tools
- 1.5.1 dpm-disk-to-dpns
- 1.5.2 dpm-dpns-to-disk
- 1.5.3 dpns-du
- 1.5.4 dpns-find
- 1.5.5 dpm-list-disk
- 1.5.6 dpm-sql-spacetoken-list-files
- 1.5.7 dpm-sql-spacetoken-usage
- 1.5.8 dpm-sql-usage-by-vo-user
- 1.5.9 dpm-sql-list-hotfiles
- 1.5.10 dpm-sql-spacetoken-replicate-hotfiles
- 1.5.11 dpm-sql-pfn-to-dpns
- 1.5.12 dpm-sql-files-by-vo-user
- 1.5.13 dpm-sql-diskfs-to-dpns-chk
- 1.5.14 dpm-sql-dpns-to-diskfs-chk
- 1.6 Discontinued tools
- 1.7 Bugs and support
- 1.8 Announcements and updates
- 1.9 Acknowledgments
GridPP DPM administration toolkit
GridPP have put together a collection of handy utilities for easing the management of DPM. This toolkit should help sites running a DPM to manage the installation and to help manage (or recover from) common problem such as disk failures and pool draining. The tools are written using the DPM python API, provided by the DPM-interfaces package. The tools are all focussed on performing a single task so you may find that to get the result you want you will need to use them in conjunction with the standard DPM command line utilities or standard shell tools (which I think is the best approach to use).
Author: Greig A Cowan, University of Edinburgh Date: May 2008 Amendments: Sam Skipsey, University of Glasgow. Wahid Bhimji, University of Edinburgh License: EGEE
As of 12 August 2010, the license for the toolkit will be the ISC ("NetBSD") License, which is compatible with the EGEE license (in both extant forms), but less ambiguously stated.
Update
The release of v2 of the toolkit introduced a new naming convention for the tools (gridpp_* -> dpm-*, dpns-*) and the tools now appear in /opt/lcg/bin rather than /usr/bin. This places them in the same location as the other native DPM client tools. This version of the toolkit also cleans out some existing tools that are now supported by the native DPM client.
Update 2
Due to packaging changes within DPM itself, the rpms for release 2.6.5 have two versions. Suffix DPM173 is rpm-dependancy compatible with DPM versions <= 1.7.3, whilst suffix DPM174 is compatible with DPM 1.7.4 and above. There is no loss of data in replacing one with the other, and subsequent releases will only be rpm-dependancy compatible with DPM 1.7.4 plus.
Installation
The tools are probably best installed on the DPM head node, but should work on a grid UI with the DPM-interfaces package installed. You need to add this yum repository to your configuration:
[sys-man] name=Systems Manager Storage repository baseurl=http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage gpgcheck=0 enabled=1
And then install the package via:
yum install gridpp-dpm-tools
The tools will be installed in /usr/bin. We will soon provide an rpm containing the above repository.
rpm -ivh baseurl=http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/sys-man-repo-1.0.0.rpm
SQL-based tools (which start dpm-sql...) also require MySQL-python rpm to be installed, and attempt to locate the DPM MySQL instance by parsing /opt/lcg/etc/DPMINFO, which will probably not be available by default outside of the SE.
User Accounting
User-level accounting is now possible using the toolset. Versions of the toolkit >=2.3.9 copy two helper files to /opt/lcg/etc which you will need to use to set up your system to enable this.
/opt/lcg/etc/accountingdb.sql
is a set of SQL commands which should be run against your MySQL instance on your head node (or wherever your DPM database is) to create a new database for accounting purposes. The final line, which grants access to the dpminfo user, should be altered to whatever user is listed in your /opt/lcg/etc/DPMINFO file. (NOTE: There is a typo in the first line of the script - you need to alter the current version to make a Primary Key on entry_date, gid, uid, rather than date, uid, gid.)
/opt/lcg/etc/usage_accounting
is a cron job specification, which should be copied into /etc/cron.d/ . It calls the /opt/lcg/bin/dpm-sql-usage-by-vo-user command with the semi-secret "--es" option to write daily logs of the usage of the DPM, by user and group, into the database you just created.
An extension to the DPM_Monitoring tool exists to allow plotting of useful information from this database, and is documented on that page.
Environment
Since the tools use the dpm python module, it is essential that you have the correct PYTHONPATH:
export PYTHONPATH=$PYTHONPATH:/opt/lcg/lib/python
on 64bit machine this will be
export PYTHONPATH=$PYTHONPATH:/opt/lcg/lib64/python
you also need to ensure that:
export DPM_HOST=dpm-head-node.domain.ac.uk export DPNS_HOST=dpm-head-node.domain.ac.uk
Mixing 32bit python with 64bit DPM-interfaces rpm, or vice-versa, will result in python being unable to load the (compiled C) _dpm.so library. In a future release, this case will be detected and result in graceful failure with an actual useful error message. Otherwise, one can work around the issue by forcing the correct python to be used to run the script (either by calling the tool with the right python:
python32 this_dpm_script
or by editing the header of the script to explicitly call the correct python.
Available tools
dpm-disk-to-dpns
usage: dpm-disk-to-dpns [options] Find the mappings between the files on a pool and the LFN in the DPNS namespace. If you want to analyse all server:filesytems on a pool, you can use the -p option. i.e., $ dpm-disk-to-dpns -p poolname To restrict to a particular server:filesystem combination, use the -s option. i.e., $ dpm-disk-to-dpns -s pool1.glite.ecdf.ed.ac.uk:/grid01 options: -h, --help show this help message and exit -d, --debug Use debug flag only for testing. -sSERVERFS, --serverfs=SERVERFS Specify which server:filesystem to be analysed. -pPOOL, --pool=POOL Specify which pool to be analysed.
dpm-dpns-to-disk
usage: dpm-dpns-to-disk /dpm/path/to/file [-d DIRECTORY][-vz] options: -h, --help show this help message and exit -d,--directory Analyse files in this directory -v,--verbose See information about namespace entries without replicas -z,--zero Only print out files with zero size.
dpns-du
usage: dpns-du /dpm/path/to/directory options: -h, --help show this help message and exit -si Print with decimal, not binary prefixes -x, --exclude Ignore this directory -z, --zero Only print out files in DPNS that have zero size. -s, --summary Print a summary for each argument
dpns-find
This tool does not attempt to emulate everything that UNIX find can do. It is just a simple tool to help people find the files paths of the files they are interested in.
usage: dpns-find /dpm/path/dir filename options: -h, --help show this help message and exit -xDIRECTORY, --exclude=DIRECTORY exclude all files in this dir.
dpm-list-disk
usage: dpm-list-disk [options] This allows you to list the replicas on disk from the DPM head node without having to log onto the pool nodes. You can use the command line options to pick out the filesystem you are interested in. options: -h, --help show this help message and exit -fFS, --fs=FS Specify filesystem of interest. -sSERVER, --server=SERVER Specify server of interest. -pPOOL, --pool=POOL Specify pool of interest.
dpm-sql-spacetoken-list-files
usage: dpm-sql-spacetoken-list-files [options] This allows you to list the files in a given spacetoken. For performance, it does this by performing SQL queries against the dpm_db database. options: --st specify a spacetoken
dpm-sql-spacetoken-usage
usage: dpm-sql-spacetoken-usage [options] This allows you to list spacetokens and their usage. For performance, it does this by performing SQL queries against the dpm_db database.
dpm-sql-usage-by-vo-user
usage: dpm-sql-usage-by-vo-user [options] This allows you to list the usage of the DPM broken down by user (DN) and VO. For performance it does this by performing SQL queries against the cns_db database. options: --vo specify a VO to limit the query to -s, --si Use powers of 1000 not powers of 1024 --es Update records to MySQL database for user accounting
dpm-sql-list-hotfiles
usage: dpm-sql-list-hotfiles --days N --num M [--implicit-suffix K][--surls] This allows you to list the M most "hot" files, sampled over the last N days of requests to the DPM. This involves a slightly intensive SQL query against the dpm_db and cns_db databases, the latter to retrieve file sizes for files still on the DPM. options: --days Number of days before the present to sample for. --num Length of list to return. --implicit-suffix K Use 'K' as the implicit SI suffix for filesize output (this should be an upper-case letter corresponding to the standard SI symbol) --surls Output the surl for the file, rather than the pfn (that is, the name of the file in the DPM namespace, rather than the "real" filename on the pool node)
dpm-sql-spacetoken-replicate-hotfiles
usage: dpm-sql-spacetoken-replicate-hotfiles --st SPACETOKEN --nreps N(=2) This allows you to replicate files in a given spacetoken.
options: -h, --help show this help message and exit --st=ST Specify a space token description --nreps=NREPS Specify the number of copies required.Default 2. --del Delete excess replicas (above amount specified in nreps) --list Just list replicas. No action taken. --verbose Print more output
dpm-sql-pfn-to-dpns
usage: ./dpm-sql-pfn-to-dpns server:filepath [server:filepath] Gives DPNS name for a single physical filename
options: -h, --help show this help message and exit
dpm-sql-files-by-vo-user
usage: ./dpm-sql-files-by-vo-user [--vo VO] Gives a list of all SURLS on the DPM (for a particular VO if specified)
dpm-sql-diskfs-to-dpns-chk
This tool performs a reverse lookup for the contents of a disk or filesystem against the DPNS. Optionally does adler32 checksumming. This tool flags up files on disk which are not in the DPNS ("dark data").
dpm-sql-dpns-to-diskfs-chk
Perform a consistency check between the DPNS records for a given disk / filesystem and the actual resulting records on disk. Optionally, checksums can be calculated for files present, using adler32. This tool will flag files in the DPNS which do not exist on disk.
Discontinued tools
dpm-listspaces
This tool is discontinued in the dpm-tools package because a native version is available in DPM itself.
usage: dpm-listspaces [options] options: -h, --help show this help message and exit -dDPM_DOMAIN, --domain=DPM_DOMAIN Set DPM domain (default: local domain) -g, --gip Use as a GIP provider and produce Glue LDIF output -L, --legacy Build a Glue 1.2 compatible SA in addition to standard ones (requires --gip) -l, --long Detailed information on pools and reservations -pPOOLS, --pool=POOLS Pool to display -rRESERVATIONS, --reservation=RESERVATIONS Reservation to display -v, --debug Increase verbosity level for debugging (on stderr)
gridpp_dpm_find_dpns_zero_size_files
This tool has been superceeded by gridpp_dpm_dpns_to_disk with the -z option.
usage: gridpp_dpm_find_dpns_zero_size_files dpns-listing The dpns-listing should be a text file containg the output of a dpns-ls command. i.e., $ dpns-ls -lR /dpm/ecdf.ed.ac.uk/home/lhcb/ > /tmp/dpns.txt $ gridpp_dpm_find_dpns_zero_size_files /tmp/dpns.txt options: -h, --help show this help message and exit
gridpp_dpm_get_group_map
usage: gridpp_dpm_get_group_map List all groups known to the DPM and their corresponding virtual gids. Requires DPM >= 1.6.10. $ gridpp_dpm_get_group_map options: -h, --help show this help message and exit
gridpp_dpm_get_user_map
usage: gridpp_dpm_get_user_map List all users known to the DPM and their corresponding virtual uids. Requires DPM >= 1.6.10. $ gridpp_dpm_get_user_map options: -h, --help show this help message and exit
gridpp_dpm_list_space_tokens
usage: gridpp_dpm_list_space_tokens [options] List all defined space tokens in the DPM. If you want to limit the search, please specify a regular expression. i.e., $ gridpp_dpm_list_space_tokens -r ATLAS options: -h, --help show this help message and exit -rREGEXP, --regexp=REGEXP If required, you can specify a regular expression for the token desc.
Bugs and support
Please submit bugs to:
http://savannah.cern.ch/projects/srmsupportuk/
Questions can always be asked on:
gridpp-storage AT jiscmail.ac.uk dpm-users-forum AT cern.ch
Announcements and updates
Updates and changes will be announced via the blog and the above mailing lists.
Acknowledgments
- Remi Mollon, Jean-Philippe Baud, Lana Abadie (CERN) for help with the DPM API.
- Ewan McMahon (University of Oxford) for writing the rpm spec file.
Other contributions always welcome!