wiki:developers-meeting-20110406

[part of a series of meetings]

Participants

Patrick, Tanja, Tigran, Paul, Antje, Christian, Gerd, Dmitry.

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Dmitry: On vacation, lots of emails -- working on PIC-reported problem; kernel panics ("Self-destruct in five minutes" error message). In the past, mitigated by downgrading encp version. There is an indication that RedHat? are trying to fix this problem. Must have support for PNFS and Chimera concurrently .. Dmitry has his hands full.

Gerd: gPlazma plugins + discussions with Paul about gPlazma and the usual things.

Christian: fighting gLite to release 1.9.5-25; not yet released but looking promising. A couple of meta-packages needed for EMI + some documentation. Investigating the ticket Dima reported about the admin door crashing due to too many open file-handles.

Antje: Documentation

Paul: Death by XML

Tigran: 1.9.12 release; presentation for kids; kerberos privacy and integrity modes now supported (code to appear soon in ReviewBoard?). Linux client supports both of these already (for ages).

Tanja: Tickets, HSM wiki page, reading NFS spec. named-attributes working on how to implement this in dCache. We want to turn dot command information into extended attributes.

Patrick: Started working on prep. for Lund (EMI all hands meeting) and for EU review. Work started on the next EMI deliverable. Presentation on NFS for today's CMS meeting, given by Adrea S. This was very well received with lots of questions. Tanja (one of our secretaries) has found rooms for our developers meeting.

Dmitry to chase Patrick's email requesting information about

Q: Tigran

Do you have a stack-trace from the kernel panic? No.

Machine just freezes

Magic-SysRq? key to try to get the stack trace? Don't know: I'm not running these machines.

If you can find a stack-trace then Tigran can send it to Linux kernel developers.

Plans for patch-releases

Should we make a new patch release?

Some merge requests for 1.9.12.

Probably have another 1.9.10 release soon.

But, nothing really urgent (AFAIK).

DESY are running 1.9.12 in

NDGF and Swegrid are also running 1.9.12. Once 1.9.12-2 is out, Gerd will ask Finland

Trunk activity

Progress with new features...

Single JAR file

Tigran: would like to merge everything into a single jar file: all external dependencies and all our code. This is motivated by:

  1. be sure you're running what you shipped.
  1. to be able to run dCache by running "java -jar dcache.jar".

Gerd: putting them in one directory might be OK, but need to think about packaging into a single jar.

Patrick: 2. would be good for Windows.

Gerd: To support Windows, would look at Java Native Daemon thingie native JNI library that exposes Java as a native service.

Gerd: If you're clever, you can write a single, long java command-line and start dCache. However, you would loose the functionality you get from the scripts.

Would be good to look at current best-practice: for example, what about OGSI?

gPlazma 2

Gerd: submitted two patches. Works inside gPlazma, but haven't tested transfers yet.

Two things that are outstanding with the StorageAuthDB-parser:

o support for lists of groups, a comma-separated list of group IDs. o Rather than "authorized .." one writes "dynamic .." mapping that looks like it's doing a call-out to something dynamic. It's not clear if this is something we want in gPlazma2 (as it supports plugins) but it's ostensibly missing functionality in the new gPlazma2 parser.

Issues from yesterday's Tier-1 meeting

Two Tier-1 centres (Triumf, NDGF) called in, neither had problems. Two Tier-1 centres (GridKa and PIC) were unable to make the meeting but sent reports via email.

GridKa

A problem with a pool that may have caused a ripple-effect on other components. This is their old PNFS-based instance; the two Chimera instances (CMS and ATLAS) are running fine.

PIC

Gerard reported an Enstore issue with their pool servers, that has produced kernel panics (!) and a BerkleyDB corruption in a pool a couple of weeks ago. In contact with Jon Bakken.

Gerard reports that they will upgrade from 1.9.10 to 1.9.12 on our Tier3 instance in a few weeks, subject to a maintenance window.

Issues from EMI

No real issues.

Tigran: Can we build in EMI? Yes.

Do we know for certain that we don't need Sun JDK at run time? "If you want support, run Sun". The classes are mostly the same, but the JVM is a bit different.

Christian: Do I put a run-time dependency on Sun JDK? yes.

We can change our test-system to run OpenJDK.

Does this "security stuff" needed for SRM work with OpenJDK? It didn't work with IBM's JRE.

With OpenJDK you cannot attach with management console (jconsole / visualvm) unless there is an explicit command-line option .. not that we use this feature frequently.

StoRM and Java

StoRM are planning to switch to Java. Features like WebDAV are coming after the move. Patrick to find out more.

Patrick: with SRM as a front-end, it's not sufficient to just run another server (FTP / WebDAV).

Chimera performance issues

An issue NDGF had some weeks ago

During bulk deletion, problems with Chimera and throughput blocking other activity.

Single request from ATLAS to delete O(100,000) delete request as a single SRM request.

During bulk-deletion SRM chops up the request into requests to PnfsManager, to delete 50 files at a time. Inside PnfsManager, this gets scheduled to the different queues (due to hashing of the complete file-path), spreading the requests over all available queues. But in Chimera, the same parent inode is being updated, so the 20-queues become sequential. This blocks other activity; e.g., those needed for file transfers.

Work-around: single delete at a time in SRM, but this makes bulk deletes very slow.

Another possible work-around: change PnfsManager's queue selection algorithm to hash based on parent directory path.

If we altered how Chimera updates the parent inode nfiles so it does this later on then the t_inodes table will be locked for shorter.

We could also add support in Chimera for a bulk remove operation.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

 RT 6107: SRM transfer failure with Israel CA

Requested their CA certs and CRLs. These have now been installed on Fermi test-stand. Now need end-user to do operations against test-stand.

 RT 6167: bringonline failed due to pinning failed:finding read pool failed

Gerd: when working on the Pin-manager

bring-online when the file is already online. Select read-pool message to PoolManager, PoolManager,

Cost of pool goes up very quickly, due to bulk activity .. until the cost-cut to start failing transfers.

This means that bulk pinning can fail if a certain threshold is reached.

Would like to move the pool-selection algorithm into pin-manager, which will solve this problem; but what do we do for the golden releases.

We add a flag to the message to indicate that the pool costs should not be increased.

At least for 1.9.12, it would be good to remove the "magic" option. It should be always switch on or dCache will not work correctly.

In all communication between pool and pool-manager, always send the cost.

Slightly more randomised writes.

PoolManager could send a request to stage a file but only if pool cost is below a certain threshold. Pool can reject a request if it overloaded.

 RT 6197: Feature Request: move dCache files from disk to tape and viceversa

Ticket has been accepted as a excellent idea and will be moved to the feature request queue.

 RT 6227: different files/one pnfsid

Do we want to do something inside dCache to support 'exclusive create'. At the same time, two clients attempt to write data into dCache into an empty file.

We have explicit support for overwriting an empty file; believe this was due some requirement .. perhaps due to preload library support.

Shouldn't we be able to prevent upload different file content because one mover will win and upload the checksum before the other. The second checksum upload should fail.

Was checksum enabled in the pool? If not then there's no protection.

The problem may be triggered more often if jobs upload their file logs (e.g., a job is started twice).

We allow overwrite files if you completely rewrite the file. This uses a namespace trick, using a temporary filename like: '_' <filename> ';' '_' <uid> '_' <pid>

Need to check what happens inside PnfsManager and ChimeraNamespaceProvider? and Chimera itself under these circumstances.

Review of RB requests

Tigran want's 3161 reviewed.

DTNM

Same time, next week, subject to enthusiasm.