wiki:developers-meeting-20091021
Last modified 11 years ago Last modified on 10/28/09 12:17:05

[part of a series of meetings]

Participants

Gerd; Tigran, Owen, Paul, Irina, Tanya; Timur, Vladimir, Vijay, Micheal, Dmitry.

Agenda

[see box to the side]

Commands needed by encp/enstore

Micheal is joining our meeting to participate in this part of the discussion, so it's the first item.

Tigran: Places where enstore server-part was relying on PNFS-specific features.

.access support in Chimera only becomes an issue if a user removes or renames the file before it goes to tape.

Adding support for the .access file isn't a problem, but some other dot-commands are problematic since there isn't the equivalent Chimera concepts.

The actual problem is not the .access files, but that dCache has to get the information from level-4 to get the file from tape.

Probably need a trigger to convert level-4 information to the t_location table.

We don't want another custom extractor.

level-4 has multiple pieces of information; the enstore

Try to get .access files to allow writing and reading files

Use-case: write with enstore, read with dCache.

Right now, we can only do this with levels.

Tigran: Yes, we can do it, but how much work do we want?

Try to get it to work reading/writing with dCache then look at write with enstore and read with dCache.

Vijay: try to get something working with the .access commands and look at fixing this in the future.

Tigran: at least by supporting the .access command, then we can exercise with real data.

Test environment that emulates production environment 100% (robots and everything).

OK.

Status of PNFS release

Owen: checked SVN doesn't work

Patch to fix this in SVN:

Owen: send me an email when it's ready to be reviewed.

Status of work for 1.9.5

A (quick?) review of activity needed for the 1.9.5 release

Is it stable?

Depends who you ask.

PIC are currently unhappy due to lots of directories owned by root.

Tigran: We don't know if this is because dCache is much faster, so it could be we hit PNFS much harder now.

Gerd: this is the old problem where the second getFileMetadata throws a FileNotFoundException?.

Released 1.9.5-4 just now.

New info provider with space reservations and mega table

Version committed; likely needs a small amount of follow-up work.

HSM cleaner for Chimera

Tigran knows the current status.

There is a patch in review board, but Tigran thinks it should be split into 5--6 patch.

Some aspects are simply spelling fixes.

Tigran to help Irina in splitting patches into bite-size chunks. Should be ready by end of week.

Status of work for 1.9.6

A (quick?) review of activity needed for the 1.9.6 release

ACL checks for SRM

(For 1.9.5 or for 1.9.6?)

Needs a developer.

Gerd: probably need help for this: this is hard in the dCache specific part of SRM.

Timur: currently, I'm just submitted a set of patches that alter how the scheduler works in SRM. Once this is committed I can postpone further work on Terracotta and work on improving.

Two different ways: 1) delegate the check to PnfsManager. This has performance benefits and, once implemented, full permissions check on a directory path. Unfortunately, this is the more complex of the two to implement 2) fetch the file attributes from PnfsManager and do the check locally; this is used elsewhere in SRM.

Webdav support

needs a bit of polish for error handling and configuration and can then be submitted as a replacement for the http door.

Don't know with the webdav library how scalable it will be. The library is fine used as synchronous. To achieve better performance, need to use async.

Fast list for SRM

This is in progress and we know what needs to be done.

Gerd to submit a proposal to Dmitry very soon. The plugin part is updated. The generic SRM then needs to be updated to use the new support.

Terracotta for SRM

The batch of patches just submitted will (finally :-) allow SRM to work with Terracotta.

Testing: 1,000 of commands of every type into two parallel sessions. Haven't tested the performance of that yet.

Now just waiting for the code to get through the review process.

Gerd: email about pureTLS: offload to native code. Not sure how much it offloads the ASN.1 stuff.

PureTLS is the cryptographic library used by Globus.

Tigran: if we have native code, why not have an Apache front-end?

Gerd: We don't have to ship with the library, but we could document how to set it up.

Timur: Benefit of this approach is we don't have to change any code in dCache.

Timur: the report at CHEP from StoRM developers described their issues with their SRM issues. Which is implemented with Globus The limit to the number of concurrent number of clients was similar.

Gerd: surprised at this, as pure SSL can have.

Timur: to run tests against this, but not sure when.

Single port xrootd mover

This needs development, but Gerd know what to do.

Haven't started coding this ..

Tigran: do you have ideas how to do this.

Gerd: what we need to do is link requests from the door with the connection we get at the pool. Originally hoped to use the token feature of the xrootd client. Unfortunately, that the client has a bug.

Instead use the open request.

Mover wrapping a single server running in the pool.

Tigran: want to get rid of the thread that waits for the door to tell it that the mover needs to be cancelled.

Tigran: add a challenge to the path

Easy domain composition

First is done, second step needs development.

Big patch committed, the rest postponed until 1.9.7.

Tigran: if nothing is defined, use localhost as the lm. Nearly at the stage where dCache

Need dummy tags to get dCache to work. Nearly have a fix for this. Want to fix dCache so it works without storageInfo missing.

Issues from yesterday's Tier-1 meeting

root-owned directories

http://www.dcache.org/rt/Ticket/Display.html?id=5184

They upgraded from 1.9.2-5 has the 1 second sleep as a work-around to this issue. This was removed because Jon complained about the performance problems.

Catalin still running 1.9.2-nn with parallel database to implement checksums.

Gerd: how to we handle this problem?

Tigran: think this is simply because we can fire more requests into PNFS.

Gerd: Can we fix this somehow .. ignore file-not-found.

Tigran: I will send Paco or Gerard some tcpdump commands to capture NFS

The problem with the caching is that with the Linux client is that there's still some caching even with noac NFS option.

BNL: gPlazma with SAML or XACML

Pedro reports that BNL cannot use gPlazma with SAML or XACML in 1.9.4-3.

this leads to very frequent SRM_FAILURE errors due to permission denied. if we use a short
lifetime for the cache we also notice a very high load on the SRM machine. currently using
grid VO role mapping but we still see some SRM_FAILURE errors every once and a while. is the
problem understood? is there a time line for a fix? 

PIC : problems with SRM async ls

Gerard reported that they tried to deploy this but it didn't work.

Owen: bugs with FTS (specifically with async ls) and maybe lcg-utils.

Paul: to ask Gerard to submit the issue to support@…

Outstanding RT Tickets

Tickets waiting for an answer …

1) http://www.dcache.org/rt/Ticket/Display.html?id=5157 Timur and Dmitry will take care of this ticket.

2) #5150 solved.

3) #5115 closed.

4) http://www.dcache.org/rt/index.html?q=5073 closed

Feature request:

o have a check-list that should include setting ulimit

o try to detect when we'll run out of file handles and not start a mover.

Added to trac: http://trac.dcache.org/projects/dcache/ticket/266

5) #5069 closed.

6) http://www.dcache.org/rt/Ticket/Display.html?id=5164 OPEN

7) http://www.dcache.org/rt/Ticket/Display.html?id=5149 OPEN

Review of RB requests

DTNM

Proposed: same time, next week.