wiki:developers-meeting-20100922
Last modified 10 years ago Last modified on 09/22/10 17:57:43

[part of a series of meetings]

Participants

Antje, Jan, Paul, Owen, Christian, Patrick, Tigran; Gerd

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

Tigran: who week in meetings .. busy merging into 1.9.10 branch. TWo tests that fail, probably bisecting commits to find the commit that breaks the tests. The two tests that fail are removeDuringPut and removeBeingPutOverwrite (basically the same test). Closer with the thread-less movers in the pool.

Gerd: Working on release notes (with Paul's help) usual place for download. Hope Tigran can add last few patches before release is made. Cleaning up the URL handling in SRM. This is now done in the dCache specific part. Upgraded to 1.9.10 this week. Today also upgraded a couple of pools to 1.9.10.

Antje: also testing; try to install 1.9.10 (didn't work, but should be fixed now). EMI meetings.

Jan: Trying to get web admin ready for 1.9.10, with input from Gerd and Dima. EMI meetings.

Paul: EGI attending and giving two talks + dcap work.

Owen: EMI + Virtual machine infrastructure. Made a release of dcap .. with bug-fixes to fix ATLAS' broken clients. Looking for bugs before merging dcache-configure before merging into Trunk (+back-porting).

Christian: Looking into dcache book and trying to find errors (or useful material). sshd package (from Apache) .. looking at integrating ths into dCache.

Patrick: EGI techncal forum. Milestone 30 for EMI. Abstract for Presentation for LBNL; abstract for HEPix. Going through the book and installing dCache within Amazon EC2 .. S3.

No questions until we get through everyone :)

Plans for patch-releases

Should we make a new patch release?

No plans.

1.9.10-1 due out 1st of October.

SRM tests that fail =

Have to find out why: why now and not in 1.9.9.

Salient is knowing whether a GridFTP transfer is started.

A lot of entries that don't have a name and have a strange file-type. ... 4,000 files like that. Looks like a bug in Chimera or PnfsManager.

Exceptions from commands

A Command that throws an Exception will now be serialised as an Exception object instead of a string.

We will need to update web-admin to handle errors that propagate as an Exception object.

The help command must also be updated to Command line needs to

Typing "rep ls foo" will return an Exception now.

Trunk activity

New plotting code

Gerd noticed the release is 10 MB bigger because it includes all Batiks library.

Does anyone know the status of this? Nope.

What do we do?

We could try to package this separately and allow people to download it separately. Perhaps we don't need to package separately as the size increase isn't that much.

We need to know how to switch in on .. Paul to ask Dmitry to investigate.

New dCache-configure

Early warning: Owen will likely push new dCache-configure into ReviewBoard?.

Issues from yesterday's Tier-1 meeting

xrootd mover

Original motivation: two files on the same connection.

Gerard trying to use xrdcp with multistream settings, which also fails (but unrelated).

Would be nice if someone at DESY ..

If ROOT is using multiple streams

Could be that the second open is being sent to the pool, rather than the door.

Tigran to look into it.

NFS 4.1

Upgrade their pp to 1.9.10 and can get

Problem with Java sign/unsigned. Not worth patching 1.9.5 as we will be releasing 1.9.10 soon anyway.

Doris dcap

Problem looks like poor error recovery in dcap client.

Client is active, tries to connect to pool and fails.

Suggest changing mode so pool connects to client; but this doesn't help.

Work-around was to set an environment variable. This doesn't seem to help but perhaps ATLAS is overwriting this? Nope; not in root, but perhaps ATLAS framework is (un-)setting this value.

If the pool cannot connect to pool then there's a problem in their networking.

However, if pool cannot connect to client it must tell door (which tells client) that this connection has failed. This also isn't happening.

If we do not connect but fail, that is also an OK solution.

We should set a more conservative timeout (e.g., 2 minutes) and simply fail if this doesn't work.

Troubleshooting guide

Paul: please read the email.

Bug tracker

It seems we going to have to provide metrics for the EMI project:

o how long it takes for us to process a ticket, o who identified the bug,

We might be able to use the CERN infrastructure.

"High priority" ticket needs a daily report:

o how many bugs in daily, o who identified them, o ID of bugs,

(basically, a dump of current status).

"High priority" is anything we choose it to mean.

We need a weekly report on all tickets.

Propose that all sites (using the EMI release) send their bugs to EMI bug trackers.

Patrick says he wants to see the progress of bugs being processed.

Not every bug in RT is a bug from EMI.

Options: create our own infrastructure (RT+separate queue, trac, ...),

use EMI infrastructure (JIRA instance + GGUS + Savannah).

In favour of our own infrastruce. is Patrick would like same metrics for all dCache change-requests. This would save duplicated work; we have some control over what is reported.

In favour of using EMI infrastructure. There will be no issues with reporting metrics. We don't have to maintain an extra service. It isolates our (internal) bug-tracker from bug system used by external people.

We defer this to Patrick.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

RT 5572: Re: SRM dCache needs restart after lcg-CA update?!?

It does re-read in 1.9.10. Don't need to discuss this ticket any more? No.

RT 5756: Problem of locality

No need to discuss: needs 1.9.5 patch from Dmitry.

RT 5824: One hot file access case

Tigran didn't remember the details but he thinks he has an idea where there's a race condition and wanted to know we could come up with a nice solution.

Don't need to discuss, unless Tigran finds something interesting.

Review of RB requests

DTNM

Same time, next week.