wiki:developers-meeting-20100127
Last modified 11 years ago Last modified on 01/27/10 17:37:28

[part of a series of meetings]

Participants

Agenda

[see box to the side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Irina: Investigating Cleaner issue in #5418 (could not be reproduced on a test system); testing HSM cleaner to be ready for 1.9.6-3 ; managing RT tickets. Next week: add main Cleaner Commands to the dCache Book, Chapter 28; set up a multi-node dCache on 64Bit machines with SL5; reinvestigate problem with 'ls dcap' reported in #5291.

Status of work for 1.9.5

A (quick?) review of activity needed for the 1.9.5 release

Timur: Private discussion with Gerd about start-up scripts. Implement his suggestions to support other implementation doors (kerberos-FTP door ) adding new service isn't sufficient.

Feb 18th is next update point for Fermi.

(Partial) reversal of the sync failure handling. Release -13 as soon as possible.

Gerd: Timur, can you get the script ready today? Yes.

A couple of security configuration options need to be updated and some other changes, so there may be some discussion needed.

Makes sense to have per-Domain JAVA_OPTS.

Gerd: this is planned for 1.9.7

Pool-list file to be for any service; and domain list to contain options.

Jon may be able to do one more upgrade of pools to latest dCache before going into production for a very long time. Interactive command to change the log level.

Gerd: It's on my todo list, but haven't had a chance to look into it yet.

Restarts of pools that crash shouldn't involve admin intervention. This should already be the case with the pools his is already rolling out.

Status of work for 1.9.6

A (quick?) review of activity needed for the 1.9.6 release

Old bug that needs to be fixed in Jetty.

Handful of patches in RB for SRM; they're ACK-ed by Tigran, but it would be useful if Timur could have a look.

The SRM fixes are dead-lock fixes. Do a release of 1.9.6 as soon as these are in.

Status of work for Trunk (a.k.a future 1.9.7)

A (quick?) review of activity needed for the 1.9.7 release

How's getting everything working with message queue: only broken thing is the topo cell.

Switch to stop automatic restarts: suppress restart. Gerd: it's on my list.

Connection reset by peer.

Issues from yesterday's Tier-1 meeting

srm client interoperability

According to message in team from JPB. srm client should change behaviour to do the following.

# check the list of methods supported by the FTP server and not use checksum if not supported by the server # or try with checksum and if rejected by server, retry internally in the client without checksum.

At least while waiting for a proper fix, there is a workaround with this option -send_cksm=false I'm more worried by the problem with srm-setpermissions which does not work with DPM as already reported in Savannah: there is currently no workaround.

JPB

Timur: we can easily accommodate the request.

Some discussion that one should do a srm-getPermission first to find the group.

It would be nice to get this out "reasonably quickly": within a calendar month.

dccp and errors

  1. In many cases dccp library hides system errors and translates them into its own errors which makes it hard to understand the root cause of the problem. The situation is similar to catching java exceptions, that should be propagated.

For example CMS had to do to change line

dcap.c:                        dc_debug(DC_ERROR, "Accept failed.");

to

dcap.c:                        dc_debug(DC_ERROR, "Accept failed (err=%d - %s).", errno, strerror(errno));

In order to understand the problem.

Would it be possible to always print the errno and associated error string for system errors? This would of course require scanning and modifying all of the dcap code.

library call: dc_perror(). This has the errno set by libc

Is there a dccp command-line option to expose these failures? dccp should relay this information by calling dc_perror() and printing an error message.

Timur: to obtain a detailed example of when it fails.

64-bit SL5 gsi dcap

I could not find proper gsi dcap version which would work with globus included with vdt that installs on 64 bit SL5. I also do not know which version of globus is needed to get gsi dcap working there. I need this for automatic dcache testing.

Timur and Owen to discuss this off-line.

Fermi upgrade

Upgrading public dCache to new PNFS. rpm

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

RT 5424: Setting Java options for dCache SRM

Note that this is about the SRM clients. Just honour JAVA_OPTIONS environment variable.

Perhaps have a separate

SRMCLIENT_JAVA_OPTIONS

but fall back, if not set, to

JAVA_OPTIONS

Timur: we'll think about it.

5398: srm: Illegal State Transition : g illegal state transition from Done to Failed

This is triggered by the client calling cancel on a transfer that is done.

Tigran: these show up in S2 tests.

DTNM

Same time, next week.