wiki:developers-meeting-20100106
Last modified 11 years ago Last modified on 01/07/10 20:57:40

[part of a series of meetings]

Participants

Agenda

[see box to the side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Status of work for 1.9.5

A (quick?) review of activity needed for the 1.9.5 release

There are the issues that Timur and Jon have reported. These should be fixed.

Have been observing a problem with gsi-dcap. Timur to talk to Tanya for a site in US that is using Gsi-dcap.

Status of work for 1.9.6

A (quick?) review of activity needed for the 1.9.6 release

People have download 1.9.6 so we cannot break compatibility with configuration files.

There was some discussion about the release process: when 1.9.6-1 was released it was delayed but the process of unblocking the release process wasn't well understood. This should be fixed in the future.

It is big deal to Timur that Fermilab was not notified about release despite agreement to delay release and as a result he needs to roll back the changes we wanted badly to be included in first release of 1.9.6 release.

What about config. files? Currently marked as (conf)

As soon as the SRM/Terracotta issue is resolved, we should do a new release of 1.9.6. This is to fix the xrootd corruption issue.

High probability that there will be at least one instance updated to 1.9.6 ("resilient").

CSS patch for WebDAV should be in 1.9.6-2 :-)

Dmitry would like to fix something with SRM ls (making it serialised so the counting is handled correctly).

Each SRM ls .. offset count with multiple SURLs then the order isn't guaranteed by the server: the files could be returned in the wrong order.

Some discussion about limited output count if user supplies too many input files.

Status of work for Trunk (a.k.a. 1.9.7)

A (quick?) review of activity needed for Trunk.

SRM ls issues

Issues from yesterday's Tier-1 meeting

Spin off from [www.dcache.org #5332] jdbcTaskQueue is too long : Gerd wants to ask Dmitry if a particular SQL query is correct.

Dmitry has submitted a patch to fix the issue. Further investigation is needed.

This could have resulted in transfers failing with the "already exists" error, although Dmitry hasn't seen this in his test stand.

Reported that the query is slow. This may be related to how PostGreSQL plans the query. Depending on load patterns, the distribution of distinct values changes. This will result in PSQL not using indexing. May wish to add a combined index on (state + PNFS-ID).

Tried it PostGreSQL 8.4 and it didn't use the index on PNFS-ID and state in the query.

Can we remove the Vacuum calls?

Still seems to be some

Can make it disabled by default.

Timur to try to find out which version of PostGreSQL the user that reported some success from SRM's explicit calls to VACUUM.

No changes for the time being, but look at fixing/adding documentation on how to disable / enable vacuuming.

RegExp in LinkGroupAuthFile

See potential use-cases that would require regular expressions in auth. file.

The concern is that people will use wildcards and we can never get rid of support and must continue to support indefinitely.

Gerd wishes to review the patch.

Jetty SRM as default

Tigran ran this over Christmas on test machine machine without any problems (until client machine died)

1.9.7 make Jetty default 1.9.8 drop Tomcat support.

Do we have a mechanism to test it under high-load.

Gerd repeated the ping-test against Jetty. Performance was more-or-less the same and didn't crash (300 msg per second).

NDGF will deploy this on Swedish T2 tomorrow, T1 next week if that's OK.

Timur: we also need to figure out how to deploy Terracotta with Jetty-based SRM.

We'll also need to figure out whether the monitoring scripts still work.

Did we test 3rd party copies? Ran the S2 tests every 30 minutes for 2 weeks. ... but S2 doesn't have a 3rd-party copy test.

Tigran wants to fix dCache so a single RPM can be deployed without a configuration. script.

Gerd has been started to look into writing an async. GSI connector. With HTTP read-ahead, there's a 1:1 mapping between network connection and network thread.

Issues from CMS / Fermi upgrade to 1.9.5

There were two lists sent out: Timur's and Jon's

Two big concerns:

  • increase memory usage of the pool. This is not necessarily during the pool's startup; sometimes it happens some hours after the start-up. Could it be that some of the movers are leaking memory. Seems to happen more when the files.

Do these pools do third-party copies? Yes, but this isn't related as pools that don't do this show same symptoms.

This is an upgrade from 1.7 to 1.9.5 so there may be more than one issues here as Jon is upgrading from 1.9.1 to 1.9.5.

Can we deploy an intermediate version; e.g., 1.9.3 ? Not in production, but it should be possible to set this up in the test-stand.

Bug in the recovery logic resulted in the wrong exception being thrown. The pool shutdown was correct, under the circumstances.

Issue with the batch files is understood and Gerd will submit a patch for this.

Restricted dcap door not available as a standard option.

Gerd to update patch to include the line number of the batch file in the exception message.

The dccp command with pnfs:// protocol is very useful. Could this be documented?

Jon doesn't run the space manager: he manually removes it from the batch file.

poolCollector: Gerd to propose a patch to make it always available.

alm: Patrick to delegate someone to look into this.

GridFTP cell not responsive to interactive commands during transfers. Almost everything is done using a dedicated thread that responds to admin commands.

TransferManager? hangs .. Timur: we're working on this issue.

Global changes of parameters not possible with GUI any more.

With the GUI he can send a command to

Jon needs a command to set the log level of a particular cell (a pool). Will also like this for doors and other cells.

At Fermi, all pools are in their own domain, so just changing the System's log level is sufficient. We could add a extra command(s) in all cells.

Would be nice to add a single command that set a logger to a certain threshold and also change the appender.

Hidden parameters in JARs

The Spring XML file was previously in the file-system but now are in the JAR file. This prevents site-admins from editing the contents.

Is there an existing toolkit to support

Before every update Jon runs diffs on the default values in the batch files. If he sees the defaults changing then he wants to understand what the consequence of these changes for Fermi/USCMS instance.

He found in some instances where the commented out value isn't the default value.

JDBC monitoring options not present in batch file.

Huh? Timur to look into this.

Pnfs directories set to some restrictive permission were changed to a less restrictive.

Not sure where this is coming from. This might be the SRM change permission function.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

Review of RB requests

DTNM

Proposed: same time, next week.