wiki:developers-meeting-20091104
Last modified 11 years ago Last modified on 11/04/09 18:15:50

[part of a series of meetings]

Participants

Gerd; Irina, Owen, Tigran, Paul, Jan, Tanya; Timur, Vladimir, Vijay, Gene

Agenda

[see box to the side]

Status of PNFS release

Past all internal testing and for

SL5 64-bit SL4 32-bit

Need to test for use-case where people upgrading from 32-bit to 64-bit.

Expect to be released in the next few days.

Berkley DB

Vladimir is nearly ready to send a patch that adds support for compiling PNFS with the Berkley DB.

Status of work for 1.9.5

A (quick?) review of activity needed for the 1.9.5 release

New info provider with space reservations and mega table

Any other issues?

When will be the next release?

gPlazma performance reasons

external call out will see a big performance improvement.

When do you expect to have those committed?

Should be part of 1.9.5-6 release.

List bug.

We have a request for a 1.9.4-5 release.

---

There was a message from

Q: 1.9.5-6

it seems that, in the change the last commit was 12745. Change 12727 wasn't included in the change log.

12735 has the version number increase.

Probably just missing from the release notes.

Releasing 1.9.5-7

Monday would be OK but Owen would prefer this to happen on Friday.

Status of work for 1.9.6

A (quick?) review of activity needed for the 1.9.6 release

ACL checks for SRM

Timur: ready to start working on this, this week.

Webdav support

Gerd: first version submitted to Review Board.

Fast list for SRM

Timur: we aim to have something ready for testing by November 15th.

Waiting on path 952? Gerd: no, this is unrelated. Started working on this for the SRM, but realised that this wasn't the correct approach.

For first version use the synchronise

Does this remove the need to mount? Yes.

Owen reported a bug: default install of dCache, restarting the system, the scripts mount PNFS read-only.

Terracotta for SRM

Timur: pushed the changes I wanted into review board. Time to do the extensive testing and see what is still missing.

What are the deployment plans? Is Jon planning on using it?

Jon is talking about redundancy: improving up-time. But, too early to say for sure how this will be deployed.

When SRM receives a URL it checks whether the URL hostname matches its own. This check is now needs to be more complex.

Single port xrootd mover

Gerd: no progress. Too busy fixing bugs.

Easy domain composition

Committed in the version that's good for 1.9.6

Refractoring of batch files.

Tigran reported that he's working on a version of dCache that works without config files.

Problem shutting down due to script looking for PID files.

HSM cleaner for Chimera

No progress yet.

Any other issues?

Nothing else: Tigran might do more changes on NFS stuff.

Issues from yesterday's Tier-1 meeting

Yesterday's meeting was pretty smooth; no urgent problems.

Doris and the black hole

Doris reported that they had a pool act as a black-hole for all pool-to-pool transfers. Restarting the pool fixed it.

Black-hole: problem with PoolManager due to Pool failed bec the pool-to because the port-range was exhausted. Current limit is 30 ports. All pools in different domains then you have a different port-range for They running 11 ports, NFS mover is allocating a port (11 ports)

Error message improvement. Once the p2p client fails to start then it just got stuck. This was fixed in 1.9.5-5, but Doris was using 1.9.5-4.

SRM 3rd-party transfers

Jon reported that Fermi are currently unable to achieve 3rd party transfers with SRM. The FTS / GridFTP trick works, though.

Particular channel in MIT. Still needs to be investigated.

Timur will investigate.

Managing schema migration

Database schemata change over time. How do we manage this change?

Tigran: lots of tools that allow you to migrate the schema; Tigran to send a list of tools.

Gerd: doesn't the IBM article

Question: do we want the software does it out-of-the-box. Or do we want to run some scripts.

Dmitry has worked on a similar issue in enstore.

Automatically generate schema-diff and apply

Gerd: Is the XML language powerful enough to express triggers and stored procedures?

Dmitry: Not yet; triggers and stored procedures are expressed as normal SQL with CREATE OR UPDATE ...

Desirable to have no hard-coded the SQL in the Java, but

Do we want it in-side or out-side of code?

We should have different definition files outside of code.

How triggered?

  1. from inside dCache?
  2. externally triggered?

Prefer 1.

Problem with updating a live system:

Not sharing any tables.

Owen: don't like giving an running application schema update rights.

Can have both worlds: Tier-1 with locked-down schema and dCache without schema update rights, or fully automated system where dCache has schema update rights.

Outstanding RT Tickets

[This is an auto-generated item]

RT 5189: Fwd: Lost files

Skip.

RT 5196: bug: migration module cannot handle pin manager failures

Is this the sticky bit created by the Pin Manager? Gerd: pretty sure: name starts with "pinmanager" then migration module sends the request to pin manager

Timur to investigate.

RT 5230: Hung transfers in production

It may be a design flaw that the door doesn't notice that the mover has gone.

Or it can be a feature: the door can temporarily loose connection without affecting the transfer.

Gerd: to take the ticket and ask questions (whether the problem was a one-off)

Client closed the data-channel.

Transfer finished message was lost.

Pool was, for a short-time time,

A normal restart of the pool shouldn't be too bad a problem: all the movers are stopped.

If it was a reboot of a machine, the door won't get any transfer-finished messages.

Q: was the problem coincident with network problems?

RT 5235: PinManager?/dCacheDomain problem

Problem due to stage protection being enabled.

If the protection point is the doors then the doors must read the file. PinManager? is a door, so it has this.

Tigran: problem included being unable to type commands in the PinManager? admin interface.

Q: to what extent the policy enforcement policy point is the doors?

1.9.4-4, so it cannot be the PoolManager

Need to discover if the problem is with PinManager? really hanging or just not servicing requests.

He has the tape protection enabled, deliberately.

Irina to follow up on this.

RT 5234: dcache admin shell communication cuts-Connection reset by peer

Gerd to handle it.

RT 5229: open and read file using dcap

Irina to interrogate end-user.

RT 5223: dCache fails with RFC VOMS proxy certificates

Proxy certificates:

Globus-style RFC-style

An option to grid-proxy-init or voms-... "-rfc" that results in RFC-compliant proxy certs being created. This is not accepted by dCache. It's a long-standing problem that ARC developers have complained about for a while.

No full stack-trace; Gerd can modify the code to generate it.

Gerd to generate stack trace Timur: to try voms-proxy-init -rfc

RT 5227: Questions about Database schema (SRM, gPlazma)

Timur to reply.

Paul to build template in DocBook?.

RT 4552: SpaceAuthorizationException?: User is not authorized to release this space

Looks like a problem with implicit space reservation.

Dmitry to investigate.

RT 5072: dcap movers hanging

Can we close this ticket (as Doris mentioned yesterday that the current version is 1.9.5-5 ):

Looks like problem persists. Tigran will investigate.

http://www.dcache.org/rt/Ticket/Display.html?id=5075

Gerd to look into this.

Information.

Next week on Wednesday, 11th of November.
Fokke Dijkstra is going to perform upgrade from dCache 1.7 (on SL3) to 1.9.4-4 using Chimera (on CentOS5).

Should we propose him to take 1.9.5 (waiting until 1.9.5-7 ) ? http://www.dcache.org/rt/index.html?q=5221

Yes, they should take 1.9.5-7.

Fermi has dCache instances running dCache v1.7 (or 1.6) so would be interested in the results of the migration.

JMS support in dCache

Gerd will update to topo cell for 1.9.6 or 1.9.7

For moving to JMS as a default involves changing a single line in the batch file.

Now that SAM is moving to ActiveMQ.

Transition plan:

  1. field testing without JMS as the default
  2. field testing with JMS as the default
  3. make it a requirement.

Review of RB requests

DTNM

Proposed: same time, next week.