wiki:developers-meeting-20141105
Last modified 5 years ago Last modified on 11/06/14 10:18:31

[part of a series of meetings]

Participants

Karsten, Christian, Paul, Tigran, Patrick, Al, Dmitry, Gerd

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Karsten:

patched pom.xml for updated URLs patched WebDAV to behave correctly on USERINFO requests (e.g., by davfs2) tickets, reviews

Christian:

got pCells to work, unmuted exceptions

Paul:

Java8 upgrade WebAdmin? EGI, AAI, Meetings, ...

Tigran:

NFS, multiple receiver processes investigated heap dump from DESY CMS instance after upgrade wirked on LDAP plugin

Al:

RM, rewrite, had to revisit some code working on problem with automated tests, needs more work. Will look into Gerd's patch

Dmitry:

Investigate user problems with NFS, mostly. So far it looks like one of the clients cause problems. Pool Queues, Sweeper Reviewing some patches and updating patches, will make merge requests

Gerd:

Session Strings in dCache (shortened them), this was caused by soem xrrotd-issue, committed first Java8 patches. worked on replacement for java keystores refacoring finished upgrade guide vector reads through HTML, but Milton misbehaves (copies, keeps temp files)

Patrick:

Visited DFN to arrange meeting in December (concerning Shibboleth) Mitarbeiterbewertung dCORE + MOU negotiation with legal department concerning licences Jana finished her thesis.

Special topics

NFS

Dmitry:

Was the provided information helpful, do we need more.

Tigran:

Need to run script, will try to reproduce.

Dmitry:

There is a ticket with NPE concerning NFS

Tigran:

That was caused by mount over upgrade with old filehandles will fix the NPE, though.

Dmitry:

How to debug?

Tigran:

Use wireshark and observe traffic.

-> Will try to reproduce next

Paul:

It would be nice to have something for the testsuite. Can we add it to NFS test suite?

Dmitry:

This is rather a debugging script. Currently we guess writing causes problems on ls.

pCells

Christian:

How should we release pCells? Separate from dCache? Or bound to a dCache version?

  • Is there any problem with taking out all pCells related classes of the dCache repo?

-> Suggestion: Make a release for every dCache branch. Users can then drop in other

dCache version jars.

Paul:

Concerned about different versions of released versions and versions in Nexus. Should we push our JAR files as part of the release files?

Gerd:

Ideally pCells would download the JARs automatically from Nexus. Maybe we can refactor to have a separate API package. That would solve the dependency issue.

Al:

dcache-common should be used more consistently.

Gerd:

an API package would solve that problem.

Tigran:

how about if the client pulls the jars from the server? For now, it would be okay to keep copying the JARs into the pCells directory.

Christian:

Could package all dependencies as before, but the goal was to have a clean build.

Tigran:

You would copy Java files and compile it as part of pCells. This would allow a smaller package. This would only require a small number of files.

Christian:

There are actually many cases where we get objects instead ...

Tigran:

Then we can have pCells as a module of dCache and release it like srmClient. Or you install dcache to maven locally and build pCells there.

Paul:

Suggest to either copy Java files or to put pCells in dCache. This would be consistent with the srmClient.

Christian:

...

Pool Queues and Sweeper

Dmitry:

We have a system that has multiple storage groups that don't play well together. Therefore we have a large pool group for many experiments. This sometimes pushes out other experiments. It would be good to have a mechanism to have a fair share for pushing out files based on storage groups. Opinions?

Gerd:

It would be nice to have pool and transfer virtualisation, but that is a long term plan. You could implement a custom sweeper, but the queueing is hard coded.

Dmitry:

PartitionManager? would allow custom algorithms

dCache User Workshop

D+A: We are afraid that politics will probably block all our plans.

Easter wouldn't work, too

Christian:

We want to have it close to HEPiX. the week before is blocked, the week after might work,

Patrick:

We can check with people if they want to go to Hepix anyway otherwise we can take the week before.

-> Christian to send another mail with all dates and suggestions.

HTTP Get

Gerd:

What can we do about Milton keeping temporary files?

The current version misses some interface to read multiple ranges which causes the problems.

The most recent Milton version is a major update and we should probably not put it in 2.6.

Tigran:

People could solve this by putting a proxy in front of dCache.

-> We should solve the temporary files issue, but suggest to use a proxy to avoid additional namespace lookups.

Paul:

I am worried that relying on such a solution for good performance would still cause inexperienced users to rely on the dCache implementation of proxying and get bad performance. We might want to disable proxying in dCache all together.

Gerd:

This might cause some rather simple use cases to become too difficuilt in deployment.

Paul:

Would it be possible to use NFS's proxy support for other protocols?

-: Rather not.

Patrick:

Why is it so slow?

-: If you have multiple redirects that could take up a lot of the time.

Paul:

Could the problem be caused by the client cacheing the redirected location, but with a close the location becomes invalid. There seems to be a conflict between the HTTP way and the dCache way of cacheing. We could only offer this if we can guaranty that a mover would still be valid for the time it expects the cached redirect to exist. There might be a sweetspot in ragards of the cacheing time.

Would we really gain anything with NGinx?

-: NGinx has a much better cache

Paul:

Concerned that the NGinx might not be that a great performance benefit. Maybe make NFS proxy more generic.

Should we really invest time into the NGinx solution or rather suggest people to upgrade.

-> Upgrade Milton and backport it to 2.10 and create a demo with NGinx as example. -> Implement multiple vector reads in dCache and possible caching.

Gerd to look into the possible solutions

Globus and SSL3

There is a timeline on how globus wants to move from ssl to tls. We can switch it off indenpendently. Need to make some patches.

Trunk activity

Progress with new features...

Tigran: patch Gerd and Paul to comment.

Issues from [FIXME: Add link to yesterday's Tier-1 meeting]

Paul:

Thursday: - Tuesday: KIT:

LHCb opened ticket about SRM closing connections. Ticket is already closed, but Tigran guessed that it could have been a switch assuming an inactive connections before the request finished.

-> It is fine for dCache to close idle connections. Maybe it is FTS3 having a bug assuming

the connection has to stay open.

The error could have many causes. Needs more investigation if it happens too often.

Paul to as what the error message in FTS log excactly means.

NDGF:

Some problems with Alice recreating tape data. Some requests fail with "No such file or directory".

Gerd is investigating

NFS41:

problem with queuing movers.

Tigran to try to reproduce.

Plans for patch-releases

Should we make a new patch release?

Wanted to release 2.11 this week, but might take until next week. Other branches later.

Outstanding Documentation

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

8518:

  • Ask for latest version this worked.
  • read on mounted filessystem will work with uid:gid, but stage goes through door as nobody.
  • At least previously we allowed staging if the pnfsid is known.
  • Could be the door is sending the stage request as anonymous instead of root.
  • the message is only created in chimera-namespace/pnfsmanager

-> Ask to set log to debug and repeat and send log.

Review of RB requests

New, noteworthy and other business

DTNM

Proposed: same time, next week.