wiki:developers-meeting-20141112
Last modified 3 years ago Last modified on 11/12/14 17:43:00

[part of a series of meetings]

Participants

Al, Christian, Paul, Tigran, Karsten, Gerd, Dmitry, Patrick

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Karsten:

  • Preparing presentation on Atlassian Suite
  • Reviews

Christian:

  • release process for pCells. Some things can still be improved. Builds on Jenkins. 2.6 and 2.10 work fine.
  • Currently only DER keys are supported, but will extend to others
  • tickets
  • looking for dates for dCache workshop

Patrick:

  • dCore meeting about merging dCache and their cloud product
  • preparation of vacation

Dmitry:

  • Looked at patches
  • CMS has problems with NFS4.1, and experience problems with mounts getting stuck.
  • Public dCache got a bit better after fixes in 2.6
  • any "full" ls will hang
  • 3rd party transfers, looks like relative path does not get appended to home dir, will investigate more

Al:

  • Many small patches, cleaning up
  • Alarms
  • Talked to Paul about fixing login page
  • Still thinking about optimal solutions

Gerd:

  • Reviewing
  • Worked on xrootd mover to propagate errors
  • Solution for Alice not being able to remove files was caused by Alice themselves

Paul:

  • 2.11.0 is out
  • dCore meeting
  • WebAdmin? with Al
  • Worked on automating release process
  • many meetings (EGI fedcloud)
  • talked to GLUE people to get some changes

Tigran:

  • dCore meeting
  • NFS problems
    • DESY Cloud, fixed something, but still having problems
  • worked on billing cell to make it really pluggable

Special topics

ls over NFS

Dmitry: looks like even simple ls does a full request on DB. Are there any plans to change that? Tigran: We expect clients to always request attributes, so we always do a full join. Dmitry: Request takes about 10x the time

Dmitry: Planning on moving DB to SSDs

Try with "ls -1"

Dmitry: Client with NFS4.1 sees slow ls Tigran: Need a tcpdump to debug

Tigran: better to use 3 than 4.0

Alarms embedding

Al: Gerd discovered that the CellNucleus? tries to reject some classes.

This causes a problem with having the xml datasource in the same domain as the transfer manager? -> suggest to either remove XML or to file bugreport against DataNucleus?

-> will file bugreport -> we should make people aware of this problem with xml and transfer manager

2.11

Some monitoring tools fail agains 2.11 with verifying certificate ages while it does work with 2.10. There are a couple of combinations of monitorin and dCache that fail for still unknown reasons.

dCache as object store

Paul: We have some inconsistencies how we handle pnfsid and access rights. In NFS we expose the pnfsid

but in dcap knowing the pnfsid is enough to access the file. If we want to move towards a object store, we need to be consistent thoughout our protocols.

What should be the security model for dCache as a object store?

Tigran: We could have a switch for file system or object store

Patrick: We have to be both, CDMI is both, too

-> It might be useful to assign ACLs to IDs

Trunk activity

Progress with new features...

Issues from [FIXME: Add link to yesterday's Tier-1 meeting]

Paul:

Thursday: cancelled Tuesday: - KIT: would like to have "rep ls" filtering by storage class

-> We definitly want that and even more filters! And a grep!

  • Problem with Atlas instance using a lot of CPU and the crashed with OOM.
  • Deployed NFS for non-WLCG. They had a problem with too few nfs movers.
  • They are looking through guid for upgrading to 2.10
    • Question: How is it possible to find out if a client is banning a pool?
  • Interested in the health-script.
  • They are even considering upgrading to 2.11 after moving to 2.10 for the HSM improvements.
  • Lhcb SRM issue getting a "closed connection", probably due to an certificate issue during the handshake Gerd: 2.6 is more aggressive in timing out handshakes. Hardcoded to 10s. -> We can check for the logged IOException on out side.
  • Support for 2.10 on the monitoring plugin of Ilya

-> We need to understand what he hopes to achieve with the upgrade to 2.11 for HSM

Plans for patch-releases

Paul:

2.11.1, 2.10.10, 2.9.14, 2.8.23, 2.6.38 were just released.

There are some pull-requests by Al fixing garbled characters in alarms properties file.

  • UMD repos contain 2.6.35 and 2.2.27 of srm client. Can we build recent versions of srm client with java 6? -> Christian to try to build with j6. Watch out for dependencies!
  • Do we feel like we can drop support on a supported branch early? If nobody is using those branches (e.g. 2.7)? -> Will keep the current system.
  • Working to improve releases and jenkings testing.

Outstanding Documentation

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

Please have a look at your open tickets!

Review of RB requests

New, noteworthy and other business

DTNM

Proposed: same time, next week.