wiki:developers-meeting-20141203
Last modified 5 years ago Last modified on 12/04/14 12:50:30

[part of a series of meetings]

Participants

Karsten, Christian, Paul, Tigran, Gerd, Dmitry

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Karsten:

  • Created build and deploy plans on DESY Bamboo for CMS-TFC plugin
  • Added build targets to book and created Jenkins job for auto-deployment
  • More discussion on how to use SmallFiles? for dphep
  • still busy with Java 8 updates - patch

Christian:

  • more tickets
  • pcells, support normal PAM style keys
  • will release pcells after some more tests
  • reviewing

Paul:

  • gave talk about dCache at Helmholtz meeting at DESY, ++
  • releases

Dmitry:

  • deployed 2.2
  • helped users
  • system runs fine, users seem happy
  • they use noack option

Gerd:

  • logging for Jetty handshake
  • IPv6 in ftp relaxed behaviour for passive -> improves compat with fts3
  • debugged some issues with alarms preventing clean shutdown
    • it seems to log too many messages (i.e. everything)
  • 2.10.13 on production
  • worked on porting xrootd4j to jetty 4

Tigran:

  • committed many patches
  • debugged CMS instance (5000 cores on NFS)
    • deployed updated libraries to fix
  • ELK testing, experimenting
  • dCache on HDD

Special topics

Trunk activity

Progress with new features...

Tigran:

  • ELK: make billing data directly injectable using JSON
  • Should we have a optiones to configure output to either file or elastic search?

Gerd:

  • Currently we don't have use for it, but other may be.
  • using elastic search for storing billing data takes up too much space.

Tigran:

  • depending on (index-)configuration you can optimize it for different access patterns.
    • e.g. use one index per day rather than per month
  • we have 400G for 9 month
  • ...

Alarms

Dmitry:

  • Alarms revord everything, additionaly taking special action on log messages marked as alarms. We probably should add logic to filter on log level before storing

Gerd:

  • Wasn't the alarms service only supposed to only show alarms, instead of all logs

Dmitry:

  • We can add an option to store only by alarm tag.

Gerd:

  • suggest to store only tagged alarms as default

Paul:

  • what if the alarms service stores the last x messages per service allowing the admins to mark messages as alarms while not storing everything?

Dmitry:

  • for admins logging everything might be useful to trace back problems

Gerd:

  • For that we already have the central logback service

-> talk about it next week and probably only log alarsm by default

dcap vector reads

CMS still uses ancient version of dcap which causes jobs to fail (ticket 8553)

There are several ways to solve this:

  • increase size of vector
  • update clients

How can we solve this, without having it look like it's dCache fault?

Paul:

  • dCache should protect itself agains such clients

Tigran:

  • that still doesn't fix the problem of files to being readable

Paul:

  • if CMS does not care enough to upgrade the clients it should be their responsibility

Tigran:

  • DESY moved 50% of worker nodes to use NFS
  • other sites will probably follow

Dmitry:

  • since the server can detect the client version could we just reject CMS requests?

Tigran:

  • only some reads fail (vector reads with more than 8k.

-> maybe we can return an error and have the client fallback to serial read.

Issues from [FIXME: Add link to yesterday's Tier-1 meeting]

KIT: Running fine

  • CMS benchmark caused problems.. probably problem with benchmark
  • asked about FAX plugin
  • IPv6 + FTS3 caused problems to initiate transfers -> probably a problem with FTS

FERMI:

  • OOM
  • NFS very slow and previous mountpoints getting stuck
    • many processes ran "rm" on non exiting file

Dmitry:

if you clear those processesthe system recovers

Paul:

would attribute caching help for NFSv3

Tuesday:

PIC: Looks okay, tested 2.10, will upgrade on 15th Dec --> in http://rt.dcache.org/Ticket/Display.html?id=8459 Russian T1 states that they will also upgrade to 2.10

NDGF: After upgrade on Atlas VO, everythings is working, will upgrade to 2.11.x

SL5, SSLv3

There are still SL5 machines, which may in best case disappear in the next 6 month. They will use the SSLv3 info to push for upgrades

Plans for patch-releases

Should we make a new patch release?

Patches are out, and we are back on schedule.

Paul:

How can we get FERMI dCache versions back onto our normal version?

Dmitry:

We could already go back to normal versions, but we need the caching. -> Will test!

Tigran:

There are some patches on RB for that. Backporting to older branches does not make sense.

Paul:

Will FERMI upgrade public dCache to 2.10?

Dmitry:

Probably yes, have to fix things, then within 3 month.

Paul:

Suggest to use 2.11 instead of 2.10

-> Tigran to look into backporting things.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

Review of RB requests

Gerd:

  • Java 8 patch
  • 7638 -> Dmitry

Karsten to revert hostname2fqdn patch

New, noteworthy and other business

Paul:

FTS does limit requests with an upper limit of 2k files then waits for some time and then submits more requests based on the number of completed requests.

DTNM

Proposed: same time, next week.