wiki:developers-meeting-20090513
Last modified 12 years ago Last modified on 05/14/09 14:49:13

dCache Developers meeting Wednesday May, 13, 2009

Participants : Tigran, Gerd, Paul, Timur, Dmitry, Vladimir, Ted, Irina

Agenda

  • Plots of SRM requests,
  • dCache CMS system
    • short term solution,
    • medium to long term solution,
  • News about the agenda for next week,
  • Interfaces: touching base,
  • Point-of-order: meeting anouncements being sent.
  • GDB: what did we learn?
  • UberFTP client
  • Release notes
  • 1.9.3 release process: where are we?
  • Outstanding patches

Plots of SRM requests

Discussion centred on Dmitry's SRM plots and whether they would be useful for next weeks meeting.

Timur asked whether they would be useful.

Tigran pointed to the talk by Guiseppe in the GDB. This included pie chats showing which SRM SOAP requests were most frequent. Would it be possible or useful to generate these pie chart? These pie charts were generated over 2 months.

Timur pointed out that, given the available time, we would not be able to contribute plots based on such long observation time.

Paul suggested that gathered data would be useful, even if collected over a shorter (<< 2 months) time-frame. Guiseppe's data was taken from a single Castor SRM endpoint: by showing data that data from dCache is not wildly different, we could show that broadly similar client activity is observed elsewhere and against a different implementation of SRM.

Gerd offered to do this for NDGF.

News about agenda for next week

No news.

dCache CMS system

The issue is with storing the AL&RP information on file-creation, rather than only when a file is flushed to tape. The additional load causes problems in PNFS: resulting in a performance regression.

short term solution

Timur thanked Tigran for a rapid response. The solution was to supply a switch that instructs PnfsManager not to record AL&RP information when creating a file. Jon has now deployed the fixed version and has not yet observed the problem reoccurring.

medium to long term solution

Tigran reported that he considered the initial plans to fix this (moving the responsibility for storing the AL/RP creation to the pool) is architecturally bad and might not solve the problem. However, there may be other fixes (e.g., storing the checksum at the same time) that would lead to a increase in performance equal to that lost by storing the AL&RP. This may be sufficient to remove the performance regression.

Interfaces: touching base

Timur asked what was happening?

Paul report that he and Tigran had worked on this on the plane back, but apologied that he hadn't finished off the interfaces yet. There have been a few other things (e.g., CHEP paper deadline) that needed finishing.

Point-of-order: meeting announcements being sent

Gerd mentioned that, because the announcement wasn't sent, that he had difficulty in connecting. Also, the agenda wasn't (at that time) available.

Agree to send the announcements.

GDB: what did we learn?

Tigran reported that LHCb were running extensive analysis tests and claimed that all LHCb Tier-1 sites had been informed. Yet FZK seemed unaware of this activity since, when faced with a problem due to large number of SRM interactions, they didn't suspect the LHCb analysis as a cause.

UberFTP client

It seems that UberFTP is becoming a more popular GridFTP client. Because of that, it would be good if it worked well with dCache and continued to do so.

Paul said he'd contact the UberFTP people and attempt to establish an agreement about testing UberFTp and dCache.

Release notes

Gerd forwarded a concern voiced during the Tier-1 support meeting: that, although the release notes were good for him, he would like something that he could point end-users to that would justify the down-time necessary for an upgrade.

Discussion centred on whether we should change the change-log format: have a single page for a branch (e.g., 1.9.2) and changes as section within this page.

Gerd and Paul to look at providing a single page for 1.9.2, using this as a template for forthcoming 1.9.3 release.

1.9.3 release process: where are we?

Timur asked whether the PNFS/PnfsManager should be considered release critical, so delaying the 1.9.3-1 release until it is fixed.

Timur was in favour of considering it RC; Gerd was mildly in favour of not considering it RC (as the performance regression already exists in 1.9.2) but admitted that it seems silly to release with a known regression. The others didn't express a opinion. We agreed that it was a cause for concern but delayed making a decision whether to consider the problem RC until 1.9.3 is closer to being releasable.

Outstanding issues are:

  • port allocation for NFS,
  • Interface to permission handler,
  • (Maybe RC) File creation PNFS/PnfsManager performance regression.

Outstanding patches

42: Patch to disable suspend functionality in RequestContainerV5

Believe this may now be redundant because Gerd has fixed a bug elsewhere. Gerd (? or Tigran) would test whether the fix satisfies Brian's use-case. If so, Timur will contact Brian for him to test that the patch works for him.

64: Berkeley Db implementation of PNFS.

With Vladimir: working on an update to the patch

71: Postgres Continuous Archiving and Point-In-Time Recovery (PITR) Scripts

[Ship it]

With Timur: need to chase this one up.

129: I forgot to remove unused datamember from SrmLs?

[Ship it!]

With Dmitry?

131: Refactors the dcache script

With Owen.

Paul to kick Owen.

145: Fix ChimeraInfoExtractor so it is a replacement for GenericInfoExtractor

[Marked Ship-It]

With Paul: investigate why the patch isn't committed yet.

168: Allow "pool create" to work on non-existing directories.

Gerd's waiting on Patrick to OK the new error message since he first complained about it.

With Patrick: Paul & Tigran to kick Patrick

182: A new implementation of Trash interface using NFS

[Ship-it!]

???

193: Add support for an option to tell comparator to continue when faced with an inconsistency

With Paul: need to upload a patch that is viewable.

208: Fixes Ticket 4450 and other bugs in PartitionManager

Tigran/Irina? waiting on conformation that this wouldn't break Jon's system. Consensus was this wouldn't because Jon doesn't use "save" in the admin interface.

With Tigran: waiting for Timur et al to send a copy of Jon's configuration for a final check.

212: automatic logging of counters in RRD Databases and generation of the history pages

Ongoing work between Tigran, Timur and Paul.

With Paul: need to finalise interfaces.

We stopped reviewing progress here due to lack of time.

DTNM

Same time, next week.


Last Modified by patrick @ Sun Mar 7 00:28:19 2021