wiki:developers-meeting-20100210
Last modified 11 years ago Last modified on 02/11/10 16:08:11

[part of a series of meetings]

Participants

Gerd; Timur, Dmitry, Gene; Tigran, Jan, Paul, Owen, Patrick, Irina, Tanya

Agenda

[see box to the side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

Gerd: new configuration system, scripts; very nice, Timur: Working preparation for upgrade; preparing patch to get supprot for kerb. door and multiple grid. Also working on a bulk unpin command the pin-manager (symmetric one with pin-manager). Another patch for 1.9.6. Power crash yesterday, dCache still recovering. While everything else comes up OK, the SRM doesn't come up.

Tigran: did two releases in two days. Most working with Gerd on getting xrootd working and NFS stuff.

Jan: progress in getting dependencies of metadata into ETICS. Working on getting security with new web admin: looking at wicket.

Paul: stuff.

Owen: Mostly ETICS ping-pong and doing stuff with Jan. Version chasing ...

Patrick: been busy.

Irina: working on the issue of checking multiple user's FQANs (and not only a single FQAN of the proxy) to decide about the user's permission to stage files (Tape Protection issue, mentioned in tickets #5419, gus#55121). It turns out to be bugs in gPlazma, that need to be fixed first. Committed final version of HSM Chimera Cleaner so that it will be available in upcoming release 1.9.6-3.

Tanya: working on Pool Manager Adapter and reviewed all tickets for last two months.

No questions until we get through everyone :)

Status of work for 1.9.5

A (quick?) review of activity needed for the 1.9.5 release

Making changes to provide native dCache support for kerberised dcap doors and multiple doors on the same node.

Another issue Memory usage of the pools: talk about this in a bit.

Has there been any progress on the DPM setuid/gid set permissions and "gridftp issue" for srm-client?

Timur: it's on the TODO list.

Dmitry: have two issues: passing VM issues and the set-permissions issue.

Tigran: has a problem with empty URI. SARA can't flush files to tape any more. The flush script was returning something that was a valid URI, but wasn't something we were expecting.

The error message said ...

One fix to remove all empty strings. But, there's also a fix to remove any invalid URI.

Pool memory usage (see RT 5461)

Any progress to adding the ability to change the log levels in any cell? Yes, in any cell you can now change the log-level.

Status of work for 1.9.6

A (quick?) review of activity needed for the 1.9.6 release

Need a new release at some point: OK.

The webdav stylesheet committed? Yes.

Status of work for Trunk (a.k.a future 1.9.7)

A (quick?) review of activity needed for the 1.9.7 release

Initial switch to SLF4J allows migration to log-back.

Issues from yesterday's Tier-1 meeting

Triumf

SRM certification problem

Can we ticket Globus? No, we couldn't provide any useful information.

It would be very nice to have NDGF experience replicated elsewhere. Don't want to push Tier-1s towards Jetty.

Timur: any chance we can get the logging from Tomcat? If this could be pushed to sites then we this might let us know what's going wrong.

Most likely the problem is related to CRL updates? Well, maybe, but Triumf experience suggests this might not be the case.

Need more logging to discover what's going wrong.

NDGF

SRM stalling (for ~20-minutes, seems to be a one-off)

No news.

pool load (20-minutes of high load, every hour)

Current theory, that this is the Java GC working close to the limit. Would explain why it gets worse over time.

Planning on upgrading tomorrow. Increasing the ehap size, see if the problem goes away.

Pedros email

Pedro has sent an email to use-form asking about SRM and ACLs. Could you look into it?

København trip

Hotels

Gerd suggested a few: one hotel is ~500 Danish Krona (~75 euro). Advantage is that is would be in the city-centre.

Patrick to send the Timur details about the hotel.

One small complication: if they offer group discount then it's a group reservation with group pre-payment.

Gerd to call the hotel investigate how to optimise our hotel usage.

Timur sending travel details to team@…

Plans

The concept of user-identity. How do we treat users, permissions, and stuff like that.

Everyone agreed that this is a good topic to work on.

Many inconsistencies how we handle multiple identities / roles.

How we define permissions on newly created files?

There are some things that (not for WLCG) how we can handle ACLs on a per-user basis.

For EMI, it is required to interface our gPlazma with the Argus system. Once we get EMI funding, we'll have to attack this problem. The code

Currently support: GSI, Kerberos, Username/Password?.

A number of places where external user-ID information is carried across into internal components; e.g., staging protection and dcap.

Authorisation: not just one step.

Get together a document that captures the issues we should address.

Gerd: the idea of methodology of the meeting. We will be working in small groups and meeting up.

Going to asking one of Gerd's colleges to act as ring-master. Gerd suggested H. T. as someone for this role: he has lots of experience.

Gerd's stuff

Unfortunately, over several weeks, saying that this & that problem will be solved by some work.

Working on how we do configuration, defaults, init scripts and boot-strapping a dCache instance.

Some success Much more DRY, standard Java property file formats.

Should have something ready for review some point next week.

Unfortunately, dCache Configure script needs rewriting at that point.

The next golden version (1.9.15, say) ..

In 12 + x months dCache will decide to make another Golden release.

Patrick: there must be a nice way for them to upgrade.

Gerd: it's still the same kind of thing people are configuring, just how you describe what

Tigran: if they're upgrading once a year then they should simple

Patrick: we need a clear migration path; we must not require that people read all the intermediate release notes.

Gerd: we need a migration guide from one golden release and the next.

Owen: I have documented the migration path from 1.9.1 to 1.9.5, it was very short but needed to read through a large amount of release notes to build the document.

How we manage the configuraiton file and defaults. There will be two files: one that is immutable (non-user editable) and a dCache setup file. Variables have a declarative fashion rather than imperative style. This has the advantage of declare the java options.

Per-domain configration. Directory etc/domains. One file per domain: gridftp, pool, etc. One line per service: name followed by a list of key-value pairs for overriding value.

As an optional file listing which domains to start along with overriding list of key-value pairs.

New bootloader class in dCache that parses these file.

The scripts call the bootloader to ask the values.

Is it possible to use the bootloader as a library for querying configuration.

Interface to the configuration: an abstraction. We could then add new configuration options.

Not changing how parameters are injected into dCache. That continues to be via arguments to the cell's constructor.

We should discuss where we want to go with the configuration system.

Might need to change only the second chapter in the dCache book (how to configure a new dCache system).

Chimera support in enstore

Timur: all of Vijay's patches to support Chimera in enstore repository. Imminent release of enstore that can run on PNFS or Chimera. All regression tests now pass.

Do you have a complete dCache + enstore test system?

Automatically detects which name-space is being uses.

The enstore server doesn't care: it can support multiple (mounted?) namespace.

What does the encp script return to the pool?

If enstore is using level-1 and level-4 and not URI then many improvements are not available.

Someone (Patrick / Tigran) will send the link to the documentation about the URI.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

RT 5424: Setting Java options for dCache SRM

RT 5461: pool startup problem - out of memory

Review of RB requests

DTNM

Proposed: same time, next week.