wiki:developers-meeting-20100707
Last modified 11 years ago Last modified on 07/07/10 18:10:53

[part of a series of meetings]

Participants

Jan, Paul, Tigran, Owen, Tanja; Timur, Dmitry; Thomas

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Owen: mostly dCacheConfig2 (the next generation)

Tanja: playing with Hudson, learning JMX, tickets

Tao: last week working on plotting module. We going to use some kind of pipeline format; expandability, decided the architecture.

Timur: I continued working on gPlazma. Added new functionality to Paul's patch and added door-specific. Also, worked on a couple of bugs: deadlocks in SRM (Caused by simulataneous locking at different levels), recursion on constructors in SRM ls operations.

Dmitry: article, from 2008, "if you use emacs you must stop it!" Playing with IDE (NetBeans?). A couple of patches in space manager, using cell-stub and other purely decorative. Tickets. Updated gPlazma status page.

Thomas: continuing to work on Big Xrootd Refactoring. Took a while due to how xrootd protocol provides vector reads. Small change to Chimera to implement recursive mkdir.

Jan: did two new pages. Poolgroup page in web-admin, currently read-only but one could easily add support for configuring poolgroups. Also added the possibility to download info XML.

Q: what to do with SRM-Watch? Not sure how to itegrate this at the moment.

SRM-Watch uses the Jaida graphics package, part of JAS (Java Analysis ?).

Also looking at profiling, using HProf as extra option of JVM.

Paul: talk at XFEL, usual stuff.

Tigran: Releases: 1.9.5-20, -21, 1.9.9-1. We are going to have 1.9.7-3 and 1.9.8-2. Some stuff that is merged; but need to check what's missing. Also transfer knowledge on how to do the release process to Tanja and Antje. Still investigating performance issues with NFS. Also wrote some perf. stats classes that publish info via JMX.

Cell tunnels are independent of transport: we can change from TCP to JMS. A change of tunnel will use JMS instead of TCP. Each domain would have exactly one tunnel connecting to the JMS "cloud".

Next release (1.9.10) will use JMS tunnels only.

After 1.9.10 released then make cells aware of JMS features (e.g., pub/sub.)

Plans for patch-releases

Should we make a new patch release?

Patch from Timur ... the problem that Gerd discovered affects only dCache instances where asynchronous SRM ls is configured. We don't know if any sites are using this; but it's unlikely as lcg-utils doesn't support async. SRM ls.

Is it only related to SRM ls? Yes, only the execution of SRM ls. Tigran says "OK".

Should we stage it through the branches? The patch is in the release 1.9.9-1.

Let's keep it on hold until someone complains. The fix is in 1.9.9 and trunk so all future releases will have the patch.

Trunk activity

Progress with new features...

gPlazma

Implemented changes related to strategies. Patch is do not hard-code behaviours ..

After this, will implement a LoginStrategy? that will use the GPlazma class to do the login process.

Hope that Jan will be able to test it with his web-admin.

Timur to ping Jan when the new GPlazma2LoginStrategy (sp?) is ready for testing.

Who will implement plugins

Tigran has a NIS plugin that he'll tidy up and put on RB.

Timur to write gPlazma plugin to wrap old-gPlazma; then need to work on CDF upgrade.

We will ship with old gPlazma as the default for next couple of releases.

We could use the wrapper plugin to allow us to ship with the old gPlazma as default.

xrootd stuff

Tigran: What is your primary goal of new mover?

xrootd mover opened a new port.

Goal is to share the same port.

Tanja and Tigran working on changing how ports are handled in movers.

Need to find the right mover.

Opaque information sent to client by door and relayed to the pool. Same information sent to pool by door as part of the ProtocolInfo.

We rely on the (xrootd) client passing the information.

Thomas is not working on the pool code that creates a new mover.

Tigran: with the new pool code, the pool decides whether a IO request is allowed (or not). Depending on mover-type, might be mover per thread or a single component handling all requests.

Another goal is that a mover doesn't consumes a thread.

Tigran: this is similar to what Tanja is working on.

Thomas: focus on the single-port part of the activity; later on, work on allowing xrootd mover use a single thread or a threadpool.

Issues from yesterday's Tier-1 meeting

Did Simon every recover the lost SRM space manager entries?

CDF dcap clients

They have a client that is ~1 year old. Is there any reason to upgrade their version?

We released (about a year ago) a client that fixed an issue with the read-ahead. Other than that, nothing changed in the dcap functionality.

Tigran could have a look for the RT ticket about this.

CDF client is older than a year then they may.

There may be an issue from OpenSSL library: people can have custom SSL library and linking against libdcap. This is only about using the gsidcap dcap plugin/tunnel.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

RT 5696: gPlazma issue with SCAS, dcache version 1.9.5-17

Timur to look into it. He'll ask Ted.

RT 5699: How to test for sanity of dcache databases

Dmitry I sent email in Dec. 2009 describing the enstore "xml2ddl" tool. This allows one to extract/reverse-engineer the database schema from a live database. This can be diff-ed against the result from a "golden" master database to discover any inconsistencies.

Doesn't support checking functions/triggers/.. so functions and triggers are always; in enstore, these are always inserted afresh on an upgrade. This means one cannot run the tool on a live system.

Is "sanity" just the schema, or is it also data consistency?

Timur: Neha is the OSG contact person for dCache support. When someone says "my dCache doesn't work", she doesn't know how to diagnose the problem.

Owen: just checking whether the required databases exist would be a good start.

We could implement a pre-flight check: dCache would refuse to start if databases are in an "insane" state.

This would require parsing configuration file, to support remote databases.

Components should get a "data source" ...

Maybe Neha has some specific requirements or ideas what kind of checks would be appropriate as a priorities wish-list.

You see tons of SQL exceptions, which trains our users to ignore exceptions. Perhaps these could reduce the number of these?

Toplink still tries to create the tables, resulting the error messages in the log files.

RT 5704: Deadlock (sort of) during SRM restart

Agreed not to merge, unless people complain.

RT 5711: filling database from billing log files

Timur: would be nice to have, but no one has the time to implement it.

We will talk to people and see what they're doing and what we need to do.

Review of RB requests

DTNM

Proposed: same time, next week.