wiki:developers-meeting-20101208
Last modified 10 years ago Last modified on 12/08/10 18:13:10

[part of a series of meetings]

Participants

Karsten, Antje, Paul, Tigran, Christian, Tanja, Gerd, Dmitry, Gene, Thomas.

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Tanja: completed patch for NFS data-server. Restart door and client doesn't come back this is how we clean up the movers (still needs verification). Timeout manager still works for NFS: Yes! (probably). Tickets; WebDAV with different clients (problems with clients?).

Christian: fixed password issue in ssh server; sshv2 in RB. EMI 100% build in ETICS. Prepared some EMI web-pages. List how well we follow the EMI "guidelines". At lot of discussion at the moment. Plus other EMI stuff

Tigran: chimera (finally) merged into dCache code. Debugging doors to find out why they're (keep) dying. Originally after 4 days; now dying a couple of times per day. Started moving xrootd mover to threadless mover (exercise to verify framework OK). Patch so pool sends record so we don't need to send one from the door.

Fermi: can we package Chimera as an RPM without dCache?

Two options: we can repackage so we have chimera only or use Tigrans playground.

Paul: EMI work.

Antje: Documentation.

Karsten: working on the VO-role mapping for gPlazma2. Looking good so far. Some work towards EMI all-hands meeting. Probably have to postpone hosting this meeting.

Thomas: work on some smaller dCache stuff. File some issues with reading and writing in the xrootd pool. Fix gsi-auth patch in RB. Try to do some reviews myself.

Dmitry: trying to ignore tickets. working on ENStore-dCache integration. New patch on ReviewBoard? to update info-extractor. This is against 1.9.10. Would like to backport this to 1.9.5 as there is a switch (default off) to enable the behaviour.

Gene: we have an opening for a Timur replacement.

Gerd: we reorganised priorities .. user-mapping is delayed as now 50% on non-WLCG activities. Working on pin-manager (due to issues with ATLAS transfers). Working on a major restructuring of the components. Last week was NDGF all-hands meeting.

Dying GridFTP doors

Far too many threads in the door. There looks like there's too many thread-local memory, but this isn't clear.

Java io library throws OutOfMemoryError? if run out of DirectBuffer? memory. This doesn't trigger the auto-stacktrace (cmd-line option). Heap is 8GB.

Looked at JVM code java.nio.Bits. If there's not enough memory then does System.gc(). This behaviour is in at least, 1.6.0_16.

If this fails then OutOfMemoryError? is thrown.

It's always ModeEProxy that triggers the problem; but that could just be because that mode is more popular.

Still trying to find out whether the JVM is buggy or has something changed.

400-threads sounds a bit too many. 400 threads, this could be with 20 streams; but only uses a thread per door (uses a selector), so this isn't likely to be the cause.

This could be because of increased FTP load (CMS are pushing lots of data to DESY).

db schema synchronization in stable branches

If we change schema in 1.9.11 .. 12 .. 13 that we ..

Why do this? The upgrade to liquibase has to happen at sometime. Why do this in the stable branch?

Look at Gerd's patch for pin-manager. This isn't so hard.

Tigran: we should exercise this. What happens if you have existing database and start liquibase. Liquibase can have preconditions to check whether a change-set has already been applied (i.e., it's an existing database).

Changes outside liquibase can be a problem.

Suppose we have a 1:1 correspondence between class and table and we want to add a new column. In a new version that

Can we validate the database schema? Not really .. partly becuase files are executable.

For Chimera we can break the changes into small bit-size chunks and put guards to check if the change-set has already been applied.

Utility commands that know about the different database schemas and can report if the schema is up-to-date. We can then use this to validate database schema, list schema versions, update them, etc.

Should be safe to run, provided the database is transactional.

But would like to protect against site-admin installing the wrong version and the database schema is automatically updated.

Schemas in special place for pin-manager, or all schemas together?

Would be nice to have component-specific XML files; but having some different hierarchy for these.

Liquibase's best-practice is to have a chainset-file for each release of dCache.

Plans for patch-releases

Should we make a new patch release?

Lot of stuff for 1.9.5 so we should release something, either end-of this week or beginning of next week.

Backport of enstore: ability to change permissions on the tags is reserved so only root can do this. It is fixed in Chimera, but the fix is not available in 1.9.5. This looks like it isn't a problem. This feature may be used by user-base who are using enstore directly, but isn't needed for

Trunk activity

Progress with new features...

fall-out from merging Chimera

Property files change means the command-line tools for Chimera don't work. They are expecting to see an XML configuration file; which doesn't exist.

Common start to scripts?

Paul to look into this; inject code as part of build process.

Chimera config

Patch into

Fermi migration

Jon is planning to move to Chimera in Fall 2011; and have all the nice improvements to SRM (making it more scalable) in a separate 1.9.5-like branch.

Isn't PNFS going to be the bottleneck?

Additional benefits include resilience against single-point-of-failure.

But, we need to check this. It could be that, with PNFS backend, there's no benefit from multiple PNFS instances. The previous testing didn't test this specific aspect.

The results may cause Jon to adjust he plans.

May not be so easy to back-port SRM changes to 1.9.5

billing plots

Last week we deployed Tao's billing plot software.

The problem is there's a link to "billing plots". Second is release notes say "new feature: billing plots".

Given current situation at Fermi, perhaps we should remove the link.

Product is rather "raw", so we shouldn't advertise it.

Problems include:

  • two Decembers in 2009
  • data misaligned against time-base
  • y-scale was unlikely.

For the accounting system they use in the nordics, they use Protoviz to generate graphs.

GWT: Javascript is autogenerated.

Graphical visualisation

It makes RPM 15 MB bigger with not functionality; the dependencies (Batiks, etc.). It also helps remove our dependency on SunJDK.

Tigran to submit patch to remove it.

Dmitry

Syn. enstore work for 1.9.10 back to 1.9.5.

Web stuff

Web-admin: who's maintaining it? Karsten.

Tigran: within

Why are we here? Is wicket the right choice of framework?

Grails is running inside the JVM, is the equivalent to Ruby-on-rails .. (== Groovy-on-Rails). This was bought by Spring-source, so it's now a spring product.

One page seems a lot of code in wicket.

Time spend on the admin functionality and the result doesn't match.

What should we do?

Heavy weight comes from doing so much inside Java, not from a separation of concerns.

Perhaps we should step back and re-evaluate wicket against other solutions (e.g., grails).

If a site has requests for information and this takes half a year then something's wrong.

Some things in the admin interface is awkward (for finding out some information). Fermi are mostly using scripts Jython.

Different audiences: admin interface is good for some people; web-interface is useful for the more casual sys-admin/users. Don't see web as a replacement for admin-interface.

SRM watch

Would it be worth investing effort in fixing this? Yes.

This happened after the security problems. Dmitry to have a look at fixing this.

Issues from yesterday's Tier-1 meeting

Issues from EMI

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

Caused by IN2P3 having half their FTP doors replying on GEANT network, even when the incoming connection was on the OPN. FZK filters on incoming packets on the GEANT interface.

Likely due to the asymmetric at IN2P3: incoming connections on GEANT could be OK.

Review of RB requests

DTNM

Same time, next week.