wiki:developers-meeting-20111012

[part of a series of meetings]

Participants

Tanja, Karsten, Antja, Paul; Dmitry, Al

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Tigran: Moving NFS code to Grizzly 2; installed on DESY. ~60% better performance than old code under heavy load (hammer cloud). After 32-clients/jobs there's a 60% performance boost with new license. Suspicion is that the new grizzly code sets number of threads to number of cores of machine.

Christian: SSH-2 admin code ~80% done. Nagios probes are now running + others.

Karsten: Posted patches remotely related to IPv6 work. Now fixing comments.

Antje: Documentation.

Paul: enstore

Patrick: preparing Padoa meeting.

Dmitry: continued to test Chimera backend; trying to generate load. Only unknown: schema tool. Tested pnfsDump lvl-4 support: it works. Continued end-to-end endstore+dCache+Chimera going well under high load. Making case to upgrade public dCache to upgrade to Chimera. Al has some questions about monitoring table for plots.

Al: packaged up implementation of billing database and submitted it for review .. Now trying to understand if we can use ??? to aggregated data using cascading SQL. Some of the code doesn't seem to be generating all the required tables (need oinput from Valdimir)

Tanja: tickets for people with tickets; talk with Dmia about feature requests. Talk with Tigran .. implmeneting a patch.

Gerd: just came back from vacation (Sat. morning). Spent first two days on Geante workshop; upgrading jline v1.0. Lots of new features.

Plans for patch-releases

Should we make a new patch release?

Trunk activity

Progress with new features...

Schema tool, but not liquibase. Start up dCache and in data nucleus creating switched on: thread to update and thread to use it. See two threads trying to get exclusive lock. "Jobs" table contains all request go. This table has a ~40 columns. Don't need to search all tables to find a request by ID.

If an installation doesn't have a billing database then one must be formed.

Tanja + Tigran had discussion about admin interface. Use-case of doing: cacheinfoof to get lisit of pools with the file; then

Created a single command in admin cell. repinfoof command gets the list of pools

It make sense to move commands into the general / overall shell. Moving admin commands into the

Better to have these in a one place.

Think about the interaction patterns: that's abstracting over the smaller parts that we have in dCache.

Get away from the Cell-centric and move towards high-level view.

The problem with current implementation is hopping through cells to get

Maybe going through all the existing cmds and hand-picking, think about what high-level

Another idea is to navigate the namespace after logging into the admin cell.

cd into a namespace and operate on files.

These would be high-level commands such as:

Transfers, Files (namespace) Space.

like with proc on Linux filesystem.

Single package FHS? We would change the name of package to be "dcache" rather than "dcache-server".

Could we just add "fhs" to the name? Yup, that should work for people who install manually.

Christian to find out whether we can change the name.

We can release just the Debian DEB that's FHS-compliant.

If we would need make the next Golden release in /opt and FHS.

Paul: we don't need to rename the RPM to protect from upgrading to the wrong layout.

Gerd: still prefer renaming the RPM.

What to do with tar-ball? Suggest packaging in a single directory with dCache-nnn and this is position independent.

Sounds good.

Debian

Debian version on our webpage?

Yes, probably.

Karsten to look into this; Paul to help.

Issues from yesterday's Tier-1 meeting

When reloading the metadata: strings are not interned. Except this doesn't happen when starting a pool. Pool runs fine, but when restarting it uses the memory.

We're doing cleaning of metadata during startup. Lists allocated and diffs

Due to backwards compatiblity, files are created custodial+nearline. Files are marked precious. If you suppress

Should check ls queue

Don't have any errors? No: not HSM-backed pool.

Should get back in touch with Nicolas to check defaults.

LFS-mode then it doesn't matter.

The point is consistency between different systems.

Is LFN set to something

Are there files in the flush queue?

Out of memory exception --> heap dump. Can we get this?

Run jmap to see how much memory jconsole + GC to see what jstat GC-run and see what happens.

Gerd to repeat an exercise

Could be stale metadata that is cleaned up.

The uid+gid is added to hash-map. The example from IN2P3 shows this info is simply missing.

Are they using a custom info-extractor? No, they dropped it.

storageinfoof in pnfs-manager.

How are they using the hoping manager?

PIC

Run with JMX

enable sumplink. Snapshot every n milliseconds to see which thread is burning CPU.

[aside: Problem with CLOSE_WAIT on pools due to port-scanner]

Triumf

Pools available for read if inventory isn't ready; however, the disk IO is so bad that you can't start a single mover.

To exclude this, ask Simon what webpage is saying about the pool.

Checksum

Request for checksum from end-users. Look at adding a new dot file.

Issues from EMI

Version number

The 2.2.x-series will be the next Golden release for EMI-2.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

 RT 6449: identifying "used else" files for linkgroups

Dmitry already replied on this ticket.

 RT 6561: dCache 1.9.12-8: WebDav? door with constant CPU activity despite doing nothing

Paul to chat with Gerd about this.

Tigran to try to reproduce the effect.

 RT 6688: feature request: collection of files

Postponed for a chat with Gerd.

 RT 6700: FHS and deb packages

Already discussed.

Review of RB requests

DTNM

Proposed: same time, next week.