wiki:developers-meeting-20110209
Last modified 10 years ago Last modified on 02/09/11 18:26:31

[part of a series of meetings]

Participants

Christian, Tanja, Antje, Paul, Tigran, Karsten, Patrick/2, Dmitry

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Paul: Lots of things,

Tigran: merging patches for a release, helping with ETICs, and Karsten. Fixed Chimera CLI.

Karsten: cleaning up gPlazma stuff. References with authentication classes. Preparing for the DESY WLCG meeting.

Patrick: mostly organising Goettingen and WLCG workshops.

Dmitry: Testing scalable SRM, and switched to new pin-manager. It does fail in some cases. Waits for a connection from connection-pool, if this times out then there's a huge stack-trace; got steady rate of 150 Hz on quad-SRM instance with Chimera. Some work on tidying up Catalin's

Preparing for interview: the new Timur. Putting together a list of questions for him.

Operational issues with CDF dCache. Updated to 1.9.5 Oct 2010. In last month see once per week, kerberised dcap doors running out of memory. Have 256 MB allocated for kerberised dcap doors, so maybe this is natural, although it's suspicious.

Christian: doing test-bed stuff: trying to figure out how to organise things. Started writing the test and validation plan for EMI, using their template. Helping with the Xen machine migration.

Tanja: tickets, NFS stuff, discussion with Tigran.

Antje: working on the book: first set of comments using the new comment system, which have now been fixed. Testing Chimera CLI.

Plans for patch-releases

Should we make a new patch release?

Planning to make a release, but the Xen infrastructure has not been stable enough.

Promised to move from iSCSI to FibreChannel within a couple of weeks.

We can teach Hudson to stop and start machines as needed, but the problem is that we need a period of stability to make the tests meaningful.

We can check availability using a simple ping and monitor this with Nagios (but not the night-alarm). Operators can be provided with a set of procedures to recover from a reboot.

No releases until we have stable machines with which to test.

Tigran: I've tested 1.9.11 for a few days, but it isn't clear whether we have resources to do the release without the build machine(s) dying. To release, one needs to run three builds (Linux-RPM, Solaris-pkg, Linux-tgz) at the same time. If one platform crashes then that build is useless.

Tests are S2 and G2 (wrapped).

Trunk activity

Progress with new features...

Tigran: trying to make ACLs work with NFS. Allowing people to set ACLs through NFS-mounted filesystem. Not sure how much this will depend on gPlazma .. will definitely hit it at some point.

Issues from yesterday's Tier-1 meeting

FZK

After changing the hardware, they didn't migrate the database-schema but the indexing wasn't there.

After startup, the space-manager didn't bring up the database how Karlsruhe expected it. So there may be a missing index (or foreign-key) in a table.

We should ask Gerd what he did to fix the problem.

If you have a database and you want to move it, you do pgdump/restore. You don't start SRM and migrate the data: this inefficient.

There are a varienty of tables like *Request. Those are transient and the data isn't so critical.

A site can put different services in different db, but if not then the simplest thing is to dump/restore the whole db.

Ask Gerd to open a ticket to describe what he did to fix FZK: (with a nice specific title :-)

How do deal with pools that are gone?

If someone steals a whole pool, how does a sys-admin decommission the pool?

Also, if this happens, we need to get a list of files that are gone. This is for the EMI tool that registers that these files have been lost in the experiment catalogues.

Problem with RT

If an RT ticket is in the "bug" or "request" queue then no emails are sent.

Tanja has gone through all tickets affects and sent emails explaining the problem.

Do we want this for all queues? Yes: if commenting (so ticket submitter is contacted) should result in an email being sent.

"feature" and "back-ports" are still affected by the problem, but we just need to tell Bjoern that we want these queues fixed, too.

Hero's problem

RT ticket #6107.

Dmitry to chase this up. Suggestions are: upgrade CRL, to try with the gridftp commands (as this may allow more detailed/useful error messages)

Issues from EMI

the GGUS ticket

  • GGUS tickets will be evaluated and assigned by the 2nd level SU Deployed Middleware Support Unit (DMSU), so we will not be able to directly assign tickets to other Support Units (SU) --> ticket GGUS #63769

They are asking us to close this ticket when we understand this.

What do we do if the ticket is misallocated to us? (e.g., if a ticket is really a DPM problem).

Christian to investigate.

Tanja noticed that this ticket didn't trigger a new ticket in our RT system. this could be a problem.

Tanja to contact Sven to investigate this.

Who to contact about questions with the GGUS? Try EMI's EMT.

RT ticket system

Reminder DO NOT use the states: 'closed', 'not-cert.', 'certified'.

When finished, a developer puts ticket into 'resolved' is still the last state from the developers' point of view

See this page for details.

Remember to put RT ticket links in the SVN commit !!

ETICS

  • Still problems building dCache server in ETICS (maven related)

Test machines

  • SL 5, 6, Debian --> EMI early adaptation

Need four machines.

Moving everything into "/usr". Gerd is looking into this, reviewing what Thomas has done for Debian.

EPEL --> "Extra Packages Enterprise Linux". They have a rule that you cannot provide newer versions of what EL already provides: no upstream packages.

Managing databases

Dmitry has sent an email with his view on the subject. The tool that manages database scheme should take a DDL definition and generate the diffs.

Tigran: I'm not pushing liquibase: I used it and

Liquibase can do reverse-engeering on existing databases. There are plugins for eclipse and intilliJ that allows you to create these XML. I like that it works. It's in the NFS. Every time starting Chimera under test conditions it successfully created the db.

Gerd wants a unified way of managing databases in dCache.

Dmitry says he's happy with liquibase tools.

Tigran: the nice thing is you're not bound liquibase. You're code doesn't need to know about it.

We can enable auto-create data-nucleus then

We can create a liquibase that can recover from a table being deleted by creating changesets that are fine-grain enough.

Tigran: don't want us to bind to a particular tool inside the code. Dmitry & Paul agrees.

Dmitry: I use proper tools for creating database (Umbrello / Eclipse plugin). Can use other tools, but we should use something that has good tool support. Any code that is database-specific should be removed from dCache.

If everyone used different tools for db-schema management then our ability to provide support is compromised.

Shall we choose liquibase as a standard tool? Yes.

Do we need to make plans how we migrate to using liquibase?

Do we have requirements (from EMI) to run on any other database? No.

Dmitry: As soon as we come up with a new schema for storing SRM requests then that would be a nature time to migrate to using liquibase. However, it's a waste of time to work on existing code and migrate that to use liquibase.

gPlazma: keep gPlazma-1 as-is (using top-link), but gPlazma-2 should use liquibase.

Chimera: is currently not using liquibase.

Transfer-manager: (was, jpox but this became data-nucleaus) uses this for logging. (picked up by srm-watch).

Do we want schema modification happening automatically or allow sys-admin to do this automatically.

The use-case for doing schema migration externally is from "property" done database security. Sys-admin will have three database-roles: one that is allowed to select, a role that is allowed to select, insert and update, and a role that can upgrade schema.

Do we keep with the option: auto or manual upgrade? Yes.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

PinManager

About a month ago we wanted to decide whether we should make new pin-manager the default.

Dmitry: lets test it first!

The test were successful. New pin-manager survived at 150 Hz srmGet requests. The bottleneck is the database; not just the pin-manager but SRM, too. This isn't a worry, as technologies to allow horizontal scaling of the database.

Review of RB requests

DTNM

Same time, next week.