wiki:developers-meeting-20090729
Last modified 12 years ago Last modified on 07/29/09 17:50:04

[part of a series of meetings]

Participants

Paul, Gerd, Macro, Timur, Patrick, Owen, Dmitry, Alex, Vladimir,

Agenda

[see box to the side]

Very high memory consumption of the 1.9.4-2 release

Problem is high memory consumption relative to the v1.9.4-1. Gerd had to downgrade again.

Upgrade from 1.9.4-1 to 1.9.4-2; almost immediately died Out-of-memory exception; increase heap and ran for 4 hours then OOM. Peaks in CPU usage with no corresponding increase in IO rate.

Found 35,000 entries for Kors in SRM and Pin manager store credentials.

Should store 1 entry for 1 DN + Role.

Does it affect other branches? Don't know: nobody has reported anything.

Zuiten noticed that during Step'09 they had to restart the SRM every hour.

In the SRM database, the index of the table is an ID number.

Patrick to forward details on how to check this problem to colleges at Zuiten.

Gerd is using a single database for pin manager and srm

Is this related to the memory used?

This affects only the SRM domain? Yes, but we didn't update the other nodes.

The changes have been propagated

Gerd has *not* checked that every SRM transaction results in a new entry in the table, only that there are a large number of entries.

Gerd to look into

Are the entries are removed after a transaction has completed? No: it's a cache.

If people aren't removed from this cache; how is revocation handled?

The revocation lists should have this information; if someone is banned then their authentication will fail.

Kors Bos entries in the database:

Oldest is from April 2008 Newest July 29th 2009 Average July 25th 2009

Gerd to send Dmitry a list of Kors Bos entries.

Can you supply a pg_dump of the table? Gerd to email this.

After some more investigation Gerd noticed that the large number of entries started a week and a day ago: so 1.9.4-1 update.

Update on security issue

Four customers

Tier-1s

Seem to understand that they have to upgrade.

Some sites are still running 1.9.0 (Triumf; BNL) Triumf will update SRM to 1.9.1. BNL are not planning to upgrade.

EGEE and OSG

With gLite people, we agreed security issue is "low" and, as soon as gLite comes out with a fix, then they're

Problem from OSG. Now they are complaining that we have been reporting this issue.

Timur said we explaining to Ruth that there was no point in making it public before we had a patch available.

Tier 2s

Informed via user-forum; updates are available.

Certified Owen reported that new release is certified; on

Issues from Tier-1 meeting

Solaris dcap client

Owen has a Solaris machine to build on. Update on Monday.

Outstanding outstanding RT tickets

These ones in particular:

RT #4684

Geuianni and Owen discovered that they couldn't certify srmcp due to the client hanging. The client could write but couldn't read files. Setting the port range allowed the operation to succeed.

The GridFTP that allows parallelism doesn't allow passive or active.

Owen feels that a warn be emitted when incorrect values are used.

If number of streams and passive is specified then throw an error.

Default is currently documented as "passive", which

Owen talked with Gerd about GridFTP 2.

All the clients need to be updated: srmcp, lcg-cp, ...

Server supports Mode-X but our client (srmcp) doesn't support Mode-X.

Review of review requests

  • RB#305 improvements for 1.9.3 concerning 'chimera' database

ACLs and PNFS-migrated Chimera

Release notes

Some sentence problems with Replica Manager fixed. What was this?

The same issue we had some months ago with the PoolManager: if reply-required not set then PoolManager

Replica manager now works: fix was in Cells Msg constructor.

DTNM

Proposed: same time, next week.