wiki:developers-meeting-20110112
Last modified 10 years ago Last modified on 01/12/11 17:45:35

[part of a series of meetings]

Participants

Agenda

[see box on the right-hand side]

Postcards

Christian: Tracking the stability issues in our testbed that we also provide to EMI (asked questions to EMI concerning BDII). Adapting Bugmapping code to satisfy requirements. Working on changing voms servers entries in our virtual machines http://rt.dcache.org/Ticket/Display.html?id=6005, some problems restarting the machines after adaptation

Dmitry:

  • continuing srm scalability tests
  • produced Enstore HSM Storage info extractor for 1.9.5
  • fixing "Record Already Exists" error reported by UC
  • talking to CMS about parallel processing and pNFS

Tanja: bothering people w/ tickets; viewing patches; gPlazma patch (needs testing)

Tigran: mostly fighting with virtual machines. Tried kerberised NFS (force is strong with this one), poolmanager code .. trying to understand what it's doing; have gained partial understanding.

Karsten: submitted Argus plugin; started working on VO-role plugin.

Paul: info-provider, review patches, support tickets

Gerd: xrootd paper is looming, pin manager (now getting close to being a drop-in replacement).

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Plans for patch-releases

Should we make a new patch release?

1.9.11 is now in build. After, we will release 1.9.5. Removing cache locations on pool restart.

BNL will be upgrading in four days from now: 16--17 January.

Backporting space-manager pool-manager msg patch

See RT #5518, ...

People are generally in favour of back-porting the extra msg

Two linkgroups auth. Both can serve the transfer. This is an arbitrary decision. We could use the primary FQAN as a guide.

With implicit space reservation we have the IP, as the message comes from the door.

Can we really backport the patch? In 1.9.5, we still have the code that iterates over identities.

No this only for creating the namespace entry. The space-manager should only have the FQAN

If you have FQANs that allows you to use a space token; not bound to a path then you have a problem.

Is FTP sending only one FQAN to space-manager in 1.9.5?

At Copenhagen, we agreed to send everything, so removed the iteration.

Before FTP transfer, in the SRM, that this happens.

Implicit reservation: create a reservation as part of the SRM operation.

With spaceReservationForNonSrmTransfers=true then space manager will create reservation

Check whether Ron has GridFTP or with LCG-cr / -cp (i.e. with SRM).

Trunk activity

Progress with new features...

Dmitry's xrootd results

Tigran asked Gerd: do you understand why the performance of xrootd is lower than dcap?

Gerd believed this is due to the async behaviour of xrootd. It has this async reads.

dcap using blocking IO, one thread per connection.

Having one-thread-per-client is known to give better performance than async model for low number of clients however it doesn't scale to large numbers of clients.

CSC Helsinki: under high load (300--400 jobs) some (or all) transfers .. adding a dcap transfer increases the overall throughput but adding an xrootd transfer doesn't. This may be because they reached the ThreadPool? limits.

SocketIO 5, Disk-IO 20.

Xrootd: Have a handler that pushes the task onto a separate queue of disk activity.

On a pool, every request is an IO request, so separating them may not be necessary.

NFS: believe that, by default, O_NODELAY on socket is switched off.

NFS creates a pool of DirectBuffer? objects; this allows reuse of these objects. Doing this gave a noticeable perfoamcen boost.

What to do about the background image?

Batik and OpenJDK. See the Batik ticket.

--> I have removed the Batik directory in the ETICS build, it build successfully in EMI-0 (server version 1.9.5-22)

Tigran found another project like Batik: SVG Salamander, but it crashes /doesn't work correctly with our SVG.

Fix our SVG so it doesn't su

Use SVG file directly. Do we know how many browsers support this?

Firefox, IE, Safari,

Not showing bg image is OK, but must not crash.

Remove version number from background image.

Use inkscape instead.

Investigate browser support for SVG background images. Investigate fixing our SVG or Salamander to use it instead of batiks.

Pin manager

Gerd: there are two design decisions would like to get feedback.

Not collapsing pn requests =

Old pin-manager had concepts of pin-requests and pins. Multiple pin-requests could be collapsed into a single pin.

PoolManager can collapse stage requests, can merge read requests. Pool can collapse sticky flags.

New pin-manager doesn't collapse multiple pin-requests into a single pin.

Two tables: pin-request table and a pin table.

If you pin files twice, then you have two pins.

If something goes wrong; doesn't matter how many times the client retries, they only create new pin-requests, not directly affecting the pin.

If you have online files then avoid pinning them. This should only apply to tape files.

Pin-manager does other things that staging files:

it also checking if a file exists,

Trigger p2p to a client-readable pool

Aspects

AspectJ is introduced with this patch. It's really convenient: just a @transaction annotation. The only (small) issue is the singleton aspect, which is configuration through spring context. So we can't have two transactional cells running in the same domain. This can be fixed by having two class loader.

Issues from yesterday's Tier-1 meeting

Issues from EMI

  • Could you please have a look ate this page: http://trac.dcache.org/projects/dcache/wiki/Bugmapping and say if you see any problems
  • Platforms to be supported by EMI: there was a survey asking future EMI users what platforms they need (brought up the question to EMI SA2 to provide the results, the survey was not specific enough, it seems)
  • I have talked to Bjoern about adapting our request tracker in accordance to the requirements set by EMI guidelines, all changes that are shown on http://trac.dcache.org/projects/dcache/wiki/Bugmapping, these changes will not effect developers in any way, they are just needed to provide metrics to EMI
  • Testbed instability ticket: https://gus.fzk.de/ws/ticket_info.php?ticket=65512, trying to track down the instability cause, looked at BDII, which might be the source of problem, asked EMI about the BDII structure (lxbra2506v7 is setup as top level BDII) and if there is some validation of data that is pulled from our site-level BDII (waterford.desy.de).

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

RT 5847: Migration remarks(problems)

Review of RB requests

DTNM

Same time, next week.