wiki:developers-meetings-bnl
Last modified 13 years ago Last modified on 10/30/07 15:05:47

Tuesday meetings with BNL (last conference call : Oct 23)

Running Issues

PoolManager, resp Prestager (DummyStager) stuck on high load after the upgrade to 1.7.0-41

It seems we are a step further with this. BNL folks found out that there are users submitting a prestage for the same file in the order of 1000 times. This loads the Stager because it's not smart enough to notice that these are prestage requests for the same file. This then explains why it would run out of memory after a while. BNL folks will send log files to get this theory confirmed. In the meantime dCache folks will fix the Stager.

Prestager is fixed with release 1.7.0-47. There is a new prestager called ForwardStagerV1.java The batch file has to be modified as follows :

 create diskCacheV111.hsmControl.ForwardStagerV1 Prestager "${config}"   

Pools randomly reports to be OFFLINE

This problem is not understood at all. BNL folks will do the following :

  • find out if there have been network problems at the time, the pool disconnected.
  • Find some messages in the dCacheDomain log for the time the disconnect happened.

The dCache people will in return try to findout why the system doesn't reconnect after the network is back.

Pools reports to be offline in the Cell Web Page. The messages in the pool log look :

  10/15 11:14:37 Cell(c-100@acas0443Domain) : runIO :
java.io.EOFException
10/15 11:14:37 Cell(c-100@acas0443Domain) : java.io.EOFException
10/15 11:14:37 Cell(c-100@acas0443Domain) : java.io.EOFException
10/15 11:14:37 Cell(c-100@acas0443Domain) :     at
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2498)
10/15 11:14:37 Cell(c-100@acas0443Domain) :     at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273)
10/15 11:14:37 Cell(c-100@acas0443Domain) :     at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
10/15 11:14:37 Cell(c-100@acas0443Domain) :     at
dmg.cells.network.LocationMgrTunnel.runIo(LocationMgrTunnel.java:283)
10/15 11:14:37 Cell(c-100@acas0443Domain) :     at
dmg.cells.network.LocationMgrTunnel.connectionThread(LocationMgrTunnel.java:202)
10/15 11:14:37 Cell(c-100@acas0443Domain) :     at
dmg.cells.network.LocationMgrTunnel.run(LocationMgrTunnel.java:347)
10/15 11:14:37 Cell(c-100@acas0443Domain) :     at
java.lang.Thread.run(Thread.java:595)
10/15 11:14:45 Cell(lm@acas0443Domain) : Problem in 'whereIs' request :
java.lang.Exception: Request timed out
10/15 11:14:46 Cell(c-100@acas0443Domain) : No (valid) reply from lm
10/15 11:15:00 Cell(lm@acas0443Domain) : Problem in 'whereIs' request :
java.lang.Exception: Request timed out
10/15 11:15:01 Cell(c-100@acas0443Domain) : No (valid) reply from lm
10/15 11:15:15 Cell(lm@acas0443Domain) : Problem in 'whereIs' request :
java.lang.Exception: Request timed out
10/15 11:15:16 Cell(c-100@acas0443Domain) : No (valid) reply from lm
10/15 11:15:30 Cell(lm@acas0443Domain) : Problem in 'whereIs' request :
java.lang.Exception: Request timed out
10/15 11:15:31 Cell(c-100@acas0443Domain) : No (valid) reply from lm
10/15 11:15:45 Cell(lm@acas0443Domain) : Problem in 'whereIs' request :
java.lang.Exception: Request timed out


Not all grid ftp doors are used

We need to know :

  • the network configuration
  • the output of the LoginBroker 'ls -l' command.
  • Do all door appear in the output and do the reported IP numbers match the actual expected IP number of the pools.
  • NOTE : BNL is using the DNS trick to tackle the inside/outside door mechanism.
  • Iris will investigate.

Transfers missing in Billing file

It seems that there are regular (failing) dcap transfers missing in the billing file. It would well be that those transferes failed because the pool went down. The billing can not report on those failing transfers because the pools send the billing information at the end of the transfer, which it can't if the pool goes down before.

BNL has trouble with a DN with some weird E-mail address

We will consult Ted on this issue.


Last modified by Patrick @ Sat Feb 27 16:43:09 2021