lebowski
26th March 2010, 11:55
Hello everybody,

I hope someone has a hint for me with this problem I am facing:

I have a cluster and want to switch from one node to the other.
After Oracle and LN got switched to node 2, everything seems to work fine: you can log in, many sessions work without any problems.
However some sessions start running and get error 505 after a while:
"No server specified in tabledef or server cannot be started".
After switching to node 1 again, the error is gone.

In the log.bshell I find besides the error 505 an entry with error 2 (No such file or directory) and the bshelll is looking for $BSE/lib/user/r<username>.
But this user is no remote user and when LN is runnning on node 1, the bshell doesnt look for the r-file.
The filesystem, on which $BSE is residing, is switched, so no files can be missing.

Any ideas ?

Best regards,

lebowski

Han Brinkman
26th March 2010, 13:29
if the BSE_REM variable is set. That's the reason why it's looking for the r<username> file.

Regards,
Han

lebowski
26th March 2010, 15:40
Thank you Han. If I do an "env" (it's Unix, forgot to say that) there is no entry BSE_REM.
In the LN resource files it can not be set because $BSE is the same on both machines.
:confused:

Han Brinkman
26th March 2010, 16:09
And it's not defined in bse_vars either? Are the entries in tabledef6.2 the same? Can you connect to oracle with sqlplus using the settings in tabledef6.2?

Regards,
Han

lebowski
26th March 2010, 16:33
Hello again Han,

yes, it is not defined in bse_vars. The whole filesystem on which $BSE is installed, is unmounted from server 1 and mounted on server 2 after the switch. So also tabledef6.2 is identical.
sqlplus is worth a try, but I can not test that without switching again...
The users can login and work, the error 505 appears only in some sessions (unfortunately the important ones).
Thank you for your ideas.

dave_23
26th March 2010, 16:48
Hello everybody,

I hope someone has a hint for me with this problem I am facing:

I have a cluster and want to switch from one node to the other.
After Oracle and LN got switched to node 2, everything seems to work fine: you can log in, many sessions work without any problems.
However some sessions start running and get error 505 after a while:
"No server specified in tabledef or server cannot be started".
After switching to node 1 again, the error is gone.



Which sessions? is it consistent?

Dave

lebowski
26th March 2010, 17:23
2010-03-24[19:49:42(UTC-01:00)]:E:user005: user_type N language 3 user_name user005 tty ote locale ISO88591/NULL
2010-03-24[19:49:42(UTC-01:00)]:E:user005: session: "whinh2120m000";object: "whinh2120m000"; company number: 100
2010-03-24[19:49:42(UTC-01:00)]:E:user005: query: "select whinh204.orno: order.number
2010-03-24[19:49:42(UTC-01:00)]:E:user005: from whinh204
2010-03-24[19:49:42(UTC-01:00)]:E:user005: where whinh204._index1 = {:i.order.origin,
2010-03-24[19:49:42(UTC-01:00)]:E:user005: :i.order,
2010-03-24[19:49:42(UTC-01:00)]:E:user005: :i.order.set}
2010-03-24[19:49:42(UTC-01:00)]:E:user005: and whinh204.acti = :i.activity
2010-03-24[19:49:42(UTC-01:00)]:E:user005: and whinh204.appl = tcyesno.yes
2010-03-24[19:49:42(UTC-01:00)]:E:user005: as set with 1 rows
2010-03-24[19:49:42(UTC-01:00)]:E:user005: "
2010-03-24[19:49:42(UTC-01:00)]:E:user005: Errno 0 bdb_errno 505 (No server specified in tabledef or server cannot be started)
2010-03-24[19:49:42(UTC-01:00)]:E:user005: user_type N language 3 user_name user005 tty ote locale ISO88591/NULL
2010-03-24[19:49:42(UTC-01:00)]:E:user005: session: "whinh2120m000";object: "whinh2120m000"; company number: 100
2010-03-24[19:49:42(UTC-01:00)]:E:user005: query: "select whinh204.orno: order.number
2010-03-24[19:49:42(UTC-01:00)]:E:user005: from whinh204
2010-03-24[19:49:42(UTC-01:00)]:E:user005: where whinh204._index1 = {:i.order.origin,
2010-03-24[19:49:42(UTC-01:00)]:E:user005: :i.order,
2010-03-24[19:49:42(UTC-01:00)]:E:user005: :i.order.set}
2010-03-24[19:49:42(UTC-01:00)]:E:user005: and whinh204.acti = :i.activity
2010-03-24[19:49:42(UTC-01:00)]:E:user005: and whinh204.appl = tcyesno.yes
2010-03-24[19:49:42(UTC-01:00)]:E:user005: as set with 1 rows
2010-03-24[19:49:42(UTC-01:00)]:E:user005: "
2010-03-24[19:49:42(UTC-01:00)]:E:user005: Errno 2 (No such file or directory) bdb_errno 0
2010-03-24[19:49:42(UTC-01:00)]:E:user005: Log_mesg: Error 2: cannot stat '/baan/bse/lib/user/ruser005'.


Hello Dave,

mainly this session. But also tisfc0120s000 with an select on tcibd000.
Could this be SLM related ? The error doesnt sound so, but it has to be something that is locally installled on server2 (and /usr/slm is).

Regards,

lebowski

dave_23
26th March 2010, 17:27
hmm, that error certainly doesn't make it look SLM related.

can you access that table via GTM when you're failed over?

can you open ottstpshell when failed over and when not and do a "set" and an "env" and compare between the two servers?

Also, DBSLOG might be helpful

-- -set DBSLOG=01570 whinh2120m000

Dave

lebowski
26th March 2010, 17:53
Thanks for your opinion and giving me material to work with.


lebowski

ungerm
13th May 2010, 16:24
Hello,

how did you resolved the problem? I have the same one ..... I appreciate any advice.

Thank you

Martin

lebowski
13th May 2010, 20:40
Hello Martin,

the error was caused by an entry in $BSE/lib/audit_hosts.
Here the hostname of node 1 was present.

Solution: Create an empty audit_hosts file.

Regards,

lebowski