Pages

Wednesday, November 7, 2012

My Experince with Weblogic Server Startup Problems


Here is the list of common error messages and their solution which I have faced during WebLogic server start –up

Note:  Please don’t treat this solution as final solution, whatever solution I have mentioned it’s came from my experience and various sources with whom I have worked. The mentioned solution worked for me for the environment where I was facing the problem but does not necessarily that it will solve the problem what you are facing. Just treat given solutions as guideline.

Also if it is a production system then please make sure before making any changes, take a backup of files which you are modifying. 

1         ERR: transport error 202: bind failed: Address already in use


Starting weblogic with Java version:
ERROR: transport error 202: bind failed: Address already in use
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)



1.1       Root cause

Above error comes due to port usage problem, if during server stop/kill process if the port which weblogic server is using not reclaimed by OS and user again issued the command to start the Weblogic server instance then servers try to bind the same port while another process is already running on that port and Weblogic server fails to start.
 

1.1       Solution:


This error mostly comes when user forcefully kill the server, so before restart please check all the ports status which weblogic using e.g. http port , ALSB_DEBUG_PORT and DEBUG_PORT  by using this command -
netstat –an | grep <port no> # to see which whether this port is active or not.
e.g. netstat –an | grep 9001
If you get the result from above queries then please wait and give some time to OS to reclaim this port, most of scenarios Port will be get free after some time and you can restart the servers again.
In case above port is not getting released it simply means there is some process holding this Port, that process could be existing Weblogic PID or some demon thread, engage your Unix Administrator to find out that process and kill the process.
Also we can bounce the whole server to release this port.
In case of if this problem is coming for ALSB_DEBUG_PORT and DEBUG_PORT , then you can change this port number as well since these port are internal to Weblogic and does not used by any external system communication.

2         Err: Error occurred during initialization of VM

2.1       Root cause:

You would get better help for this on WebLogic Server or a Java forum. The above problem is a fairly typical JVM and operating system interop issue. On startup JVM tries to reserve a contiguous block of virtual memory for heap and permgen. If it is unable to do so, it fails to start. On 32-bit operating system, virtual memory address space is 4GB, but drivers and operating system processes can breakup the available space into smaller chunks such that Java is not able to allocate a contiguous block that it requires. Any number of things could of caused this to start happening all of a sudden. Anything from a Windows update to installing new hardware or software.


2.2       Solution:

Several possible solutions...

  1. Tries to reduce memory allocated to WLS java process. Move down small increments at a time until the server starts e.g. to 2gb/1gb/512m. Perhaps you can get your app to run with reduced memory.
  2. Switch to a 64-bit operating system and a 64-bit JVM. Note that you don't need more than 4GB of actual ram to get a benefit from a 64-bit environment in this case. It is the size of the virtual memory address space that counts. You will need to seek advice on WebLogic Server forum regarding configuring WLS to run with a 64-bit JVM.
  3. Switch to running WLS using Jrockit. I do not believe Jrockit has the contiguous memory requirement.
  4. Do some low-level debugging to identify which drivers or dlls are loaded where in memory and uninstall offenders or attempt to move them. You can find information on this via Google, but I don't really recommend doing this unless you enjoy low-level debugging and have some experience with it.
  5. If above error is coming due to less memory allocated while your JVM need more then try increasing the memory as well in incremental mannerk solution could be increase the Min and Max size of JVM if your server have enough Memory , as a best practise Min and Max size always should be equal so that during Initiliation itself JVM will reserve that much memory.
In setDomainEnv.sh file find out this properties ‘EXTRA_JAVA_PROPERTIES’ and add this lines “-Xms3072m -Xmx3072m” or “-Xms3g –Xmx3g”
Or in Weblogic Admin server console, click on server, go to “server start” tab page and specify JVM parameter “-Xms3072m -Xmx3072m” or “-Xms3g –Xmx3g”
Verify the .out file whether changes is getting effected or not.


3       Err: Could not obtain an exclusive lock to the embedded LDAP data files directory



<31/05/2012 4:32:08 PM EST> <Error> <EmbeddedLDAP> <BEA-171519> <Could not obtain an exclusive lock to the embedded LDAP data files directory: /hta/home/fusion/osb_home/osb_install1/user_projects/domains/vhaosb-dev2/servers/AdminServer/data/ldap/ldapfiles because another WebLogic Server is already using this directory. Ensure that the first WebLogic Server is completely shutdown and restart the server.>
<31/05/2012 4:32:16 PM EST> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to FAILED>
<31/05/2012 4:32:16 PM EST> <Error> <WebLogicServer> <BEA-000383> <A critical service failed. The server will shut itself down>
<31/05/2012 4:32:16 PM EST> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to FORCE_SHUTTING_DOWN>

3.1       Root Cause:

Some time even after proper shutdown or forceful shutdown .lok file does not get removed automatically and when users try to restart the server again and since file is already present it server fails to start the servers.

3.2       Solution:

Navigate to Idapfiles location under server and delete the “EmbeddedLDAP.lok” file from there, location would be e.g. /hta/home/fusion/osb_home/osb_install1/user_projects/domains/vhaosb-dev2/servers/AdminServer/data/ldap/ldapfiles
And restart the Weblogic Server.

4         Err: The persistent store "_WLS_AdminServer" could not be deployed:


<01/06/2012 10:56:47 AM EST> <Error> <Store> <BEA-280061> <The persistent store "_WLS_AdminServer" could not be deployed: weblogic.store.PersistentStoreException: [Store:280105]The persistent file store "_WLS_AdminServer" cannot open file _WLS_ADMINSERVER000000.DAT.
weblogic.store.PersistentStoreException: [Store:280105]The persistent file store "_WLS_AdminServer" cannot open file _WLS_ADMINSERVER000000.DAT.
                at weblogic.store.io.file.Heap.open(Heap.java:312)
                at weblogic.store.io.file.FileStoreIO.open(FileStoreIO.java:104)
                at weblogic.store.internal.PersistentStoreImpl.recoverStoreConnections(PersistentStoreImpl.java:413)
                at weblogic.store.internal.PersistentStoreImpl.open(PersistentStoreImpl.java:404)
                at weblogic.store.admin.AdminHandler.activate(AdminHandler.java:126)
                Truncated. see log file for complete stacktrace

<01/06/2012 10:56:47 AM EST> <Critical> <WebLogicServer> <BEA-000362> <Server failed. Reason:

4.1       Root cause:

Above error comes mostly when weblogic does not able to read the file store .DAT file. This file weblogic uses for its internal working. There are 7 weblogic sub systems which write information in this file e.g. Diagnostic Service, JMS Messages, JTA Transaction Log (TLOG), Web Services, EJB Timer Services, Store-and-Forward (SAF) Service Agents and Path Service.
Also this file could be corrupted as well during forceful shutdown when users use kill -9 command, or reason can be anything else as well.

4.2       Solution:


1. cd $DomainHome/servers/AdminServer/data/store
2. find . –name  *.DAT
3. Verify the file name in your result and error message should be same.
4. Rename this file and move from this directory to some other directory.
5. find "EmbeddedLDAP.lok" and "AdminServer.lok" as well and remove the same.
6. check the port using netstat –an | grep <Weblogic server port>, there should not be any open connection to this port.
7. Start your weblogic server either via weblogic script or node manner.
Note: .DAT file is very important file, and contains business data as well in Production system. Please take a backup of this file before doing any operation on it, so that later this file can be analysed to complete those transaction.

5         ERR: The loading of OPSS java security policy provider failed due to exception


<Jun 5, 2012 1:37:53 PM EST> <Critical> <WebLogicServer> <BEA-000386> <Server subsystem failed. Reason: weblogic.security.SecurityInitializationException: The loading of OPSS java security policy provider failed due to exception, see the exception stack trace or the server log file for root cause. If still see no obvious cause, enable the debug flag -Djava.security.debug=jpspolicy to get more information. Error message: JPS-02592: Failed to push ldap config data to libOvd for service instance "idstore.ldap" in JPS context "default", cause: org.xml.sax.SAXException: Error Parsing at line #1: 1.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; <Line 1, Column 1>: XML-20108: (Fatal Error) Start of root element expected.

5.1       Root cause:

I need to find out exact root cause of this problem. As of now I am not quoting any details for why this error came but the solution which I have followed I am specifying here -

Note: There is similar error post, please refer this as well, it might fix your problem -
https://ali.education/tag/jps-02592/

5.2       Solution


Step1: delete “tmp” folder from the location $Domain\servers/AdminServer.
Step2: find out all the .lok file and .DAT file and delete it.
Find <location up to $Domain/servers/AdminServer> –name *.lok
Find <location up to $Domain/servers/AdminServer> -name *.DAT
Use the “rm” command to delete the files which comes as part of above find result.
Step3: Restart the Weblogic server.

6         Err: Boot identity not valid

<Jun 7, 2012 8:21:37 PM EST> <Critical> <WebLogicServer> <BEA-000386> <Server subsystem failed. Reason: weblogic.security.SecurityInitializationException: Authentication denied: Boot identity not valid; The user name and/or password from the boot identity file (boot.properties) is not valid. The boot identity may have been changed since the boot identity file was created. Please edit and update the boot identity file with the proper values of username and password. The first time the updated boot identity file is used to start the server, these new values are encrypted.

6.1       Root cause:

boot.properties files contains the username and password details in encrypted format.
There could be various reasons for above errors-
1)      Weblogic user password got changed.
2)      UNIX user access got changed for the folder where boot.properties located.
3)      Boot.properties file got corrupted.
4)      By mistake someone has modified this file.
5)   I have seen scenarios where while starting weblogic using wlst commnd e.g. statWebLogic.sh if you pass wrong password for weblogic user then boot.properties file become corrupted. May be it was temporary bug might have resolved now
6) nm_password.properties file might have got cruppted, noramlly located at this path $Domain//config/nodemanager/

6.2       Resolution:


Usual solution of this problem is to rename the existing boot.propeties file, create a new file with same name and provide the weblogic username and password in text format and restart the server. If server get started successfully the content of this file will be encrypted.

Most of time server will get restarted without problem, but some time I have noticed that even you give correct weblogic username and password in text format but Weblogic does not pick that information from this file.

If that is the case then delete this file, start the weblogic using .startWebLogic.sh script without nohup. It will ask the weblogic password during startup, provide the password, it will create the boot.properties file in background and server will come up. Once server came up, then you can shut down the server and can start the server again using “nohup” command as background process.

Note: Some time even after running in Development mode boot.properties will not be created, however once you change from Production mode to Development mode config.xml "<production-mode-enabled>false</production-mode-enabled>" entry and then try to create manually boot.properties file and specify username and password in correct format then it works for me.

e.g.
username=weblogic
password=Welcome1

Also, if you trying to start Weblogic Managed server using node manager via either Admin Console or WLST script, depending up your node manager configruation, your nm_password.properties file might also got cruppted. This file format is exactly same as boot.properites file and also need update. So simply rename old nm_password.properties  file and create new one with same name and provide weblogic username and password in clear text format same as boot.properties file. It should work.

7         Err: Could not start JTAMT on local server because

weblogic.cluster.migration.MigrationException: Could not start JTAMT on local server because it could not be deactivated on the current host.
                at weblogic.transaction.internal.TransactionRecoveryService.failbackIfNeeded(TransactionRecoveryService.java:571)
                at weblogic.transaction.internal.TransactionRecoveryFailBackService.start(TransactionRecoveryFailBackService.java:23)
                at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
                at weblogic.work.ExecuteThread.execute(ExecuteThread.java:252)
                at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)
Caused by: java.lang.SecurityException: Method 'deactivateJTA' cannot be invoked without administrator access
                at weblogic.rjvm.ResponseImpl.unmarshalReturn(ResponseImpl.java:237)
                at weblogic.rmi.internal.BasicRemoteRef.invoke(BasicRemoteRef.java:223)
                at weblogic.cluster.migration.RemoteMigratableServiceCoordinatorImpl_1036_WLStub.deactivateJTA(Unknown Source)
                at weblogic.cluster.migration.JTAMigrationHandler.deactivateJTA(JTAMigrationHandler.java:82)
                at weblogic.cluster.migration.JTAMigrationHandler.deactivateJTA(JTAMigrationHandler.java:99)
                at weblogic.transaction.internal.TransactionRecoveryService.failbackIfNeeded(TransactionRecoveryService.java:557)
                ... 4 more
Caused by: java.lang.SecurityException: Method 'deactivateJTA' cannot be invoked without administrator access

7.1       Root cause:

I need to find out exact root cause of this problem. As of now I am not quoting any details for why this error came but the solution which I have followed I am specifying here -

7.2       Resolution:

I was getting this error during restart up of Managed server.
I have searched for the above error and found couple of forum and most of the forum said delete the *.lok file and restart the servers. I searched for *.lok file under aserver/mserver but didn’t find anywhere then I stopped the Admin Server as well and tried to restart the server and its was failing and got some error which was related to “Disk Space, there was no space left on disk”
Then I have deleted lots of *.logs and *.out files etc to create some free space. I have used “df –h”  and “du –sh” command to find out the location which is taking more space and deleted the un used file and restarted the server Admin and Managed both and the above error “Could not start JTAMT on local server” went away.

8         ERR: EmbeddedLDAP  java.lang.ArrayIndexOutOfBoundsException

During Admin server restart we were getting below error continuously and Admin server was not coming up-
####<Nov 5, 2012 4:47:05 AM EST> <Critical> <EmbeddedLDAP> <vans075007> <AdminServer> <VDE Replication Thread> <<anonymous>> <> <> <1352051225634> <BEA-000000> <java.lang.ArrayIndexOutOfBoundsException: 0

8.1       Root cause:


This Error comes when the changelog.data  and changelog.index file get corrupted located in servers/Adminserver/data/ldap/ldapfiles .
Below are list of scenarios during when these file can get corrupted.
  • While the admin server was writing an LDAP entry to the changelog, it was interrupted by a forced shutdown, which made the changelog partially updated.
  • When the admin server rebooted, it attempted to process the changelog (i.e., send the entries to the managed servers), but encountered the partially updated changelog.
  • The partial update was an entry that had been assigned a change number, but there was no data for the entry.
  • When the change log writer is interrupted between the index update and the data update and this update is in a synchronized method.

8.2       Resolution


This Error comes when the changelog.data  and changelog.index file get corrupted located in servers/Adminserver/data/ldap/ldapfiles .
Please take a backup of existing “ldap” folder deleted these two files, and restart the Admin servers.
Both files will be created again and server will start successfully.
The changelog.data file is used in WebLogic Server (WLS) to store LDAP information regarding users, groups, roles and policies. The EmbeddedLDAP server has an index file and a data file. Each entry in the data file is pointed to by a index file entry; the index file entry is dictated by an integer that identified the entry.
For more details about issue and solution please refer the Oracle Note: DOC ID: 1325978.1


Common resolution

Following is list of general action which we can perfom in Test enviornment to fix the server startup problem -
Step1: Try to delete all the .lok file
Step2: Try to delete .DAT file insider $Domain/servers/AdminSever/step.
Step3: Take a backup of "tmp" folder inside servers folder and delete “tmp” folder
Step4:  Take a backup of "data" foler inside servers folder and delete “data” folder
Step5: Change the port number if some port is already in use.