Pages

Wednesday, November 7, 2012

My Experince with Weblogic Server Startup Problems


Here is the list of common error messages and their solution which I have faced during WebLogic server start –up

Note:  Please don’t treat this solution as final solution, whatever solution I have mentioned it’s came from my experience and various sources with whom I have worked. The mentioned solution worked for me for the environment where I was facing the problem but does not necessarily that it will solve the problem what you are facing. Just treat given solutions as guideline.

Also if it is a production system then please make sure before making any changes, take a backup of files which you are modifying. 

1         ERR: transport error 202: bind failed: Address already in use


Starting weblogic with Java version:
ERROR: transport error 202: bind failed: Address already in use
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:690]

FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)



1.1       Root cause

Above error comes due to port usage problem, if during server stop/kill process if the port which weblogic server is using not reclaimed by OS and user again issued the command to start the Weblogic server instance then servers try to bind the same port while another process is already running on that port and Weblogic server fails to start.
 

1.1       Solution:


This error mostly comes when user forcefully kill the server, so before restart please check all the ports status which weblogic using e.g. http port , ALSB_DEBUG_PORT and DEBUG_PORT  by using this command -
netstat –an | grep <port no> # to see which whether this port is active or not.
e.g. netstat –an | grep 9001
If you get the result from above queries then please wait and give some time to OS to reclaim this port, most of scenarios Port will be get free after some time and you can restart the servers again.
In case above port is not getting released it simply means there is some process holding this Port, that process could be existing Weblogic PID or some demon thread, engage your Unix Administrator to find out that process and kill the process.
Also we can bounce the whole server to release this port.
In case of if this problem is coming for ALSB_DEBUG_PORT and DEBUG_PORT , then you can change this port number as well since these port are internal to Weblogic and does not used by any external system communication.

2         Err: Error occurred during initialization of VM

2.1       Root cause:

You would get better help for this on WebLogic Server or a Java forum. The above problem is a fairly typical JVM and operating system interop issue. On startup JVM tries to reserve a contiguous block of virtual memory for heap and permgen. If it is unable to do so, it fails to start. On 32-bit operating system, virtual memory address space is 4GB, but drivers and operating system processes can breakup the available space into smaller chunks such that Java is not able to allocate a contiguous block that it requires. Any number of things could of caused this to start happening all of a sudden. Anything from a Windows update to installing new hardware or software.


2.2       Solution:

Several possible solutions...

  1. Tries to reduce memory allocated to WLS java process. Move down small increments at a time until the server starts e.g. to 2gb/1gb/512m. Perhaps you can get your app to run with reduced memory.
  2. Switch to a 64-bit operating system and a 64-bit JVM. Note that you don't need more than 4GB of actual ram to get a benefit from a 64-bit environment in this case. It is the size of the virtual memory address space that counts. You will need to seek advice on WebLogic Server forum regarding configuring WLS to run with a 64-bit JVM.
  3. Switch to running WLS using Jrockit. I do not believe Jrockit has the contiguous memory requirement.
  4. Do some low-level debugging to identify which drivers or dlls are loaded where in memory and uninstall offenders or attempt to move them. You can find information on this via Google, but I don't really recommend doing this unless you enjoy low-level debugging and have some experience with it.
  5. If above error is coming due to less memory allocated while your JVM need more then try increasing the memory as well in incremental mannerk solution could be increase the Min and Max size of JVM if your server have enough Memory , as a best practise Min and Max size always should be equal so that during Initiliation itself JVM will reserve that much memory.
In setDomainEnv.sh file find out this properties ‘EXTRA_JAVA_PROPERTIES’ and add this lines “-Xms3072m -Xmx3072m” or “-Xms3g –Xmx3g”
Or in Weblogic Admin server console, click on server, go to “server start” tab page and specify JVM parameter “-Xms3072m -Xmx3072m” or “-Xms3g –Xmx3g”
Verify the .out file whether changes is getting effected or not.


3       Err: Could not obtain an exclusive lock to the embedded LDAP data files directory



<31/05/2012 4:32:08 PM EST> <Error> <EmbeddedLDAP> <BEA-171519> <Could not obtain an exclusive lock to the embedded LDAP data files directory: /hta/home/fusion/osb_home/osb_install1/user_projects/domains/vhaosb-dev2/servers/AdminServer/data/ldap/ldapfiles because another WebLogic Server is already using this directory. Ensure that the first WebLogic Server is completely shutdown and restart the server.>
<31/05/2012 4:32:16 PM EST> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to FAILED>
<31/05/2012 4:32:16 PM EST> <Error> <WebLogicServer> <BEA-000383> <A critical service failed. The server will shut itself down>
<31/05/2012 4:32:16 PM EST> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to FORCE_SHUTTING_DOWN>

3.1       Root Cause:

Some time even after proper shutdown or forceful shutdown .lok file does not get removed automatically and when users try to restart the server again and since file is already present it server fails to start the servers.

3.2       Solution:

Navigate to Idapfiles location under server and delete the “EmbeddedLDAP.lok” file from there, location would be e.g. /hta/home/fusion/osb_home/osb_install1/user_projects/domains/vhaosb-dev2/servers/AdminServer/data/ldap/ldapfiles
And restart the Weblogic Server.

4         Err: The persistent store "_WLS_AdminServer" could not be deployed:


<01/06/2012 10:56:47 AM EST> <Error> <Store> <BEA-280061> <The persistent store "_WLS_AdminServer" could not be deployed: weblogic.store.PersistentStoreException: [Store:280105]The persistent file store "_WLS_AdminServer" cannot open file _WLS_ADMINSERVER000000.DAT.
weblogic.store.PersistentStoreException: [Store:280105]The persistent file store "_WLS_AdminServer" cannot open file _WLS_ADMINSERVER000000.DAT.
                at weblogic.store.io.file.Heap.open(Heap.java:312)
                at weblogic.store.io.file.FileStoreIO.open(FileStoreIO.java:104)
                at weblogic.store.internal.PersistentStoreImpl.recoverStoreConnections(PersistentStoreImpl.java:413)
                at weblogic.store.internal.PersistentStoreImpl.open(PersistentStoreImpl.java:404)
                at weblogic.store.admin.AdminHandler.activate(AdminHandler.java:126)
                Truncated. see log file for complete stacktrace

<01/06/2012 10:56:47 AM EST> <Critical> <WebLogicServer> <BEA-000362> <Server failed. Reason:

4.1       Root cause:

Above error comes mostly when weblogic does not able to read the file store .DAT file. This file weblogic uses for its internal working. There are 7 weblogic sub systems which write information in this file e.g. Diagnostic Service, JMS Messages, JTA Transaction Log (TLOG), Web Services, EJB Timer Services, Store-and-Forward (SAF) Service Agents and Path Service.
Also this file could be corrupted as well during forceful shutdown when users use kill -9 command, or reason can be anything else as well.

4.2       Solution:


1. cd $DomainHome/servers/AdminServer/data/store
2. find . –name  *.DAT
3. Verify the file name in your result and error message should be same.
4. Rename this file and move from this directory to some other directory.
5. find "EmbeddedLDAP.lok" and "AdminServer.lok" as well and remove the same.
6. check the port using netstat –an | grep <Weblogic server port>, there should not be any open connection to this port.
7. Start your weblogic server either via weblogic script or node manner.
Note: .DAT file is very important file, and contains business data as well in Production system. Please take a backup of this file before doing any operation on it, so that later this file can be analysed to complete those transaction.

5         ERR: The loading of OPSS java security policy provider failed due to exception


<Jun 5, 2012 1:37:53 PM EST> <Critical> <WebLogicServer> <BEA-000386> <Server subsystem failed. Reason: weblogic.security.SecurityInitializationException: The loading of OPSS java security policy provider failed due to exception, see the exception stack trace or the server log file for root cause. If still see no obvious cause, enable the debug flag -Djava.security.debug=jpspolicy to get more information. Error message: JPS-02592: Failed to push ldap config data to libOvd for service instance "idstore.ldap" in JPS context "default", cause: org.xml.sax.SAXException: Error Parsing at line #1: 1.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; <Line 1, Column 1>: XML-20108: (Fatal Error) Start of root element expected.

5.1       Root cause:

I need to find out exact root cause of this problem. As of now I am not quoting any details for why this error came but the solution which I have followed I am specifying here -

Note: There is similar error post, please refer this as well, it might fix your problem -
https://ali.education/tag/jps-02592/

5.2       Solution


Step1: delete “tmp” folder from the location $Domain\servers/AdminServer.
Step2: find out all the .lok file and .DAT file and delete it.
Find <location up to $Domain/servers/AdminServer> –name *.lok
Find <location up to $Domain/servers/AdminServer> -name *.DAT
Use the “rm” command to delete the files which comes as part of above find result.
Step3: Restart the Weblogic server.

6         Err: Boot identity not valid

<Jun 7, 2012 8:21:37 PM EST> <Critical> <WebLogicServer> <BEA-000386> <Server subsystem failed. Reason: weblogic.security.SecurityInitializationException: Authentication denied: Boot identity not valid; The user name and/or password from the boot identity file (boot.properties) is not valid. The boot identity may have been changed since the boot identity file was created. Please edit and update the boot identity file with the proper values of username and password. The first time the updated boot identity file is used to start the server, these new values are encrypted.

6.1       Root cause:

boot.properties files contains the username and password details in encrypted format.
There could be various reasons for above errors-
1)      Weblogic user password got changed.
2)      UNIX user access got changed for the folder where boot.properties located.
3)      Boot.properties file got corrupted.
4)      By mistake someone has modified this file.
5)   I have seen scenarios where while starting weblogic using wlst commnd e.g. statWebLogic.sh if you pass wrong password for weblogic user then boot.properties file become corrupted. May be it was temporary bug might have resolved now
6) nm_password.properties file might have got cruppted, noramlly located at this path $Domain//config/nodemanager/

6.2       Resolution:


Usual solution of this problem is to rename the existing boot.propeties file, create a new file with same name and provide the weblogic username and password in text format and restart the server. If server get started successfully the content of this file will be encrypted.

Most of time server will get restarted without problem, but some time I have noticed that even you give correct weblogic username and password in text format but Weblogic does not pick that information from this file.

If that is the case then delete this file, start the weblogic using .startWebLogic.sh script without nohup. It will ask the weblogic password during startup, provide the password, it will create the boot.properties file in background and server will come up. Once server came up, then you can shut down the server and can start the server again using “nohup” command as background process.

Note: Some time even after running in Development mode boot.properties will not be created, however once you change from Production mode to Development mode config.xml "<production-mode-enabled>false</production-mode-enabled>" entry and then try to create manually boot.properties file and specify username and password in correct format then it works for me.

e.g.
username=weblogic
password=Welcome1

Also, if you trying to start Weblogic Managed server using node manager via either Admin Console or WLST script, depending up your node manager configruation, your nm_password.properties file might also got cruppted. This file format is exactly same as boot.properites file and also need update. So simply rename old nm_password.properties  file and create new one with same name and provide weblogic username and password in clear text format same as boot.properties file. It should work.

7         Err: Could not start JTAMT on local server because

weblogic.cluster.migration.MigrationException: Could not start JTAMT on local server because it could not be deactivated on the current host.
                at weblogic.transaction.internal.TransactionRecoveryService.failbackIfNeeded(TransactionRecoveryService.java:571)
                at weblogic.transaction.internal.TransactionRecoveryFailBackService.start(TransactionRecoveryFailBackService.java:23)
                at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
                at weblogic.work.ExecuteThread.execute(ExecuteThread.java:252)
                at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)
Caused by: java.lang.SecurityException: Method 'deactivateJTA' cannot be invoked without administrator access
                at weblogic.rjvm.ResponseImpl.unmarshalReturn(ResponseImpl.java:237)
                at weblogic.rmi.internal.BasicRemoteRef.invoke(BasicRemoteRef.java:223)
                at weblogic.cluster.migration.RemoteMigratableServiceCoordinatorImpl_1036_WLStub.deactivateJTA(Unknown Source)
                at weblogic.cluster.migration.JTAMigrationHandler.deactivateJTA(JTAMigrationHandler.java:82)
                at weblogic.cluster.migration.JTAMigrationHandler.deactivateJTA(JTAMigrationHandler.java:99)
                at weblogic.transaction.internal.TransactionRecoveryService.failbackIfNeeded(TransactionRecoveryService.java:557)
                ... 4 more
Caused by: java.lang.SecurityException: Method 'deactivateJTA' cannot be invoked without administrator access

7.1       Root cause:

I need to find out exact root cause of this problem. As of now I am not quoting any details for why this error came but the solution which I have followed I am specifying here -

7.2       Resolution:

I was getting this error during restart up of Managed server.
I have searched for the above error and found couple of forum and most of the forum said delete the *.lok file and restart the servers. I searched for *.lok file under aserver/mserver but didn’t find anywhere then I stopped the Admin Server as well and tried to restart the server and its was failing and got some error which was related to “Disk Space, there was no space left on disk”
Then I have deleted lots of *.logs and *.out files etc to create some free space. I have used “df –h”  and “du –sh” command to find out the location which is taking more space and deleted the un used file and restarted the server Admin and Managed both and the above error “Could not start JTAMT on local server” went away.

8         ERR: EmbeddedLDAP  java.lang.ArrayIndexOutOfBoundsException

During Admin server restart we were getting below error continuously and Admin server was not coming up-
####<Nov 5, 2012 4:47:05 AM EST> <Critical> <EmbeddedLDAP> <vans075007> <AdminServer> <VDE Replication Thread> <<anonymous>> <> <> <1352051225634> <BEA-000000> <java.lang.ArrayIndexOutOfBoundsException: 0

8.1       Root cause:


This Error comes when the changelog.data  and changelog.index file get corrupted located in servers/Adminserver/data/ldap/ldapfiles .
Below are list of scenarios during when these file can get corrupted.
  • While the admin server was writing an LDAP entry to the changelog, it was interrupted by a forced shutdown, which made the changelog partially updated.
  • When the admin server rebooted, it attempted to process the changelog (i.e., send the entries to the managed servers), but encountered the partially updated changelog.
  • The partial update was an entry that had been assigned a change number, but there was no data for the entry.
  • When the change log writer is interrupted between the index update and the data update and this update is in a synchronized method.

8.2       Resolution


This Error comes when the changelog.data  and changelog.index file get corrupted located in servers/Adminserver/data/ldap/ldapfiles .
Please take a backup of existing “ldap” folder deleted these two files, and restart the Admin servers.
Both files will be created again and server will start successfully.
The changelog.data file is used in WebLogic Server (WLS) to store LDAP information regarding users, groups, roles and policies. The EmbeddedLDAP server has an index file and a data file. Each entry in the data file is pointed to by a index file entry; the index file entry is dictated by an integer that identified the entry.
For more details about issue and solution please refer the Oracle Note: DOC ID: 1325978.1


Common resolution

Following is list of general action which we can perfom in Test enviornment to fix the server startup problem -
Step1: Try to delete all the .lok file
Step2: Try to delete .DAT file insider $Domain/servers/AdminSever/step.
Step3: Take a backup of "tmp" folder inside servers folder and delete “tmp” folder
Step4:  Take a backup of "data" foler inside servers folder and delete “data” folder
Step5: Change the port number if some port is already in use. 


17 comments:

  1. Hi manish ,

    That's a good job. i have the same issue as mentioned above ie 3 one..but i did have any file as mentioned there still getting the same. please give me the solution for it ..

    ReplyDelete
  2. hi manish,
    while restarting weblogic i am not able to see the output message as Running mode in nohup.out but when i grep to weblogic i can see that weblogic is running ..

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. Hi Manish,

    I am getting this error while starting weblogic 10.3.5

    "Either the server process could not be started or it terminated abruptly. Check the start script."

    How to resolve this? Please provide solution if you know.

    Thanks

    ReplyDelete
  5. Hi,
    I am getting below error while setting the WLST environment, could you please suggest me.

    @soahost1vhn0 bin]$ java weblogic.WLST
    Exception in thread "main" java.lang.NoClassDefFoundError: weblogic/WLST
    Caused by: java.lang.ClassNotFoundException: weblogic.WLST
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332)
    Could not find the main class: weblogic.WLST. Program will exit.

    ReplyDelete
  6. while doing nmConnect i was get the below error could you suggest please.

    Traceback (innermost last):
    File "", line 1, in ?
    File "", line 1408, in nmConnect
    File "", line 1847, in raiseWLSTException
    WLSTException: Error occured while performing nmConnect : Cannot connect to Node Manager. : Connection refused. Could not connect to NodeManager. Check that it is running at localhost:5556.
    Use dumpStack() to view the full stacktrace

    ReplyDelete
  7. Hi Manish,

    My server is throwing error as below:

    Jun 24, 2015 7:12:09 AM EDT>


    I had removed the credentials from embedded ldap section but not its throwing above error and asking me to provide embedded LDAP server credentials , I don't know where to provide these, please help me

    ReplyDelete
    Replies
    1. Hi Karanveer, you said above error? which error you are referring?

      Delete

  8. Hi,

    After changing the weblogic admin password, I'm getting error while starting Admin server.. The error is Err: Boot identity not valid
    <Server subsystem failed. Reason: weblogic.security.SecurityInitializationException: Authentication denied: Boot identity not valid; The user name and/or password from the boot identity file (boot.properties) is not valid. The boot identity may have been changed since the boot identity file was created. Please edit and update the boot identity file with the proper values of username and password. The first time the updated boot identity file is used to start the server, these new values are encrypted...Please help.

    ReplyDelete
  9. ok this often happens coz boot.properties file is very sensitive. delete the existing boot.properties file after taking backup.
    next create a new boot.properties file using vi editor and put the password in clear text
    -
    username=weblogic
    password=

    Make sure there is no empty lines and space at the end of line, save the file and try to restart the admin server.

    Hopeful it should start, if still not start then delete the file again and start the admin server in putty without background process, when admin server will be starting it will ask you to enter user name and password, enter the username and password it should create a boot.properties file in backgroup under security folder.

    Try above two methods and let me know how it goes, if it work, update your feedback here.

    Regards..
    Manish

    ReplyDelete
  10. Sir, Regarding
    5 ERR: The loading of OPSS java security policy provider failed due to exception

    Reason and solution can be found @ https://ali.education/tag/jps-02592/

    please update your blog as it will be easy for other users..
    Thanks,
    Amit

    ReplyDelete
  11. weblogic.store.PersistentStoreFatalException: [Store:280041]An record was found in the file store with an invalid version of 5

    Sir, please tell me how to resolve this issue

    ReplyDelete
  12. Hello Sachin, its hard to comment without getting full details of issue, but its problem is related to file system, and you are sure about there is no pending transaction you can delete the file store, and resatrt the WLS server as it will recreate file store while booting. There is no straight tool to read .dat file store file and delete the infected transactions.

    ReplyDelete
  13. Hi Manish,
    This is Vinay Kumar.
    when iam giving correct username and password, server is not comming to running mode. It is showing forcefully shutting down.What would be the reason.

    ReplyDelete
    Replies
    1. Hi,

      where u are giving correct user name and password? is it in console while server is getting started, or via boot.properties file.

      whatever the case, to know more about what is the error, check the adminserver.out file, also increase the logging level to "TRACE", you can modify config.xml directly without admin console to do that. just find out Adminserver in config.xml and change logging level and then you would be able to see what is the error.

      rds.
      Manish

      Delete
  14. Nice blog Manish!
    My name is kumar and facing some issues with my middleware environment for Oracle SOA & BPM Suite. Do you have any idea about below issues-
    1. Abrupt shutdown of Admin Server
    2. Abruptly restart of SOA Server
    3. SOA server consuming a lot of memory(sometimes 100% RAM consumption)
    Any idea or suggestions to fix this?

    ReplyDelete
    Replies
    1. Hi, it must be related to the stuck threads. take the thread dump and use the thread dump analysis tool to investigate which tread are getting blocked and look for solution related to that thread, raise ticket with oracle support they have comprehensive tool to analyze thread dump and can advise you the root cause of server failure. I have used thread logic earlier its good tool to analyze thread dump logs. you can google how to take thread dump its very easy. Also enable gc.log using JVM parameter and make sure GC is working and cleaning up your dead object.

      Delete