Errata/Troubleshooting for NetSolve, version 1.4
History of NetSolve releases
Version 1.0 : ??
Version 1.1 : January, 1998
Version 1.2 : February 15, 1999
Version 1.3beta1-6: sporadically in 2000
Version 1.4 : July 31, 2001
To better address the needs of our NetSolve users, we're in the
process of creating this Errata/Troubleshooting webpage. This
file provides diagnostic help in explaining the reasons for
specific NetSolve run-time error messages, gives a list
of known deficiencies in the NetSolve system, and provides
up-to-date information on bugs reported and how to download
patches to NetSolve.
This file contains:
- Errata for NetSolve Users' Guide
- Errata for NetSolve, Version 1.4
- Bug Report Checklist
- Troubleshooting Run-Time Error Messages for NetSolve, Version 1.4
- Known Deficiencies in NetSolve, Version 1.4
NetSolve has been tested on a variety of architectures.
- Pentium Linux 2.2.15
- Solaris 2.7 and 2.5
- Aix 4.3.3.0
- Tru64/OSF1 V5.1
- Alpha Linux 2.2.14-6.0
- IRIX 6.5
- Windows 2000 (client interface)
In addition, testing was performed
using Mathematica version 4.0 for Linux and MathLink
version 3.8, and Matlab Version 6.0.0.88 Release 12 (Unix and
Windows versions), NWS release
2.0, IBP version 1.0.1, PETSc 2.0.29, Aztec version 2.1,
SuperLU version 1.1, ScaLAPACK version 1.6, and Java 1.2.
Errata in NetSolve Users' Guide
No known errata at this time.
Errata in NetSolve, version 1.4
No known errata at this time.
Bug Report Checklist
When reporting a suspected bug to the netsolve mailing alias, please
supply the following information. These are the first questions that
we will ask.
- On what type of machine did you install NetSolve (os and compiler)?
- What is the exact configure line used to configure NetSolve (config.status)?
- Did you compile client only or client/agent/server?
- Did you send us the cut-and-paste of the error message encountered?
- If the error occurred at runtime, did you consult the "Troubleshooting" section of this Errata file?
- If the error occurred at runtime, did you check for more information
in the nsagent.log and nsserver.log files? What was the text found
in these log files?
Troubleshooting Run-Time Error Messages in NetSolve, version 1.4
If an error occurs during the invocation of NetSolve, a variety
of diagnostic runtime error messages, as well as error codes that
can be returned when calling a NetSolve function from the C or
Fortran interfaces, are provided.
The error codes and runtime error messages are listed in
Chapter 24 of the NetSolve Users' Guide, and may have several
possible explanations/causes.
If one of these error messages occurs,
the user should first check the agent and server log files,
$NETSOLVE_ROOT/nsagent.log or
$NETSOLVE_ROOT/nsserver.log, respectively.
These files may contain more information to clarify the reason
for the error message.
- NS: unknown problem
-
Possible causes:
The user has requested a problem that is not serviced by
any of the available servers. To check for this possibility,
the user can invoke the NS_problems command,
and see if the problem requested is included in the list of
available services. To expand a server's capabilities, the
user should refer to Chapter 13 of the NetSolve Users' Guide.
- NS: no available server
-
Possible causes:
-
Service zombie, i.e., a process that has gone awry and can be
seen using ps -ef or ps -augx,
and must be killed using kill -9 pid. This
can occur if a service hangs or is abnormally terminated.
-
The user could have requested a problem that is not serviced by
any of the available servers. To check for this possibility,
the user can invoke the NS_problems command,
and see if the problem requested is included in the list of
available services.
- NS: impossible to bind to port
-
Possible causes:
-
This error usually occurs when the user is trying to start an
agent on a machine to which an agent is
already running. The process could be owned by the user or
by another user.
-
Or, it is possible that another user is running a process
on the port that you have requested for the agent
process.
- NS: Cannot contact agent
-
Possible causes:
-
This error will occur if there is a conflict in the agent
specified by the NETSOLVE_AGENT
environment variable, and the @AGENT
that is specified in the $NETSOLVE_ROOT/server_config
file.
-
Or, it is possible (for whatever reason) that the agent is not
responding. The user could query with the NS_config
command to request the list of reachable agents/servers in the NetSolve
configuration,
or simply issue the NS_killall command to kill the agent
and server and then restart the processes.
Known Deficiencies in NetSolve, version 1.4
The following caveats exist in the NetSolve code, and will
be fixed in an upcoming release.
- Assumes $NWS_DIR/bin/ARCH/ is in your path if you enable
NWS (configure --with-nwsdir=NWS_DIR) in NetSolve.
- Requires (PETSc, Aztec, and ITPACK) to all be installed
in order to use the sparse_iterative_solve PDF.
Likewise, requires MA28 and SuperLU to both be installed
in order to use the sparse_direct_solve PDF.
Need to incorporate sparse wrapper modification so that the
pdf can be enabled if only one of the libraries is installed.
- Inconsistent printed error message between C, Fortran,
Matlab, Mathematica, and Windows client interfaces.
Missing "NS:" prepended to error messages. Windows client
interface is still prefixed with "NetSolve:".
- Missing run-time error message for NetSolveUnknownHandle (-40)
error in src/CoreFunctions/netsolveerror.c.
- Mathematica ScaLAPACK interface fails when RHS > 1,
questioning transpose routine when matrix is not square.
- "Invalid argument" message sent to stderr (nsserver.log)
when invoking 'sparse_iterative_solve', 'ITPACK', ...
coming from SSORI from ITPACK. Needs further investigation.
-
When running multiple servers within the same tree,
if a log file isn't explicitly chosen, the newest server
will take over the log file and you won't get logs
of messages from other servers. You should explicitly
direct the log of each server to a unique file.
Combine all server log information into one log file or
should be maintain separate logs for each server?
- There is currently no limit on the size of the nsserver.log
and nsagent.log files. We should incorporate some mechanism
to limit the size of those files, and have it start overwriting
the file at a certain point.
- benchmarking anomaly. NetSolve/src/Server/kflops.c.
- check_server timing bug.
- unexplained anomalous behavior with Workload reporting.
- pdfgui requires Java 1.2 or later.
- clean up compiler warning messages.
- memory leaks.
- case insensitivity of job submit for 'PETSC', 'AZTEC',
'ITPACK', 'SUPERLU', 'MA28'. Just need to do a
strcasecmp() in NetSolve/problems/sparse_iterative_solve
and sparse_direct_solve.
- NetSolve/src/Examples/sparse_testers/itpack_tester/
is referencing the old interface to 'itpack_solve'
and the ../itpack_tester/Makefile is hardwired
for gcc.
- The size of the problem_init.o grows with the number of
pdf services enabled. Depending upon the amount of
memory available on a given architecture, it may be
possible that not all pdfs can be enabled.
- ARPACK pdf was not tested for this release. ARPACK enablement
requires Chao Yang's SPEIG distribution to be included with
the standard ARPACK distribution.
- @COMP limited functionality in PDFs. Its functionality
needs to be expanded.
- @COMPLEXITY expression is too limited. We need to be able
to express fractions (e.g., 2/3 n^3), and thus need 3 integers
to be specified (numerator, denominator, and exponent) instead
of the existing 2 integers. It would also be helpful to have
a "memory" complexity as well as the "flop" complexity to more
easily refine the scheduling process.
- IRIX WORKLOAD SET ARBITRARILY HIGH:
The server workload numbers reported to the agent for use in scheduling are
statically set to 58 for any IRIX platform. This high value ensures that
any IRIX machine in a NetSolve grid will rarely be assigned to service a
request, unless it is the only server configured to service a particular
problem. And obviously, the agent has no true notion of the load on the
machine. All IRIX boxes would look alike to the agent.
- IRIX Matlab: We had difficulties to build "make matlab" on
IRIX due to duplicate symbol warnings and the compile fails. This
is under investigation.
- IBP enablement within NetSolve was tested with IBP v1.0.1,
whis is no longer available on the IBP website. NetSolve should work
fine with IBP v1.0.2, however, we have not yet tested with that
version.
- Windows client software currently only works with Windows2000.
It will not work on Windows98 or earlier.
- The port used by the NetSolve Agent is currently hardwired to
that specified in NetSolve/include/general.h. In a future release,
the port number for the agent will be configurable when the agent
is started.
- If more than one agent is running in a netsolve pool, the
agent will not properly update information when an agent is taken
down. There is inconsistency with reporting.