|
OpenOffice.org is a multi threaded program. The threading model is
preemptive. This gives us some benefits and some drawbacks. One
benefit is concurrency in one process without cooperation. E.G. The
user interface is still active during document loading, if this is
done in a separate thread. Another one is possible speedup on multi
processor systems.
On the other hand we need synchronization if we
run multi threaded. The drawbacks are synchronizing is time consuming
and error prone (deadlocks, inconsistencies and races).
One major problem is old not thread save code.
I like to give a few statements about inconsistency. Most calls to
an API may result in an inconsistent state. This seems to be strange,
because it doesn't occur often. The reason is that an API normally
specifies, what are the
changes on the given parameters, the result.... The API doesn't
specify the side effects of the call (e.g. memory usage). And an API
is explicit used to abstract from those side effects, because it
shouldn't restrict the implementations. The result is, that the
implementation is free to do everything additionally to those things
which are specified. One result might be, that the values which are
given to the API are changed (of course they have to be reference
arguments). This normally results in unspecific behavior. Memory
management is an good example for side effects. If the process isn't
able to allocate more memory, then it could call a
callback to request all caches to release their memory. This side
effect may occur in every memory allocation call. Think about the
problems which may occur if you try to put a new entry in the cache
and the cache is deleted caused by a need memory request.
So you
may be inconsistent under many circumstances. The reason why we
haven't so much trouble is, that those kind of side effects are
seldom. Normally it is an implementation goal to avoid side effects.
In 1989 we start the StarWriter development. At this time multi threading wasn't popular and Unix, Mac, M$-W didn't support it. So we started with a single threaded application. After years the internet become more important and we made the decision to load documents from the internet. The problem came up that we need to be active during the document loading phase. To do this we created a cooperative multi threading. Many commands are dispatched through a queue. If it was necessary we call a dispatch function to execute the next command from the queue. We called this function during document loading. Of course it wasn't real cooperative multi threading, because we had only one stack. After we did this we got a lot of problems, because the number of side effects had been increased. I don't know when we decide preemptive multi threading is the right way to do parallel things, but then we need a synchronization mechanism to guard all our old code. First we put a callback command in the command queue and the callback was only called from the queue dispatcher. The result was that only one thread was active in the old not thread save code. But unfortunately this is to long winded. Then the SolarMutex was born. This mutex have to be locked, if a thread will call in not thread save code. This works fine (beside M$-W apartment problems) and was very simple.
All UNO components can be accesses from different threads at the same time.
The component have to ensure that the status is consistent. Please keep in mind that this is only valid for the component itself, not for the whole status of several components. E.g. If you have one component wich constains the properties FirstName and LastName. Then set in one thread FirstName to Markus and LastName to Meyer and set in a second thread FirstName to Jörg and LastName to Budischewsky . Then four results are possible: (Markus; Meyer), (Jörg; Budischewski), (Markus; Budischewski), ( Jörg; Meyer). It is a bug if the UNO component may contain an mix in one name (Jörkus; Medischewski). It is also a bug if the component crashes.
A component should not account for deadlocks or race conditions.
Unfortunately the requirement to be consistent and have no deadlocks or race conditions is conflictive. If you guard anything and do not release the guard if you communicate to other objects, then you have a possible deadlock. The simplest example is that a component A with it's own thread calls component B and Component B with it's own thread calls component A. If both do not release their guards, then you have a deadlock.
In normal programming their are no general usable mechanisms to
recover from deadlock or avoid them. Only for special cases (e.g.
database) or in some programming languages exists help. An important
special case is the knowledge how a component synchronize and
communicate (E.g. A component which do all operation inside and does
not communicate with other components).
As a result the most
flexible part in the problem is the consistency. But what is
consistency? In general it is a specification (e.g. an invariant)
which defines the relation between objects, values, .... For example
the relation is a * b = c (e.g. 4*5=20).
A solution for the a*b=c
example could be implemented
in the following way: Component A contains the properties a and b and
set the result of the calculation(*) to property c in component C.
The properties a and b are set due the calculate operation. An
external observer request the properties a and b from component A and
c from component C. The observer run in it's own thread independently
from the calculate call. Component A have 4 different possibilities
to synchronize. You can see that no possibility is the
solution.
No synchronization: If multiple threads call the calculate method, then a * b = c isn't true under all conditions.
Component A synchronize setting properties a and b. Then a and b have the values given in one calculate call, but c have to be changed after synchronization. So a * b = c isn't true under all conditions.
Component C synchronize the setting of the list. This guarantees that the address list has the result of an previous operation and no merged result of two or more lists, a * b = c isn't true under all conditions.
Component A synchronize itself and then Component C. Then set the properties a and b, calculate and set c. After that release the synchronization. Then a * b = c is under all condition right, but we might get a deadlock.
If we cannot get more control over synchronization (synchronize A and C without a deadlock) the only way to escape is to change the specification or the structure. At this point you have different possibilities to change the specification:
You change the structure and remove Component C. The specification hold and you calculate the result each time it was requested. This is the preferred way, if the operation is cheap or called seldom. In the example the component C could be removed and the result could be returned in the calculate method.
You could specify that C contains the result of the
operation, but you don't say at which time this occur. You could
make the specification a little bit stronger, if you say when the
calling thread returns the value is set. It is recommended that you
should provide the broadcast listener mechanism at least for the
result.
The consequence is, that you couldn't read a,b and c and
the * relation between them is true (e.g. 3*4=20). If you do the
little after returning... specification AND
the component C is private to you, then you can read the values and
the relation is true. This is the benefit of the little extra
specification.
Unfortunately only some hints and strategies are possible. I recommend to read additional books and do some experiments with synchronization and threading.
Use reference counting or autopointer instead of explicit deleting an object. This prevents crashing, due to the fact that objects couldn't be deleted which are used in the stack. One rule is that the caller must hold the object alive until the call returns.
Avoid redundancy.
Write components which do not communicate to other ones.
Avoid unnecessary synchronous consistency specifications.
Do not block unnecessarily a calling thread and switch to another one. This increases the probability of deadlocks and you may run in problems with the dynamic context.
Synchronize only the necessary parts.
Most of the the OpenOffice.org code isn't really thread save. To avoid inconsistencies we use the SolarMutex to synchronize. If you look at the strategies, we only do not unnecessarily block threads. We don't use reference counting in all cases, we don't know where are redundancies, the components communicate in unknown ways with other ones and we synchronize all parts.
If you think about avoiding deadlocks you will see that one bad guy do not cause a deadlock, if all other components are well implemented. This is the starting point to integrate the old code in the multithreading environment. The rules with the SolarMutex:
If you are synchronized with the SolarMutex, you normally
needn't release it. In some special cases (e.g. Visual Class
Library, SystemIntegration) you have to understand how the
SolarMutex is used and how to handle it if you call system
functions. It is recommended to be a better player in game, that you
release the SolarMutex if it is possible. Remember that it is very
difficult to do it, because you have to check all your usage
conditions. If the calling component also uses the SolarMutex then
you release its synchronization too.
In the VCL library we
release the SolarMutex at least in the Reschedule call. I'm sure
that we are not save in that calls under all conditions, but the
wholes in the synchronization are opened in fewer cases and at known
points, where we did some investigations. The result was a lower
crash risk and this is what a user sees.
If you write a component which blocks the thread, then it is not allowed to be called with a locked SolarMutex. Or you have to ensure that no callback to SolarMutex guarded code occurs.
If you call another component and you don't know the synchronization style of that component you must release your own guard. Please keep the consequences for consistency in mind.
-What is with race conditions
-Example for the general solution
points
- UNO support
Author: Markus Meyer. Copyright 2001 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, CA 94303 USA. |