OpenOfficeUNO Synchronization and Threading Draft


Introduction

OpenOffice.org is a multi threaded program. The threading model is preemptive. This gives us some benefits and some drawbacks. One benefit is concurrency in one process without cooperation. E.G. The user interface is still active during document loading, if this is done in a separate thread. Another one is possible speedup on multi processor systems.
On the other hand we need synchronization if we run multi threaded. The drawbacks are synchronizing is time consuming and error prone (deadlocks, inconsistencies and races). One major problem is old not thread save code.

Inconsistency

I like to give a few statements about inconsistency. Most calls to an API may result in an inconsistent state. This seems to be strange, because it doesn't occur often. The reason is that an API normally specifies, what are the changes on the given parameters, the result.... The API doesn't specify the side effects of the call (e.g. memory usage). And an API is explicit used to abstract from those side effects, because it shouldn't restrict the implementations. The result is, that the implementation is free to do everything additionally to those things which are specified. One result might be, that the values which are given to the API are changed (of course they have to be reference arguments). This normally results in unspecific behavior. Memory management is an good example for side effects. If the process isn't able to allocate more memory, then it could call a callback to request all caches to release their memory. This side effect may occur in every memory allocation call. Think about the problems which may occur if you try to put a new entry in the cache and the cache is deleted caused by a need memory request.
So you may be inconsistent under many circumstances. The reason why we haven't so much trouble is, that those kind of side effects are seldom. Normally it is an implementation goal to avoid side effects.

History

In 1989 we start the StarWriter development. At this time multi threading wasn't popular and Unix, Mac, M$-W didn't support it. So we started with a single threaded application. After years the internet become more important and we made the decision to load documents from the internet. The problem came up that we need to be active during the document loading phase. To do this we created a cooperative multi threading. Many commands are dispatched through a queue. If it was necessary we call a dispatch function to execute the next command from the queue. We called this function during document loading. Of course it wasn't real cooperative multi threading, because we had only one stack. After we did this we got a lot of problems, because the number of side effects had been increased. I don't know when we decide preemptive multi threading is the right way to do parallel things, but then we need a synchronization mechanism to guard all our old code. First we put a callback command in the command queue and the callback was only called from the queue dispatcher. The result was that only one thread was active in the old not thread save code. But unfortunately this is to long winded. Then the SolarMutex was born. This mutex have to be locked, if a thread will call in not thread save code. This works fine (beside M$-W apartment problems) and was very simple.

UNO Threading Specification

  1. All UNO components can be accesses from different threads at the same time.

  2. The component have to ensure that the status is consistent. Please keep in mind that this is only valid for the component itself, not for the whole status of several components. E.g. If you have one component wich constains the properties „FirstName“ and „LastName“. Then set in one thread „FirstName“ to „Markus“ and „LastName“ to „Meyer“ and set in a second thread „FirstName“ to „Jörg“ and „LastName“ to „Budischewsky“ . Then four results are possible: („Markus“; „Meyer“), („Jörg“; „Budischewski“), („Markus“; „Budischewski“), ( „Jörg“; „Meyer“). It is a bug if the UNO component may contain an mix in one name („Jörkus“; „Medischewski“). It is also a bug if the component crashes.

  3. A component should not account for deadlocks or race conditions.

General Problem

Unfortunately the requirement to be consistent and have no deadlocks or race conditions is conflictive. If you guard anything and do not release the guard if you communicate to other objects, then you have a possible deadlock. The simplest example is that a component A with it's own thread calls component B and Component B with it's own thread calls component A. If both do not release their guards, then you have a deadlock.

In normal programming their are no general usable mechanisms to recover from deadlock or avoid them. Only for special cases (e.g. database) or in some programming languages exists help. An important special case is the knowledge how a component synchronize and communicate (E.g. A component which do all operation inside and does not communicate with other components).
As a result the most flexible part in the problem is the consistency. But what is consistency? In general it is a specification (e.g. an invariant) which defines the relation between objects, values, .... For example the relation is a * b = c (e.g. 4*5=20).
A solution for the a*b=c example could be implemented in the following way: Component A contains the properties a and b and set the result of the calculation(*) to property c in component C. The properties a and b are set due the calculate operation. An external observer request the properties a and b from component A and c from component C. The observer run in it's own thread independently from the calculate call. Component A have 4 different possibilities to synchronize. You can see that no possibility is the solution.

  1. No synchronization: If multiple threads call the calculate method, then a * b = c isn't true under all conditions.

  2. Component A synchronize setting properties a and b. Then a and b have the values given in one calculate call, but c have to be changed after synchronization. So a * b = c isn't true under all conditions.

  3. Component C synchronize the setting of the list. This guarantees that the address list has the result of an previous operation and no merged result of two or more lists, a * b = c isn't true under all conditions.

  4. Component A synchronize itself and then Component C. Then set the properties a and b, calculate and set c. After that release the synchronization. Then a * b = c is under all condition right, but we might get a deadlock.

If we cannot get more control over synchronization (synchronize A and C without a deadlock) the only way to escape is to change the specification or the structure. At this point you have different possibilities to change the specification:

  1. You change the structure and remove Component C. The specification hold and you calculate the result each time it was requested. This is the preferred way, if the operation is cheap or called seldom. In the example the component C could be removed and the result could be returned in the calculate method.

  2. You could specify that C contains the result of the operation, but you don't say at which time this occur. You could make the specification a little bit stronger, if you say when the calling thread returns the value is set. It is recommended that you should provide the broadcast listener mechanism at least for the result.
    The consequence is, that you couldn't read a,b and c and the * relation between them is true (e.g. 3*4=20). If you do the little „after returning...“ specification AND the component C is private to you, then you can read the values and the relation is true. This is the benefit of the little extra specification.

General Solution Strategies

Unfortunately only some hints and strategies are possible. I recommend to read additional books and do some experiments with synchronization and threading.

  1. Use reference counting or autopointer instead of explicit deleting an object. This prevents crashing, due to the fact that objects couldn't be deleted which are used in the stack. One rule is that the caller must hold the object alive until the call returns.

  2. Avoid redundancy.

  3. Write components which do not communicate to other ones.

  4. Avoid unnecessary synchronous consistency specifications.

  5. Do not block unnecessarily a calling thread and switch to another one. This increases the probability of deadlocks and you may run in problems with the dynamic context.

  6. Synchronize only the necessary parts.

Specific OpenOffice.org Problems

Most of the the OpenOffice.org code isn't really thread save. To avoid inconsistencies we use the SolarMutex to synchronize. If you look at the strategies, we only do not unnecessarily block threads. We don't use reference counting in all cases, we don't know where are redundancies, the components communicate in unknown ways with other ones and we synchronize all parts.

If you think about avoiding deadlocks you will see that one bad guy do not cause a deadlock, if all other components are well implemented. This is the starting point to integrate the old code in the multithreading environment. The rules with the SolarMutex:

  1. If you are synchronized with the SolarMutex, you normally needn't release it. In some special cases (e.g. Visual Class Library, SystemIntegration) you have to understand how the SolarMutex is used and how to handle it if you call system functions. It is recommended to be a better player in game, that you release the SolarMutex if it is possible. Remember that it is very difficult to do it, because you have to check all your usage conditions. If the calling component also uses the SolarMutex then you release its synchronization too.
    In the VCL library we release the SolarMutex at least in the Reschedule call. I'm sure that we are not save in that calls under all conditions, but the wholes in the synchronization are opened in fewer cases and at known points, where we did some investigations. The result was a lower crash risk and this is what a user sees.

  2. If you write a component which blocks the thread, then it is not allowed to be called with a locked SolarMutex. Or you have to ensure that no callback to SolarMutex guarded code occurs.

  3. If you call another component and you don't know the synchronization style of that component you must release your own guard. Please keep the consequences for consistency in mind.

Document ToDos

-What is with race conditions
-Example for the general solution points

ToDos

- UNO support


Author: Markus Meyer. Copyright 2001 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, CA 94303 USA.