Language Specification of the COM Bridge

OpenOffice Draft 0.1

Contents

  1. About this Document
  2. COM - UNO Mapping
  3. UNO - COM Mapping

1 About this Document

This is a draft concerning how COM and UNO types are mapped to types of another environment. The term mapping, refers to how COM type library information is represented by UNO type library information and vice versa. So technically, this document is about converting type library data.

2 COM - UNO

2.1 Restrictions and Conventions

COM components are usually distributed with a type library rather than IDL files. The TLB (type library) does not contain all the information that is provided by IDL files. These are:

To generate UNO type library information, one would use the COM TLB and hence have to deal with ambiguities caused by insufficient information. These ambiguities mainly affect pointer parameters. To solve this problem, all of those types are mapped to one special UNO type, that contains additional information, which is to be supplied by the programmer.

COM uses unicode Strings: LPCOLESTR, LPOLESTR, OLECHAR*, LPCWSTR, LPWST, and BSTR; however, the TLB can contain ASCII strings and COM programmers might choose to use arrays, char[], small[], or short[] as strings. This specification only takes those strings into account which are declared as strings by the TLB, such as:

COM allows the partial transmission of array data. The COM to UNO mapping does not take this into account; instead, arrays are always transmitted as a whole.

The bridge uses the UNO type library to marshal calls from UNO to COM. Therefore, the used COM interfaces must be represented in that library. To resolve ambiguities and to make programming easier, the UNO TLB contains custom type information, such as pointer parameter, SAFEARRAYSs (mapped to any), etc.

 

2.2 Predefined and Base Types

MIDL

UNO IDL

(unsigned) boolean

boolean

byte

unsigned byte

signed char

byte

(unsigned) char

unsigned byte

double

double

float

float

(signed) hyper

hyper

unsigned hyper

unsigned hyper

signed __int32

long

unsigned __int32

unsigned long

(signed) __int3264

unsigned long, unsigned hyper (dependent on platform)

signed __int64

hyper

unsigned __int64 unsigned hyper
(signed) long long
unsigned long unsigned long
(signed) short short
unsigned short unsigned short
(signed) small byte
unsigned small unsigned byte
void [1]
(unsigned) wchar_t char

[1] void is allowed when used with iid_is. void* represents an interface pointer. There is no similar UNO idl type. See chapter 2.3.

 

2.3 void and the iid_is attribute

void* are used along with the iid_is attribute, in cases where interface pointers are passed whose type is not known until runtime, for example:
 
HRESULT IActiveScript::GetScriptSite(
                [in]                REFIID riid,
                [out, iid_is(riid)] void **ppvObject);

When this method is called from UNO, then the caller knows the Interface which is to be returned. In this case, the bridge knows the interface by the parameter riid, but it must be told what parameter holds this information. Unfortunately, the COM type library does not contain information concerning iid_is, hence it is up to the UNO programmer to provide this information.
Assuming the bridge knows what parameter contains the IID, then there is still the problem of mapping the IID to the UNO interface name. There is currently no way to obtain interface information with just an ID.

If one generates UNO type infos from a COM TLB, then the void* must be replaced by an UNO type. Because void* implies that iid_is is used, the COM2UNO type converter could create a function like this:

// UNO IDL – identifier in, interface out 
void GetScriptSite( [in] uik, [out] COM_UNKNOWN_TYPE);
...
struct COM_UNKNOWN_TYPE
{
	short posId; // identifies the parameter that contains the uik
	any value;
};

In MIDL, it is also possible to tag in parameters with the iid_is attribute; then the bridge would not need to know the parameter with the IID because the type is implicitly given.

// UNO IDL -IID in, interface in
void func( [in] uik, [in] any);

In this example, the any contains the interface along with its type description.
The IID can also be an out parameter:

// MIDL
HRESULT func([out] IID* refid, [out, iid_is( refid)]void**); 

In this case, the bridge also needs to know what parameter contains the IID, so it can map the interface after return.
There is still the possible case where both parameters are in/out parameters.

//MIDL
HRESULT func([in,out] IID* refid, [in,out, iid_is(refid)]void**);

This function can be treated like the case where both parameters are out parameters.

Here is a summary of reasonable combinations:

IID Interface Information necessary Possible UNO Type
in out IID parameter struct
in

in

no any

out

out

IID parameter struct

in,out

in,out

IID parameter struct

Unfortunately, the iid_is attribute is not only used with void pointer, but also with well known types which are base interfaces, for example, IUnknown. These examples can be found in system IDL files:

HRESULT ITypeFactory::CreateFromTypeInfo(
                [in] ITypeInfo *pTypeInfo,
                [in] REFIID riid,
                [out, iid_is(riid)] IUnknown **ppv
            );
HRESULT IOleInPlaceActiveObject::RemoteResizeBorder
(
        [in] LPCRECT prcBorder,
        [in] REFIID riid,
        [in, unique, iid_is(riid)] IOleInPlaceUIWindow *pUIWindow,
        [in] BOOL fFrameWindow
);
	

Because of this, pointers are ambiguous, they could be normal pointers or those attributed with iid_is. Therefore, one has to use an UNO type that can contain both: normal pointers and void*. The information on what parameter contains the type has to be merged into the struct used for pointers. See paragraph 2.16.

When a COM interface containing iid_is attributes is mapped to UNO, then the interface is implemented in an event sink object, and gets called from COM; then the bridge cannot convert the pointer to the corresponding UNO type. See paragraph 2.16, 2.17.
Only if the pointer is not void and it is an in parameter, then the bridge could create the proxy on UNO side.

2.4 Arrays

COM uses a variety of arrays, such as: fixed, varying, conformant, varying, and conformant arrays, as well as SAFEARRAYs and sized pointers.

Fixed Arrays

Fixed arrays have a predefined size and are used as in C.

[
    /*Attributes. */
]
interface MyInterface
{
    const long ARRAY_SIZE = 1000;

    void ARemoteFunc(char achArray[ARRAY_SIZE]);

    /* Other interface functions. */
}

Varying Arrays

Varying arrays have a fixed size; however, depending on additional arguments and attributes, only a part of the array is transmitted.

[
    /*Attributes*/
]
interface MyInterface
{
    const long ARRAY_SIZE = 1000;

    ARemoteFunc(
        [in] long lFirstElement,
        [in] long lBlockSize,
        [in, first_is(lFirstElement), 
          length_is(lBlockSize)] char achArray[ARRAY_SIZE]
    );

    /* Other interface functions */
};

Conformant Arrays

Conformant arrays can vary in size which is specified by a separate argument.


[
    /*Attributes are defined here. */
]
interface MyInterface
{
    ARemoteProc(
         long lArraySize,
         [size_is(lArraySize)] char achArray[*]
    );

    /* Other interface procedures are defined here. */
};

Sized Pointers


HRESULT Func1(
    [in] long n;
    [size_is(n)] long * plong); /* Specifies a pointer
                                  to an n-sized block of longs */
HRESULT Func2(
    [in] long n;
    [size_is( , n)] long ** pplong); /* Specifies a pointer 
                                       to a pointer to an n-sized 
                                       block of longs */
HRESULT Func3(
    [in] long n;
    [size_is(n ,)] long ** pplong); /* Specifies an
                                      n-sized block of pointers 
                                      to longs */
HRESULT Func4(
    [in] long m;
    [in] long n;
    [size_is(m,n)] long ** pplong); /* Specifies a pointer to an 
                                      m-sized block of pointers, each 
                                      of which points to an n-sized 
                                      block of long.*/
 HRESULT Func5(
     [out] long  * pSize,
     [out, size_is( , *pSize)] my_type ** ppMyType); /* Specifies a pointer 
                                              to a sized pointer, 
                                              which points to a block 
                                              of my_types, whose size is
                                              unknown when the stub 
                                              calls the server. */

Other Arrays

Arrays can be both conformant and varying.
OLE Automation provides the SAFEARRAY type.

Multidimensional Arrays

COM arrays can have multiple dimensions, for example:

HRESULT Proc2(  [in] short m;
    [in, size_is(m)] short b[][20]);  // If m = 10, b[10][20]

Arrays and the COM TLB

The array attributes, which specify a portion of an array that is to be transmitted, cause the MIDL compiler to generate a TLB with a parameter description that contains the VARTYPE VT_USERDEFINED. Those attributes are length_is, first_is, and last_is. The parameter description does not give any information about the fact that the parameter is an array nor does it indicate the element type. If those attributes are used along with an array "[]", then the TLB does not even contain the correct information. Instead, it contains a type with the name of the interface itself. But this is not an issue because the system interfaces use those attributes only with size pointers.

If the array has an attribute which specifies the size of the array, then the parameter description in the TLB contains the VARTYPE VT_PTR and the respective element type. Those attributes are size_is, min_is, and max_is. The type information does not give any clue about whether the parameter is an array.

Arrays can also be represented by pointers as in C, and the MIDL array attributes can be applied as well. The parameter description contains VT_PTR and the element types, but does not indicate an array.

When a SAFEARRAY is used, then the TLB contains that information. Unfortunately, it does not tell about the dimensions being used. That prevents a tool from generating types like sequence <sequence<xxx>> when a SAFEARRAY has two dimensions.

Multidimensional arrays whose dimensions all have a fixed size are contained in the TLB as multidimensional arrays with the respective dimensions and sizes. If the size of a dimension has been omitted in MIDL then the TLB contains a pointer instead, for example:

//MIDL
HRESULT func( [in]long _size, [in, size_is(m)]long ar[][10]);
// TLB generated function
HRESULT func( long _size, long ** ar);
Only the use of fixed arrays results in a proper parameter description. The parameter description then contains the type VT_CARRAY, the element type, count of dimensions, and their boundaries.

Mapping

It would be desirable to map MIDL arrays to UNO IDL sequences, because the sequence already contains the length; therefore, an additional parameter or member that is referenced by the size_is attribute could be saved. Also, it could be specified that, always, the whole array is transmitted, which would make parameters redundant which are referenced with the length_is, first_is, and last_is attributes, e.g., a COM interface defines a function:

HRESULT func( [in]short _length, [in]short _first, [in]short _last,  
              [in, size_is(_length),  first_is( _first), last_is( _last)] char ar[]);
The generated UNO type information could look like this ( C++ representation):

void func( [in] sequence<sal_Int8> ar) raises (SomeException);
If this method is called in UNO then the bridge must provide all additional parameters for the COM function. The size and length could be derived from the length of the sequence. Additionally, the bridge needs to know what parameter receives which value. Unfortunately, that information is not given because those array attributes are not contained in the COM type library.

If an array is not fixed then the parameter or member is described as pointer by the TLB. Because of the ambiguities caused by that, pointers are mapped to a special UNO type that can contain all those ambiguous types. See chapter Mapping for Pointers for a description of that type. The function in the example above would be mapped to this UNO function:
void func( [in]short _length, [in]short _first, [in]short _last, 
           [in] COM_Ptr ar) raises ( SomeException);
Although the COM TLB identifies SAFEARRAYs, it does not tell how many dimensions it has and what size they are; therefore, they cannot be mapped to sequences, instead, they are mapped to anys. The bridge can infer from the type description provided by the any that it has to convert the data to a SAFEARRAY.

Fixed arrays are identified as such by the COM TLB. They are mapped to UNO arrays of a fixed size.
Arrays Possible Mappings
pointer COM_Ptr
fixed array

array

SAFEARRAY

any

  Examples:
//MIDL
HRESULT func( [in] long ar[10], [in]short ar2[10][10]);
//UNO
void func( [in] long ar[10], [in]short ar2[10][10]);

//MIDL
HRESULT func( [in] long _size, [out, size_is( _size)]long* ar);
//UNO
void func( [in]long _size, [out] COM_Ptr ar);

//MIDL
HRESULT func( [in] SAFEARRAY(long) ar);
//UNO
void func( [in] any ar);

 

2.5 String

Strings can occur in various ways in COM interfaces. This specification only takes those strings into account that are identified as strings by the COM TLB. That excludes strings which are actually arrays as in counted arrays:
//Counted Strings
HRESULT func(
    [in,  length_is(count), size_is(STRSIZE)]    char  ar[],
    [in]                                        long  count);

/* counted string */ 
typedef struct 
{ 
    unsigned short size; 
    unsigned short length; 
    [size_is(size), length_is(length)] char string[*]; 
 } COUNTED_STRING; 
 
/* counted string with a fixed maximum length */ 
typedef struct 
{ 
    unsigned short length; 
    [length_is(length)] char string[100]; 
} COUNTED_STRING2; 
Strings are described as such only when they are either BSTRs or they have the string attribute set in the IDL description.

Strings can also be specified with the string attribute. When applied to one-dimensional arrays of char, byte, or wchar_t then these arrays are treated as strings. The proxy code generated by the MIDL compiler determines the length of the array automatically. When using C, the arrays have to be concluded by a null.

Mapping

Ascii and unicode strings are mapped to UNO strings, only when they were declared with the string attribute. That is, the COM TLB contains the information that they are in fact strings.
BSTRs are automatically contained in the TLB and can therefore be mapped.

MIDL UNO IDL
ascii string ( char*, byte*, small*) string
wide character string (wchar_t*)

string

BSTR

string

 

2.6 typedef

Typedefs are very similar to C typedefs. The MIDL compiler generates headers which use the defined types and includes the typedef statement.

The generated TLB only contains information about typedefs if the typedef'ed type itself can be represented by an ITypeInfo interface. What types those are is stipulated be TYPEATTR::typekind.
typedef enum tagTYPEKIND {
      TKIND_ENUM = 0
    , TKIND_RECORD
    , TKIND_MODULE
    , TKIND_INTERFACE
    , TKIND_DISPATCH
    , TKIND_COCLASS
    , TKIND_ALIAS
    , TKIND_UNION
    , TKIND_MAX
} TYPEKIND;

A type is a typedef whenever the member TYPEATTR::typekind has the value TKIND_ALIAS. Then the TYPEATTR struct and ITypeInfo interface provide a means to obtain a type description of the original type.

Whenever the COM TLB contains a typedef (TKIND_ALIAS) then it is an alias for types which are usually not typedef'ed in UNO IDL. This is because MIDL uses C syntax and hence typedef's: structs, enums, and unions. Those typedefs are not mapped.

2.7 Union

MIDL knows two union types, encapsulated unions and nonencapsulated unions. The MIDL compiler generates a struct from an encapsulated union that contains a discriminant and the union. Nonencapsulated unions need an additional parameter or member that contains the discriminant. The discriminant is referred to by the switch_is keyword that is attributed to the union.


//Microsoft IDL
//encapsulated union
typedef union   My UNION switch (long l) Type
{
case 1: char a;
case 2: short b;
case 4: long c;
default: foo obj;
}My_UNION;

// nonencapsulated union
typedef [switch_type(short)] union _UNION_TYPE 
{ 
    [case(24)] 
        float _a; 
    [case(25)] 
        double _b; 
    [default] 
        ; 
} UNION_TYPE; 

typedef struct _SOME_TYPE 
{ 
    [switch_is(aNumber)] UNION_TYPE w; 
    short aNumber; 
} SOME_TYPE; 

The COM TLB does not contain the switch_is attribute.

Mapping

Encapsulated unions are contained in the COM TLB, as a struct with a member that acts as discriminant and another member that is a union. Because a union always needs a discriminant so that MIDL can produce proper marshaling code, one can infer that a struct with a union and only one additional member is an encapsulated union.

// UNO representation of the MIDL example above:
//UNO 
union My_UNION switch (long) 
{
  case 1: char a;
  case 2: short b;
  case 3: long c;
  default: foo obj;
};

A nonencapsulated union is contained in the type library as a union. The TLB does not contain the information of what member or parameter contains the discriminant (switch_is attribute). Therefore, the COM2UNO converter cannot omit the discriminant member or parameter. The union itself is mapped to an any. The bridge knows when a parameter or member is a union (custom type information), and can map the any appropriately. Example:
//MIDL 
typedef [switch_type(long)] union _A_UNION
{
  [case(1)]
     float a;
  [case(2)]
     double b;
} A_UNION;


HRESULT func( [in]long _type,[in, switch_is( _type)] A_UNION u);

// UNO
void func( [in] long _type, [in] any u);

Mapping:
MIDL UNO IDL
encapsulated union union
nonencapsulated union

any

 

2.8 Constants

The COM TLB does not contain information about const values. Instead, those values are directly used wherever applied in MIDL, for example, a fixed array. Therefore, there is no mapping required.

2.9 Enumerators

Enumerations in MIDL are much alike UNO enumerations, because MIDL uses a C like syntax, the enums are typedef'ed .

typedef enum{...} ENUM_TYPE; or
typedef enum _tagXXX{...} ENUM_TYPE;

Mapping:

//MIDL	
typedef enum _MyEnum{a,b,c} MyEnum;
typedef enum {a,b,c} MyEnum

are mapped to 
//UNO IDL
enum MyEnum
{ 
  a,
  b, 
  c
};

2.10 Struct

MIDL structures can be mapped to UNO IDL structures. The members are mapped according to the rules for the mapping of those members.

Structs are often declared within a typedef statement.
typedef struct{ ....} MYSTRUCT;
typedef struct _MYSTRUCT{...} MYSTRUCT;
A COM TLB only contains a typedef if the first statement is used.

Mapping:

// Microsoft IDL
typedef struct
{
  T0 m0;
  Tl ml;
  ...
  TN mN;
} STRUCTURE;

typedef struct _tagSTRUCTURE
{ 
  T0 m0;
  Tl ml;
  ...
  TN mN;
} STRUCTURE;

struct STRUCTURE
{
  T0 m0;
  Tl ml;
  ...
 TN mN;
};


//UNO
struct STRUCTURE
{
  T0 m0;
  Tl ml;
  ...
 TN mN;
};

2.11 Pointers

MIDL uses three different pointer types: reference pointers, unique pointers, and full pointers, which are used for in, out, and in/out parameter. See the chapter about parameters for more details.
Because of the ambiguity between pointers and arrays, in parameters cannot be simplified for UNO. Say there is a COM function:

HRESULT func( [in] long * l);
The optimum UNO mapping would be:

void func( [in] long l);
Instead one needs to substitute the long by a type that can hold all possible types that are described as pointers by the COM TLB.

void func( [in] COM_PTR l);
The COM_Ptr type contains flags which indicate the meaning of the data, for example, a long** could mean:

One certainly does not have to supply all the information, because the bridge could use some logic and inference rules to take some of the load off the programmer.

As mentioned, COM uses three different pointer types with different characteristics. The ptr pointer ( ptr is the attribute used in MIDL) is the one that comes closest to the normal C pointer, in that it can be null, it can change the value during the call, it can have aliases, etc. The latter is the most important for the bridge because it has to ensure this behavior even after the conversion of parameters. Let us assume there is a COM function:

HRESULT func( [in] BSTR aString, [in,ptr] wchar_t* pCurrentPostion);
The pCurrentPosition pointer would point to a position within the string. The corresponding UNO function would look like:

void func( [in] string aString, COM_Ptr pCurrentPostion);
A COM_Ptr would possibly contain an any that holds all the different types. It does not need a flag indicating that the bridge has to interpret the data as pointer, because the bridge knows that, from custom UNO type information and by the fact that a COM_Ptr type has been used.
When the function is called, the program would set the any so that it points to a certain position within the string. As the bridge receives the call, it would usually convert the UNO string into a BSTR and the COM_Ptr value into a pointer. If the bridge does not know about the correlation between those parameters then the converted pointer would not point into the string any more. That is because the BSTR is created by the system, so that the pointer to the string data changes completely.

The bridge therefore, needs to be informed about a possible correlation between parameters. The programmer could put additional data into the COM_Ptr in terms of which other parameters are affected. This has the drawback that the programmer has to care about another detail, and that the programmer supplied information is not provided whenever a COM interface is implemented in UNO and gets called from COM. Therefore, the better solution is that the bridge examine the parameters and finds those which are referenced by the pointer. During the parameter conversion, the bridge can take this into account.

 

2.12 OLE Automation Types

MIDL UNO IDL
boolean boolean
unsigned char

unsigned char

double double
float float
int long, hyper (system dependent)
long long
short short
BSTR string
CY hyper
DATE double
SCODE long
enum long, hyper (system dependent)
IDispatch* COM_IDispatch
IUnknown* COM_IUnknown
SAFEARRAY(type) any
VARIANT any
VARIANT_BOOL boolean

 

2.13 GUID

The GUID has a similar construct in UNO where it is named Uik and has a slightly different layout. In both environments, the id's are unequivocal and their creation is even based on the same tool. The mapping is shown below:
typedef struct _GUID
{
    DWORD Data1;
    WORD  Data2;
    WORD  Data3;
    BYTE  Data4[8];
} GUID;


struct Uik
{
	unsigned long Data1; 
	unsigned short Data2;
	unsigned short Data3; 
	unsigned long Data4; 
	unsigned long Data5; 
}; 
The fields: GUID::Data1 ... GUID::Data3 have direct counterparts: Uik::Data1 ... Uik::Data3. The first four elements of GUID::Data4 are mapped to Uik::Data4 and the remaining four bytes, GUID::Data4[4] – GUID::Data[7] are mapped to Uik::Data5.

2.14 Interface

Naming Conventions

Mapped COM interfaces are decorated with a COM_ prefix, for example, ISomeInterface -> COM_ISomeInterface.
Function and parameter names remain the same.

Functions

Mapped COM functions have the same name and parameter names; however, the return type changes, and the parameter types may change as well.

COM functions return a HRESULT. This type contains error and success codes for a variety of scenarios. Because HRESULTs not only contains error codes one cannot map them to exceptions.

Often a parameter has the attribute retval, which means that the parameter is the actual return value. If the function is used by Visual Basic or one uses some kind of wrapper, as provided by the #import statement, then that special parameters acts, in fact, as the return value.

The parameter types are mapped to UNO types according to the specification.

COM interface functions are called with the __stdcall calling convention. The mapped UNO function, however, is called with __cdecl calling convention. The bridge is in charge of realizing the "mapping" of the calling convention at runtime.

The Base Interface

A mapped COM interface must have all the characteristics of an UNO interface. This means, that it has to inherit from XInterface, because XInterface provides the same functionality as IUnknown: it acts as a substitute for IUnknown.
// COM
interface IUnknown
{
    typedef [unique] IUnknown *LPUNKNOWN;
    HRESULT QueryInterface(
        [in] REFIID riid,
        [out, iid_is(riid)] void **ppvObject);
    ULONG AddRef();
    ULONG Release();
}
When an UNO client calls XInterface::queryInterface then the bridge converts the Type parameter into the IID. The IIDs are kept in the UNO type library as part of the interface description. That IID is basically the same ID as the one one obtained from the COM TLB.

On return, the out parameter ppvObject is mapped to the corresponding UNO interface. If the HRESULT is any other than S_OK then the XInterface::queryInterface raises a com::sun::star::uno::RuntimeException.

IUnknown::AddRef and IUnknown::Release are mapped to XInterface::acquire and XInterface::release. When acquire or release are called, then the bridge ignores the return values of the corresponding AddRef and Release functions.

Inheritance

UNO, as well as, COM interfaces support single inheritance. Whenever a COM interface inherits an interface then the mapped UNO interface does the same. The inherited interface must be mapped as well.

IDispatch

IDispatch is a very special interface in that it realizes dynamic invocation, in other words, scripting. UNO offers a similar interface, XInvocation. One possible mapping would be to map IDispatch directly to XInvocation as is already done by the OLE bridge. But then, the question comes up, what about dual interfaces and whether they have to inherit XInvocation? Seen from the UNO perspective, this is not necessary because UNO provides an invocation service that automatically creates an invocation object from any given UNO object.

Mapped COM interfaces could be used in event sink implementations, that is, they get called from COM, then they must have a mapping of IDispatch because many COM components require their source interfaces ( the callback interface) to be dispatch interfaces.

The table summarizes the importance of a mapping of IDispatch:
Scenario Mapping required
pure Automation component yes
dual interfaces ( no source)

no

source interface yes

This matter is still worth discussing. As for now, the IDispatch is mapped as it is and according to this specification.

 

2.15 In, Out, In / Out Parameters

This chapter shows what information is needed for the bridge to do a proper conversion of all the different parameter types.

In Parameters

COM in parameters can be recognized by the PARAMFLAG_FIN flag in the TLB. The major difference to UNO in parameters is that pointers can be used. Those pointers can act as sized pointers, in other words, they are arrays. In parameter which are not pointers can be mapped as specified for the respective COM types.

The bridge converts UNO parameters and creates pointers to the converted values, as necessary. The knowledge of what the level of indirection is can be obtained from the COM TLB (custom type information). Because of the ambiguity between pointers and arrays, one has to pass a special UNO construct, that can act as those types. That construct could be, for example, an any, so the type (array or not) is implicitly given. There is no need to pass information about the fact whether the value is a pointer, because the bridge knows that from the COM TLB. The bridge might still need additional information, for example, whether the pointer is a ptr pointer. Then one needs to define a struct:

struct COM_Type
{
	long flags; //contains pointer type (ref, unique, ptr
	any value;
};
All in parameters, except for pointers or arrays are mapped as specified. Pointers and arrays have to be mapped to the same UNO type ( COM_Ptr), because of the ambiguities inherent to the COM TLB.

In/Out Parameters

In/out parameters have the flags PARAMFLAG_FIN and PARAMFLAG_FOUT set. For conversions, the bridge has to convert parameters in both directions. Hence, the considerations for in as well as for out parameters apply. Unlike pure out parameters, in/out parameter can be caller allocated, even if the pointer has two and more levels.
HRESULT func8([in]long _size, [in, out, size_is(_size)]long ** _ar);

 

Level of Indirection Caller Allocated Callee Allocated Additional Info necessary
1 yes no No. (A sequence must have allocated memory).
2 and more yes yes

position of size parameter, when type is array,

Callee allocated, Caller frees memory


The datatype for in/out parameters could look like this:

// example for an eligible structure
struct COM_Type
{
	long flags; // for example, FREE_MEM 
	long posSize; // identifies the parameter that contains the size of 
						// the array, default is 0
	any value;
};

Example:
1. Caller allocated
// MIDL 
HRESULT func( [in]long _size, [in,out]long** _ar )
// COM
long arLong[10];
long* arPLong[10];
for( i= 0; i < 10; i++)
{
	arPLong[i]= &arLong[i];
	arLong[i]= 0;
}
hr= object->func( 10, (long**)arPLong);

// UNO
COM_Type outparam1;
outparam1.flags= 0; // means caller allocated
outparam1.outSize= 1; 	// left -most parameter is #1
Sequence<sal_Int32> ar( 10); //reallocated by bridge
// filling the sequence with long values, the bridge has to create an array of long*[10]
...
outparam1.value= makeAny( ar);
object->func( 10, outparam1);

Out Parameters

The COM specification says: out parameters are allocated by the caller and freed by the callee. The bridge acts, then, as a caller and thus has to free the results. This statement is quite confusing, because lots of COM interfaces, as can be found in the include directory of Visual Studio, pass arrays as out parameters and have the function fill in the data. That filling in could hardly be called allocating data. An example:
HRESULT ITypeInfo::GetIDsOfNames(
             [in] REFIID riid,
             [in, size_is(cNames)] LPOLESTR * rgszNames,
             [in] UINT cNames,
             [in] LCID lcid,
             [out, size_is(cNames)] DISPID * rgDispId);

Out parameters are marked with the PARAMFLAGS_FOUT flag. That flag allows us to produce appropriate out parameters in UNO. Depending on the specification of the COM interfaces, the bridge has to consider releasing the out parameters. The bridge, either allocates memory for the type and passes a pointer to the COM function, or it allocates memory for a pointer to the type and passes a pointer to that pointer to the function. If a high level pointer is required then the bridge always allocates memory for the type in memory and passes a pointer.

Another issue is arrays, which are allocated by the callee and whose size is passed as an out parameter; that is, the caller does not know the size until return of the function:
HRESULT func( [out]long* _size, [out, size_is( ,*_size)]long** ar); 
The bridge cannot figure out what parameter contains the size, because this information cannot be obtained from COM TLBs; therefore, the bridge must be explicitly told what parameter contains the size of the array.

If a pointer of level two or higher is used, then the bridge cannot recognize whether it has to supply the memory or only a pointer; therefore, we need an UNO construct that is used as an out parameter and contains the information.

We could use a structure that contains a member which indicates who is responsible for freeing the memory.
struct COM_Type
{
	long flags; // for example, FREE_MEM
	long posSize; // identifies the parameter that contains the size of 
						// the array, default is 0
	any value;
};
COM_Type::value contains the actual ou tparameter, which is provided by the programmer and is assigned to the out value by the bridge.

In the case an array is passed by reference, then the sequence must have the proper length. This length is used by the bridge to determine the size of the memory which has to be copied from the out parameter into the sequence.

Examples:
1. out parameter is *
1a)

// MIDL
HRESULT func( [out]long* l);

// COM usage
long l;
object->func( &l);

// UNO usage
COM_Type outparam;
sal_Int32 lvalue;
outparam.posSize= 0; 
outparam.value= makeAny( lvalue);
object->func( outparam)

1b) long (*)[], caller allocated. The bridge needs to know the size, therefore the Sequence's length must be set.
//MIDL
HRESULT func( [in]long _size, [out, size_is(_size)]long * ar);
//COM
long ar[10];
object->func( 10, (long*) ar);
// UNO
COM_Type outparam;
Sequence<sal_Int32> seq(10);
outparam.posSize= 0;
outparam.value= makeAny( seq);
object->func( 10, outparam);

2. out parameter is **
2 a) long, callee allocated and caller ( the bridge) must free the memory.
// MIDL
HRESULT func( [out]long** l);
// COM 
long* pLong= NULL;
object->func( &pLong);
CoTaskMemFree( pLong);

// UNO
COM_Type outparam;
outparam.flags= FREE_MEM;
outparam.posSize= 0;
sal_Int32 lvalue;
outparam.value= makeAny( lvalue);
object->func( outparam);

2 b) long [], callee allocated and caller must free the memory, the size of the array is not known before the call
//MIDL
HRESULT func( [out]long* _size,[out, _size_is( , *_size)]long** par);
//COM
long _size=0;
long * ar= NULL;
object->( &_size, &ar);
CoTaskMemFree( ar);

//UNO
COM_Type outparam1;
outparam1.flags=0;
outparam1.posSize=0;
sal_Int32 lsize;
outparam1.value= makeAny( lsize);

COM_Type outparam2;
outparam2.flags= FREE_MEM;
outparam2.outSize= 1; 	// left -most parameter is #1
Sequence<sal_Int32> ar; //reallocated by bridge
outparam2.value= makeAny( ar);
object->func( outparam1, outparam2);

2c) long[][], caller allocated multidimensional array
//MIDL
HRESULT func( [in] long _size, [out, size_is(_size)]long ar[][10]);

//COM
long ar[5][10];
object->( 5, (long**)ar); 

//UNO
outparam1.flags=0;
outparam1.posSize=0;
Sequence < Sequence<long> > ar;
....// reallocate all Sequences to have the proper size
outparam1.value= makeAny( lsize);
object->func( 5, outparam);


The table shows when a value might be caller or callee allocated dependent on the level of indirection of the pointer, described in the COM TLB.

Level of Indirection Caller Allocated Callee Allocated Additional Info necessary
1 yes no No. (A sequence must have allocated memory).
2 and more yes yes

position of size parameter, when type is array and callee-allocated

 

2.16 Mapping for Pointers

This mapping combines the mapping of pointers which can be used as in, out, and in/out parameters. The UNO type that is mapped to, must contain the following information:


// UNO IDL
struct COM_Ptr
{
	long flags;
	short attrPos; // position of size_is or iid_is attribute 
	any value;
};


constants COM_Ptr_Flags
{
	const long PTR_POINTER= 0x1;
	const long FREE_MEM= 	0x2;
	const long IID=	0x4;
	const long SIZE= 0x8;
};

COM_Ptr is used for C arrays of varying length and pointers.

The meaning of the COM_Ptr_Flags is the following:

The members of COM_Ptr:

When size_is used then the COM_Ptr acts as in/out or out parameter where arrays are allocated by the callee.

2.17 Restrictions with COM interfaces implemented in UNO

COM can have parameters which are only useful with another parameter, that is, they, somehow, bear information about that parameter which is only known at runtime. An example is the C -array whose size is not known at compile time. To realize a mapping of those functions to UNO one has to employ special UNO types that carry some additional information.

//MIDL
HRESULT func( [out]long *_size, [in, size_is( ,_size)]long**);
Because the dependencies declared by the size_is attribute are not contained in the COM TLB, one cannot combine both parameters in UNO (a sequence provides its length itself). The corresponding UNO function would look like

void func( [out] COM_Ptr _size, [out] COM_Ptr _p);
The COM_Ptr type is a struct with a dditional information for the bridge. _p would contain the information that parameter one represents the size of the array. The point is, that the UNO programmer has to supply this info, and there is no other way for the bridge to obtain that info ( except one would parse the MIDL file).

Now lets say that a COM object is being accessed by an UNO client and that the object supports events. In other words, the client could register an event sink with the object. The sink object needs to implement the COM interface that is described by the COM TLB ( attribute source in the coclass section). The client provides the sink implementation, which implements the COM interface as a mapped UNO interface. The sink object, therefore, is pure UNO. After registering, the COM object can fire events which results in calls to the UNO event sink interface. Then the bridge converts an array ( long** to a COM_Ptr type, but it does not know the size of the array, because it cannot know what parameter contains the size.

Usually, COM event interfaces rarely use out parameters, but it might be feasible.

The following types do not work with event interfaces:
MIDL UNO
void** COM_Ptr
type** ( callee allocated array, out parameter) COM_Ptr

no

 

3 UNO - COM

3.1 Conventions

A COM programmer usually needs header files with type declarations to program a component. The header files are generated either by the MIDL compiler or through the #import statement. To get headers from UNO types, one could either create them directly from the UNO type library, or one creates first a COM TLB. A COM TLB would allow for the UNO components to be used from several programming languages, including the .NET languages; therefore, the first specification approach is to define mappings from the UNO TLB to the COM TLB.

 

3.2 Base Types

UNO IDL MIDL
char wchar_t
boolean

boolean

byte char
short short
unsigned short unsigned short
long long
unsigned long unsigned long
hyper hyper
unsigned hyper unsigned hyper
float float
double double

 

3.3 Constants

UNO constants cannot be mapped to COM constants by generating a COM TLB.

MIDL allows one to declare static or const members of interfaces, and the generated header, in fact, contains a static variable or a define, but the TLB does not contain that information. The only way to declare constants would be to use a module, but that does not correspond to the use of modules along with COM components. Moreover, the #import statement apparently only evaluates the coclass entry which cannot contain a reference to the module.

 

3.4 Enumerators

UNO enumerators are mapped to MIDL enumerators. The namescape is merged into the enumeration name.
//UNO
module com {  module sun {  module star {  module text {  
enum WrapTextMode
{ 
	NONE, 
	THROUGHT, 
	PARALLEL, 
	DYNAMIC, 
	LEFT,  
	RIGHT 
}; 
}; }; }; };  

// MIDL
typedef enum _com_sun_star_text_WrapTextMode
{ 
	NONE, 
	THROUGHT, 
	PARALLEL, 
	DYNAMIC, 
	LEFT,  
	RIGHT 
}com_sun_star_text_WrapTextMode;

3.5 String

An UNO string is mapped to a BSTR.

3.6 Struct

Structs are mapped to MIDL structs, according to the specification of each element. The name of MIDL struct contains the namespace.

//MIDL 
typedef struct _com_sun_star_uno_SomeStruct
{
	long a;
	double b;
} com_sun_star_uno_SomeStruct;

3.7 Union

UNO unions are mapped to encapsulated unions. ( UNO unions are currently not specified, but might be in the near future.)

//UNO
module com {  module sun {  module star {  module uno { 
union SomeUnion switch (long)
{
	case 1: long a;
 case 2:	double b;
	default: byte[8];
}
}; }; }; };  


//MIDL
typedef union _com_sun_star_uno_SomeUnion switch (long)
{
	case 1:long a;
	case 2: double b;
	default: char[8]
}com_sun_star_uno_SomeUnion;

3.8 Sequence

Sequences are mapped to SAFEARRAYs. A sequence that contains another sequence is mapped to a SAFEARRAY with two dimensions; a sequence that contains sequences which in turn contain sequences is mapped to a SAFEARRAY of three dimensions; and so on.

3.9 Arrays

Arrays are mapped to SAFEARRAYs. Multidimensional arrays are mapped to SAFEARRAYs, with as many dimensions.

3.10 Any

anys are mapped to VARIANTs.

3.11 Uik

Uiks are mapped to GUIDs.
struct Uik
{
	unsigned long Data1; 
	unsigned short Data2;
	unsigned short Data3; 
	unsigned long Data4; 
	unsigned long Data5; 
}; 

typedef struct _GUID
{
    DWORD Data1;
    WORD  Data2;
    WORD  Data3;
    BYTE  Data4[8];
} GUID;
The members Uik::Data1 ... Uik::Data3 are mapped to GUID::Data1 ... GUID::Data3, and the members Uik::Data4 and Uik::Data5 are mapped to GUID::Data4.

3.12 Typedef

Only typedefs for

can be mapped to COM TLB typedefs. All other typedefs must be converted so that the new type is substituted by its original type.

In fact, structs, enumeration, and unions should always be typedef'ed in the COM TLB. That is because the #import statement creates C code as well.

3.13 Exceptions

COM offers two way of reporting errors, HRESULT or error handling interfaces. One is free to define ones own HRESULTs but that bears the risk of possible clashes when code from different contributors is mixed.

The error handling interfaces instead offer interface based error information. To support this error reporting mechanism, the bridge must provide implementations for ISupportErrorInfo and IErrorInfo. Whenever a mapped UNO interface is queried for ISupportErrorInfo then the bridge has to return that interface.

3.14 Interfaces

Naming Convention

A mapped UNO interface keeps it name except for the leading "X" which is substituted by an "I". The namespace is not reflected by the name. That poses no risk of ambiguities because COM interfaces are identified by their GUIDs. The GUID is created out of the Uik belonging to the interface.

Functions

When an interface is mapped, then it's function names, as well as the parameter names, remain the same.

The return value is mapped to an out parameter, and exceptions are mapped to error codes or, if there is no error code specified, they are mapped to E_FAIL.
//UNO
long func( [in]short a, [out] string s ) throw ( SomeException),

//MIDL
HRESULT func( [in]short a, [out] BSTR* s, [out, retval] long ret);
The retval attributes indicates that the parameter is the return value. The parameters are mapped according to their respective specification.

During run time, the bridge has to map the calling convention from __stdcall to __cdecl when a call from COM to UNO is made.

Base Interface

A mapped UNO interface inherits IUnknown as does every other COM interface. XInterface is not mapped; instead, the IUnknown takes over all the functionality that XInterface provides.
// UNO
[ uik(E227A391-33D6-11D1-AABE00A0-249D5590), ident( "XInterface", 1.0 ) ]
interface XInterface
{
		any queryInterface( [in] type aType ); 
 		[oneway] void acquire(); 
 		[oneway] void release(); 
}; 
The bridge maps calls on IUnknown::QueryInterface to XInterface::queryInterface; IUnknown::AddRef to XInterface::acquire; and IUnknown::Release to XInterface::release. While the latter two mappings are straightforward, the mapping of QueryInterface requires a conversion of the COM IID ( GUID) to a Type. Although every UNO interface has an Uik which is equal to the IID of a mapped UNO interface, there is currently no way to obtain type information with just the Uik. The tool that creates the COM TLBs from the UNO type library needs to create additional information that can be used by the bridge to map GUIDs to interface names which can then be used to obtain type information.

On return, the return value is being mapped to the interface that has been queried for. In case the any contains a type of TypeClass_VOID then QueryInterface returns E_NOINTERFACE.

Inheritance

COM interfaces support single inheritance. When an UNO interface is mapped that inherits another interface, then the mapped interface does the same.

//UNO 
interface XAnUnoInterface: XOtherUnoInterface
{
...
};

//MIDL
interface IAnUnoInterface: IOtherUnoInterface
{
...
};

XInvocation

XInvocation is special in that it is used for scripting. There could be a direct mapping to IDispatch as it is realized by the OLE bridge. This matter needs some further evaluation. But for now, the interface is mapped as it is.

3.15 In, Out, In / Out Parameters

In parameter are mapped so that they are passed by value.

Out and In/Out parameters are passed by reference, for example:
//UNO
void func( [in] char a, [in,out] double b, [out] short c);

//MIDL
HRESULT func([in]char a, [in,out] double* b, [out]short* c);
However, there are some COM types which can only be allocated by the callee when used as out parameters:

Example:
//UNO 
XSomeInterface func( [in,out]string s, [out] sequence   s); 

//MIDL 
HRESULT func( [in,out]BSTR* s, [out]SAFEARRAY(long)*, [out,retval] 
              ISomeInterface** ret); 

// C++ usage 
BSTR s= SysAllocString(L"example"); 
SAFEARRAY*  ar; 
ISomeInterface* iface; 
func( &s, &ar, &iface); 
//freeing SysFreeString( s); 
SafeArrayDestroy( ar); 
iface->Release(); 
Currently, it is still unclear how UNO arrays are handled when used as an out parameter.


Author: Joachim Lingner ($Date: 2002/01/30 09:08:36 $)
Copyright 2001 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, CA 94303 USA.