Ruby

World Wide Web Consortium Working Draft 22-March-1999

This version:: http://www.w3.org/TR/1999/WD-ruby-19990322
Latest version:: http://www.w3.org/TR/WD-ruby
Also available for local browsing as a Zipped archive
Previous version:: http://www.w3.org/TR/1998/WD-ruby-19981221
Editor:: Marcin Sawicki (Microsoft)

Additional contributors:: Martin Dürst (W3C); Masayasu Ishikawa (W3C); Chris Wilson (Microsoft)

Status of This Document

This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. The W3C will not allow early implementation to constrain its ability to make changes to this specification prior to final release. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C Working Drafts can be found at http://www.w3.org/TR.

This W3C Working Draft is published by the Internationalization Working Group (members only). In a future version, this work is intended to be submitted to the HTML Working Group (members only) for inclusion in the next version of HTML.

There are a number areas in this document, such as ruby tag naming and detailed ruby structuring, that are currently under investigation and will be discussed during the next Working Group meeting. As is characteristic of a W3C working draft, all the proposed tag naming and structure in this document are subject to change.

Please send comments and questions regarding this document to i18n-editor@w3.org (archived for W3C members). Comments in languages other than English, in particular Japanese, are also welcome.

Abstract

The HyperText Markup Language (HTML) is a simple markup language used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of applications. The following specification extends HTML to support ruby text typically used in East Asian documents. Familiarity with both HTML 4.0 [HTML4] and XML-ized HTML [XHTML] is assumed.

1. Introduction

East Asian typography contains structural elements that are not yet exposed in HTML and thus impossible to achieve on the Web without using special workarounds or graphics. One such element is ruby text. The "ruby" is the commonly used name for a run of text that appears in the immediate vicinity of another run of text, referred to as the "base", and serves as an annotation or a pronunciation guide associated with that run of text. Ruby, as used in Japanese, is described in JIS-X-4051 [JIS].

The font size of ruby text is normally half the font size of the base. The name "ruby" in fact originated from the name of the 5.5pt font size in British printing, which is about half the 10pt font size commonly used for normal text.

There are several positions where the ruby text can appear relative to its base.

Ruby text normally appears alongside the base. It is most frequently placed above the base in horizontal layout.

Figure 1.1.1: Top ruby in horizontal Japanese

Sometimes, especially in educational texts, ruby may appear below the base.

Figure 1.1.2: Bottom ruby in horizontal layout applied to Japanese text

In vertical layout, ruby appears on the right side of the vertical column of text if it appears on top in horizontal layout. The layout flow of the ruby text is the same as that of its base, that is vertical if the base is vertical, or horizontal if the base is horizontal.

Figure 1.1.3: Top ruby applied to vertical Japanese

Ruby text appears on the left side of the base in vertical layout if it appears below it in horizontal layout. This setting is mainly used in traditional texts.

Figure 1.1.4: Bottom ruby in vertical-ideographic layout applied to Japanese text.

Sometimes, two runs of ruby text may be applied on the same base: one below and one above.

Figure 1.1.5: Ruby applied below and above a line of Japanese text

In certain scenarios, however, ruby text may appear as inline text immediately following the base. The inline form serves only as a fallback for limited resolution display media or older software.

When the ruby text appears inline, it is enclosed within parentheses. The parentheses however are used only inline. There are no parentheses around ruby text that runs alongside the base.

Figure 1.1.6: Inline ruby applied to horizontal Japanese

Note however, that using parentheses for the fallback may lead to confusion between runs of text intended to be ruby text and others that happen to be enclosed within parentheses. The author should be aware of the potential for that confusion and is advised to choose an unambiguous delimiter for the fallback, if this is a concern.

This document introduces a ruby model in HTML using the new ruby element servings as the container for the rb, rt and rp elements. The rb element contains the base characters of the ruby. The rt contains the ruby text and the rp elements contains the parenthesis characters used in the fallback case. The rb element may contain another ruby element. The author can achieve the case of ruby text appearing both below and above the same base by nesting a ruby element inside the rb element of another ruby.

Only the structure of the ruby element is discussed in this document. For example, the following ruby:

Figure 1.1.7: Top ruby applied to horizontal English

can be represented using the following markup in XML-ized HTML [XHTML], which is the full form:

<ruby><rb>WWW</rb><rp>(</rp><rt>World Wide Web</rt><rp>)</rp></ruby>

or using the following markup in SGML HTML, where it is possible to omit the end tags of rb, rt and rp:

<ruby>WWW<rp>(<rt>World Wide Web<rp>)</ruby>

If the author is not concerned about displaying ruby inside of parentheses in older UA's or he is working in generic XML environment that supports CSS2 style sheets, he may skip the rp element altogether:

<ruby><rb>WWW</rb><rt>World Wide Web</rt></ruby>

The parentheses in such a case can be generated using the 'content' property of the :before and :after pseudo-elements [CSS2].

This document only defines ruby markup. Formatting properties for styling ruby are under development not for HTML/XML, but for CSS/XSL. See [I18N-FORMAT].

This proposal does not include markup to allow to intelligently distribute the base text and the ruby text over two lines when the whole construct would need to be hyphenated. Authors are advised to limit the length of the base text where possible. The problem of whether and how to provide additional structure for advanced formatting needs is currently under investigation.

2. Ruby elements

This section contains the ruby DTD and the specification of the functionality of the ruby elements. Two DTD versions are given for each tag. The first one is in SGML [HTML4]. The second one is in XML-ized HTML [XHTML]. Note that in the SGML DTD, elements and attributes are intended to be case-insensitive, however, in the XML DTD, elements and attributes are case-sensitive. And also, start and end tags are always required in XML.

For convenience, the following parameter entities are used:

<!-- %Inline; covers inline or "text-level" elements -->
<!ENTITY % Inline "(#PCDATA | %inline; | %misc; | ruby)*">

<!-- %ruby.content; is %Inline; without ruby -->
<!ENTITY % ruby.content "(#PCDATA | %inline; | %misc;)*">

<!ENTITY % attrs "%coreattrs; %i18n; %events;">

Further definitions can be found in [XHTML].

2.1 The `ruby` element

<!-- SGML DTD: container for ruby elements -->
<!ELEMENT ruby - - (rb, rp?, rt, rp?)>
<!ATTLIST ruby %attrs; >

Start tag: required, End tag: required

<!-- XML DTD: container for ruby elements -->
<!ELEMENT ruby (rb, rp?, rt, rp?)>
<!ATTLIST ruby %attrs; >

The ruby element serves as the container for the rb, rp and rt elements only. It provides the structural association between the ruby base and its ruby text.

The ruby element does not accept any attributes other than the standard ones, such as id, class or style.

In this simplest example, ruby "aaa" is associated with base "AA":

<ruby>AA<rt>aaa</ruby>

Figure 2.1.1: SGML usage of the ruby element

In XML-ized HTML, the above example would be:

<ruby><rb>AA</rb><rt>aaa</rt></ruby>

Figure 2.1.2: XML usage of the ruby element

2.2 The `rb` element

<!-- SGML DTD: container for ruby base -->
<!ELEMENT rb O O %Inline; >
<!ATTLIST rb %attrs; >

Start tag: optional, End tag: optional

In SGML, neither the opening nor the closing tags are required, which means that any text in the ruby element that is not enclosed within an rt or an rp element belongs to the rb element. The rb element is automatically closed by an rt or an rp element.

<!-- XML DTD: container for ruby base -->
<!ELEMENT rb %Inline; >
<!ATTLIST rb %attrs; >

The rb element is the container for the text of the ruby base. Any content, including other "nested" ruby elements, is valid inside of rb.

For example, the following markup, utilizing CSS from "International Layout in CSS" [I18N-FORMAT] may be used to associate two ruby texts with the same ruby base:

<ruby STYLE="ruby-position: above">
  <rb>
    <ruby STYLE="ruby-position: below">
      <rb>KANJI</rb>
      <rt>kana-bottom</rt>
    </ruby>
  </rb>
  <rt>kana-top</rt>
</ruby>

Figure 2.2.1: Ruby markup to achieve both top and bottom ruby on the same base.

The markup above would be rendered as:

 kana-top
   KANJI
kana-bottom

Figure 2.2.2: The result of nested ruby markup.

2.3 The `rt` element

<!-- SGML DTD: container for ruby text -->
<!ELEMENT rt - O %ruby.content; >
<!ATTLIST rt %attrs; >

Start tag: required, End tag: optional

In SGML, only the opening tag is required. The rt element is automatically closed by an opening rp tag or a closing ruby tag.

<!-- XML DTD: container for ruby text -->
<!ELEMENT rt %ruby.content; >
<!ATTLIST rt %attrs; >

The rt element is the container for the ruby text. The rt element does not allow other nested ruby elements inside of it.

2.4 The `rp` element

<!-- SGML DTD: container for parenthesis characters -->
<!ELEMENT rp - O (#PCDATA)>
<!ATTLIST rp %attrs; >

Start tag: required, End tag: optional

In SGML, the rp element is automatically closed by an opening rt tag or a closing ruby tag.

<!-- XML DTD: container for parenthesis characters -->
<!ELEMENT rp (#PCDATA)>
<!ATTLIST rp %attrs; >

This element is intended to contain parenthesis characters. Parentheses are necessary for the ruby to be rendered correctly when it is inline. The existence of the rp element is necessary especially for UA's that are unable to render ruby text above the ruby base. That way, any ruby will degrade to no worse than a properly formed inline ruby in non-supporting UA's.

Consider the following markup, specifying a top (default) ruby:

<ruby><rb>A</rb><rp>(</rp><rt>aaa</rt><rp>)</rp></ruby>

Figure 2.4.1: Ruby markup using rp elements

A user agent that supports top ruby would render it as:


aaa
 A

Figure 2.4.2: Top ruby rendered by a supporting UA (note the parentheses are not visible)

However, a UA that is unable to render top ruby or does not support ruby HTML, would still correctly show:


A(aaa)

Figure 2.4.3: Top ruby rendered by a non-supporting UA (note the parentheses are visible)

3. Glossary

Ruby base: Run of text that has a ruby text associated with it.
Ruby text: Run of text that appears in the immediate vicinity of another run of text ("ruby base") and serves as an annotation or a pronunciation guide associated with the base.

Acknowledgements

The model presented in this specification is largely inspired by the work done by Martin Dürst [DUR97].

This specification would also not have been possible without the help from:

Laurie Anna Edlund, Arye Gittelman, Koji Ishii, Eric LeVine, Chris Lilley, Chris Pratley, Rahul Sonnad, Michel Suignard, Takao Suzuki, Chris Thrasher.

References

[CSS2]: "Cascading Stylesheets, level 2 (CSS2) Specification", W3C Recommendation; Bert Bos, Håkon Wium Lie, Chris Lilley and Ian Jacobs, 12 May 1998
Available at: http://www.w3.org/TR/REC-CSS2
[DUR97]: "Ruby in the Hypertext Markup Language", Internet Draft; Martin Dürst, 28 February 1997, expired
Available at: http://www.w3.org/International/draft-duerst-ruby-01
[HTML4]: "HTML 4.0 Specification", W3C Recommendation; Dave Raggett, Arnaud Le Hors and Ian Jacobs, 18 December 1997, revised 24 April 1998
Available at: http://www.w3.org/TR/REC-html40
[XHTML]: "XHTML™ 1.0: The Extensible HyperText Markup Language — A Reformulation of HTML 4.0 in XML ", W3C Working Draft; Steve Pemberton et al., 4 March 1999
Available at: http://www.w3.org/TR/WD-html-in-xml
[I18N-FORMAT]: "International Layout in CSS", W3C Working Draft; Marcin Sawicki, 22 March 1999
Available at: http://www.w3.org/TR/WD-i18n-format
[JIS]: "Line composition rules for Japanese documents"; JIS X 4051-1995, Japanese Standards Association, 1995 (in Japanese)

Changes from Previous Public Working Draft

Section	Change
Status of This Document	mentioned ruby naming and structure control as issues to be discussed at the next WG meeting
1. Introduction	removed a mention of formatting added a description and diagrams of ruby text that appears below the base or on the left (if vertical) covered the case of ruby text appearing below and above clarified the role the inline form of ruby text as a fallback additional clarifications and minor rewordings corrected figure 1.1.5 more markup examples for figure 1.1.7
2.2 The rb element	allowed nesting of ruby elements inside of rb added examples of this functionality changed %ruby.content to %Inline
2.4 The rp element	removed a mention of specific CSS properties
3. Ruby Box Model	moved the whole section to i18n-format
References	added a reference to i18n-format updated the reference from HTML-XML to XHTML
Changes from Previous Public Working Draft	created new section