Versioning Structured Technical Documentation

Henning Moeller
FU Berlin
c/o Siemens AG
Zentrale Forschung und Entwicklung
ZFE BT SE 45
Tel +89-636-41323
Fax +89-636-45111
henning.moeller@zfe.siemens.de

1. Introduction

The field of technical documentation is one major application for Hypertext systems. Complex technical systems require a documentation that consist of several thousands of pages operation and maintenance manuals, training documents, or sales documentation. Most of the documentation has an implicit structure and is highly interrelated. For several reasons it has become increasingly important to make explicit the structure and the relations between parts of the documentation.

Structuring and interlinking the documentation by using standardized approaches like the markup language SGML and the Hypertext extensions provided by HyTime is the basis for a more efficient access to relevant information and it therefore improves the quality and user friendliness of the documentation.

An important advantage of structuring documents is to replace the usual structuring on a file basis with a semantics-oriented structure: not the document as a whole is the smallest object that can be dealt with, it rather consists of several fragments, each dealing with one specific information item. These items marked up using SGML can be mapped on Hypertext nodes and the relations between them expressed by HyTime constructs form the basis for Hypertext links.

Therefore the document looses its monolithic character and can be seen as a configuration created from a pool of document fragments. These fragments can often be reused in different places of the technical documentation, so that the approach of structuring the documentation improves its consistency: with such a polyhierarchy, error-prone redundancies can be prevented. Reusing fragments also has the advantage of cutting on the production costs for a new document.

The increasing flexibility to offer client specific configurations of technical systems together with shorter release cycles calls for an efficient handling of versions and variants of the documentation. In analogy to the management of traditional documentation structured on a file-basis, the versioning of technical documentation structured on a content-basis will still basically consist of recording some meaningful stages.

The main problem of versioning structured technical documentation is first to preserve its consistency after any changes and second to synchronize some parallel editing processes with or without common surveillance of a system, a typical situation in long design tasks.

2. Versioning technical documentation

Consider the vision of technical documentation as a pool of fragments carrying the content and of each manual belonging to the documentation as a configuration of selected fragments, thus imposing a specific structure on the set of fragments. Then the concept of versions as subsequent instances of the same logical entity created as the result of a specific task [Webe91] must be worked out in detail. The entities that may be versioned are the fragments, the configured manuals, some other configurations like chapters of manuals, and the links. Which of the entities are in general subject to versioning depends on the purpose.

In the field of technical documentation the primary task of versioning is to archive meaningful stages of one or more, possibly overlapping manuals that will be or have been delivered to the clients or published as drafts for review purposes. A technical writer for example may be responsible for the complete manual. After each update cycle he will freeze the whole manual and from then on will work only on a new version. In this process the access to old versions must remain possible to allow the reuse of already existing parts.

Therefore versioning in technical documentation is just to create snapshot of the state of a manual at a particular moment. All the versions then have to be arranged in a historical order to document the development process.

3. Problems in versioning structured documents

The underlying assumption for such a versioning approach that basically consists of archiving meaningful snapshots is the existence of only one single valid and consistent state of the information at each time during the development process. This is indeed the case if only one person works on the documentation, but in reality the editing of large amounts of structured documents cannot be managed by a single technical writer.

In a multiuser environment the parallel processes of editing structured documents must be synchronized by the system for example in locking every single element or structured document that is worked on. Problems arise because of the possible polyhierarchies in structured documents: does the locking of a configured document that uses shared fragments imply the locking of the shared objects in the other configuration as well? [Rehm88, Shah94] Or does the system has to create a copy before locking? And similar: does a link connecting the locked document with some other document has to be locked or not?

Worse: if from one moment on a copy of the structured document is not under the surveillance of the system anymore the whole locking mechanism can no longer be provided. This is the case when for example a supplier delivers SGML-encoded documents that have to be incorporated into the existing Hypertext-based product documentation of the manufacturer. Each time the supplier ships an updated version of its documentation, the manufacturer runs into the problem of loosing one of the changes. A similar situation arises if during cooperative editing of a document the communication breaks down. Or if the same documentation has to be edited at two workplaces with different working hours, like the one in Europe and the other in the US. In each case you end up with two variants of one original document where you do not want to loose the changes made during either the one or the other editing process.

Finally there is the well known problem of preserving the consistency of documents that have been changed. This is a serious problem especially for file-oriented documents and may be alleviated when taking advantage of a content-oriented structure of the documentation. In this case the semantic dependencies between related fragments can explicitly be represented. There are two possibilities: either a fragment can be reused totally unchanged in another context - then the consistency is preserved by the polyhierarchy preventing redundancies -, or there are more subtle dependencies between fragments that cannot be identified computationally. Examples for this are relations like "the fragment A is the english translation of B", or "A describes the service procedure for a device and B its construction". Once you may change A, it will have an impact on the content of B. So if these dependencies are implemented, a system is able to bring the change requirements to the notice of a user.

4. Consistency in structured documents

The vision of structured documentation as a collection of configurations each imposing its structure on the set of selected fragments that carry the content has much similarity with the concept of databases: you can think of fragments as the database objects, of configured documents as database views, and of the schema of documents defined in a SGML Document Type Definition (DTD) as the schema of the database. The main difference between traditional databases and structured document databases is the value of the content. The content of database objects are computable values whereas the content in a fragment cannot be interpreted by the system.

The consistency of documentation has four aspects:

  1. Schema specific structural consistency: The document has to comply with the generic structure of the document type that is defined in the DTD.
  2. Instance specific structural consistency: Part of the structure of a document depends on its content. The order of the steps in a procedure to be carried out as a reaction to an alarm is vital, as is the destination of a Hypertext-link that points to this procedure.
  3. Context dependent content consistency: The content of a fragment may depend on its context, i.e. on related documents, chapters and the like. This is the case for the already mentioned translations of the document. Another example is the documentation of some device from different perspectives like its functional description or the operations needed for the maintenance.
  4. Context independent content consistency: Is the content of a fragment consistent and clearly formulated, does it communicate what it should? To ensure this type of consistency there is no other way than letting a user approve.

    The same holds for the consistency aspects described in 2. and 3.: only a person can make the final decision whether or not the consistency is preserved after a change. What can be done is to support the decision process in notifying the user of those fragments, configurations, or links that may be affected by the changes. Note again that you need an explicit description of the dependencies for such a support.

    5. Merging variants

    The other problem mentioned above is to merge two variants of one original document. The way to integrate the changes of both editing processes - the one carried out by the supplier and the one by the manufacturer - into one single, merged document is by far trivial: nowadays, it is done manually - if ever. And computers will at best give some support.

    There are two possibilities to merge the variants: The first consists of comparing just the resulting variants. As long as no structure has been changed, the merging is rather easy: either the content has only been modified in one variant - then this changed content can be taken for the merged version -, or in both variants the same content has been edited differently - then some user has to decide which change will be taken. But with this approach you cannot cope with some structural changes to the document instance. Moving a fragment to another location in the structure for example will always appear as a deletion of the fragment and a creation of a new one somewhere else, thus loosing information about the origin of the new fragment. The same holds for wrapping some existing sections into a new one.

    This problem is due to the difference between editing content and editing structure. You can only change some clearly identified element. And it is exactly the purpose of any structural information to denote the element that is subject to changes. Therefore editing a structure means to edit this denotation with the effect that you need some other - unchangeable - denotation to describe structural changes. One possibility is the use of a unique and permanent ID under the control of the system for the edited elements. But if you have no possibility to provide UIDs, as in the case of the two copies edited without common surveillance, the only way to describe the editing process is the one of recording the changes.

    The second possibility to merge two variants of the same original document is therefore based on keeping track of the editing in both processes thus making any structural changes apparent. With this approach you have to find an appropriate notation of any arbitrary editing. Then the set of possible conflicting changes has to be identified and resolved by some user before you may think of merging the two editing processes into a single one that can transform the original document into the desired, merged version.

    The editing operations that must be recorded are Create, Delete, Move, Copy, and ChangeContent for the fragments and configurations and Link, Unlink, and ChangeLink for the links. Conflicts arise if two operations - op1, op2 - that manipulate a set A of at least one common object are not commutative, i.e. if (op1 op2) A <> (op2 op1) A. Every combination of an operation with Delete for example creates a conflict, whereas the combination of Link and any of the other non deleting operations is always possible. A more exact definition of what I called the set A and a detailed discussion of the needed transaction model is given in [Shah94].

    Note that not every possible combination of two operations on the same object is sensible: changing the content of a fragment in the first editing process and moving its counterpart in the other process to some other place in the document may or may not make sense. This decision though can only be made by a user.

    6. Summary

    Hypertext and technical documentation have a common ground in the vision of structured documents. Structured documents get more and more similar to Hypertext with the growing usage of SGML as a means for a content-oriented structuring of documents and with the increasing popularity of HyTime as a standardized way to represent links.

    As in the field of Hypertext, versioning is also considered a major problem in technical documentation to allow for an effective management of structured documents. Two problems were stated in this paper: the demand to preserve the consistency in structured documents and the need to merge the results of two parallel editing processes.

    Concerning the consistency of structured documents four types were identified. Whereas the schema specific structural consistency may be checked by a parser, the other three types - the instance specific structural consistency, the context dependent and independent content consistency - can only be checked by some user. If additionally the structured documents are enriched with explicit information about semantic dependencies, the system may notify the user of the implications of some change.

    The merging of two variants requires the recording of the editing operations carried out in each process if you allow for non-trivial, structural changes. There are three possible situations when synchronizing the processes. First, the element under consideration has only been edited in one process. The change can then be adopted for the integrated version without any conflict: it has already be checked for the preservation of the context independent content consistency when the change was undertaken. Second, if the element has been edited in both processes and if the operations are commutable, then they may only be checked for possible semantic conflicts by some user - is the context dependent content consistency and the instance specific structural consistency violated? -- before taking both of them into the integrated version. Third, the operations of the two processes are not commutable, then a user must decide which of the operations shall be taken.

    The help of a system to merge some parallel changes made to a structured document in a consistent way cannot be underestimated. Even if the final decision whether or not a change preserves the consistency of a document must be made by a user, just the support in finding all those parts that are affected by a change would greatly ease the maintenance of structured documents.

    References

    [Corn93] Cornish, J.M., Mills, Z., Paul, R.J.: Living Databases: Concepts and Objectives, in: ITI 93, Proc. of the 15th Int. Conf. on Information Technology Interfaces, Pula, 15-18.6.93

    [Ditt92] Dittrich, J., Wolisz, A.: Toward Cooperative Use of shared Data in Open Distributed Systems, in: Open Distributed Processing, Proc. of the IFIP TC6/WG6.4 Int. Workshop, Berlin, 8-11.10.91, Meer, J. de et al. ed., Elsevier Science Publishers B.V., North Holland, 1992

    [Gold90] Goldfarb, C.F.: The SGML Handbook, Clarendon Press, Oxford, 1990

    [Haak92] Haake, A.: CoVer: A Conceptual Version Server for Hypertext Applications, in: Proccedings of the ACM Conference on Hypertext, Milano, Nov 30-Dec 4, 1992

    [Jark92] Jarke, M., Maltzahn, C., Rose, T.: Sharing Processes: Team Coordination in Design Repositories, in: International Journal of Intelligent and Cooperative Information Systems, Vol. 4.1(1), pp.145-167, 1992

    [Lock87] Lockemann, P.C., Schmidt, J.W., eds.: Datenbank- Handbuch, Springer-Verlag, Berlin, Heidelberg, 1987

    [Rehm88] Rehm, S., Raupp, T., Ranft, M., Längle, R., Härtig, M., Gotthard, W., Dittrich, K.R., Abramowicz, K.: Support for Design Processes in a Structurally Object-Oriented Database System, in: Advances in Object-Oriented Database Systems, LCNS Vol. 334, Springer, Berlin, Heidelberg, 1988

    [Shah94] Shah, P., Wong, J.: Transaction Management in an Object-Oriented Data Base System, in: J. Systems Software, Vol. 24, pp.115-124, 1994

    [Webe91] Weber, A.: Publishing Tools Need Both: State- Oriented and Task-Oriented Version Support, Arbeitspapiere der GMD 526, April 1991