|
Rivista "Informatica e diritto"
Fascicolo No. 1, 1982
THES/BID: A Computer-based
Thesaurus of Terminology in Computers and the Law
INTRODUCTION
THES-BID is a structured thesaurus of descriptors (keywords and
phrases) used in the «International Bibliography on Computers and Law»
(BID) and its computerized data base for the indexing and retrieving of
bibliographical material relating to computer science, its application
to the law and to computer law problems. It is the first attempt at
giving a general structural systemization to recurrent concepts within
legal informatics and computer law. As such it is brought to the
attention not only of documentation experts, the users of the
Bibliography and the BID data base but also of all those who are in any
way interested in this new discipline. THES-BID was compiled with the
aid of a computer by means of a specially prepared automated checking
and printing procedure designed to facilitate the authors' work.
The description of the procedures and of the programmes - which were
designed, using a new and original methodology, at the Istituto per la
Documentazione Giuridica - will be the subject of a complete and
articulated study to be published in a forthcoming issue of the journal
«Informatica e diritto». The authors, in fact, believe that the
research experience gained in the building of THES-BID could be used
profitably in the building of other thesauri in various different
disciplines.
In the first phase of the work, about 2.000 descriptors were chosen
from the technical literature on the basis of the frequency in their
use. Once normalized, they were numbered progressively and then
classified with a code number taken from a special scheme, the
Classification Table, also used for the classification of the
bibliographical units in BID.
This data was collected together in an electronic file called the
MASTER file, whilst non-preferred terms, as they were singled out,
along with particular annotations (SCOPE NOTES) relating to the use of
the descriptors, were registered in a separate file called the NOTE
file.
In the second phase of the work, the most important relations of a
vertical kind (conceptual hierarchy) and of a horizontal kind
(synonymy, quasi-synonymy and other associative-type relations),
existing between the various descriptors selected, were individuated.
Moreover, the instructions to go from non-preferred terms to preferred
terms within the Thesaurus and the connection between some particular
annotations in the NOTE List and the descriptors to which they referred
were decided upon. The relations between the descriptors (including the
instructions and connectors), alter being codified, were registered in
an electronic file called the RELAT file. The computer, employing a set
of specially designed programmes, carried out the first series of
checks on these three files (MASTER, NOTE, RELAT), verifying the formal
exactness of the classification and codification and eliminating
eventual duplications in the strings.
Then, un the basis of the properties of the relational operators,
defined in advance, the computer expanded the network of relations
registered in the RELAT file. The expansion of the reciprocity,
symmetry and transitivity of some operators, on the one band and of the
checking by the computer of irreflexivity, on the other hand, have
permitted the number of originally defined relations, to be enlarged
considerably and meanwhile the Thesaurus has been purified of
inevitable errors. This automated expansion of relations has been
particularly useful. The nearly 4.500 relations initially registered in
the RELAT file have almost trippled with the help of the computer and
the data obtained in this way has been organized in a new file, called
the RELAT-ESP file.
The vertical hierarchic relations between the Terms (identified by the
operator BT, Broader Term and its reciprocal NT, Narrower Term) have
been formulated as algebraic tree-structures, whose formal properties
have been expanded and checked automatically. A tree-structure with
only one root and numerous intermediate and terminal nodes has been
constructed for every set of hierarchic chains having the same
conceptual origin. Then, with the aid of the computer, each
tree-structure was checked to make sure there were no «jumps» in the
hierarchy during the passage from one node to another (in the BT/NT
chain) and that each node has only one hierarchic superior
(monohierarchic tree).
The horizontal relations of synonomy, quasi-synonomy and association
between the terms (identified by the operator RT, Related Term, and
expanded automatically according to the property of symmetry and
directed transitivity) have, instead, been designed as lattice
non-cyclic algebraic structures. They were also checked automatically
in order to avoid logical errors and redundancy and, yet again with the
aid of the computer, the observance of the rule of incompatibility
between the relations of BT and that of RT for the same couple of Terms
has been verified.
(x BT y ^ x RT y ->«error of direct incompatibility»;
x BT y ^ y BT z ^ x RT z -> «error of indirect incompatibility»).
Finally, the computer was also used to organize graphically the
printing of the numerous indices and lists which make up this edition
of the Thesaurus.
KWOC INDEX OF PERMITTED TERMS
The KWOC Index of Permitted Terms allows us to go from the single words
making up part of the phrases found in the Thesaurus to the complete
phrases. It is an alphabetical index built up according to the KWOC
index technique: in the column on the left, the single terms are listed
in alphabetical order as an index; in the column on the right the
phrases from which the terms have been taken appear.
Consultation of the KWOC Index is the key to access to the Structured
Alphabetic List for anyone who does not already know the order of the
single words in the complex phrases found in the Thesaurus.
THE STRUCTURED ALPHABETIC LIST
The Structured Alphabetic List is the main index of the Thesaurus
because it contains, in alphabetical order, all the keywords and
phrases examined (including those non-preferred) with an indication of
the different types of relations between them. The descriptors are
followed, in brackets, by their respective codes of classification (on
the basis of which they are set out in the Class List) and the
relations between them are indicated using relational operators
codified, according to common usage, as follows:
US = Use
A US B: the term A is a non-preferred term in the Thesaurus. The term B
is suggested in its place. The US relation - defined (and checked by
the computer) as irreflexive, asymmetrical and intransitive - has as
its inverse (or reciprocal) relation UF, which has the same properties
as US.
UF = Used for
B UF A: the term B has been used instead of the term A, which is a
non-preferred term. The UF relation is expanded automatically by its
reciprocity to US (x US y -> y UF x)
TT = Top Term
A TT: the term A is defined as a Top Term whenever it is at the vertex
of a conceptual pyramid or, in other words, whenever it is at the root
of an algebraic tree-structure. The computer has permitted us to verify
that only descriptors situated at the root of the trees are defined TT
and, equally, that none of those descriptors are left without their
appropriate definition as TT's. The TT relation is monadic having only
one term on its left.
HT = Heading Term
B HT: the term B is defined as a Nodal Heading on the condition that it
is a conceptually important node to which a substantial group of other
terms (not less than 5) are joined. Even the HT relation, as that of
TT, is monadic.
SC = Scope Note
A SC n: A is followed by a note (numbered progressively in the NOTE
file and referred to by the number «n» made up of 5 figures on the
right of SC) which clarifies its meaning and use. The SC relation is
biadic, irreflexive, asymmetrical and intransitive.
BT = Broader Term
A BT B: the term A has as its hierarchically superior concept the term
B. The properties of the BT relation, which is also checked
automatically, are irreflexivity, asymmetry and intransitivity. The
inverse (or reciprocal) relation is the NT relation which has the same
properties as the BT.
NT = Narrower Term
B NT A: the term B has as its hierarchic inferior the term A. The NT
relation is expanded automatically by reciprocity of the BT relation (x
BT y -> y NT x)
RT = Related Term
A RT B: the term B is associated with the term B (and symmetrically A
is associated with B) because the two terms are in a relation of
synonomy, quasi-synonomy or of generic association. The properties of
symmetry and directed transitivity are applied automatically to the RT
relation which is also irreflexive.
(symmetry)
xRT y -> x RT y, y RT x
(direct. transitivity)
X RT y ^ y RT z -> x RT y, y RT x, y RT z, z RT y x RT z, zRTx
(but)
X RT y ^ z RT y -> x RT y, y RT x, z RT y, y RT z
Finally, for the editing of the list under examination, a further
operator called DL (Delete) was used. This allowed us to eliminate
those relations of the RT type which, although produced by automatic
expansion, were nevertheless judged as conceptually unacceptable. The
DL relation whose properties are irreflexivity, symmetry and
intransitivity, has always been registered in the RELAT file, but,
being useful only in the compilation phase of the list, has, for
obvious reasons, not been printed in the Thesaurus.
INDEX TO THE HIERARCHS
In the Index to the Hierarchy, the single descriptors (keywords and
phrases) of the Thesaurus ace set out in alphabetical order. They are
completed with their appropriate codes of classification. The root of
the conceptual tree (or trees) where it can be found is indicated for
each descriptor. Therefore consultation of the Index io the Hierarcby
permits the user to move up from a single descriptor to the Top Term in
the hierarchic chain where the descriptor belongs.
LIST or Top TERMS (TT)
The List of Top Terms contains the list of descriptors which are at the
root of the 52 conceptual trees making up the structure of the
Thesaurus.
LIST OF HEADING TERMS (FIT)
The List of Heading Terms (HT) is a list of the 156 descriptors which,
in the structuring of the trees represent the conceptual nodes
important for the number of relations dependent on them (not less than
5).
The list is made up of two parts, the first listing the descriptors in
alphabetical order and the second dividing the descriptors into groups
on the basis of the number of relations connected with them. Within
each group, they are in alphabetical order.
HIERARCHY LIST
The Hierarchy List contains the description of the general hierarchic
structure of this edition of the Thesaurus. In this list all the
lexical material is grouped together and organized in 52 conceptual
trees having as roots the same number of descriptors, considered of
general semantic importance (see List of Top Terms).
The hierarchic level is graphically represented by a different number
of indicator-dots preceeding the word or the phrase inside each tree.
Therefore, in the search for the hierarchic superior or inferior of a
particular concept it will be necessary to look up or down the list
until reaching the row singled out by a number of indicator-dots -
either one more or one less - than the concept started from. On the
basis of the same criterion, the descriptors preceeded by the same
number of indicator-dots are to be considered un the same level in the
conceptual hierarchy.
By consulting the Hierarchy List the documentalist, who must index a
given bibliographical unit, and the user of the Bibliography and the
«BID» data base are given a sort of map of the field of knowledge where
the subject matter, the object of his examination or research, has been
placed. Therefore it is possible, according to one's needs using the
hierarchy of concepts indicated in the list, to broaden or to narrow
one's analysis in a systematic manner, testing efficaciously the choice
of terms for indexing or for retrieving information.
CLASSIFICATION TABLE
The Classification Table is a decimal classification scheme which
represents a systematic frame of reference for subject matter of
importance to legal informatics and computer law. Each of the nine
general classes (numbered from 0 to 8) making it up is internally
structured in subclasses, in turn hierarchically organized according to
their gradually increasing level of specificity. A code number of one
more figures has been given to each hierarchically organized according
to their gradually increasing level of specificity. A code number of
one or more figures has been given to each through a process of
systematic interpretation joining the heading taken into consideration
with those which are gradually more general than it. The constitutive
elements of the Classification Table (the codes of classification and
their corresponding headings) when related to a specific documentary
unit permit the documentalist to arrange its contents within an
organized system of knowledge.
Equally, in as far as this system of classification has also been
linked to the single descriptors of the Thesaurus by the affixing of
the relative codes to them, biunivocal cross-reference between keywords
and the codes of classification and an analytical breakdown of
information in both the indexing phase and in that of retrieval has
been made possible.
CLASS LIST
In the Class List the headings of the Classification Table are to be
found. The terms of the Thesaurus relating to each heading are listed
under it, followed in brackets, if appropriate, by an indication of the
other codes of classification where they are grouped together in the
same manner. The terms are listed in two distinct groups, each list
being in alphabetical order. The first group of terms includes
descriptors taken from the Classification Table whilst the second group
contains all the words and phrases registered in the Thesaurus which do
not come directly from the Classification Table but directly from the
technical literature examined.
GEOGRAPHIC LIST
The Geographic List, an alphabetic listing of the names of countries
(or politically important groups of countries) used in the examined
documentation to identify geographic areas where research or computer
applications relevant to the Bibliography are being carried out. These
names have been presented in a normalized from which is therefore
binding on the indexer and on the researcher.
When the name of a country is followed by that of a state, province or
region belonging to it, the two names are separated by a dash whilst
the names of the cities are indicated in brackets after the name of the
country or the state, province or region to which they refer.
ACRONIM LIST (1 and 2)
The Acronym List is made up of abbreviations which frequently appear
within the area of legal informatics or computer law and of the
expressions from which these abbreviations are taken. Of the two lists
in the Acronym List the first is based on the acronyms in alphabetical
order and the second on the expressions relative to them.
The abbreviations contained in brackets when following expressions
corresponding to institutes, public bodies, associations etc. indicate
the country (according to international standards) in which their head
office is situated or, where appropriate, their international
character. (INT.). When, instead, we are dealing with the titles of
journals, guides, inventories, and the like, they are indicated as
publications (PUB.).
|