Universal Standard Book Code

The increased use of computers in handling bibliographic data and the accumulation of large numbers of items, running into millions, will mean less and less involvement of the human element in the various processes such as manual key allocation and quality control. This trend has now become established at least within computer professionals and is now accepted as an axiom that the more we eliminate the human involvement from the internal technical retrieval mechanisms of an information system the more successful and free from errors the system will be. Our interest here is the automatic control of large collections of database records with particular emphasis on unique identification and quality control.

Today, the identification and control of bibliographic items is primarily based on an arbitrarily allocated key which accompanies the corresponding record throughout its processing history. Typical keys are the ISBN (International Standard Book Number) and the ISSN (International Standard Serials Number). The USBC (Universal Standard Book Number) is generated automatically from pertinent bibliographic data elements, independent of centralised bodies such as the SBN (Standard Book Number) agency.

USBC Criteria

The USBC is an alphanumeric code which is produced by means of an algorithm which does not require any a priori information about the bibliographic item. The universality of the code implies that it is possible to regenerate this at any time and at any part of the world by means of an algorithm which conforms to the following criteria:

# Unique items receive unique codes.
# The algorithm is independent of source input.
# The code is as short as possible.
# The algorithm is easy to implement.
# The code is regenerable so that the same code is derived for the same item at different times.
# The code can be fixed or variable in length, depending on the operational requirements for record identification.
# It is possible to verify the code manually.

Theoretical Basis

The theoretical basis for the derivation of the code is sound since it is based on the well established information theory. More specifically, a principle of information science states that the entropy of a set of symbols is maximised when the probability of occurrence of each becomes the same. The USBC algorithm utilises this principle to construct codes (keys) from pertinent fields in order to locate and retrieve unique records as well as clusters of records with lexically homogeneous information. The codes derived offer a very high discriminating strength of over 98% with the use of only 7 bytes per code, where each byte is selected from the least frequent characters found in pertinent bibliographic fields.

Research Information

The original research was carried out by Professor [http://www.aueb.gr/Users/yannakoudakis/english/index.htm E. J. Yannakoudakis] at the Postgraduate School of Computer Science, University of Bradford, W. Yorkshire, England, between 1975–1978. The project has received funding from the British Library, the Ministry of Education, the European Union and several other organisations.


