Classification and the librarian's problem

[Leibniz rejected Locke's tripartite division of knowledge.] ...
"A truely memorable story might deserve a place in the annals of universal
history; yet might equally deserve a place in the history of a particular
country or even of a particular individual. A librarian is often undecided
over the section in which a particular book needs to be catalogued (cf.
Serres 1968: 22-3)" (Eco 1993:1995, p.278)
"Theoretically, a classification system should be so organised that material
on any one subject can be found in only one place. Some subjects, however,
have so many aspects, so many phases, so many contributing factors that it
may not be possible to place all material relating to such a subject in only
one class. ... It is important to remember that even though books are
classified according to the subject which is given the greatest emphasis,
they may, to some extent, treat other subjects." |

Categories in Leibniz' private collection of books.

- A. Jurisprudence
- B. Medicine and science
- C. Literature
- D. Philosophy
- E. Applied mathematics
- F. Mathematics
- G. Linguistics
- H. Theology
- J. Periodicals
- L. Misc.
- M. History
- N. Misc.
- O. Misc.

Consider a set of books, **B**. The librarian's problem is to order these
books in such a way that all books about a similar subject are shelved
together.

Two complementary approaches to this problem both involve the definition of a
set of subjects, **S**.

**Books as primitives**

The first approach starts from the books. By analysing the contents of each
book, a number of a posteriori subjects can be assigned to them.

The i-th book, b(i), may be concerned with m(i) a posteriori subjects, s'

**S'** AND b(i) = { s'(i,1), ..., s'(i,m(i)) }

(Let us assume that m >= 1)

The set of a posteriori subjects, **S'**, then becomes a list of these
assigned subjects. For j books, this is not simply,

{ s'(1,1), ..., s'(1,m(1)), ..., s'(i,1), ..., s'(i,m(i)), ..., s'(j,1), ..., s'(j,m(j)) }

as the interpretation of any given subject s' may be the same as the interpretation of another given subject.

To get around this problem, we must consider the set of unique
interpretations, **C**

If we assume that this assignment of subjects is valid, then the number of
unique interpretations, n(**C**), must be less than or equal to the number of
assigned subjects, n(**S'**).

n(**C**) <= n(**S'**)

where n(**S'**) = m(1) + m(2) + ... + m(j)

**Subjects as primitives**

The second approach starts with a set of subjects, **S**. The assignment
of each book then takes the form

{ s(a(i,1)), ..., s(a(i,m(i))) }

where, if there are k subjects s(1), ..., s(k), then each assignment relation, a(x,y), is an integer between 1 and k.

Note: I call this computational set theory because I first created this representation with dBase 3+ records. I converted records containing one or more keywords into a table of {record, keyword} pairs.

From the above definitions, it is clear that the relations between the set of books and the set of subjects can also be represented as a table of pairs, {b,s}:

{b(1),s(a(1,1))}

...

{b(1),s(a(1,m(1)))}

...

{b(i),s(a(i,1))}

...

{b(i),s(a(i,m(i)))}

...

{b(j),s(a(j,1))}

...

{b(j),s(a(j,m(j)))}

**Table 1.1 - relations between books and subjects sorted in book order.**

This table can also be sorted on subject:

{s(1),b()}

...

{s(1),b()}

...

{s(k),b()}

**Table 1.2 - relations between books and subjects sorted in subject order.**

Note: If we replace the interpretation 'book' with 'word or phrase' and the interpretation 'subject' with 'class', these two collections together form one representation of the structure of Roget's Thesaurus. Table 1.2 forms the main section of the thesaurus, table 1.1 forms the index.

The representation given in tables 1.1 and 1.2, has some limitations.

- The relative important of different subjects to a specific book
are not taken into consideration. If we decide to assign only one
subject to each book then we will be able to shelve the books. This
is part of the method used by librarians.
However if we are looking for multiple connections between subjects we cannot limit our database in this way. The way librarians get around this limitation is to provide a subject cross-reference for the shelving system. The books can be physically shelved in one order if the catalogue retains the other connections.

Note: This was something that we observed in the Ecosystems Analysis and Governance Group at Warwick University - we were finding ecological books in every section of the University's main library.

- Subjects may be nested in some way. ie. all the books that are about subject s(u) might also be about s(v). This would be interpreted as meaning than s(u) was a sub-class of s(v). In this example, if it were also the case that all the books about s(v) are also about s(u) then s(u) and s(v) would be synonyms and the two subjects merged. This may help to reduce the number of connection and allow shelving to be achieved.

van Dieren, W., ed. (1995) "Taking Nature into Account", Springer-Verlag, New York.

In this case do we consider the individual essays that make up the book to books in their own right (as they have their own authors), or as another set of objects seperate to books?

To get around this problem we can consider a more general case of the two tables given above.

Using the structural properties of tables 1.1 and 1.2, we can create a general table that can represent relationships between a set,

{v(1),v(a(1,1))}

...

{v(1),v(a(1,m(1)))}

...

{v(i),v(a(i,1))}

...

{v(i),v(a(i,m(i)))}

...

{v(n),v(a(n,1))}

...

{v(n),v(a(n,m(n)))}

**Table 2 - relations between general objects.**

This is the general structure for a dictionary or an encyclopedia.

Using this general structure it is possible to produce a graph theory version of the librarian's problem.

Note: 12/11/99

An exploration of graph theory has found that this structure is similiar to
an adjacency list.

- Encyclopedia structure
- Graph theory
- Classification by gene function. Librarian's problem - genetic version.
- Sociological approaches to ecological uncertainty (version 2)
- Translations between biology and sociology - formalism: a linking theme between mathematics, computation, and cultural theory
- Trees and graphs: ordered narratives
and intertextual webs.

Converting the librarian's problem into graph theory. This may allow translations between the formalist results of mathematics, computation, and semiotics. - Gottfried Leibniz (1646-1716)
- Genotype and Phenotype

Links to other sites...

Created 3/6/99

Last modified 12/11/99