This does not mean that any classification system is as good as any other. Venter's classification is partially based on the a posteriori classifications given to the proteins by their researchers. These researchers that know some of the pathways in which these proteins are involved within the cellular context. The evidence does suggest that both hierarchies and webs are involved in cellular organisation. In this way, ecological approaches to multi-cellularity might ask informative questions.
Translating this problem back to the genotype-phenotype question, can we ask - how is our set of protein forms related to our set of protein functions? If we take as given that genes determine protein form can we also assume that genes determine all protein functions? To answer this question we must assume that protein function is independent of protein form, and then see how the two relate. The number of genes involved in human genome makes computational methods necessary.
In computational approaches we must ensure that the assumption of form-function independence is preserved so that the results will be a property of the data and not a computational effect. This can therefore be expressed as a problem in bio-informatics - how can we classify a database so that the functionally properties that we wish to study are preserved? In this way, computational and linguistic approaches to multi-cellularity might ask informative questions.
For a current classification: EGAD Cellular Roles
The notes to the table include the following:
"Average percentage of genes per role do not add to 100% because some genes appear in more than one role or subrole."
This another version of the librarian's problem. Genes with more than one role take the place of books with more than one subject. If the total was significantly over 100% then it would imply that there was a significant proportion of genes with more than one function. This does not appear to be the case.
This table also analyses cellular processes against tissue type for 37 tissues. But it does not record total percentages for these tissues. This would be something to calculate. (Calculations for each tissue type yield total percentages 99.9-100.1%. As the percentages are given to 1 decimal place, it would appear that multi-functional genes do not comprise a significant proportion of human genes.)
However, these assignments were based on database annotations and so may not cover all the roles of a specific gene.
The above table gives some idea of the balance between different cellular processes in humans. Of course these percentages are averaged across both time and space. Temporally, cell cycle processes, such as DNA replication, are linked to the time scales of cell division processes. Spatially, different tissues (or different cells of the same tissue type) can express different proteins. These percentages can still give us some useful information on how organisational processes interact within the cell.
For example, both genetics and metabolism are essential for growth and reproduction. The metabolism supplies the ATP that drives DNA, RNA and protein synthesis, while the genetics supplies the proteins and lipids that enable metabolism to take place. In this way, it seems impossible to say which process is more fundamental to life (Maynard-Smith & Szathmary, (1995, pp. 17-18).
Starting from this interaction, a necessary question is: are there novel processes that arrive by the interaction of other processes? Say, for example, between cell signalling and gene/protein expression?
One limitation of this classification scheme is that it does not distinguish between the genes that we share with our unicellular ancestors and those novel genes that we have evolved (or acquired) for multicellular interactions. This division would seem to cross the process classification.
(See also: A classification of the processes involved in gas nets