GUHA Book

Internet edition:

guhabook.pdf [1613kB]
guhabook.ps.zip [1015 kB]

From the preface to the web publication (by P. Hajek)

In 1966 the GUHA principle was formulated in a paper by Hajek, Havel and Chytil. (GUHA being the acronym for General Unary Hypotheses Automaton, only much later we realized that GUHA is a frequent Indian surname). The principle means using the computer to generate systematically all hypotheses interesting will respect to the given data (hypotheses describing relations among properties of objects). A milestone in the theoretical and practical development of this principle (GUHA method) was the book, by me and T. Havranek, Mechanizing Hypothesis Formation (mathematical foundations for general theory), published by Springer-Velag in 1978 (ISBN 3-540-08738-9, 0-387-08738-9). Since then many things have changed: two of the pioneers of the GUHA method died: Tomas Havranek and Ivan Havel. Computers underwent tremendous evolution.Various implementations of the GUHA method based on the book were produced and theoretical development was combined. There were several practical applications in various domains. I mention two special volumes of the International Journal of Man-Machine Studies devoted to the GUHA method. But it must be said that the GUHA method has never got broad recognition. Citations of the book are counted in tens but not hundreds (some citations being rather prestigeous, e.g. by R. Fagin and others in relation to generalized quantifiers in finite model theory).

When the terms "data mining'' and "knowledge discovery in databases'' emerged their chose relation to the (much older) GUHA principle seemed absolutely clear to us. The book, if sufficiently known, could contribute well to logical and statistical foundations of them. There have been some papers trying to call the attention of DM and KDD community to the GUHA method and theory, but the reply was not as expected. One reason for this is undoubtedly the fact that the book by me and Havranek has become more and more difficult to get.

This is why I decided to ask Springer-Verlag for permission to put a version of the book on Internet for free copying. I am extremely grateful to the representations of Springer-Verlag for explicit confirmation that Springer reverts the copyright to the authors. The result is what you can see: a re-edition of the book as a technical report of the Institute of Computer Science, whose electronic version is free for downloading and printing. My very sincere thanks go to Mrs. Hana Bilkova for retyping the whole book in LaTeX (the original book was printed from typewritten pages with hand-written mathematical symbols). The web publication of the book was partially supported by the COST Action 274 (TARSKI). The book remains unchanged (expect for some few corrected misprints) and therefore does not contain any reference to later development. The reader will have to judge in how far the formulation presented in it are valuable for contemporary data mining and KDD. I shall be grateful to comment of any kind.