Site about text research by Francesc Hervada-Sala.
Articulation vs. Partition
Articulation and partition are two entirely different ways to divide a whole. To partition is to divide into parts, which are distinct elements. Everything that is in the whole is in some of the parts but not in more than one part. The lexeme ”part“ is included significantly in the word ”departure,“ which means, ”to go away.“ In contrast, the verb ”to articulate“ means, ”to divide into segments,“ each of which is a conjunct element. The verb is related to the noun ”article,“ which indicates that each segment reveals an aspect of the whole or specifies a point. Articulated segments, unlike parts, are not mutually independent and do not substitute for the whole. Interestingly, ”to articulate“ also means ”to utter or to give shape or expression,“ which we do by means of language. Sentences do not reflect parts of the thought or feeling we are expressing but articulate it in its entirety. Furthermore, the meanings of sentences are not independent of each other but intertwined with each other and with the sense of the whole.
Articulation and partition lead to different results and should, therefore, be used appropriately. Unfortunately, it is common to apply partitioning when one should apply articulation. This happens most notably in software development. Confronted with the enormous complexity of software, we struggle to control it through partitioning. We strive to build stand-alone software parts that can be reused, and we build ”frameworks“ and ”libraries,“ but that does not solve the problem, and software development is just as far from sound engineering as it was forty years ago. The same error underlies the scientific method. We have partitioned scientific knowledge into ”disciplines,“ with the result that they evolve independently, use different languages, and produce a fragmented landscape that is, at present, completely out of control.
Instead, software should consist of articulated software units, and scientific knowledge should consist of articulated scientific disciplines. This can be achieved by using a text layer as a foundation, similar to how we use language to articulate ourselves. Software developers should consider a programming language not as a programming system (implementation) but as a resource pool to build software units (specification). Software development should not consist of combining executable modules but combining linguistic expressions that make up the source code, similar to how scientists should be aware that they are building a description of reality. Each discipline should not own a part of the whole description but better it in some way, either adding new sentences (as they do today) or enhancing (e.g. specifying, simplifying, or cross-referencing) those that already exist.
At present, we are good at partitioning but not much good at articulating. The key to articulating is to use text as an intermediary; that is, to be aware that the text exists and to apply it proficiently.
My concept of a text as an articulated symbolic figure represents both a generalisation and a specialisation of the common concept of text as an oral or written language production.
Text is less than language, because words also carry associations, values, desires, fears and more that do not belong to the text. The ideas we want to express with texts and the ideas with which we interpret texts are not part of the respective text itself. The textual part corresponds to the bare bones of the body created by language.
Yet text is also more than language in the respect that a text or some part of it can be created by non-linguistic means. For example, the table of contents of a book can be a hierarchy of parts and chapters that for example are represented graphically with different font sizes and begin a new page, without using words such as ”chapter“ or ”part“ explicitly. In this case, the table of contents is not a linguistic production (it is not English or French) although it obviously pertains to the text of the book.
The Next Document System
At this post I want to imagine some aspects of a desirable future computer-assisted environment for reading and writing, following the inspiring blog post ”the ultimate knowledge environment“ by Frode Hegland.
Classic documents are the first step. Each classic document is an isolated static text that must be consumed by the reader as it was written by the author. The future document system is a text database that can be consumed by the reader in a variety of selections, combinations and presentation modes between which the reader can switch in real time at will. The larger the text database the better. The best would be a single world-wide (most-of-?)all-embracing text database.
Documents are richly connected, as a whole as well as paragraph-, sentence- or word-wise. The connections can be set by the author or by readers, at once or retroactively over the years (and centuries). The system maintains the reciprocal connections automatically (all items that link to the current one). The connections are themselves again documents and have, as any other document, an author, a publisher, a publishing date and other describing meta-data. Thus the connections can be filtered according to these criteria. For example, when reading an original document I can select which type of connected documents I want to be aware of, say from a particular author or group (”my acquaintances“, ”great philosophers“, ”seasoned statesmen“), or from a particular publisher or type of publisher (academic, journalism, specialist, self-published), or those appeared in a selected time range (”present“, ”17th century“, ”contemporary to the respective reference document“). The user interface can show connected documents in many ways, the user can optionally get the related passages for example as footnotes, superimposed on mouse hover, in separate columns, or in a 3D view with coloured transclusions such as in Xanadu. The user can change the view at any time.
The future document system includes a lexical subsystem (”a dictionary“). Each word can be looked up to obtain information about it (meaning, grammar, morphology, spelling, pronunciation, lexical and semantic field, usage, translations), both in general as well as in the particular context. From each word one has immediate access to every document that uses it, too. All lexical information is exposed by regular documents in the document system and can be filtered, queried and presented using the system's regular configuration capabilities.
The future document system includes a reflexive subsystem that consists of documents that evaluate, rank, contrast, summarise and integrate other documents. It represents a sort of digestive system that assists the reader in exploiting the huge amount of available material.
A particular kind of relationship between documents that is part of the reflexive subsystem is the version. A document is called a version of another document if it reproduces the whole original document. For example, a versioned document can be an original french document translated into English, an original scientific article or legal text expressed in common language, or an old book printed with current orthography or vocabulary. When I am reading news about a political agreement between countries, I can jump to its wording. As layman I don't understand it very much, so I change to a simplified version and I choose the one made by a particular journal I like that is shortened to be read in five minutes. When I have a question about a particular point, I can fade in the original sentence or other versions.
Also part of the reflexive subsystem is a ”trust profile“. I can record how much I trust particular authors or publishers I come across and store this in a profile. Later on, I can filter the views to include just the ones I trust most, or to show these at the first position. Trust profiles can be chained transitively, so that the system can infer trust from my trusted sources. A trust profile can be published as document and thus be used by others, commented, enhanced, versioned etc. I am of course not restricted to a single trust profile for me. I can manage more than one of them (”private“, ”job“, ”hobby“). When reading, I can select which trust profiles to apply (of mine, publicly available or shared with me by acquaintances) and switch in real time.
These were just some points for a brainstorming towards a future document system.
Common Concept of Scientific Knowledge
It is common to consider that science produces knowledge. This seems to be an assertion that does not deserve further consideration, when in fact it actually does, because the concept of knowledge is highly problematic.
What does science know? There are a number of theses that are currently accepted, such as some laws in physics. But what exactly is a law such as Newton's second law of motion? It looks like a sentence, and it is often represented mathematically as ”F = ma“. The pure mathematical statement is not enough, an explanation in natural language is required in order to define ”force“, ”mass“ and ”acceleration“. That is, what science knows in this regard is a mathematical sentence, which is stable, well-known and formalised, plus some considerations about its meaning that are not well-documented. There is, strictly speaking, no book or article which we could point to and say: this is the scientific knowledge about the laws of motion.
If one looks at the whole scientific production, one observes that scientific knowledge is itself unknown. We do not know what we know. We assume we know some things, but we do not know what the knowledge consists of exactly. This is not just a question of precision, as we would not know perfectly what science knows, but just to some extent. It is a basic question. For example, we know that past theories have shown to be wrong, but we still assume that current theories are correct. This is nonsense. We can actually be sure that current theories are also wrong. Our knowledge is always wrong, but it can become better.
To summarise, science ought to deal with the question of scientific knowledge. It ought to establish a terminology of scientific research and define what its product is.
Common Concept of Text
The definition of text does not present any difficulties at first. The word ”text“ is used to refer to the written word and is sometimes generalised to include other productions of natural language such as the spoken word or the lyrics of a song.
Yet obscurity arises on closer inspection. All could agree that Hamlet is a text. But what exactly is Hamlet? Is it a particular copy of the printed book? Of course not: every copy of the book is just an instance of the same work. Furthermore, there are many versions of Hamlet that differ in wording or orthography. In fact, there are also versions of it translated into other languages, which are obviously very different productions, but they can still be considered to be Hamlet. Basic questions also arise about whether a text is a syntactical or a semantic product. What is Hamlet: a particular list of words, or a particular set of fictional characters and a particular succession of events?
The word ”text“ is commonly used with very different meanings that are not accurately specified. This is acceptable in common language, but unacceptable in science. In science we must strive for a clear-cut, precise, stable concept of text.
The human sciences still do not deserve to be considered scientific. There is no corpus of verified knowledge, there are only concurring narratives. As both mathematics in ancient Greece and physics in modern Europe have shown, a science emerges when a discipline abandons the prose and begins using a formal language. Narrative, which creates meaning, does not produce scientific knowledge. Only a formal language production, that is, a text, can be intersubjectively verified. Human sciences will come into existence when a corpus of formal knowledge consolidates.
To become a science, human disciplines need primarly to be slimmed down. They should limit themselves to the things that can be formally expressed and work towards expanding the existing boundaries. They should abandon individual conviction and accept only intersubjective and intercultural acknowledgeability as proof.
That aside, a new area of public discussion should arise simultaneously, a space for interchanging ideas, a place where narratives could be cultivated and grow. There is currently no such place. Ideas are published indistinguishably from other contexts (sometimes inside science), with the consequence that ideas are not pertinently treated and are destroyed prematurely. A public arena for ideas should be neither regulated nor have entry barriers, it should be a free space in which each voice could speak and be heard. A critique of an opinion would not be perceived as faulting or even annihilating it, but simply as adding a new opinion. It would be the intrinsic nature of this arena to be subjective and cultural. It would be characterised by a liberal atmosphere in which every nascent idea is welcome and encouraged to grow.
Knowledge is text and as such is susceptible to being interpreted, thus producing meaning. Science (including human sciences) should concentrate on the text structure and let its meaning be interpreted in the arena for ideas. Both approaches are important and should be pursued, but separately. In doing so, both areas would grow and enrich each other.
Language and Text
Analysis of natural language reveals its underlying text. Each oral language expression consists of a series of audible signs. A particular expression can be pronounced differently by more than one person, for example with different accents and, as far as the linguistic content is concerned, both utterances will be considered to be the same. Therefore, an oral expression is a symbolic expression or, more precisely, since the order of the symbols is relevant, an oral expression is a text.
The phonetic signs that make up an oral expression form higher order symbolic units, the words. Analysis of the phonetic text reveals a series of words that in turn can be analysed syntactically to unveil the structure of sentences. Similarly, a written expression consists of a series of visible signs that are interpreted symbolically. These signs are grouped into series of words and sentences.
A linguistic expression is therefore a multilayered text. The first layer is physical, either phonemic or graphic, the layers above consist of pure text and build words and sentences. Sentences consist of connected text units. There is a finite number of text units, the lexemes, and a finite number of rules of syntax and morphology used to combine them.
Analysis of the meaning of sentences reveals the semantic layer. This layer is not linguistic in the narrow sense, it concerns the cultural sphere of the imaginable.
Expressions in natural language are deep, dense texts below the surface of sentences. Sentences are light, varied and original, the text beneath them robust, regular and predictable.
Language is, as a fact, the common structure of many existing texts and, as a competence, a pool of resources available for text production.
Physical Basis of Computing
While humans work with text by understanding it, computers are material devices and therefore act exclusively on a physical basis. There is a variety of computer architectures, from smartphones to database servers and supercomputers, but all of them share a common basis that was first formulated by John von Neumann and others while constructing the first digital computers after World War II.
Von Neumann identified four units in a general-purpose computer: the arithmetic, memory, control and interface unit. A computer is an electronic device that can perform arithmetic and logical operations, keep record of data and retrieve it, and interact with other devices. The key point is the existence of a control unit that can perform operations on data depending on other data. Von Neumann introduced the notion of the ”stored-program computer“, in which it is not hardware (physical connections) but software (data) that determines which operations must be performed. That is what makes a computer a general-purpose machine and distinguishes it from other, special-purpose electronic devices.
Arithmetic and logical operations are possible through so-called gates, which are electric circuits that receive an electrical signal as input and send an electrical signal as output. Yet the output does not depend continuously on the input, but in a stepped way. For example, if the input is 0 to 0.5 volt, the output will be 1 volt, and if the input voltage is larger than 0.5, the output will be 0 volt. The performed operation is a symbolic one; the result does not depend on the voltage of the physical input signal, but on the fact whether the input signal represents one of two symbols: ”low“ or ”high“, ”off“ or ”on“. In other words, 0 or 1.
The main unit of a computer is a processor that can perform operations on input electrical signals to output electrical signals. Processors can perform a variety of operations. The input signal provides both the command and the data; it determines which operation has to be applied as well as which data the operation must be applied to.
Memory circuits provide an internal state that causes the processor to not always produce the same output for the same symbolic input. The order in which the symbols appear is relevant to the output. That is, not just a symbolic processing, but a text processing takes place.
Apart from that, a processor has a circuit that establishes the rhythm at which processing takes place. The so-called ”clock“ generates an electrical signal at regular intervals and, on each beat, all other circuits operate on their current input to produce an output.
Therefore, what makes computers possible is the existence of electronic devices that process electrical signals symbolically, have an internal symbolic state, use part of the symbolic input to control themselves, and generate symbolic output step-wise. That is, computers are devices that are driven by text, receive text as input and produce text as output.
The fact that present-day computers are electronic devices is a technological contingency. In the future, a technology could exist that would not be based on electrons, but, for instance, on photons or other subatomic particles. As long as these devices would process text and be text-controlled, they would be computers in the same sense as they are today.
Text-Oriented Operating System
From the insight that software actually does nothing except handle text, the idea of a text-oriented operating system naturally emerges.
The current paradigm of software consists of applications that store their data in files and an operating system that holds the files in a file system. The file is the exchange unit between applications and the operating system and their respective limit, too. Each application defines the file formats it uses and manages reading from and writing to files. Data is therefore application-owned and not shared. However, users constantly need to share data between applications, and so they must spend a lot of time providing bridges between applications; copying and pasting data or exporting and importing data files. Additionally, lots of programmers spend a lot of time working on interfaces between applications and programming export and import functions.
Better designed would be a software system in which the operating system takes care of text storage and the applications provide specialised ways to gather, manipulate and represent text parts. In such a system, an application such as a word processing program would not store the documents in separate files, but would cause the operating system to add some nodes representing the document's content to the system-wide text. Note that in such a software system there would be no need for a file system. Data would not be stored in files, but much more finely graded in text nodes such as the particular chapters and paragraphs. If the user then wanted to retrieve, for example, statistical information about chapters and paragraphs, he or she could invoke the database application, which in turn would fetch them from the operating system. Note that this way, queries could be done upon the actual document contents in real time and not on a redundant copy created by an export-import mechanism. The user could add some information to the document in the database application, and that information would be immediately available inside the word processing application. There would be no more need for bridges between applications.
Yet a proper text-oriented operating system would have not only a central text storage but also a text-engine in its kernel. A text-engine is a software unit that apart from storing and retrieving text parts can also run queries on them and transform them. An application can run a query against the text-engine to get or update the data it needs. The user can also run queries; for example, to know what text units were updated last week or what references to a particular person are recorded.
A text-oriented operating system manages the general, unique text structure and applications need only care about their respective goals. The user can combine applications at will, because they interoperate both spontaneously and smoothly with one another.
It is wrong to refer to research about subatomic particles as ”basic research“. It should more modestly and more properly be called ”basic physical research“. A matter-centred philosophical outlook on the world is surely common, but it is simply incorrect. Matter is not the basis of reality. Just consider how many things there are that are not made up of atoms. What atoms does a Tuesday consist of? How many atoms does one dollar, the Universal Declaration of Human Rights, or the theory underlying basic physical research contain? None. You can surely limit your knowledge to materials and energy, but then you are also narrowing your field of experience extremely and will only grasp just a few facts, leaving out most of reality.
It is not matter but text that is the really basic phenomenon. The basic phenomenon must apply directly or indirectly to everything. Matter does not, yet text does, since it applies to everything that can be described and thus to everything that science can research.
Apart from that, text is a reflexive concept, that is, the definition of text is a text that also applies to itself. This way, the concept of text resolves the gap in science between theory and reality. How can theory conform to reality? There must be a correspondence between both in order for the theory to be provable. Yet how could they correspond, if they are ontologically different? If both are perceived as having text as the common root, correspondence is possible and can be verified.
Truly basic research can only be text research.
Speech, Writing, Computing
Speech is spoken, natural language. Each utterance is a sequence of symbols, symbols being phonemes and other phonemic features such as intonation. Speech is a text representation in which symbols are audible and arranged as a one-dimensional array in time.
Writing is a text representation with symbols that are visible shapes arranged in a two-dimensional space. Writings include one-dimensional structures (paragraphs) but are not restricted to them. While an oral expression is restricted to a single series, a written expression can consist of many series that relate to each other in different ways, such as columns, footnotes, tables or hierarchical trees. The characters used for writing are either alphabetical, which are related to phonemes, or ideograms, which are not, such as Arabic numerals and punctuation marks. Some ideograms have an equivalent in speech; those that represent words, for example, although some of course do not. For these reasons, any spoken text can be written, but not the other way around.
Unlike speech, humans cannot naturally produce writing, requiring tools for it. That is a significant hurdle, indeed mankind existed for many thousands of years without knowing how to write, and at present, schooling and personal effort are required for every child to learn it.
Through computing, a new form of text representation has entered the arena. Digital electronic devices operate with sequences of symbols; these symbols are neither phonemes nor characters but bits. Further research should clarify how computers represent text. The arrangement of bits is neither one-dimensional in time nor bi-dimensional in space. It is a sort of operational arrangement that has to do with processing and internal states (as described by the theory of finite automata). Digital devices can represent all texts that can be represented orally and by writings, but they add new kinds of texts that were unrepresentable before. While writing produces static texts, computer texts are dynamic. They contain not only the description of a certain state but also the specification of the rules that conduct to the next one.
Speech, writing and computing are text technologies. They allow text to be represented. Since text must be represented in order to be created, transmitted and preserved, the technology used is a key factor that determines which texts are feasible, and thus which texts exist. Oral language gave a preponderant role to the human species and made possible the creation of cities and work specialisation. Writing led to advanced civilisations first and, later on, to modern science and open, law-governed societies. Where computing will take us remains to be seen.
Text as Experience
How do we experience text? We use it all the time, send and receive information, control processes, and regulate our coexistence with text. However, we are mostly unaware of it because we master the used language and don't have to worry about it. We are more likely to be thinking about the things we are doing instead; we are busy with the information we are exchanging, the processes we are controlling, or the rules we are establishing.
We begin to experience text as such when we encounter its difficulties. For example, when we express ourselves in a foreign language we begin to learn, or when we handle a complex question. It is then that we feel the difficulty of making text, and this difficulty becomes stronger the longer and the more dense the text is. In fact, text production is a hard activity. Writers know how laborious book-writing is, programmers know how challenging the creation of software is, physicists have been struggling for decades to find an all-embracing formula, and logicians have to deal with what Bertrand Russell called ”logic's hell“. Text is a sound structure built of straight segments you cannot bend; it has an inexorable authority.
Yet text is also beautiful. You marvel at a short sentence that describes intricate facts with ease. You look at a good design and admire its construction, the economic use of resources, how it gets to the heart of the matter. Mathematicians praise general, tight theorems, and call them ”elegant“. Google currently lists 45 million hits for the ”beauty of logic“.
Additionally, text produces concordance. A factual language helps mediate disputes, argumentation eases the coexistence. Two people can have very different views about everything, but if it comes to clear, fact-based language, they will acknowledge it. Every human being understands text exactly; we share it, and that in turn produces agreement and harmony between us.
We experience text as hard, beautiful and unifying.
Science is Text
What is science? There are many sciences such as natural, social and formal sciences; they study general or singular phenomena (i.e. physics and history respectively), the outside world, or the human experience. Yet beneath these multiple faces is a common basis. In short: science creates knowledge, and knowledge is text. Let us consider this.
Science describes researched phenomena either in prose, mathematical language or, more recently, in form of computer algorithms. Obviously, scientific prose writings are texts. What makes up scientific knowledge, however, are not these texts as expression, but their semantic content. This is best seen in scientific handbooks that summarise the knowledge systematically. The more mature a discipline is, the more encyclopaedic handbooks of it have been published and the more mutually concordant these handbooks are.
A science that employs mathematical language uses a particularly sound language, but it does essentially the same thing as the sciences that use prose. Note that the tag ”formal language“ applies both to a natural language style and to mathematics - that is not a popular analogy but instinct.
Algorithmic descriptions are, like mathematical ones, an alternative way to express semantic contents. Every algorithm can be translated into plain English and has exactly the same meaning. No matter the language used to describe it, scientific knowledge can be reduced to text, because the same is true of natural language, mathematical language and software.
Computer = Text Machine
Given that computers are capable of many different things, it is easy to gain the impression that they are universal machines. Yet this cliché is absolutely wrong. Let us take a closer look at what computers really are.
Software systems are typically divided into applications. An application is used for word processing, another for email, and others to respectively manage a spreadsheet, browse the Web, or edit a photo.
A word processing application allows you to edit documents. With my definition of text, the text of the document contains not only the visible character strings being edited, but also some hidden data that contributes to the text representation or behaviour; for example section heading marks or cross-references.
A spreadsheet application is actually a text editor, too. However, the edited text structure is not prose as in word processing, but a table instead. While prose is a hierarchy of headings containing lists of paragraphs, a table is a bi-dimensional array of cells. While headings and paragraphs are character strings, spreadsheet cells are scalar values (either constant or calculated values).
Each document of a word processing or spreadsheet application is stored as a separate file. The operating system provides a file system - a structure of directories or folders in which files can be saved. A file system is also a text: it is a hierarchy of nodes, each of which has a name (string of characters) and binary data (string of bytes).
We have seen that files edited by word processing and spreadsheet applications are text and that they are stored by the operating system in a more comprehensive text. This applies not only to these particular applications, but to all of them.
For example, let us consider a photo editor. Don't be confused by the fact that a photo is not perceived as text, yet software handles it as text. For bitmap graphics, for instance, an image file keeps record of a bi-dimensional array of pixels and stores the colour and transparency of each of them. This is a text, a table in which each cell has two values. The computer, together with help from the graphics card, transforms this text into an image on the screen. The image you see is a text representation.
To sum up, a software system is a system of text, not because everything looks spontaneously like text, but because everything has text inside. This insight brightens up the essence of computers. Computers are machines that can represent text in many different ways; they can create, manipulate, store and retrieve it. Computers are indeed text machines.
Definition of Text
Let us now approach the definition of text. What is text made of? A text is made out of symbols. A symbol is, generally speaking, the unity of a manifoldness. When a symbol occurs in an expression, it refers to many different things that in a particular sense are the same. For example, every common noun is a symbol, the noun ”dog“ embraces plenty of living beings and considers them to be, in a given sense, the same. Note that the two aspects of a symbol, the manifoldness and the unity, are entirely different. The manifoldness can be real, it can be something material, such as the many existing dogs, but the unity cannot: the fact that each and every individual dog is ”a dog“ can only be thought, it is just a mental occurrence.
A text consists of symbols, but a set of symbols by themselves does not make a text. For a text to be, the symbols must be articulated as a whole, that is, there must be relationships between them. For example, the array of symbols ”I,“ ”you“ and ”love“ is not a text, but the sentence ”I love you“ is. To find out the relationships between the symbols in that sentence, the reader performs a syntactic analysis of it. The sentence contains a subject and a predicate, which in turn consists of a transitive verb and a direct object. The symbol ”love“ applies to the relationship between the symbol ”I“ and the symbol ”you.“
I propose the following definition of text: a text is a symbolic expression in which symbols refer to symbols as symbols. To put it algebraically:
T: A -C-> B
Symbol A relates to the symbol B as symbol C; we refer to this fact as symbol T. Each of the symbols A, B and C can in turn be divided into further symbols tied with the same formula again, making the text T more complex.
I argue for the thesis that every text can be reduced to a symbolic expression in these terms, and every text becomes fully described by it. Furthermore, the fact that you can reduce to the formula above such different things as natural language (through syntactic analysis), mathematics and logic (through semantic analyisis), and software (through source code parsing) shows that they are all founded on a common basis, namely text.
History of Text
Research that is yet to be carried out is a comprehensive history of text. There is of course some historical knowledge about phenomena that are related to text, but there is no history about the text as a fundamental concept as defined by me. There are some special fields that have already been cultivated, to mention some: the history of writing, the history of the book, the book trade, libraries, the history of literary genres, text forms, the history of rhetoric, the history of information technology, the history of numbers, arithmetic, numeric computation, the history of algebra, algorithms, the history of geometry, the history of logic. These special fields centre on particular human experiences and products, but they all share textual resources, because the same few forms appear everywhere. One can explore what textual resources were used, in which regions of the earth and in which centuries they appeared, and how they were reused in other fields later on. The key factor for this approach is to abstract from all aspects of the text representation and, more generally, of the human experience, and focus on the text structure.
The book ”Information Ages: Literacy, Numeracy, and the Computer Revolution“ (Hobart and Schiffman, 2000) is worth a mention, because it takes the right direction in considering that literacy, numeracy and computation are different aspects of a single basic phenomenon. This direction is the one to pursue, to extend and to deepen.
Text Experience and Text Reality
We make experiences when we read and write, but we should differentiate in every single case between the object of our experience being the text as such or the text representation. As a rule, any visual or aural experience is made with a text representation. For example, it is possible to more easily read contemporary papers, with spaces between words and with paragraphs and punctuation marks, than it is to read a Roman scroll with an uninterrupted flow of capital letters. However, exactly the same text could be represented in both forms. If we want to progress, it is relevant to find out superior text representations, but this is not the only thing to be improved. We should also become better at understanding text, creating it and manipulating it. I don't mean understanding, creating and manipulating character strings, since these are mere representation, but understanding, creating and manipulating the underlying, purely logical text structure.
The Book of Nature
I define text reduction as the process of extracting a text out of a phenomenon. If you are walking by a lake and utter ”the moon is shining over the water“ then you have extracted a text from something that you have seen, that is, you have reduced an experience to text. That is essentially the same that happens in science. For example, Newton's laws of motion are a text reduction that applies to any material object in the universe.
Text reduction is the opposite process of text representation. You derive a text from reality or you depict a text as a real object. Reduction and representation of text build a mapping between textual expressions and phenomena.
Information: a Bad Metaphor
It is common to refer to text as information. Of course one can encode information as text, but that is not the only thing a text can be used for. A text can code such different realities as scientific knowledge, a law, a question or a command. Consider for example a command. It can be given to a person or a machine (through a computer). It can also be given by a person or a machine. One can perceive a coded command given by a machine to a machine as information, but this is an artificial, academical exercise that has nothing to do with what actually takes place between the machines. Machines simply cannot perceive information and hence cannot act upon it; only humans can do this. Information is not a phenomenon, it is an interpretation.
An important attribute of text is that it can be represented in many ways. You can write a particular word in a variety of ways, with different materials, on different surfaces, with different dimensions and typefaces. Every person who knows the language the word is written in will agree that those different objects represent the same word even though they are materially quite different.
Therefore, text is not a visual thing. What you can see is a text representation, but not the text as such. That is why not everyone sees words, but just the people who understand the word's language. If I see a word that is written in an unknown alphabet, I won't see a word but a drawing. Text is not on the paper, it is in your mind.