26 October 2013

Verb valency and its consequences in formal grammar

When I first went to university, I was studying computer science and artificial intelligence (although I switched to pure CS in third year) and I was introduced to formal grammars in elementary natural language processing (NLP -- also known as computational linguistics.) When I later studied languages with the Open University, I again went through an introduction to formal grammars.

Both of these basic introductions were based very heavily on the idea of context-free grammars (Wikipedia link for the curious). First time round, I found it difficult to get my head round these CFGs, but as I hadn't done any serious study of language (French and Italian in high school doesn't count) I couldn't put my finger on it... somehow they just felt wrong.

The only objection that I could form clearly was that language is so intrinsically tied to context that it's meaningless without it. For CFGs to be a valid model of language would imply that grammar has at most only a very minor role in the formation of meaning -- an idea that I personally am opposed to. This was indeed one of claims of Noam Chomsky (WP), the man credited with first formalising CFGs. Chomsky's claim was that our awareness of grammaticality was not tied into our understanding of language. This idea he demonstrated with nonsense sentences that were superficially grammatically correct, the most famous of which is colorless green ideas sleep furiously.

When I came back to formal grammars it was in the context of a course on English grammar as part of my Open University studies, and comparing the results and consequences of CFGs to the reality of complex structures in English  started to make my objections to CFGs much clearer.

My beef was that CFGs typically broke a sentence into a noun phrase and a verb phrase, which then contained everything but the  This rule was typically formalised as:
S -> NP VP

The first problem I had with this was to do with the passive and active voice, and the role of the grammatical subject of a sentence.

For example, this splits the man ate the snake into NP "the man" and VP "ate the snake", but the man was eaten by the snake also gets given an NP of "the man", even though the relationship between "the man" and the verb "eat" is pretty much diametrically opposite in the two examples.

With some verbs, this complication is even clearer. Consider the verb close.

The company closed the factory.
The factory was closed by the company.
The factory was closed.
The factory closed.

Notice that in all the different permutations, there is one constant: the thing being closed. We don't need an agent, but we must always have an affected party. Traditional grammatical terminology tells us that close is transitive, because it has to take a direct object in normal use, but there's more to it than that, because we can remove the idea of an agent altogether. There's a small set of verbs that behave this way in English, but that set is growing. These verbs are referred to as ergative, even though there are differences between them and the handling of verbs in an ergative-transitive language such as Basque. (WP article on English ergative verbs.)

If we split our sentence S->NP VP, then we're not considering the role of the VP, and without knowing what we've already got, how can we identify what is still needed? This is how Chomsky's model allows us to generate such utter nonsense.

The other thing that bugged me about Chomsky's model was how the trees we were dealing with were so Anglo-centric. Now, we were always given the caveat that the rules were different for different languages, and while that in principle seems fair enough, the differences in structure between CFGs for different languages are pretty huge.

Consider the effect of word order. Languages are often classified by the order of the subject (S), verb (V) and object (O). English is typically SVO: I (S) shot (V) him (O) and in S->NP VP our subject S is NP and our object O becomes part of the verb phrase VP. Most of the Romance languages follow the same SVO pattern SVO, and even though German verbs can be fairly weird from an Anglo-centric viewpoint, most of the time, German sentences start with the subject, so Sentence->NP VP handles most of Western Europe.

In fact, the rule even survives into the extremities of the Indo-European family, as the Indic languages (Hindi/Urdu, Punjabi, Bengali etc) are SOV, so the only difference is that you now have a VP of OV instead of VO, which is a pretty trivial difference. Even Japanese has a tendency to be SOV.

Even a VOS language would be OK, because now you Sentence -> VP NP, and your VP is verb,object and your NP subject. It's a trivial reordering.

But what about the Celtic languages in Europe?

They're mostly VSO (Breton's changed a bit under influence from Latin and then French) and with the subject trapped between the verb and the object, any rule you make now is fundamentally different from the old S->NP VP of English.

First up, if you define it as S->VP NP, your NP now represents the object, not the subject, which is a non-trivial difference. Secondly, it's just plain wrong, because adverbials go after the object (normally; exceptions apply) and you can't make the adverbials part of the object, because they are tied to the verb (hence the name).

Now it should be noted that Noam Chomsky's other big idea is that of universal grammar. Chomsky proposes that all language is just combinations and permutations of an evolved grammatical system that occurs in the brain as a sort of "language acquisition device", hence all subsets of a single genetically-determined "universal grammar". It is quite ironic, then, that it was Chomsky who ended up defining a system that exaggerates the differences between languages rather than highlighting the fundamental similarities between them.

And even if you don't believe in universal grammar, this hiding of similarity is still a problem: conceptually, the Indic languages are very similar to the Celtic languages when word order is ignored (because they are still surprisingly closely related despite thousands of years of independent evolution), but superficially, a CFG for Hindi looks more like one for Japanese than one for Welsh.

Which brings us to valency. This term was proposed by Lucien Tesnière in a 1959 book, as an analogy to valency in chemistry -- the idea that atoms bond to other atoms by a number of connections with certain restrictions.

Looking back at the earlier example of close, the verb has a mandatory bond to an affected entity (the factory, in the example), and an optional bond to an active agent (the company).

Basically, all noun phrases are subordinate to the verb, or as I like to say, the clause "pivots" on its verb.

Tesnière's valency should have changed the way we defined formal grammars, because on paper it's a very small change to the model, and yet 40 years later after Tesnière's book was published, my CS/AI study of formal grammars started with a near-identical model to Chomsky's. 47 years after Tesnière, the OU was still using the same models for teaching to language students. And a full 55 years after the book, Daniel Jurafsky and James H. Martin published the second edition of Speech and Language Processing, and Chomskyian CFGs are still the sina qua non of formal grammars, rather than a historical footnote. Any course in NLP or computational linguistics is layered on top of CFGs, and results in piling hack upon hack on top to overcome the deep-seated flaws in the paradigm, rather than simply building a paradigm which works.

So why didn't CFGs die?


CFGs, while they were designed to describe human language, are closely coupled to the development of the computer. The parse trees behind CFGs model structures developed early on in the history of computer science, and it is therefore evident that the cognitive scientists involved in them were of the school that believed that the human brain works on the same lines as an electronic computer. (This, by the way, is patent nonsense. If anyone built a computer that worked like a brain, it would be impossible to program, and would be as trustworthy as a single human being in its conclusions -- i.e. completely untrustworthy.)

As it turns out, CFGs are pretty efficient as a model for computers to use to identify parts of speech in a sentence, and it was all the computers had the power to do in the early days. It gave us a good approximation, and something to work with.

But while an approximation based on fundamentally flawed models may get us close, if it gets us close in the wrong way, improving the approximation may be intrinsically too difficult.

The planet Earth, for example, is over forty thousand kilometres in diameter at its widest point. To be able to pinpoint a spot to within 30km would therefore seem to be a pretty damn good approximation. However, the Grand Canyon is 29km wide at its widest, and the amount of work, time and effort it would take to get from the right bank to the left bank would be phenomenal.

So, just as this is a numerically "good" approximation but in practice a really bad one, so are CFGs apparently good, but store up problems for later.

This all goes back to what I was saying last time about meaning existing on multiple levels of abstraction, and as Chomsky had decided that there was some conceptual gap between meaning and grammar, his trees make no attempt to model meaning.

No comments: