Saturday, February 27, 2010

tibudopabikugolatudaropi

In Cracking the speech code: How infants learn language [Patricia K. Kuhl, Acoust. Sci. & Tech. 28, 2 (2007)] says:

Studies show that 8-month-old infants can learn word like units on the basis of transitional probabilities. Saffran, Aslin and Newport [26] played two-minute strings of computer synthesized speech (e.g., tibudopabikugolatudaropi) to infants that contained no breaks, pauses, stress differences, or intonation contours. The transitional probabilities were 1.0 among the syllables contained in four pseudo-words that made up the string, tibudo, pabiku, golatu, and daropi, and 0.30 between other adjacent syllables. After exposure, infants were tested for listening preference with two of the original words, and two part words formed by combining syllables that crossed word boundaries (for example, tudaro—the last syllable of golatu and the first two of daropi). The results show that infants learned the original pseudo-words.

I didn't understand what "0.3 between other adjacent syllables" meant, as somewhere probabilities must add to 1. So I went to the original article by J. Saffran, R. Aslin and E. Newport, Statistical learning by 8-month old infants [Science, 274, 1926–1928 (1996)]. Indeed, in that article, we find:

The only cues to word boundaries were the transitional probabilities between syllable pairs, which were higher within words (1.0 in all cases, for example, bida) than between words (0.33 in all cases, for example, kupa).

So that leads me to write the following stochastic grammar:

Word => 0.25 ti 1.0 bu 1.0 do
Word => 0.25 pa 1.0 bi 1.0 ku
Word => 0.25 go 1.0 la 1.0 tu
Word => 0.25 da 1.0 ro 1.0 pi
Start => 1.0 Sequence
Sequence => 0.99 Word 0.33 Sequence (see note)
Sequence => 0.1 End

Note: Actually, this stochastic grammar is not correct, because it doesn't reflect the fact that the experiment explicitly says that the same word cannot appear twice in sequence. That's what explains why the probability is 0.33 and not 0.25. However, to reflect that, a different stochastic grammar needs to be written. See at the end of this blog.

The next question is: how does the child derive that grammar from the observation of input; as the article says, a 2 minutes sequence is enough for the child to learn. How do they do it? While I know of universal grammar selection theory (e.g. Evolution of Universal Grammar, Martin A. Nowak, Natalia L. Komarova, and Partha Niyogi, Science 5 January 2001 291: 114-118
), it seems that we are not at that level here, and that we should rather look at songbirds learning, e.g. Recursive syntactic pattern learning by songbirds, Timothy Q. Gentner, Kimberly M. Fenn, Daniel Margoliash & Howard C. Nusbaum [Nature 440, 1204-1207 (27 April 2006)], which would address another aspect of Patricia Kuhl's discussion in her article cited above:

In other species, such as songbirds, communicative learning is also enhanced by social contact. Young zebra finches need visual interaction with a tutor bird to learn song in the laboratory [32], and their innate preference for conspecific song can be overridden by a Bengalese finch foster father who feeds them, even when adult zebra finch males can be heard nearby [33].

Note that this article uses a regular grammar, not a stochastic grammar, and doesn't attempt to specify how the grammar is acquired. What the article does beautifully is demonstrating that an actual grammar is acquired by the birds, i.e, that concepts are built and sequentially and recursively organized.

Well, the problem I want to address is now clearly stated. If stochastic grammars are as general as I believe, I now need to think hard. There are two concepts, Word and Sequence, in the stochastic grammar I proposed above.

1. Are they the right concepts, in other words, is it the right stochastic grammar?
2. Whether they are the right concepts or not, how do such concepts emerge? Or don't they? Is answering question 2 also providing an answer to question 1?

This is somehow related to the binding problem, i.e. understanding how perceptual entities are actually linked to concepts in the brain.

Now, as I said earlier, the stochastic grammar I wrote above is not the right one anyway. It's time to stop this blog and go to a correct (if lengthier) explanation that I'll undertake in my Stochastic Consciousness knol. I'll see you there.

Bertrand du Castel


0 comments: