IPO ANNUAL PROGRESS REPORT Nr - Eindhoven University of ...alexandria.tue.nl/tijdschrift/IPO...

IPO ANNUAL PROGRESS REPORT

Nr.10 1975

Editor: A.J. Breimer

Typist: Helena Koning

INSTITUTE FOR PERCEPTION RESEARCH - fNSTITUUT VOOR PERCEPTIE ONDERZOEK

P.O. BOX 513 EINDHOVEN HOLLAND

NATIONAL (040) 756605TELEPHONE -------

II\JTERNATIONAL +3140 756605

I I

ORGANIZATION I.P.O.

supervisory board

(31.12.1975)

Ir. K. Kooij (chairman)

Dr. J.H. Bannier

Prof. Dr. W.A.T. Meuwese

Prof. Dr. J.F. Schouten

Dr. Ir. K. Teer

- Eindhoven

- 's-Gravenhage

- Eindhoven

- Eindhoven

- Eindhoven

scientific board Prof. Dr. H. B. G. Casimir (chairman)

(31.12.1975) Prof. Ir. R. G. Boiten

Prof. Dr. I r. P. Eijkhoff

Prof. Dr. H.E. Henkes

Prof. Dr. S.L. Kwee

Prof. Dr. W.J.M. Levelt

Prof. Dr. I r. R. Plomp

Prof. Ir. O. Rademaker

Prof. Dr. R.J. Ritsma

Prof. Dr. P.C. Veenstra

Prof. Dr. C.J.D.M. Verhagen

Dr. Ir. P. L. Walraven

Prof. Dr. P.J. Willems

Prof. Dr. Ir. A. van Wijngaarden

- Heeze

- Delft

- Eindhoven

- Rotterdam

- Eindhoven

- Nijmegen

- Soesterberg

- Eindhoven

- Groningen

- Eindhoven

- Delft

- Soesterberg

- Tilburg

- Amsterdam

director Dr. H. Bouma (from 15-9-1975)

Prof. Dr. C.A.A.J. Greebe (till 15-9-1975)

adviser Pro f. Dr. A. Co hen

research associates Ing. D.J .H. Admiraal

Ing. J.J. Andriessen

Ing. H.J. BleilevenIr. F.J,J. Blommaert

Drs. D.G. Bouwhuis

Ir. A.J. BreimerIr. J.P.L. Brokx

Drs. B.L. Cardozo

Dr. Ir. H. Duifhuis

Dr. J.P.M. Eggermont

Ir. F.L. Engel

J. t t Hart

Ing. Th.A. de Jong

Dr. A.F.V. van Katwijk

Dr, Ch.P. Legein (part-time)

Ing. F.F. Leopold

Ing. G.J.J. Moonen

/PO annual progress report 10 /975

research staff

secretaries

library

workshop

III

H.F. \1uller

Dr. Ir. F.L. van NesDr. Ir. L.P.A.S. van Noorden (Z.W.O.~)

Dr. S.G. Nooteboom

Drs. J.J. de Rooij (Z.W.O.l<)

Dr. Ir. J.A.J. RoufsDrs. C.W.J. Schiepers (Z.W.O.~)

I.H. Slis

Ing. J.e. Valbracht

Ir. L.L.M. Vogten

Ing. J. Vredenbregt

Ir. L.F. Willems

Ing. M.A. Alewijnse

Th.M. Bos

Ing. E. de Braal

G.J.N. Doodeman

Ing. J.C. Jacobs

C.A. Lammers

Ing. G.H. van Leeuwen

A.W.J.J. Melchers

A.C. van Nes

W.H. Noordermeer

Ing. J.A. Pellegrino van Stuyvenberg

Ing. J. Polstra

A.L.M. van Rens

K.G. van der Veen

Ing. H. de Vries

Mrs. J.A.C.E. van Esch-van der Vleuten

Mrs. C.J. Mennen-Senkeldam

Mrs. L.J. Savenije-Clignett

Mrs. P. Thiele

Mrs. J.W. Tielemans

Mrs. J.C.G.M. Verbruggen-Jansen

Mrs. J.M. Hoogervorst

Miss C.W. Koning

M.A. van den Ban

C.G. Basten

J.H. Bolkestein

P.A.N. Broekmans

C.Th.P. Godschalx

H.E.M. Melotte

D.J. van der Wees

~ Netherlands Organization for the Advancement of Pure Research

IPO annual progress report 10 1975

IV

This report or any part thereof may not be reproduced in any form without the

written permission of the Institute for Perception Research. Reprints of the

separate contributions are available. Illustrations may be reproduced only

with explicit mentioning of source; copies will be appreciated.

/PO annual progress report /0 /975

v

INTRODUCTION

As new director of the Institute for Perception Research IPO, I gladly take the

opportunity to greet our colleagues and all those interested in the IPO through

the medium of the 10th issue of the IPO Annual Progress Report.

Four members of the scientific board left in 1975, viz. Professor L.B.W. Jongkees,

Professor J. Koekebakker, Professor H. Mol, Professor A.J.B.N. Reichling and

Professor A.J.H. Vendrik, and I should like to express our thanks to them for their

help in improving the quality of our research and in maintaining effective relation

ships with other research groups.

For IPO, the year 1975 saw a number of changes, the main one being that

Dr. C.A.A.J. Greebe, who had succeeded Dr. J.F. Schouten in 1972 as director of

IPO, himself left on September 15th. He served IPO, among other things, by his

concentrated efforts in formulating explicit lines of research in close consulta

tion with the IPO staff. In this programme, an increased effort is foreseen in

cognitive aspects of perception. Our Institute is greatly indebted to Dr. Greebe,

and I consider myself fortunate in continuing along the research lines drawn up

by him.

Other members of IPO who left us this year are Linda Savenije-Clignett, Jos Tielemans

and Messrs Alewijnse, Andriessen, v.d. Ban, Eggermont, Engel, van Noorden, Noordermeer

Schiepers, Slis and Vredenbregt. IPO was fortunate in having them as colleagues and

we are grateful for their contributions both to the work and to the comradely

atmosphere of the Institute. We are glad to say that we can maintain close relation

ships with some of them at least.

As will be seen from the contents of this Progress Report, we have all done our

best to limit the impact of these changes on the work. Our research potential will

soon be strengthened by the advent of new members of staff. The absence of reports

on ergonomic subjects is purely accidental.

In the scientific field, I wish to draw attention in particular to two papers,

published this year (see Publications list on p. 91). Dr. van Noorden, in his

dissertation, worked out an original approach to the audition of perceptually

ambiguous tone sequences. Mr. 't Hart and Dr. Collier have published their inspiring

continuation of the early work on intonation by Cohen and 't Hart, based on the

melodic line of spoken sentences, which has now led to a grammar of Dutch intonation

in which a number of hierarchical levels are distinguished. Furthermore, I should

like to make mention of the "Dynamic Aspects of Speech Perception" conference on

which occasion IPO was happy to provide a meeting place for linguists, physicists

and psychologists, all interested in the perception of running speech. A general

survey of the symposium is given in a separate contribution by Dr. Cohen and

Dr. Nooteboom, who have assumed responsibility for the Proceedings which have

already appeared.

Both in pure and in applied science in the field of human perception, we wish to

contribute to basic understanding of the processes involved. Close contacts, both

with our colleagues abroad and in the Netherlands are essential for the purpose and,

indeed, a source of encouragement to ourselves.

H. Bouma


VI

CONTENTSpage

organization II

introduction V

contents VI

1 auditory perception

B.L. Cardozo, H. Duifhuis, G.H. van Leeuwen

L.P.A.S. van Noorden and L.L.M. Vogten

Auditory Research in 1975

L.P.A.S. van Noorden

Temporal Coherence and the Perception of

Temporal Position in Tone Sequences

H. Duifhuis

Psychophysical Two-Tone Suppression

2 speech

S.G. Nooteboom, J.P. Brokx, G.J.N. Doodeman,

Th.A. de ,long, J. It Hart, A.F.V. van Katwijk,

,l.,l. de Rooij, J.H. Slis and L.F. Willems

Research on Speech Perception in the IPQ 1975

2

4

19

25

,l. It Hart

The Location of the Non-Final Fall in Pitch

Contours in Dutch 27

,l.J. de Rooij

Prosody and the Perception of Syntactic

Boundaries

A.F.V. van Katwijk

Accent Patterns in Number Name Sequences

A. Cohen and S.G. Nooteboom

A Symposium on Dynamic Aspects of Speech Perception

IPQ, August 4-6, 1975

3 visual perception

J.A.J. Roufs, J.J. Andriessen, Th.M. Bos, H. Bouma,

F.L. Engel, Ch.P. Legein, J.A. Pellegrino van Stuyvenberg,

A.L.M. van Rens and C.W.J. Schiepers

Research on Vision 1975

H. Bouma and D.G. Bouwhuis

Word Recognition and Letter Recognition

IPQ annual progress report 10 1975

36

40

45

49

53

VII

J.A.J. Roufs and F.J.J. Blommaert

Pulse and Step Response of the Visual System

F.L. Engel and Th.M. Bos

Small Involuntary Eye Movements

H. Bouma, Ch.P. Legein and A.L.M. van Rens

Visual Recognition by Dyslectic Children

4 instrumentation

D.J.H. Admiraal

IPQ Instrumentation 1957-1975

A.C. van Nes

A Speech Spectrum Rotator

G.H. van Leeuwen

Electronic Ear Trumpet

J. Vredenbregt and J.H.M. van der Straaten

A Miniature EMG Device

5 i.p.o. publications 1975

/PO annual progress report 10 1975

60

68

72

80

83

86

88

91

1 auditory perception

2

AUDITORY RESEARCH IN 1975

B.L. Cardozo, H. Duifhuis, GH van Leeuwen, L.P.A.S. van Noorden and L.L.M. Vogten

This introduction to the auditory section is intended as a background to research

in the field. The objectives are at three levels. The basic one is to obtain

knowledge of "how the ear works". A second level may be briefly described as

research intended as support to speech research. A third level of research objec

tives aims at capitalising on the first and second levels and is involved in

projects of a more applied nature.

It has been stated on other occasions, that from the viewpoint of basic research,

knowledge of the functioning of hearing should acknowledge physiological findings.

In this area there have been important developments in recent years. Physiologists

have supplied us with important new data both on the hydromechanics of the cochlea

and on the physiology of the more peripheral stages of the auditory nervous system.

Although these data refer to animals, mostly under anaesthesia, and one is not,

as a rule, in a position to measure activity in more than one single neuron at a

time, these data represent important reductions in the set of possible models of

auditory information processing. We want to stress that these reductions are based

on what we do know about auditory physiology. On the other hand, the map of the

ear's physiology shows a vast terra incognita and it would be unfortunate indeed

to be ohliged to wait until this area were fully explored before any elaborate

model of the hearing mechanism could be ventured.

Having indicated the restrictions of auditory physiology, it is but fair to recall

the limitations of psychoacoustics, in which most experimentation is performed

with relatively simple stimuli in relatively simple, threshold-like paradigms that

allow relatively firm criteria to be used by listeners. We must bear in mind that

the relatively well-defined, more or less quantitative models based on psychoacoustic

data depict only a very narrow rim of ~uman audition.

In terms of the ultimate objective of basic research on audition, progress is slow.

Some new results are presented in a paper by Duifhuis in the present issue. Vogten

has been investigating the consequences of his interpretation (Vogten, 1974) of

the low-level ~laximum ~Iasking Frequency shift Ul~lF). Pilot measurements of the

pulsation threshold (Houtgast, 1973) have been carried out at low stimulus levels

in order to check the interpretation of MMF at low levels in terms of two-tone

suppression. Unfortunately, at these low levels the reproducibility of the measure

ments is so poor that it is very difficult to draw conclusions.

On a second level of research, we aim at ohtaining a more qualitative description

of auditory phenomena less amenable to threshold-like paradigms. This type of

investigation can still be regarded as p~ychoacoustic, but the criteria for the

listener are less firm, the experimental data tend to show greater variance, so

that interpretation in terms of auditory models is difficult indeed. The study of

temporal coherence in tone sequences (van Noorden, 1975 and this issue) is a good

example. In this domain, there are many problems that could be tackled, e.g. the

perception of accent in tone sequences. This work is primarily descriptive. The

IPO annual progress report /0 /975

3

important thing is to analyse perception and try to find elementary percepts and

categories of perception, in the hope of bridging the gap between psychoacoustics

and perceptual phonetics.

The third level of research comprises a multitude of practical problems, such as

the perception and perceptual evaluation of noises, factors contributing to the

intelligibility of speech and the development of an acoustic amplifier in an attempt

to optimize certain factors (e.g. signal-to-h~ckground ratio) for the benefit of

the mildly hard of hearing (van Leeuwen, this issue).

references

Houtgast, T. (1973) Psychophysical Experiments on "tuning curves" and "two-toneInhibition", Acustica ~, p. 168-179.

Vogten, L.L.M. (1974) Low-level pure-tone masking and two-tone suppression, IPQAnnual Progress Report ~, p. 22-31.

4

TEMPORAL COHERENCE AND THE PERCEPTION OF TEMPORALPOSITION IN TONE SEQUENCES

L.PAS. van Noorden

introduction

It has been shown by a number of authors (Bregman and Campbell (1971), Schouten

(1962), Thomas and Fitzgibbons (1971) and Wilcox (1972)) thatit is difficult to

report the temporal order of the elements of a cyclic, repeated tone sequence if

both the rate of presentation and the frequency intervals between the successive

tones have high values. It has been proposed that this is due to the fact that

the successive tones are not perceived as one coherent whole, but that they fall

into groups according to their frequency region.

In Van Noorden (1975) it has bee~ shown th2t a largest frequency interval can be

found between the tones A and B of the alternating tone sequence ABAB .. where this

tone sequence can still be perceived as one coherent whole (i.e. the temporal

coherence boundary). This boundary was found to depend on the tone repetition

time T in such a way that with increasing value of T the value of the boundary

increases. When the tone sequence is perceived as split up into two strings A A.

and B B.the observer has the impression that he can not tell the precise position

of the tones B relative to the tones A.

In this article some expcrimer.ts are prescrrced with the aim of discovering whether

the loss of temporal acuity in perception of tone sequences is caused by the loss

of temporal coherence. We shall not study the perception of temporal order but

the perception of the precise temporal relations between the tones, as this can

be made in the simple tone sequence ABAB .. of which we have precise measurements

of the temporal coherence boundary. Furthermore, we shall determine the temporal

acuity in al sequCi:CC-i of 0:11y 2 or :' tones, i;1 W,l;_C): we ;'ave S;'OW:1 that t1e temporal

coherence boundary lies at much larger frequency intervals, and h) dichotically

alternating tone sequences.

the alternating tone sequence ABAB..

method

We used a tracking method to measure how precisely the temporal position of the

tones B with respect to the tones A can be observed in the tone sequence ABAB ..

Starting with the continuous tone sequence ABAB .. described in Van Noorden (1975),

we made the tone repetition time of the tones B (T B) about 1% smaller or larger

than that of tones A (TA); as a result, the tones B will gradually shift from

the midpont between the tones A. The observer is now asked to depress a push

button as soon as he perceives that the tones B are no longer precisely half-way

between the tones A. At this moment the relative positions of tones A and Bare

recorded and TB is changed from TA + 1% to TA - 1% or from TA - 1% to TA + 1%,

so that the tones B move back through the middle and so on. In other words, the

tones B oscillate continuously about the midpoint between the tones A during the

experiment (see Fig. 1).

IPa annual progress report /0 /975

5

Fig. 2. Set-up for the tracking experiment.

Fig. 1. Form of the stimulus used for investigation of the perception of the temporalposition of the tones B in the continuoustone sequence ABAB .. by a tracking method.The repetition time of the tones A, (TA),is equal to 2T. The repetition time of thetones B, (TB), is equal to 2T - 1% or2T + 1%. The sign changes after each response of the observer. 6T' and 6T" represent the just noticeable displacement of thetones B from the midpoints between the tonesA.

o.'s response

This measuring method resembles that

used by Von Bekesy for the semi

automatic tracking of the auditory

threshold, but the criterion that the

observer has to use in our experiments

is more difficult than the criterion

of whether or not one can hear a tone.

Moreover, in the auditory threshold

measurements the mean of the two rever

sal points is taken as a measure of the

threshold, while in our measurements

we are interested in the distance be

tween the reversal points. For these

reasons, among others, completely un

biased results cannot be expected from

these measurements; ~owever, the speed

and directness of this method counted

as advantages in this exploratory

investigation.

The distance 2l1T = LIT' + LIT" between

the reversal points on the two sides

of the half-way position can be taken

as a measure of the accuracy with

which the observer can perceive the

position of tones B with respect to

tones A. The measuring set-up is shown

in Fig. 2.

:..., ~

TA =2T + TB =2T+1%

- TB=2T-1%

I -

lilT" :" ,

'~ I~ ';--

+ I .. ~- I - I

o:s response

, '-,,...

TIME

, + ,+'-A

6T": :,....,-,....

A B A B

G ~T:T:zUJ:::>oUJa::LL

observations

Before discussing the measurements, we shall describe what the observer can hear

when he listens to the tone sequence in which the tones B are shifted with respect

to the tones A. We used a tone duration of 40 ms, trapezoidal burst envelopes and

a level of about 35 dB SL, f B = 1000 Hz, fA and T variable.

In slow tone sequences (T = 400 ms) in which the temporal coherence of the tones

A and B can always be heard, we hear a rhythmic change in the tone sequence

(comparable with the structure of iambic or trochaic verse). The size of the tone

interval between A and B does not change this effect essentially. The situation

is different with slightly faster tone sequences (T about 160 ms). Fission now

occurs when the tone interval is large enough. In this case it is more difficult

to perceive the position of the tones B, but with large shifts away from the

half-way position we perceive a sequence of groups AB AB instead of two separate

~trings A.A. and B.B .. The separation by pitch is as it were replaced by temporal

separation into groups of two tones. When the tone interval is smaller, the transi

tion from isochronous sequence to trochee or iamb can still be clearly heard.

6

In very fast tone sequences (T about 100 ms), we hear that B is no longer in the

half-way position, not so much on the basis of temporal differences but rather as

a consequence of subtle differences in the tone bursts themselves: the starts of

the tones seem to be either more or less gradual or staccato, or to differ in

level. These differences are more marked at small tone intervals. At large tone

intervals, the phenomenon of separation into groups AB AB can be observed.

measurements

We used the above-mentioned tracking method to measure the just perceptible dis

placement of the tones B from half-way between the tones A in the continuous tone

sequences ABAB .. , for a number of values of the tone interval I and the tone repe

tition time T. The tone duration was always 40 ms and the tone bursts had trapezoi

dal envelopes with rise and fall times of 5 ms; f B was 1 kHz and fA was variable.

The level of both tones was the same and was chosen so as to give about 35 dB SL

for the isochronous tone sequence with T = 100 ms, D 40 ms and fA = f B = 1 kHz.

Values of 62,82,101,120,158,202,278 and 398 ms were taken for the tone

repetition time, and of 0,1,2,3,5,7,10,14,19 and 25 semitones for the tone

interval I. About 8 reversal points on each side of the middle position were deter

mined in succession for each tone sequence with given values of T and I. With a

given value of T, all values of I were dealt with in a random order; this took

about 20 minutes. The various values of T were also dealt with in a random order.

In order to test the reproducibility, this whole set of measurements was repeated.

In all, about 5000 reversal points were determined in the course of a week. All

these measurements were carried out by one observer, the author. However, pilot

experiments indicated that different observers get comparable results.

results

The results of the measurements are presented in Fig. 3. The difference between the

first and second sets of measurements were slight; we therefore used their average.

It may he seen from Fig. 3 that the distance between the reversal points 26T

increases with increasing frequency interval I; however, the extent of this increase

depends strongly on the value of T. Regression lines have been drawn through the

mean reversal points as functions of the tone interval. The intercepts of these

lines on the T-axis and their slopes, together with the corresponding correlation

coefficients, are given in Tahle 1. In most cases the correlation coefficient is

greater than 0.9.

The strongest dependence of the value of 2nT on the tone interval was found at

T = 120 ms, the weakest at T = 398 ms. It is further striking that when T is less

than or equal to 120 ms, a systematic difference is found between the just obser

vable shifts forward and backward from the middle. The spread in the measured

values increases slightly with 2nT and v~ries roughly between 2 and 15 ms.

discussion

In order to compare these results with those obtained for the temporal coherence

boundary, we have presented them in a different way in Fig. 4. In this figure the

LvN.f8

=1kHz

7

T 2 3F B F B F B

ms ms ms

62.4 -.95 .96 56 66 -1.39 0.8681 .6 -.99 .96 70 90 -1 .70 0.79

100.8 -.99 .97 89 112 -1 .92 O. 71120.0 -.97 .94 105 135 -2.50 1 .26158.4 -.96 .87 145 171 -1 .98 1 .45201 .6 -.94 .94 188 216 -1 .43 1 .6527 8.4 -.64 .91 260 295 -0.37 0.73398.4 -.54 .50 379 421 -0.29 0.21

Fig. 3. The just noticeable forward displacement ~T' and backwarddisplacement ~T" of the tones Bfrom the midpoints between thetones A in the tone sequenceABAB .. as functions of the toneinterval I with tone repetitiontime T as parameter. The experimental points are the mean of15 determinations. At I = 0 andI = 25 semitones the spread ofthe observations (+ standarddeviat ion) is indicated bv horizontal bars. The lines throughthe experimental points correspondto the regression data of Table 1.As can be seen, ~t large values ofT the values of 6T' and 6T" donot depend on I, as they do atsmaller values of T. For T < 120 ms,there is considerable asymm~trybetween 6T' and 6T".

Table 1

1) correlation coefficient2) intercept of regression line

(I = 0)3) slope of regression line

F: T1 < T ; B: T1 > T

relative just perceptible shift ~T/T, is plotted as a function of T with the fre

quency interval I as parameter. The mean of the forward and backward shifts from

the middle (H = (~T' + H")/2) is taken.

We can read this graph as follows. The longest time int~rvals give the sharpest

perception of shifts from the middle, which does not much depend on the tone

interval. As T is reduced ~T/T increases, the more so as I is larger. This increa

se in ~T/T continues until T = 120 ms, when a constant value is reached. The

results for T < 120 ms can be approximated to by the expression ~T/T = a + b.I,

~here a = 0.11 and b 0.016 per semitone. These values were found with the aid

of a chi square grid method. With these values the linear correlation coefficient

of the measured points and the predicted points amounts 0.969, which means that

94% of the variation in the experimental points can be explained with this model.

8

wasIn Van Noorden (1975) it

dichotic tone sequences

In order to show the connection

with the temporal coherence boun

dary (Van Noorden, 1975), we have

indicated in Fig. 4 the value of

T at which the temporal coherence

boundary is found at the value of

I in question. It will be seen

that for frequency intervals of

~ 3 semitones 6T/T increases, pas

sing the temporal coherence boun

dary. At the coherence boundary

£lT/T has a value of about 20%; this

is roughly three-to four-fold com

pared with the smallest values

found (5 to 7% at T = 400 ms). The

gradual manner in which £lT/T in

creases as T decreases is in

agreement with the gradual nature

of the temporal coherence boun

dary.

shown that fission occurs in

dichotic tone sequences with

Fig. 4. The mean relative just noticeabledisplacement £lT/T = (£IT' + £IT" )/2T of thetones B in the tone sequence ABAB .. as a function of the tone repetition time T with Ias a parameter. These data are derived fromthose of Figure 3. The set of curves is cons~ructed by using the formula £lT/T = a + bIwith a = 0.11 and b = 0.016 semitone- 1 forT < 120 ms, and by drawing the best smoothcurve through the experimental points by eyefor larger values of T. The dotted area indicates the values of T where at a certain valueof I the temporal coherence boundary is found.As can be seen, £lT/T increases at the temporal coherence boundary when we follow thecurve of a certain value of I from large tosmall values of T. fA = f B and T less than about

150 ms. The experiments described

below were performed in order to see whether the perception of the relative temporal

position also deteriorated under these conditions. We used the tracking method of the

first section, except t~at only the forward s~ift from the middle was determined. The

direction of shift of the tones B with respect to tones A was reversed automati

cally as soon as the middle was reached. The experiments were performed on the tone

sequences ABAB .. and ABA ABA ...

60 1= • 25 semitonesI c 19 LvN

50 -- 25 • 14, 10

· 7

19v 5

3

" 2~ 14 • 1-30 · 0f-

10f-<I 7

20 5

2110 ___ 0

t

00 50

The results are plotted in Figure 5. The experimental points are the medians of ten

reversal points. It will be seen that the perception of relative temporal position

is less accurate with dichotic presentation than with diotic. However this difference

is less at large T than at small T. The results with dichotic presentation deterio

rate compared with those for diotic presentation between 160 and 140 ms. This is in

good agreement with the value of 150 ms of Van Noorden (op. cit.) for the loss of

temporal coherence with dichotic presentation, and with the minimum repetition time

of 172 + 25 ms for the ability to follow apparent movement of sounds from left to

right and vice versa found by Blauert (1~70)

Here again, loss of temporal coherence is accompanied by deterioration in the

accuracy of perception of relative temporal position. It may further be noted that

perception of the relative temporal position is more difficult in the tone sequence

ABA ABA ... than in the sequence ABAB ..

9

Fig. 5. The just noticeable relative forwarddisplacement ~T/T of the tone B in the tonesequences ABAB .. and ABA ABA ... in dichoticand diotic presentation. The dotted areaindicates the values of T at which temporalcoherence is lost in dichotic sequences.

aI . I • LvN

fA :fa

a --- ~~:Aa ABA -~'. \;:{::

a L':X.--01-dlOhco~ 0 ~ \

o 0............... .o....... o:::~

aa

-dir:'";:F-l LvNfA:fs

aABAj- - re').,,1

01--- diohco-o-o-o_o"'''_ ~""'_...

a I I I 1-0

2

3

3

4

5

6

_2~

a 50 100 150 200 250TONE REPETITION TIME T (ms)

Huggins (1974) has measured the perceived rate of dichotically alternating sequences

of clicks compared with the perceived rate of sequences of binaural clicks. He

finds values for the interpulse intervals between 70 and 100 ms below which the rate

of the dichotic sequence seems to be half that of the binaural sequence. The dis

agreement with the results mentioned above is perhaps due to the difference in task.

short tone sequences

In the initial phase of our investigation many of the measurements (van Noorden,

1971a, 1971b; Linssen, 1973; Augustus and Nederhand, 1973) were devoted to the

perception of the relative temporal position of the tones in short tone sequences.

It seemed at first as if the results of the measurements were hardly compatible

with the observer's impression that he could not perceive the relative temporal

position of the tones well in cases of fission.

However, later experiments (Van Noorden, 1974) showed that temporal coherence can

be perceived over faster tone intervals in short tone sequences than in conti

nuous ones. We shall now discuss briefly a couple of these measurements.

The just noticeable displacement of the second tone in a two-tone sequence with

a tone interval between the tones was determined with the aid of a binary-choice

method. The observer was presented with two tone pairs. In the second pair, the

time interval between the tones was either the same as that for the first pair (T),

br shorter (T - ~T)(see Fig. 6). The observer now had to say whether there was a

displacement in the second pair. We determined the value of T at which 75% of

the responses were correct with a sequential up-and-down method such as can be

performed with the aid of the IPa threshold tester (Cardozo and de Jong, 1971).

A B A B

" t5f'..-., !+-....-500 ms ----:

TIME

10

Fig. 6. Form of the stimulus of the forced-choice experiment for measuring the just noticeable displacementin time of the second tone in a two-tone sequence .In the second pair the forward displacement with respect to the first pair is zero or ~T, both with equalprobability.

We chose 1 = 100 ms, D = SO ms, f B = 500, 1000 or 2000 Hz, while fA was given va

rious values between f B/4 and 4fB. The threshold was determined in duplicate for

a numher of different ratios fA/fB' by trained observers. In the successive thres

hold determinations at a given value of f B, the values of fA were varied throughout

the range of interest, once upwards and once downwards. The level of the tone

bursts was 35 dB 5L at 1 kHz.

The results of the measurements for f B = 1000 Hz are plotted in Fig. 7; the results

for f B = 2000 Hz were basically the same.

It follows from these measurements that the accuracy with which the relative tempo

ral position can be determined depends on the tone interval. The best results are

obtained when fA = fBi the just noticeable displacement is then 5 ms. As the tone

interval increases, the just noticeable difference increases gradually to about

15 ms at a tone interval of 2 octaves. These results seem to agree with the findings

in Van Noorden (1975), that temporal coherence can be observed over a much greater

tone interval in two-tone sequences than in continuous sequences. The value of

15 ms mentioned above is much smaller than the 50 ms found for the same values of

T and I in continuous tone sequences; however, we cannot place too much reliance on

the difference because of the differences in the measuring methods used in the

two cases.

ABAB - - • LvN

- - LvNB AB °

50T=100 ms

-40

H~ 30....~ •20 I..-

AO

10,..P.O/"

a0 10 20 30

I (semilones)

Fig. 7. The just noticeable displaceme~t ~T/T of the secondtone of a two-tone sequence as a functlon of the tone lnterval I. The mean of the cases f < f£ and fA : f R has beentaken. The results for the just notlceable dlsplacement ofthe tones B in a continuous tone sequence ABAB at T = 10~ mstaken from Figure 4, are included for the sak~ of comparlson.As can be seen_in the two-tone sequ~nce the ~lsplacementcan be perceived more readily than In a contlnuous tonesequence.

In order to permit better comparison of the accuracy of

continuous tone sequences, we have determined both with

red to in the first section.

observation in short and

the tracking method refer-

11

It was convenient to perform these measurements with the tone sequence ABA ABA ...

The just noticeable shift of the tones B with respect to the tones A in this se

quence can then be compared with that found in the tone sequence obtained by omit

ting every other group ABA. The remaining ABA groups are than so far apart that

they are perceived as separate short tone sequences.

The advantage of this method is that we can now use the same measuring conditions

for both tone sequences. If we shift only tone B in every other group ABA in the

continuous sequence, this means that the tone B takes just as long to move a given

distance from the middle in the two sequences; see Fig. 8. We only measured the

just noticeable shift of B in the forward direction.

~T";'T-':'-2T~-., --

.. ..... .....i.- 4T-%%-+---4T-----...:..- 4T-%%+-- 4T-----.:t

ffit----------------------1:::Jo~ :..T..:..T7--6T"----.,-:LL

Fig. 8. Form of the stimulus used tomeasure the just noticeable displacementof the tones B in three-tone sequencesand continuous sequences. The time intervals are so dimensioned that in bothcases an equal length of time is neededto reach a certain displacement fromthe middle.

~ ~ ~

i-----8T-Jr4%----<·.:.·~--8T-11.%---:

TIME

quence depends much more strongly

on the frequency interval than it

does in the 3-tone sequences. As

we have seen in Van Noorden (1974)

that temporal coherence can be per

ceived over larger frequency inter

vals in short tone sequences,

these observations provide further

support for the view that the tem

poral position can only be perceived

properly when there is temporal co

herence.

202I (sem ito nes)

40

r-x

50

Fig. 9. Just noticeable displacement 6Tof the tones B ill a Three-tone (0) and acontinuous (x) tone sequence. The experimental points inrticate individual reversalpoints in the tracking experiment.

(/) 30E

~ 20

Another point in which these measurements differ from those of the first section

is that in these measurements fA is slowly swept from a frequency lower than f B to

one higher than fB

. The disadvantage of this method is that the range of frequen

cies covered must be limited in order not to make the whole cycle of measurements

too long, which would tire the observer. Further, T = 100 ms, D = 40 ms, f B = 1 kHz

and LA = LB = 3S dB SL at 1 kHz.

The results of the measurements performed by the author are plotted in Fig. 9.

It is clearly seen from this graph

that the just noticeable shift of

B from the half-way point between

the tones A in the continuous se-

12

discussion I: perception of displacement compared with perception of order

Summarizing, we may state that we have shown that the accuracy with which the tem

poral position can be determined deteriorates when there is no temporal coherence

between the tones whose temporal position is to be studied. We have shown this in

both dichotically and diotically presented continuous tone sequences.

The question is now to what extent these results can be compared with those of

investigators who have shown that the order i~ tone sequences or other auditory pat

terns can no longer be observed if elements which differ too much from one another

are presented in close succession in these sequences (Schouten, 1962; Norman, 1967;

Bregman and Campbell, 1971, Neisser et aI., 1974; Warren, 1974; Thomas and Fitzgibbons

1971). There is certainly qualitative agreement. Most of the investigators ascribe

the failure to observe the order to the loss of temporal coherence. (Bregman (1971)

uses the concept "streams" and Thomas and Fitzgibbons that of "perceptual classes".)

Some of them also observed that the order can be perceived better in short tone

sequences than in continuous ones (Warren et al., 1969).

It should be possible to derive a measure of the inaccuracy of the relative temporal

position from these order-perception experiments. If we assume that the interchange

of two successive elements is the most common error here, we might conclude that

the inaccuracy is of the same order of magnitude as the tone repetition time.

The tone repetition times at which difficulties arise in the perception of order

are those ranging from 250 to 500 ms. These values are several times greater than

the largest inaccuracies observed in our experiments. The following facts should

be taken into consideration when assessing this discrepancy:

1. The tone sequences used in the order experiments were more complex, consisting

as they did of more than two different tones.

2. In these order experiments, there was generally no silent interval at all between

successive tones. As we saw in Van Noorden (1974), fission is more likely to occur

under these conditions.

3. In point of fact, the shift experiments are concerned with the detection of dif

ferences. Wilcox et al. (1972) and Warren (1974) have shown that differences in

order can be detected in tone sequences which are so fast that the order itself

can no longer be identified.

4. In our experiments, the shift from the middle causes a change from fission on

the basis of pitch to a grouping in time: in fact, we get a rapid succession of

two-tone sequences, where the perception of temporal position is easier than in a

continuous sequence.

Although the loss of the ability to perceive order can lead to more spectacular

results, we preferred to use shift measurements because this made it possible to

use the same tone sequences as for the determination of the temporal coherence

boundary; the results of the two investigations are thus directly comparable.

Moreover, this shift method as such gives interesting results which throw light on

the question of the timing required in playing music.

1 3

discussion II: the discrimination of time intervals

So far we have only shown that the perception of temporal position and that of

temporal coherence are closely related. However, the results of our experiments

can also be compared with published data on the discrimination of time intervals.

The measurements described in the literature were all performed with different

methods and stimuli, so that the values found for the just noticeable difference

in time interval show quite a considerable spread; this naturally makes comparison

difficul t.

The oldest measurements were concerned with the discrimination of the duration

of tones or noise bursts; they generally gave a 6T of 10-20 ms at a reference

time interval of 100 ms (Burghardt, 1971; Creelman, 1962; Chistovicr., 1959, ile:lry,

1948; Stott, 1935; Small and Campbell, 1962). However, our experiments were concer

ned with the discrimination of time intervals between two tone bursts. Abel (1972)

has shown that the time interval between two noise or tone bursts cannot be discri

minated as well as the duration of tone burst. She found a just noticeable difference

for the interval between two pulses, 6T, of about 40 ms at a reference time interval

of about 100 ms. There is a big discrepancy between this value and that of 5 ms

which we found in our experiments with two-tone sequences with fA = fE

at 100 ms.

One possible explanation for this difference is that the reference value of the

interval from presentation to presentation varied over a wide range (0.63 < T < 630 ms)

in Abel's experiments, while in our experiment we always worked at T = 100 ms,

permitting much better training for this specific interval.

The experiments with the sweep method on continuous sequences AEAE ... can best

be compared with measurements on tempo or rhythm discrimination. Michon (1964)

has shown that the tempo of a sequence of clicks where the interval between suc

cessive clicks is 100 ms can be distinguished from that of a sequence where the

interval between the clicks is 0.8% longeror shorter. The difference between his

experiments and ours is that he studied the discrimination between sequences with

different tempos, while our observers had to compare temporal intervals within

one sequence. Lunney (1974) recently described an experiment which was more com

parable with ours. His observers had to adjust the temporal position of every

fourth pulse in a sequence of isochronous metronome pulses so that they could

just detect an irregularity in the rhythm. At a pulse repetition time of 100 ms,

the shift set amounted to 3-4 ms. Bearing in mind the difference in stimulus and

measuring method, we may consider this result to be in fair agreement with the

value of 8 ms we found at T = 100 ms in the first section. The T dependence of

6T in his experiments also agreed well with ours.

So far, we have only compared our results for I = 0 with published data. Little

is known about how the discrimination of time intervals depends on the frequency

interval between the two tone bursts defining the time interval. However, we may

conclude from the results of Williams and Perrott (1971), Perrott and Williams

(1970) and Divenyi and Hirsch (1972) that the time discrimination deteriorates

with increasing frequency interval. There is little point in carrying the comparison

further, in view of the great differences in measuring method and stimulus involved.

14

It would be good to be able to form a picture of the mechanism by which time

intervals are discriminated in perception. It might be thought that up to tone

repetition times of about 200 ms, peripheral processes such as masking or adaptation

could provide the information needed for discrimination of the relative temporal

position (see Appendix). A coupling with the peripheral excitation pattern could

also offer an explanation of the deterioration in time discrimination with increa

sing frequency interval, and perhaps also of the asymmetry found in the sweep

experiments with the tone sequence ABAB .. at T < 120 ms. However, since we still

have to find an explanation for the fact that time intervals can be distinguished

better in short tone sequences than in long continuous ones, and since we can only

guess at the mechanism responsible for subjective time measurements at longer

time intervals, it would seem to be better to gather more experimental data before

trying to reach definite conclusions.

summary

In tone sequences of two alternating tones ABAB, with a large frequency interval

between the tones and a fast rate, the observer does not hear the string ABAB .. ,

instead the strings AA.and BB.are formed in the perception.

In this paper we investigated whether this "loss of temporal coherence" also

implies that the observer can no longer hear the temporal relations between the

tones A and B. A measurement was carried out in which we slowly shifted the tones

B out of the temporal midpoint between the tones A. The observer had to detect

whether the tones B were no longer in the middle. The tone repetition time and the

frequency interval were systematically varied. The results indicated that the

relative just-noticeable-shift increased with decreasing tone repetition time

and with increasing frequency interval. Similar experiments were carried out in

sequences of only two and three tones and in dichotic sequences. In all cases the

relative just noticeable shift reflected the loss of temporal coherence.

appendix

a simple model for the discrimination of the relative temporal position of the tones A and B

in the tone sequence ABAB..

The phenomena of forward masking and loudness integration indicate that the per

ception of a tone lasts longer than the physical tone burst itself. The tone is,

as it were, spread out in time in the hearing. As a result, interaction can be

produced between successive tones, and will be stronger when the tones are closer

together in time. A keen observer will pay attention to this interaction if it

enables him to discriminate between successive tones more accurately than would

be possible with his time interval measuring mechanism alone.

In order to gain insight into how this temporal "blurring" could permit better

discrimination of the time interval between successive tones, we have worked out

a simple model for this mechanism. For the sake of concreteness let us consider

an electrical circuit.

15

Its input voltage can be compared to the amplitude of the tone bursts, while the

output voltage is comparable with the temporally blurred excitation produced by

the tone bursts somewhere in the auditory system. For the sake of simplicity, we

will assume that the incoming tone bursts have the form of a square wave. The

simplest circuit which can produce the kind of temporal spread we are interested

in is a first-order RC integrator circuit (leaky integration), which is characteri

zed by its time constant T; see Fig. A-1. As long as the time between two succes

sive voltage pulses is large compared with T, the maximum output voltage at the

end of both pulses will be the same. However, as soon as the time interval becomes

small compared with T, the maximum output voltage at the end of pulse B will be

higher than that at the end of pulse A.

We now assume that it is possible to compare the maximum output voltage at the end

of the successive pulses; the relative temporal position of the two pulses can

then be derived from the result of this comparison (see Fig. A-2).

INPUT

n

Fig. A-1.

OUTPUT

../"--

ltop) and A-2. (bottom).

Let us now consider a continuous sequence

of pulses analogous with the tone sequence

ABAB .. used in the paper. It will be clear

that, apart from transient effects, the

output voltage at the end of the pulses

B (UB) will not be equal to that at the end

of the pulses A (UA) when B is not situated

half-way between the two As.

It follows from the differential equation for this circuit, with the boundary

condition that the situation is stationary in time (Vt = Vt + 2t)' that

UB

1 + exp(- T - ~T)T ( 1 )

where ~T is the displacement of B from the midpoint between the two As. It will

be noticed that the tone duration does not appear in (1), as long as it is the same

for the tones A and B, and the pulses do not overlap at the input.

Now what we want to do is to detect whether 4T ~ 0, i.e. whether UA/UB ~ 1. We

assume that U = IUB/UA - 11 must exceed a certain value (the detection threshold)

for this to be possible. When we know the height of the detection threshold, the

just noticeable value ~T follows from (1), that is

(2)

We have now found a relation between ~T and T in which two parameters (the time

constant T and the detection threshold U) playa role.

16

This model can be compared with the tone sequence with I = O. However, we have

seen before that T increas~s witll increasing I. This is not unexpected on the

basis of the assumption that the discrimination of the temporal position is

realized with the aid of an interaction such as forward masking or loudness sum

mation, since interactions of this type decrease when the tone interval between the

successive tones increases. This will clearly make the discrimination of temporal

position more difficult in some way or another. We can simulate this effect in

our model by letting the detection threshold rise with increasing I. In a first

approximation, we can write

U a + bI (3 )

In so doing, we have added one parameter to the description of the system.

Although equation (2) was derived on the basis of a very simple model, we have

investigated how well (2) and (3) can describe the measured values of AT found.

The relation between AT and T in (2) is such that the former increases with increa

sing T. This is the case for the measured values of AT in the range T < 120 ms.

We therefore determined the values of the parameters T, a and b, giving the smal

lest differences between the measured values and those calculated from (2) and (3),

for the experimental points for 60, 80, 100 and 120 ms and values of I from

o to 25 semitones.

Using the minimum value of chi squared as criterion in a grid method, we found

T = 63 ms, a = 0.0585 and b = 0.0105 per semitone. The linear correlation coeffi

cient between the measured values and the values calculated with these parameter

values was 0.9765, whi~ll means th~t 95% of the variation in the experimental points

can be explained on the basis of this simple model. This is roughly the same

explained variation as we found in the first section with the model

ATIT c + dI ( 4 )

While equation (4) has one parameter less, the derivation via equation (2) has the

advantage that it provides physical insight into the probable operative mechanism,

while (4) is merely a simple empirical expression.

The fact that equations (2) and (3) allow reasonable prediction of the measured

values does not necessarily mean, of course, that this simple model provides an

adequate description of the processes occurring in the auditory system. In parti

cular, we do not yet know whether the amplitude of the voltage pulses at the input

of the RC network should be regarded as a function of the amplitude of the tone

bursts. It is possible that the pulses are first standardized (as Burghardt (1972)

assumed in a similar model which he used to predict the subjective duration of

tone bursts), thus ma:<ing the time-interval discrimination ir'dependent of t:le

amplitude.

If there is a monotonic relation between the amplitudes of the tone bursts and the

voltage pulses, the values found for T could perhaps be related to the long time

constant found for forward masking (75 ms; Duifhuis, 1972) and the just noticeable

17

difference in loudness (~ dB; Domburg, 1966). However, before we can form an opinion

about this we would have to repeat the measurements of short tone sequences with

amplitude differences between the tones.

references

Abel, S.M. (1972) Discrimination of Temporal Gaps, J.Acoust.Soc.Amer. ~, p. 519-524

Augustus, B. and Nederhand, B. (1973) De Discriminatie van Tijdintervallen tussenTonen van Verschillende Frequentie in Toonreeksen van Variabele Lengte, Unpubished Report.

Blauert, J. (1970) Zur Tragheit des Richtungshorens bei Laufzeit- und Intensitatsstereophonie, Acustica ~, p. 287-293.

Bregman, A.S. and Campbell, J. (1971) Primary Auditory Stream Segregation andPerception of Order in Rapid Sequences of Tones, J.Exp.Psychol. 89,p. 244-249.

Burghardt, H. (1971) Subjective Duration of Sinusoidal Tones, Proc. 7th Int.Congr.on Acoustics l, 20 H1, p. 353-356.

Burghardt, H. (1972) Einfaches Funktionsschema zur Beschreibug der subjektivenDauer von Schallimpulsen und Schallpausen, Kybernetik ~, p. 21-29.

Cardozo, B.L. and Jong, Th.A. de (1971) A Note on a Sequential Up-and-Down Methodof Threshold Finding, IPO Annual Progress Report ~, p. 125-127.

Chistovich, L.A. (1959) Discrimination of the Time Intervals between Two ShortAcoustic Pulses, Akusticheskii Zhurnal 5, p. 480-484.Translation: Soviet Phys.Acoust. ~ (196Q), p. 493-497.

Creelman, C.D. (1962) Human Discrimination of Auditory Duration, J.Acoust.Soc.Amer. l.!, p. 582-593.

Divenyi, P.L. and Hirsh, I.J. (1972) Discrimination of the Silent Gap in Two-ToneSequences of Different Frequencies, J.Acoust.Soc.Amer ~, p. 166(A).

Domburg, G. (1966), The Just Noticeable Difference for Loudness, IPO Annual ProgressReport l, p. 8-11.

Duifhuis, H. (1973) Consequences of Peripheral Frequency Selectivity for Nonsimultaneous Masking, J.Acoust.Soc.Amer. 2i, p. 1471-1488.

Henry, F.M. (1948) Discrimination of the Duration of Sound, J.Exp,Psychol. ~,

p.734-743.

Huggins, A.F.W. (1974) On Perceptual Integration of Dichotically Alternated PulseTrains, J.Acoust.Soc.Amer. ~, p. 939-943.

Linssen, M.R. (1973) De Discriminatie van Tijdintervallen tussen Toonstootjes vanVerschillende Frequentie, Unpublished IPO Report No. 252.

Lunney, H.W.M. (1974) Time as Heard in Speech and Music, Nature 249, p. 592.

Michon, J.A. (1964) Studies on Subjective Duration I. Differential Sensitivityon the Perception of Repeated Temporal Intervals, Acta Psych. ~, p. 441-450.

Neisser, U. (1972) On the Perception of Auditory Sequences, Paper presented at theAmerican Psychological Association, Honolulu 1972.

Neisser, U. and Hirst, W. (1974) Effect of Practice on the Identification of AuditorySequences, Perc. & Psychophysics ~, 391-398.

Noorden, L.P.A.S. van (1971a) Discrimination of Time Intervals Bounded by Tones ofDifferent Frequencies, IPO Annual Progress Report ~, p. 12-15.

Noorden, L.P.A.S. van (1971b) Een Inleidend Onderzoek naar het Waarnemen van deTijdstructuur van Trillerachtige Auditieve Patronen, Unpublished IPO ReportNo. 214

Noorden, L.P.A.S. van (1972) De Discriminatie van Tijdintervallen tussen Toontjesvan Verschillende Frequentie, Unpublished IPO Report No. 228.

Noorden, L.P.A.S. van (1974) Temporal Coherence in Random Tone Sequences, IPOAnnual Progress Report ~, p. 4-21.

Noorden, L.P.A.S. van (1975) Temporal Coherence in the Perception of Tone Sequences,Thesis Eindhoven University of Technology.

18

Norman, D.A. (1967) Temporal Confusions and Limited Capacity Processors, Attention& Performance I, A.F. Sanders, Ed. (North Holland Publ. Comp., Amsterdam).

Perrott, D.R. and Williams, K.H. (1970) Auditory Temporal Resolution: Gap Detectionas a Function of Interpulse Frequency Disparity, Psychonomic Science ~,p. 73-74.

Schouten, J.F. (1962) On the Perception of Sound and Speech; Subjective TimeAnalysis, 4th Intern. Congress on Acoustics, Copenhagen, Congress Report II,p. 201-203.

Small, A.M. and Campbell, R.A. (1962) Temporal Differential Sensitivity for AuditoryStimuli, Amer.J.Psych. ~, p. 401-410.

Stott, L.H. (1935) Time-Order Errors in the Discrimination of Short Tonal Durations,J.Exp.Psychol. li, p. 741-766.

Thomas, I.B., and Fitzgibbons, P.J. (1971) Temporal Order and Perceptual Classes,Paper of the 81st Meeting of the Acoust.Soc.Amer.

Warren, R.M. (1974) Auditory Temporal Discrimination by Trained Listeners, Cognitive Psychology Q, p. 237~256.

Wilcox, G.W. (1972) Temporal Coherence of Tone Sequences, Paper given in a symposium entitled "Perception of Temporal Order in Hearing: Old Pattern-RecognitionProblems in a New Guise" at the American Psychological Association meeting,Honolulu, Hawaii, September 4, 1972.

Wilcox, G.W., Neisser, U. and Roberts, J. (1972) Recognition of Auditory TemporalOrder, Abstract of Paper presented at Eastern Psychological Association MeetingBoston, Mass. April 28, 1972.

Williams, K.N. and Perrott, D.R. (1971) Temporal Resolution of Tonal Pulses,J.Acoust.Soc.Amer. ~, p. 644-647.

This article covers the sixth chapter and Appendix B of the doctoral thesis

"Temporal Coherence in the Perception of Tone Sequences" by L.P.A.S. van Noorden,

submitted in February 1975 to the Eindhoven University of Technology.

19

PSYCHOPHYSICAL TWO-TONE SUPPRESSION

H. Duifhuis

introduction and experimental design

We have recently developed a physiologically specified theory on cochlear nonlinear

ity and second filter (Duifhuis, 1974 a,b, 1976), which has predictive value as

for two-tone suppression. This theory specifies an average effective stimulating

waveform E(x), for the hair cell at point x, in response to an arbitrary stimulus.

Furthermore, a monotonic relation is assumed between E(x) and the average firing

rate, fr(x), in fibres innervating that hair cell.

A prediction relevant to two-tone suppression is given in Fig. 1 (for details see

Duifhuis, 1976). This figure shows the average of the effective stimulating waveform

in response to two tones, labeled P (for probe), and M (for masker), as a function

of masker level, and as expectedly observahle in a channel tuned to the probe. The

interesting features of the figure are that:

1) point AM1 is a measure of the amplitude characteristic of the first filter,

2) the ratio AM2 /AM1 is a measure of the second filter and

3) the slopes (v-1) and v are determined by the nonlinearity, which was described

with a vth law device (we use A for signal amplitude in this paper).

Undoubtedly, this figure contains much interesting information. This led us to the

question of how to measure such a function psychophysically (we are not in the

position to carry out the alternative neurophysiological experiment).

-UJ

MASKER AMPLITUDE. dB

Fig. 1. The average effective stimulatingwaveform, E, in response to a probe + maskerstimulus, E(P+M), and in response to maskeralone, E (M), for two masker frequencies, asexpected in a channel tuned to the probefrequency. The probe amplitude is fixed.Masker amplitude is the independent variable,Suppression occurs between AM1 and AM2 , inwhich region E(P+M) < E(M). Suppressionis predicted only if probe and masker frequency are sufficiently different.

If, in line with Houtgast's (1974) pioneering work on the pulsation threshold, we

assume t0at the criterion for continuity at the pulsation threshold is that the

activity in the probe channel remains constant, we may have the necessary tool

enabling us to measure the activity in response to a two-tone complex in terms of

a single-tone response. The theory predicts E for a single tone to grow as E ~ AV,

so that, using the pulsation threshold, we would be able to measure AH1 and AM2and the slope l-l/v (Fig.2), Thus, we would be able to determine the parameters

mentioned above.


ApP !

20

Fig. 2. The expected result of scanningthe probe + masker response E by using thepulsation threshold technique. The abscissais the same as in Fig. 4, the ordinate nowgives the pulsation threshold. This resultsin a change of slopes, the transition pointsremaining fixed at the masker amplitude scale.For an increase of (signal-) probe levelfrom SP] to SP 2 , we expect a shift of thecurve at 45 0

, as indicated.

The choice of the vth law device to describe the compressive nonlinearity in the

cochlea, implies that the shape of the curve in Fig. 2 is independent of the probe

level. A change in probe level would lead to a 45 0 shift of the curve, as indicated

by the dotted line.

The stimulus to be used in such an experiment is depicted schematically in Fig. 3.

M M Fig. 3. Schematic temporal organization of

f \ lSP b.

the stimulus. The masker + signal probeSP signal (H+ SP) is interleaved with the scanning

pp pulsation probe signal (PP) . Masker frequencyis variable, probe frequency fixed at 1 kHz(same for SP and PP). Signal transients are

I , I I , i )I cosine-shaped with a duration of 20 ms.

0 100 200 300 400 ms

We presented trains of 10 masker + signal probe (SP) stimuli, interleaved with

9 pulsation probes (PP). The bursts were each 120 ms in duration (in a few cases

125 ms) measured at half-amplitude points of the envelopes, and onset and offset

were cosine-shaped with a duration of 20 ms. These values closely match those

which turned out to be useful in Houtgast's (1974) experiments. Since we were inter

ested in using the technique rather than investigating the continuity effect as such,

we simply adhered to these values. For a number of fixed levels of signal probe SP

the subjects had to adjust the pulsation threshold (A pp ) as a function of the

masker amplitude. The frequency of SP and PP was always 1 kHz in this experiment, the

masker frequency, f M, was a second parameter. Care was taken to keep SP and PP

in phase, so that addition of the two at equal amplitudes would leave no detectable

transients.

Stimuli were presented monaurally (right ear) through KOSS PRO/600 aa headphones.

Subjects were seated in a sound-treated booth.

results

The results of a number of measuring series are presented in Figs. 4 to 7 inclusive.

All amplitudes are expressed in terms of sound pressure level.

21

80 r-....,--.--r-----,--.--r-----,---,---,.-,

...Jll.lfJm 60

"00-

« 40

20

20 40 80

S, HvC1M, 200 Hz

a100

dB SPL

...Jll.lfJ

!ll 60

it« 40

40 60 80 100dB SPL

Fig. 4a. Fig. 4b.

Fig. 4. Experimental results for a 200 Hz masker for S. HvC in Fig. 4a and S. DBin Fig. 4b. In the interesting right hand part of the figure we note markedquantitative differences between subjects as well as between observed data and theexpected results indicated in Fig. 2.

80 80

Asp 69 o 0 C>\

~Asp r...J ...J

ll. ll.lfJ

~~lfJ

m 60 m 60

" t "49 - il 50 A---

0-0-

39 • J 40« 40

S, HvC S, DB

1M' 600 Hz 1M' 600 Hz

a b20 20

40 60 80 10020 40 60 80 100 20AM dB SPL AM dB SPL

Fig. Sa. Fig. Sb.

Fig. S. Similar to Fig. 4 for a 600 Hz masker.

""«

40

S, HvC201M, 1200 Hz

a

""«

60

S, HvL201M, 1200 Hz

b20 40 60 80 100

dB SPL20 40 60 80 100

dB SPL

Fig. 6a. Fig. 6b.

Fig. 6. Experimental results for a 1200 Hz masker for S. HvC in Fig. 6a and S. HvLin Fig. 6b. Note that App decreases monotonically with increasing masker amplitude .

...Jll.lfJm 60"

"-:t 40

Fig. 7. Data for S. HvC for different maskerfrequencies above 1 kHz. The slope of thedescending branche is found to decrease withincreasing masker frequency.

5, HvC

202~0---'----4c'::0:----"---~60=-----'---=8':-0---'--------=-IO.L..O=-----'-J

AM dB 5PL

22

Several features are noteworthy:

1) differences between data from different subjects are relatively large, especially

in so far as the amount of suppression (i.e. the size of the dips) is concerned;

2) except for points on the "trivial" horizontal part of the predicted curve

(representing intensity discrimination), the data points are accompanied by con

siderable variability (from repeated measurements, with the same subject), as

indicated in the upper curve of Fig. 4b;

for masker frequencies above 1kHz:

3) the ascending branch of the curve (right of AM2 ) is, in general, not reached for

the masker amplitudes that were used, in some cases the descending branch appears

to level off;

4) the slope of the descending branch decreases with increasing masker frequency;

5) the intersection points AM1 tend to shift approximately linearly with SP-amplitude;

for masker frequencies below 1kHz:

6) the data points qualitatively follow the predicted curves of Fig. 2;

7) the observed slope of the right-hand asymptote tends to be greater than 1;

8) the slope of the line connecting the AM1 points for different SP amplitudes is

considerably steeper than 1, and is found to be significantly steeper than the slope

referred to in point 7.

discussion

Qualitatively, the data show the suppression effect similar to the neurophysiological

two-tone "inhibition" (e.g. Sachs, 1969), and as predicted by our theory (Duifhuis,

1976). Quantitatively, however the points 4, 7 and 8 mentioned above cannot be

accounted for by this theory. We may phrase the difference as: the data are found

to exhibit more nonlinearity than the model attributes to the system. This result

was not entirely unexpected. Points 5 and 8 are roughly in line with Shannon's

(1975) results from a comparable forward masking experiment. Point 4 is consistent

with a very similar finding obtained neurophysiologically by Sachs (1969). His data,

likewise, show a decrease of the suppression slope as the suppressor frequency

increases. Points 7 and 8 are apparently related to the nonlinearity of pure-tone

masking, as reported already by Wegel and Lane (1924). The deviation from unity

slope in our data is in the same direction as in these classical data.

At this point several ways are open for an approach towards better understanding of

the data and of cochlear nonlinearity. We mention the possibility of considering a

nonlinearity in the basilar membrane excitation and mechanical hair cell excitation.

An other possibility is that the compressive nonlinearity at the hair cell, as

characterized by the model parameter v, behaves much more complicated than was

assumed. It is, e.g., conceivable that v would depend on spatial angle as well as

on excitation level. Another point is that. because of the nonlinearities, the

single-channel hypothesis (c.q. see DuifllUis. 1976) is in fact a no longer iusti

fiable simplification (cf. Verschuure, 1974).

For the psychophysical data pertinent to the above issues we face the problem

that the data result from an active interaction in the nonlinear system, and

are therefore "contaminated". Unless we have sufficient understanding of the system,

it will be impossible to localize and quantify the properties of parts of the system,

23

Our present aim is to investigate further and evaluate, experimentally as well

as theoretically, several of the possibilities mentioned above, in order to achieve

a better understanding of cochlear nonlinearity and two-tone suppression.

summary

A theoretical model on cochlear nonlinearity and second filter, which we have develop

ed in recent years, has direct predictive value as regards two-tone suppression. In

this paper we present data from an experiment set up to psychophysically determine

certain parameters of the model, such as the nonlinearity and second-filter charac

teristics.

It is found that the experimental data, which are consistent with literature data,

qualitatively agree with the theory, However, the data cannot be fully accounted

for quantitatively by a single adjustment of model parameters. This is felt to

imply that the system contains other nonlinearities than the one specified in the

theory.

acknowledgements

The author is indebted to H. van Cuyck and D. Bol, students of the Eindhoven

University of Technology (Physics dept.), who collected most of the data presented

in this paper.

references

Duifhuis, H. (1974a) An Alternative Approach to the Second Filter, in: Facts andModels in Hearing, E. Zwicker and E. Terhardt, Eds, (Springer, Berlin).

Duifhuis, H. (1974b) The Auditory Second Filter, IPO Annual Progress Report ~'

p. 32-37.

Duifhuis, H. (1976) Cochlear Nonlinearity and Second Filter; Possible Mechanism andImplications, J.Acoust.Soc.Amer. (in press).

Houtgast, T. (1974) Lateral Suppression in Hearing, doctorate dissertation,Free University, Amsterdam.

Kim, D.O., Molnar, C,E. and Pfeiffer, R.R. (1973) A System of Nonlinear DifferentialEquations Modelling Basilar-Membrane Motion, J.Acoust.Soc.Amer. ~' p.1517-1529.

Sachs, M.B. (1969) Stimulus-Response Relation for Auditory-Nerve Fibers: Two-ToneStimuli, J.Acoust.Soc.Amer. ~' p. 1025-1036.

Schroeder, M.R. (1975) Amplitude Behavior of the Cubic Difference Tone, J.Acoust.Soc.Amer. ~' p. 728-732.

Shannon, R.V. (1975) Suppression of Forward Masking, doctorate dissertation,University of California at San Diego.

Verschuure, J., Rodenburg, M. and Maas, A.J.J. (1974) Frequency selectivity andTemporal Effects of the Pulsation Threshold Method, Proc. 8th Int. Congr.Acoust.,London, Vol. I, p. 131.

Wegel, R.L. and Lane, C.E. (1924) The Auditory Masking of one Pure Tone by Anotherand its Probahle Relation to the Dynamics of the Inner Ear, Phys.Rev. ~'

p. 266-285.

25

RESEARCH ON SPEECH PERCEPTION IN THE I.P.O. 1975

S.G. Nooteboom, J.P. Brokx, G.J.N. Doodeman, Th.A. de Jong, J. 't Hart, A.FY van Katwijk,

J.J. de Rooij, I.H. Slis and L.F. Willems

introductionA number of research projects on aspects of speech perception are at present

being carried out in our institute. The perception of speech, and particularly

connected speech, may be seen as a highly complex and flexible processing of

auditory information, the details of which are largely unknown. We hope and expect

research in this field not only to lead to making explicit the relation between

the structure of acoustic stimuli and the responses of listeners in well defined

experimental tasks, but also to the discovery of mental structures and processes

involved in the auditory perception of speech in particular, and of complex infor

mation processing in humans in general.

The main effort in our research is directed towards the perceptual structure and

functioning of speech prosody. Pitch contours and temporal structures show a high

degree of organization over utterances of phrase length. Controlled studies of

how these structures may function seems a promising way of gaining acces to per

ceptual processes and strategies dealing with the non-simultaneity of acoustic

cues relevant to the decoding of linguistic messages.

prosody and syntax

Earlier research at IPQ has shown that the structure of pitch contours in longer

utterances in Dutch may be described in terms of substructures (intonational

blocks) strung together. In the past year a few experiments have been run to test

the hypothesis that the boundaries between such intonational blocks can be related

to major syntactic boundaries, and may help listeners to detect the syntactic

structures of messages. Some results are reported in this issue ('t Hart).

In the above experiments the only variable was the pitch contour. Another research

project is at present under way, with the aim of assessing the relative contribu

tion of pitch contours and temporal structures to the detection of major syntactic

boundaries. A pilot experiment is reported in this issue (De Rooij).

discriminability of size of pitch movements

The experiments on discriminability of the size of pitch movements in speech and

piano tone sequences, reported on in our Progress Report 1974, have been extended.

It is found that the distribution of thresholds over subjects is bi-modal, both

for rises and for falls, in speech as well as in piano tone sequences. Whether the

bi-modal distributions of differential thresholds reflect inherent differences

between classes of subjects, or are caused by different listening strategies, is

at present unknown ('t Hart).

perceptual interaction between prosodic and inherent vowel duration

A number of perceptual measurements are being carried out in which the categori

zation of identical sets of vowel durations into phonemically long and short vowels

is studied as a function of position within a phrase. The results obtained so far

IPO annual progress report 10 1975----

26

can be interpreted in terms of an effect of expected prosodic duration on the

criterion for long vowel/short vowel perception. Accuracy of perception, as indicated

by the slopes of the identification functions, remains surprisingly good, also

in longer phrases (Nooteboom, Doodeman).

perception of syllable boundaries

In a perceptual experiment with synthetically generated sequences of identical

nonsense syllables, the perception of syllable boundaries has been studied as a

function of both the detailed temporal structure of the sequences of vowel and

consonant segments, and what was expected from the particular phonemic segment

that a stimulus sequence started with. It was found that what was expected general

ly over-ruled the subtle differences in durational structure (Slis).

sequential effects in production and perception of accent patterns in number names

Inreading aloud compound number names such as those for 562, 563, etc versus

491, 591, etc, accent patterns are often determined by the preceding number names.

The grammatical components which determine correspondence and non-correspondence

leading to such sequential effects in accentuation are at present being investi

gated. Some experiments have been run to find out what factors influence the

retention of preceding number names in the mind of readers, and what information

on sequential position listeners can derive from the accent patterns. Some results

are reported in this issue (Van Katwijk).

auditory coherence of connected speech

If connected speech is assembled from prerecorded words or syllables the result

may be unnatural and unintelligible due to a lack of auditory coherence, which

in normal speech is supplied by the integrity over time of production processes.

A research project has been set up to study the relative importance of several

types of acoustic discontinuities to intelligibility and naturalness (Brokx).

research facilities

In our last year's Progress Report a system was described for manipulating pitch

contours and durational structures of speech. The system was based on a channel

vocoder. With a view to improving the quality of vocoderized speech we are in the

course of developing an analysis-synthesis system based on an LPC vocoder (Willems).

The speech editing system, also described in the Progress Report 1974, is being

improved on, mainly with respect to ease of operation and sound quality. The

segmentation component of the system is provided with variable rise and decay

times (De Jong).

27

THE LOCATION OF THE NON-FINAL FALL IN PITH CONTOURS IN DUTCHJ. 't Hart

introduction

Earlier reports (Collier & 't Hart, 1971; 't Hart & Cohen, 1973; 't Hart & Collier,

1975) have described how we have tried to give an account of regularities in contours

of speech pitch in Dutch in terms of rules and, ultimately, in the form of a grammar

of Dutch intonation. A particular property of these rules, and of the grammar, is

that they do not accept input conditions as to semantic or syntactic aspects of

the word content - with one exception: if the utterance in question is split into

a number of fairly independent parts, continuations will appear at the boundaries

of these parts; a continuation, formerly called "caesura", is characterised by a

non-prominence-Iending rise, very late in the last syllable before the boundary

immediately followed by an inaudible fall back to the original low level of pitch

before the rise.

Continuations come up for discussion again later in this paper. The main issue,

however, is the problem of the location of another non-prominence-Iending movement,

viz. the non-final fall. We have recently gained better insight into this problem.-

The non-final fall is a perceptually relevant pitch movement of Dutch intonation,

which has to occur between two successive prominence-lending rises to provide

the necessary pitch "reset" after the first rise. Unlike the final fall, it does

not lend prominence; accordingly, its position with respect to the vowel onset of

the syllable is different from that of the final fall, viz. rather early in an

inconspicuous syllable or indeed so early that it can be considered to fall in

between two syllables ('t Hart & Cohen, 1973).

With this definition, nothing has yet been said about the issue as to which syllable

(or syllables) of the utterance we are dealing with. This question seems to have

been answered in Cohen & 't Hart (1967), where it is stated that the non-final

fall occurs immediately after the word which contains the first prominence-lending

rise, i.e., the remaining syllables of that word retain a high tone, and the non

final fall is located between the last syllable of that word and the first one of

the next. (Fig. 1 ,A).

Later investigations, however, showed that this simple rule was violated too often

to be retained. According to a new rule, formulated in 't Hart & Cohen (1973), the

non-final fall may occur anywhere in between the two successive rises, provided

that it does not cause spurious prominence by being located too close to the lexi

cally stressed syllable of a non-dominant, but nevertheless not totally unimportant

wo rd . (F i g. 1, B) .

In general, upon inspection of more extended material than that on which this new

rule was based, there was little reason to curtail the liberty obtained by its

very generous formulation. In particular, there was found to be one position of the

non-final fall, frequently used in spontaneous speech, that could always be taken

- The clarification of this problem was achieved in cooperation with Dr. R. Collier,whose contribution to the design of the experiments and the evaluation of theirresults is gratefully acknowledged.


28

instead of the one provided by the original rule, viz. a position immediately

following the first prominence-lending rise, so to speak on the same syllable.

Both positions were equally acceptable. (Fig. 1 ,C).

However, quite a number of cases were observed in which the non-final fall seemed

to be deliberately "postponed", i.e., it occurred not between the last syllable

of the accentuated word and the first one of the next word, but later. And, typi

cally in these cases, this postponed position could not be freely altered into the

one provided by the original rule: in that position it sounded fully unacceptable.

(Fig. 1 ,D). (Acceptability can be judged by means of Intonator (Willems, 1966)

stylizations which enable us to locate a certain pitch movement at any position in

the pitch contour).

WORD:SYLLABLE:

A

2 3

etc!4

B

c

D

._...". ......-~_. --

Fig. 1. Different observed locations of the non-final fall.A. Original rule: prominence-lending rises on syllable 2 of word 1

and on syllable 3 of word 4; syllables 3 and 4 of word 1 remainhigh; non-final fall immediately after word 1.

B. Second rule: almost anywhere. Suppose word 3 not unimportant but notdominant either, with lexical stress on syllable 2, thenavoid non-final fallon that syllable (!).

C. Frequently occurring location: immediately after first rise.D. Postponed positions. According to original rule unacceptable (:).

These observations bring the realization that the second rule (of 1973) is not

valid either: where the original rule was too rigid, this one is too tolerant. It

was not clear how the rule about the position of the non-final fall should be refor

mulated so as to have it account for the seemingly "anomalous" cases of the post

poned non-final fall. Were the anomalous cases seldom enough to consider them

marginal, and would it hence be sufficient to extend the rule by a number of

exceptions? Or should its reformulation be more fundamental, for instance by virtue

of the introduction of some more sophisticated dimension?

It seems as if an answer to these questions has been given by the outcome of two

experiments, as reported on extensively in Collier & It Hart (1975). In the next

two sections, we will briefly deal with these experiments. In the second experiment,

not only non-final falls, but also continuations were involved. Therefore, a

subsequent section deals with possible consequences of the distinction made

between non-final falls and continuations in the description.

29

experiment 1

The Intonator was set to produce "hummed contours" of ten different kinds, in two

groups: in the first group of five contours, there were fifteen "syllables" (of

uniform duration and with uniform time intervals) and a pitch contour that provided

pitch accents on the second, eighth and thirteenth syllables, with a non-final fall

at five different positions between the first and the second pitch accent; in the

second group, there were thirteen syllables, pitch accents on syllables nos.

two, five and eleven, a fixed non-final fall immediately after the first pitch

accent and a second, variable one at five different positions hetween the second

and the third pitch accent. (See Fig. 2.) The main considerations in the choice of

the stimulus material were that there should be enough room to manipulate the loca

tion of the non-final fall and that too simple contours should be avoided in order

to ensure a fair amount of variability in the reactions of the listeners.

250 ms-r-\r-\r-\C"\C"\C"\C\C"\C"\C"\C"\C"\C"\C"\C\15 "syllables" AMPLITUDE ENVE LOPE

----f;-.-.- ..,..... -.,...-.-.-,....-...,\ " " ' ''0' "\·_-._....1._._ . .1. .... _ .l. _._.'

o EB cA \---PITCH CONTOUR

r-\r-\C\r-\C"\C"\C"\C"\C"\C)C"\C)C"\

(1)

-r\------r\-.- -.-.., - ,.. \, ..~"' ' , , ,._._._.~._. _.l. _,_. ~.... _...\

ABC 0 Elid.)

(2)13 "syllables" (id.)

Fig. 2. Stylized representation of the stimuli of the first experiment.

Five subjects were asked to think of sentences for which the hummed contours would

constitute suitable intonations. They were instructed to take care to put accentuated

syllables in the proper places, but they were not explicitly told about the partic

ular positions of the non-final falls. The sentences should be grammatically

correct, but there would be no objections to semantic nonsense. 144 sentences were

produced in all to go with the ten different hummed contours.

The sentences were tested with the following questions:

1) Given a particular hummed contour, can the subjects think of sentences with a

different word content?

The answer is clearly affirmative.

2) Given that sentences of various word content have been written down to go

with a particular stimulus (hummed contour), do they possess similar syntactic

structures?

This time the answer is negative: at comparable places of different sentences

with one particular stimulus, practically all kinds of syntactic elements are

30

found, subjects, objects, adverbial phrases, verb phrases, coordinative and

subordinative clauses.

3) Given these variegated syntactic structures, do they nevertheless share some

common property in the case of one particular hummed contour, which is system

atically different in the case of another stimulus?

As will be demonstrated below, such a common property can indeed be found.

The sentences were analysed in terms of syntactic constituents on the level of

phrase markers. An example is shown below.

S (S ((wat aardig) (van j e)) S S ((dat) (j e) (gisteren) (al weer) (opbelde)) S ) S1 1 2 2

(how kind of you that you yesterday again called)

A confrontation between the thus analysed sentences and the corresponding hummed

contours yielded a highly frequent coincidence of non-final falls with boundaries

of the syntactic constituents. See Fig. 3, in which, for instance,three sentences

are given that had been produced to go with one contour, and another three to go

with a different contour.

~-- /- ~--((het strijk-orkest-je )( kent weI ) (~-ven)(stukken) (van Beethoven ))

(( ( of Pe- ter nog komt)) ((weet) (geen) (mens) (totnogtoe) (met ~-kerheid)))

(((wat aardig van je ))((dat)(je ) (~-te-ren ) (al weer) (~belde )))

_/(((toen) (dacht)(hij)(waarschijnlijk))((dat)( ik Jewel )(voor hem)(zou uit-wij-ken)))

C((van - mor - gen ) (dacht ik al)) (( er) (moet) (toch) (wat op te vin-den zijn)))

(((be -loof me ) (nou eerst eens)) ((dat) (j ij ) ( er ) (weI toe in staat zou z ijn)))

Translations in same word order, respectively: The string-band knows seven pieces ofBeethoven; if Peter still comes knows no one as yet with certainty; how kind of youthat you yesterday again called; then thought he probably that I for him would pullout; this morning thought I already there must be something to be done about it;promise me first that you would be able to do it.

Fig. 3. Examples of sentences produced to go with two different kinds of hummedcontours.

In fact, the extent to which the coincidence mentioned has been observed in the

material can be read off from Table I. Rows represent the number of syllables

between the prominence-lending rise and the non-final fall of the hummed contour

which served as the stimulus; columns refer to the number of written syllables

between the - obligatory - accentuated syllable and the syntactic boundary. Figures

31

on the junctures are frequencies of occurrence in the percentages of the total

number of sentences produced for one stimulus. Corresponding stimuli of the two

groups are taken together.

Table I. Frequency of occurrence of sentences with a given distance (in syllables)between accentuated syllable and syntactic boundary, produced on hummedcontours with a given distance between prominence-lending rise and nonfinal fall.

Number of written syllables between accentuatedsyllable and syntactic boundary

Vlbll(1) ~

...... 0"".o'"d ......t1i~ ......

...... (1)t1i

............ '+<:>-.Vl(1) ......

u t1i'"d~~(1) (1)0""S~'+<So"" I~ S ~~oo

;.. ~

'+<0..o '"d

~ ~;.. (1) t1i(1) (1)

.0;::(1)S...., Vl~ (1)0""z.o ;..

o

2

3

4

o

20

7

o

4

8

35

93

7

8

8

2

28

o

93

4

4

3

10

o

o

84

8

4

7

o

o

o

72

The main diagonal shows the essence of the outcome. Point 0 - 0 does not seem to

follow this general trend; we will deal with the top row later. Firstly, we note

that the lower-left triangle shows the few cases in which the syntactic boundary

occurred prior to the non-final fall, whereas the higher-right triangle shows that,

generally, no syntactic boundaries have been generated beyond the non-final fall

- with the exceptions given in the top row.

The top row represents stimuli in which the non-final fall was located immediately

after the prominence-lending rise. The fact that with such stimuli sentences are

produced with syntactic boundaries at rather arbitrary distances from the accen

tuated syllable is in agreement with the observation mentioned in the Introduction,

that there seemed to be one position of the non-final fall that could always be

taken, viz. immediately following the first rise. This means that the speaker is

free to choose this position of the non-final fall, irrespective of the loction

of the syntactic boundary: the resulting pitch contour is fully acceptable.

If, however, the speaker does not choose this particular position, he is obliged

to mark the syntactic boundary. If he fails to mark it, there are two possibilities:

either the syntactic boundary precedes the non-final fall; from the lower left

part of Table I we may expect that in such cases a certain number of listeners

will not notice this non-coincidence; or the speaker produces a non-final fall

32

prior to the syntactic boundary (although not on the accentuated syllable); in

that case, it will be very readily be noticed by the listeners.

The outcome of this experiment can be interpreted to the effect that the non

final fall, instead of being a mere "reset" of pitch in preparation for the next

rise, may by its very position constitute an important intonational cue to the

listener for the segmentation of the surface structure of the utterance into units

that are suitable candidates for processing as a whole.

The following two additional remarks deal with the nature of the syntactic constit

uent boundaries under consideration.

First, they appear to be major syntactic boundaries (MSB) of two hierarchically

distinguishable kinds, viz. those between clauses (51 - 52) on the one hand, and

those between phrases (NP - VP - AdvP and the like) on the other. There were no

minor boundaries involved, such as between adjective and substantive, or auxiliary

and main verb.

Moreover, in the case of two successive syntactic constituents of which the second

lacks a pitch accent, there is still the choice of having the non-final fall mark

the boundary before the second constituent or the one after it. The experiment

gave strong evidence in favour of the hypothesis that in such cases the hierarchi

cally dominant boundary is marked.

expriment 2

The purpose of the second experiment was to see if, and to what extent, speakers

were willing and able to provide listeners with the cues mentioned above.

Ten subjects were asked to read aloud 30 sentences, arbitrarily chosen from the

material produced in experiment 1. Each sentence was read twice, so that 600 sen

tences were available for analysis. However, not all of this material was expe~ted

to be suitable for our particular interest, that is if the M5Bs were intonationally

marked. Obviously, there is freedom to use the variant in which the non-final fall

is attached to the preceding rise, and although among such cases there could be

instances in which the MSB does occur immediately after the accentuated syllable,

we cannot generally use them for our purpose. Furthermore, the subjects cannot be

forced to use the same kind of contours as had been applied in the first experiment.

A number of alternative ways of acceptably intonating the given sentences will no

doubt exist, and some of these may not contain non-final falls, or comparable phenom

ena, at the places of interest.

Indeed, 240 utterances, or 40% of the material, had to be considered not suitable

for further analysis, for either of these reasons.

Of the remaining 360 utterances, in 204 cases non-final falls were produced in pitch

contours that were fully comparable to those of the first experiment.

In 192 cases, the position of the non-final fall coincided with the M5B.

In 156 utterances, use was made of the alternative way of breaking up the intona

tional continuum, that is a continuation. As in the case of our rule about the use

of continuations mentioned in the Introduction, it has long been assumed that this

feature would be capable of marking syntactic boundaries. And indeed, in 150 in-

33

stances, it coincided with the MSB.

This outcome amply confirms expectations as to what the speaker does, as formulated

above on the basis of the results of experiment 1.

discussion: non-final falls versus continuations

The by-product of the second experiment confirming that continuations are at least

equally strongly connected to syntactic boundaries as are non-final falls, gives

rise to yet another question:

Why is it that sentences, for which contours only having non-final falls as boundary

markers were considered to be suitable intonations, are often reproduced with

continuations?

It seems as if the answer can be found with the aid of what is stated in 't Hart &Cohen (1973) about an "optional alternative rule" for continuations.

The optional alternative rule was an addition to the original rule for continuations,

as stated in the Introduction.Apart from the continuation rise and the immediately

following inaudible fall, a characteristic feature of continuations according to

the original rule is that the last pitch accent before the MSB in the form of a

rise-pIus-fall or a mere fall (Fig. 4.A) According to the optional alternative

rule, the last pitch accent should be realised by a rise, the remaining syllables

(before the MSB) being kept high and, as is the case with the original rule, there

is an inaudible fall immediately after the last syllable before the MSB (Fig. 4.B)

MSB

•

Fig. 4. Two different shapes of a continuation.A: according to the original rule;B: optional alternative.

This may answer the question as to the difference between the shapes of the bounda

ry markers in the stimuli of experiment 1 and of those in the response material

produced in experiment 2: listeners in experiment 1 may have interpreted the into

nation feature of the hummed contour as the optional alternative form of a conti

nuation and may have produced a sentence in which the MSB could appropriately be

marked by means of a continuation. Accordingly, in the second experiment the

subjects have in part produced continuations of the original kind and partly

of the alternative kind.

But why did the listeners interpret the non-final fall as if it were the "resump

tion of low pitch" which is so typical of continuations? The answer is that the

shape of the alternative kind of continuation is, on purely melodic grounds, not

distinguishable from the "anomalous" cases in which the non-final fall was typically

34

"postponed". Yet, as is reflected e.g. in our grammar of Dutch intonation ('t Hart

& Collier, 1975), they have been kept apart for some reason or other. The main

"reason" for their distinction might have been the fact that the interpretation of

the empirical, and experimentally tested second rule, according to which the non

final fall could be located anywhere between two successive rises, was still

implicitly biased by the first rule, to the effect that "anywhere" would not mean

"later than immediately after the accentuated word", so that "postponed" was

"anomalous".

The experiments mentioned in this paper have put an end to this biased interpreta

tion. But this does not necessarily imply that there would no longer be any reason

to distinguish the non-final fall from the resumption of low pitch in the alterna

tive form of the continuation. The experiments are not decisive in this respect.

Namely, we do not know whether the MSBs produced by listeners who have interpreted

the non-final fall as alternative continuation have some common property (which the

other MSBs lack) that makes them candidates to be read, in experiment 2, with the

intonational features of a continuation (of either kind).

It is clear that, already on melodic grounds, a continuation in its original shape

should be distinguished from a non-final fall. The actual problem is, however,

whether the alternative continuation, by virtue of being an alternative form of a

"genuine" continuation, should likewise be distinguished from a non-final fall, or

due to its being melodically identical to the non-final fall, should consequently

be held as distinct from a genuine continuation.

It might be doubtful whether this question should be answered at all. In any case,

we should not expect to find the answer in the material gathered in these experi

ments; rather, we should turn to the natural situation of spontaneous conversation.

Thus that the by-product of the second experiment, apart from confirming an intu

itive supposition, has yielded the new problem of how the various boundary markers

are most realistically distinguished and best described. The main product of the

two experiments is that we have found a frame into which both "regularities" and

"irregularities" of the location of the non-final fall can be made to fit.

Finally, it should cause no surprise that our initial, strictly melodic approach,

in which the non-final fall could only be considered to be a reset to low pitch

between successive rises, was not capable of specifying its location in more detail.

Once we take into consideration its functioning with respect to the syntactic

structure marking, we can fully understand the exact location of the non-final fall

in the pitch contour.

summary

In the course of our attempts to develop rules for the construction of well-formed

pitch contours for Dutch utterances, several propositions have been made with

repect to the location of the so-called non-final fall, each of which, however, led

to a different set of seemingly incoherent cases of exception.

The outcome of two new experiments have now enabled us to reformulate the rule so

as to account for formerly "regular" as well as "irregular" cases. To achieve this

it was necessary to introduce as a new dimension the syntactic structure of the

utterance.

35

Namely, it was demonstrated that, unless the speaker applies a strategy - which

he is free to choose - by which the non-final fall is located immediately after

the preceding prominence-lending rise, he must make the non-final fall coincide

with a major syntactic boundary (separating syntactic units such as Noun Phrase,

Verb Phrase, Adverbial Phrase from each other).

We have interpreted this outcome to the effect that the non-final fall may consti

tute a cue for the segmentation of the speech continuum into units suitable for

processing as a whole. Furthermore, these findings give support to the notion of

the intonational block, which has been introduced, partly for reasons of descrip

tive convenience, in establishing our grammar of Dutch intonation.

Melodically, the shape of the non-final fall is similar to that of one particular

type of continuation. These experiments have suggested that there is no difference

in their functional aspects either. The final part of this paper, therefore, deals

with the question of whether or not we can decide to stop keeping them apart in

our description.

references

Cohen, A and 't Hart, J. (1967) On the anatomy of intonation, Lingua ~, no. 2,p. 177-192.

Collier, R. and 't Hart, J. (1975) The role of intonation in speech perception,Structure and process in speech perception, Proc. of the Symposium on SpeechPerception, A. Cohen and S.G. Nooteboom Eds., Springer, Heidelberg.

't Hart, J. and Cohen, A. (1973) Intonation by rule: a perceptual quest, J. ofPhonetics 1., p. 309-327.

't Hart, J. and Collier, R. (1975) Integrating different levels of intonationanalysis, J. of Phonetics 3, p. 235-255.

Willems, L.F. (1966) The Intonator, IPO Annual Progress Report 1., p. 123-125.

36

PROSODY AND THE PERCEPTION OF SYNTACTIC BOUNDARIES

J.J. de Raaij

introduction

A speaker's message, acoustically coded, shows a number of prosodic regularities.

The speech signal, for instance is prosodically related to the syntactic structure

of the sentence in that major syntactic boundaries (MSBs) are often marked by

segmental lengthening (Huggins, 1974; Klatt, 1975) and specific pitch movements,

such as the continuation rise (J. 't Hart and A. Cohen, 1973).

The question arises as to the part, if any, which these prosodic regularities

play in the recovery of syntactic structure.

In the context of a research project set up to investigate the relative contribu

tions of temporal structures and pitch contours to the recovery of MSBs in the

perception of speech, a preliminary experiment has been carried through, designed

to find out whether naive subjects are able to use prosodic cues in the perception

of MSBs at all.

On the linguistic level of the decoding process the listener brings the whole of

his linguistic knowledge to bear. In order to isolate prosodic information from

the segmental, syntactic and semantic content and thus to be able to assess its

contribution to the perceptual process, use can be made of nonsense imitations

(Kozhevnikov and Chistovich, 1965; Carlson et al., 1972), hummed imitations

(Svensson, 1974) or spectrally rotated speech (Blesser, 1969).

In the present experiment normally spoken sentences were made unintelligible by

electronically distorting the segmental information (see Method); the prosodic

structure, however, remained unaffected in this process. The location of prosodic

markers of MSBs was varied systematically and its effect on perceptual interpre

tations of the stimuli investigated.

method

Stimuli were prepared from 5 normally spoken Dutch sentences consisting of 7

syllables and 6 or 7 words. Each sentence had an MSB, viz. the boundary between

the main clause and the subordinate, object clause. An English example of this

type of sentence might be: The major said "that is wrong". This particular struc

ture was chosen, because it contains a great syntactic break which is often proso

dically conspicuous. The location of the MSB was varied systematically.

The sentences were read aloud without pauses at the MSBs by a trained speaker.

Artificial, stylized contours were superimposed on the signal in order to keep the

intonational cues equivalent and well specified for all sentences. This was done

by using the channel vocoder as an Into~ator (F. Willems and Th.A. de Jong, 1974).

The positions of the various pitch movements on the time axis were adjusted until

a satisfactory perceptual equivalent of the original contour was found.

There were 3 pitch accents, the first being a rise and a fall, the second a rise

and the third a fall. The syllable before the boundary was given a continuation rise,

which is a rise coming rather late in the syllable, not lending prominence and often


37

marking MSBs (J. I t Hart and A. Cohen, 1973).

The S contours were as follows:

syllable number 7

sentencenumber

1 2 3 4 5 6I\.., /~--=--~~-----'\\

2~/ \3~/ \4~/ \5 _-----JA'----,/ \

2 3 4

In this figure the declination lineis omitted. The numbers below referto the pitch movements; 1 is a riseand a fall, 2 a continuation rise, 3a rise and 4 a fall.

In resynthesizing the utterances, spectral scrambling was applied by sending the

permuted output of the 1S analysis filters to the 1S synthesis filters of the

vocoder. By so doing, segmental information was distorted so as to make the speech

unintelligible. Prosodic information, however, was preserved.

The distorted utterances were recorded on Language Master Cards.

In a preliminary test of the stimuli, colleagues were asked to listen to the

spectrally scrambled sentences and make up sentences fitting the prosodic patterns

they heard.

This made the disadvantages of the scrambling technique apparent. These experienced

listeners succeeded in recognizing some of the original words. They often felt

hampered by the new spectral information. Some short vowels, lengthened at the

MSB, were interpreted as unlengthened long vowels. For investigations into the

role of temporal factors in speech perception, the scrambled signal is perhaps

not optimal. Nevertheless, subjects fairly often recovered the correct MSBs, without

recognizing the original words.

In view of these considerations, I decided to provide subjects with typed-out

response sentences. As each stimulus had a different stress pattern, it was given

a set of response sentences of its own. This set consisted of 6 sentences, in S

of which the MSBs were varied in accordance with those of the stimulus sentences;

the 6th had no MSB of the type shown. An example of such a sentence in English

might be: My dear aunt lives in London.

experiment and results

21 naive subjects (students at the Eindhoven University of Technology) listened

to the stimuli through headphones, in individual sessions. They could listen to a

stimulus as often as they wished. The response sentences were type written. The

38

5 stimuli and the 6 possible response sentences for each stimulus were randomized.

They were asked to concentrate on the rhythm and the melody of the utterance and,

from the list provided, to choose the best fitting response disregarding conflic

ting information on individual speech sounds. They gave their response by writing

down the number of the response sentence. The whole task took about 8 minutes on

the average.

An example of our results is shown in table I.

51 behaves differently from the other stimuli be

cause the M5B is perceived more often 2 than 1

syllable away from its real position. For 53 R6has a syntactic boundary between subject and pre

dicate, and for 54 R6 has a boundary between verb

and adverbial adjunct, corresponding to the proso

dic boundary marker in the stimulus.

The scores on the diagonal are correct scores in

that the M5B in the response sentence is located

in the same position as the prosodic boundary

marker in the stimulus.

5 6

1

A repetition of this experiment two weeks later,

with different scrambling to avoid recognition

of the stimulus material, with 7 of the original

21 subjects, gave essentially the same results,

with a slight improvement in performance. A second

repetition with the same 7 subjects and monotonized

stimuli, gave somewhat lower scores on the diagonal, but still showed a high cor

relation between stimulus structure and the perception of M5Bs.

Table I. Row number retersto the position of prosodicboundary markers in the stimulus, column number to thecorresponding position ofMSBs in the response sentences; column number 6 refersto the response sentencewithout M5B of a similartype.

discussion

It is encouraging to see that naive subjects have no real difficulty in using prosodic

boundary markers in assigning syntactic structures to speech signals. This leads

us to believe that it is worth while to pursue this line further in attempting to

assess the relative contribution of temporal and intonational cues to the percep-

tion of syntactic boundaries.

The fact that our subjects performed less well, but still consistently and meaning

fully, with monotonized stimuli, is itself an indication that both pitch contours

and temporal cues (in our case segment durations but no pauses) are valuable aids

in the perception of potential syntactic boundaries. Using scrambled speech was found

to be a severe drawback as segment durations were not always easy to interpret pros

odically, because our subjects did not know which phonemes were realized in the

acoustic segments concerned. Obviously, in normal speech there is a high degree of

interaction between phonemic and prosodic factors affecting speech sound durations

(Klatt, 1973; Lindblom and Rapp, 1973; Nooteboom, 1972).

If the listener is denied access to one level of processing, in our case the pho

nemic one, he can in no way take the interaction between the two levels into account,

and may come up with the wrong conclusions.

39

We therefore conclude that, if we wish to study the relevance of prosodic segment

durations to the perception of MSBs in more detail, we should rather turn to

another type of stimulus material, e.g. nonsense strings with a simple and irre

levant phonemic make-up.

summary

Ih a listening experiment subjects assigned type-written response sentences to

unintelligible, but prosodically intact utterances. Each utterance had a prosodic

boundary marker arid a set of 6 possible responses.

In each set there was one response with a major syntactic break whose location

corresponded to that of the boundary marker in the utterance.

Results indicate that prosodic information delimits possible responses and may

therefore be a valuable aid in recovering syntactic structures.

references

Blesser, B. (1969) Perception of Spectrally Rotated Speech, Dissertation, M.I.T.,Cambridge, Massachusetts.

Carlson, R., Granstrom, B., Lindblom, B. and Rapp, K. (1972) Some timing andfundamental frequency characteristics of Swedish sentences: data, rules and aperceptual evaluation, Speech Transmission Laboratory, KTH, Stockholm QPSR ~.

't Hart, J. and Cohen, A. (1973) Intonation by rule: a perceptual quest, Journalof Phonetics ~, p. 309-327.

Huggins, A. (1974) An effect of Syntax on syllable timing, Q.P.R. 114, M.I.T.,Massachusetts.

Klatt, D. (1973) Interaction between two factors that influence vowel duration,J.Acoust.Soc.Amer., ~, p. 1102-1104.

Klatt, D. (1975) Vowel lengthening is syntactically determined in a connecteddiscourse, Journal of Phonetics l, p. 129-140.

Kozhevnikov, V. and Chistovich, L. (1965) Speech: Articulation and Perception,(JPRS 30), Washington D.C.,(Moscow-Leningrad).

Lindblom, B. and Rapp, K. (1973) Some temporal regularities of spoken Swedish,Papers from the Institute of Linguistics, University of Stockholm, Publication 21.

Nooteboom, S. (1972) The interaction of some intra-syllable and extra-syllablefactors acting on syllable nucleus durations. IPQ, Annual Progress Report I.

Svensson, S. (1974) Prosody and Grammar in Speech perception, Dissertation, MILUS 2(monographs from the Institute of Linguistics, University of Stockholm).

Willems, L. and de Jong, Th., (1974) Research tools for speech perception studies,IPQ Annual Progress Report 9.

40

ACCENT PATrERNS IN NUMBER NAME SEQUENCES

A.F.v. van Katwijk

introduction

It is now possible to synthesize accented syllables and provide recorded speech

with F O contours in such a way that the syllables we choose are accented. A

rise-pIus-fall in the FO contour is one of the means of giving accents to syllables.

Given a text, what we do not know is which syllables should be or may be accented.

Syntactic information, which can be made explicit, is not enough. Situational and

contextual information is generally felt to be indispensable for the prediction of

accents. This kind of information, however, is still largely beyond the scope of

explicit rules and descriptions.

In this paper we have worked with a simple instance of accent prediction. Having

found that there are variable accent patterns in number names, we have arrived by

analysis at the explicit factors which might be found in number name sequences

to account for the observed variants of accent patterns, and we have looked into

what readers actually do, i.e., what accent patterns they produce when the distance

(in terms of the number of intervening items) between contextual accentuation

factor and target is varied.

If sequences of numbers are read aloud, the accent patterns of the number names

usually bear the marks of the sequential structures of the sequences. Thus, the

sequence 24 25 26 etc. and the sequence 31 41 51 etc. get different accent

patterns. In the first named sequence one should expect the units to be accented

and in the latter sequence the tens to be accented. The principle of accentuation,

loosely formulated, would be: the constant parts of a sequence are left unaccented,

the varying parts are accented. That this principle is not restricted to number

name sequences, could be inferred from the accentuation in utterances such as (1),

he put a blue box on a red box

where at least the second "box" goes unaccented.

( 1)

Starting from simple material (number names) and a simple accentuation principle

(constancy and variation) we cannot hope to throw much light on many complex

problems involved in accent prediction. Our limited goal, however, is to see how

a tangible part of the accentuation problem can be specified and be made to play

a role in an experiment on accent strategies in the production and perception of

accent patterns.

accents in number names

The compound number names that will be discussed have the structure H E T where

H counts the hundreds, E the units, and T the tens, as may be seen in

negenhonderdzesenvijftig

H E T

/PO annual progress report J0 /975

(956: "nine hundred six and fifty") (2)

41

The parts H, E, and T are the potential accent carriers of number names of this

type.

A number of observations have been made on subjects reading aloud all kinds of

lists with numbers, the main of which are:

(a) An isolated number name of the type H E T can be accented, with three pitch

accents (on H, E, and T), with two pitch accents (either on Hand T or on E and T)

or with one pitch accent (on T). Being isolated number names the accent patterns

of these items might be defined as "neutral" accent patterns.

(b) Sequentially marked accentuation occurred with distinct clarity in non

isolated number names, and notably when a pair of numbers had constant and varying

elements (e.g., 365 366, or 528 628, or 734 744). If the units E or the hundreds

H were the varying elements, the absence of pitch accents on the constant tens (T)was a striking feature, next to the pitch accents on the varying elements E or H

in the second number name of each pair. This is mentioned because an isolated

number name may have a neutral accent pattern with one pitch accent on T. A

pitch accent on T, therefore, is ambiguous in principle. We will show in the next

section how this ambiguity played a part in the experiment.

In order to test the perceptual strength of different accent types as cues of

sequential structure, a listening session was prepared for a panel of phonetically

non-naive subjects, who had to judge whether or not auditorily presented pairs

of number names had accent patterns expressing the sequential structure of the

numerical values. An example might illustrate the experiment: Among the 140 pairs

of numbers (all of the H E T type) that the listeners saw on a list, one pair

might have read, e.g. 273 274. What they then would hear was, e.g., tweehonderd

drieenzeventig tweehonderdvierenzeventig, with pitch accents on the underlined

elements. The sequentially appropriate accentuation would have to have (at least)

a pitch accent on -vier- in the second number-name. Absence of this would probably

be interpreted by the listeners as a case of inappropriate accentuation.

All pairs of number names had artificial Fa contours, prepared by means of the

INTONATOR (Willems, 1966). The pitch accents were of a rise-pIus-fall type, imposed

on the syllable to be accented. Each number name had only one such rise-pIus-fall,

on H, or on E, or on T. There was a gradually declining pitch base line for greater

naturalness. Table shows the percentages of judgments by seven listeners. Note

that each pair had one varying element (H, E, or T), and two constant elements, and

the pitch accents could occur (a) neither on the first varying element nor on

the second, (b) only on the varying element of the first number name, (c) only on

the varying element of the second number name, and (d) on the varying elements in

both first and second number name.

From Table 1 two main conclusions might be made:

(a) The perceptual strength of the accent pattern as a marker of the sequential

numerical structure is considerable, for - in spite of the rather vague task

the judgments are practically unanimous.

(b) The accentuation of the first number name does not influence the judgments

appreciably. It is the accent on the varying element in the second number name

of a pair that really counts.

42

accent not on accent only on accent only on accent on bothvarying elements varying element varying element varying elements1st or 2nd 1st number name 2nd number name 1st and 2nd

3% 1' 83% 94%

I

-6

(on 252 judgments) (on 126 judgments) (on 126 judgments) (on 63 judgments)

Table I. Percentages of judgments that accent patterns of number names correctlyexpressed the sequential numerical structures of pairs of numbers. Seven listeners,140 pairs.

The observations reported so far indicate that accentuation and sequential struc

ture in number names are closely linked. Readers as a rule choose the syllables

for accentuation in such a way that the sequential structures, if any, are marked,

and listeners as a rule are capable of perceiving such accentual marking of sequen

tial structure. A second point that should be made is that the lack of importance

of the first number name as far as its accentuation is concerned, suggests that in

reading and in perceiving the first item of a related pair, the relationship with

the second is as a rule not anticipated.

coding and retention

In this section an experiment will be discussed on reading aloud number names of

the H E T type, where related items occurred in succession or separated by 1, 2,

or 3 intervening unrelated number names. It stands to reason that the accent pattern

of the second item of a related pair will have a probability of carrying accentual

sequence marking that should be a declining function of the number of intervening

non-related number names. Before going into the description of the experiment let

me suggest that it is not the passage of time which mainly interacts with the

relation: Five subjects who were confronted with a sentence like (3) produced a

second number name clearly expressing the numerical relationship with the first,

in spite of the large number of intervening words.

482 is a number, and of all the numbers one could mention - (3)

there being infinitely many -, one could also mention 483.

The interesting point seems to be the kinds of interfering factors that have

knock-out effects (Crowder, 1970) on the retention of the coded representation of

an earlier number name.

Three subjects had been asked to read aloud numbers from 120 cards. On each card

there were 8 numbers of the H E T type, having varying units (E), or varying

tens (T) or varying hundreds (H). One example:

772 439 866 593 227 327 954 681 (4)

In (4) the fifth and sixth numbers are the related pair.

The positions of the experimental numbers were varied, and the distance between

the related items were also varied. A second example:

476 833 567 294 921 788 667 345 (5)

1iI. ..,I.,

--....~

~~ 1

0 ::II

1

Kl r ] 2 H01. ~

~~

3

01 ~

~ --~~

1

43

In (5) the third and seventh items

form the related pair.

The first and last position was not

used for experimental items. Every

distance was represented 10 times

in the material.

From the recorded performances the

related pairs were rerecorded in

temporal contiguity without the

extraneous number names. Panels of

three listeners judged whether the

accent patterns expressed the numeri

cal relationships. Fig. 1 shows the

results.

Fig. 1. Proportion of sequential accentpatterns in pairs of related number names asa function of the number of interveningnumber names.There were three reading subjects (producing120 times 8 number names with the accentpatterns) and three listeners (judgingwhether or not the recorded pairs of relatednumber names had the relevant sequentialaccent patterns) and there were three typesof relationships between input numbers:H, as between the 478 ... 578 ... ; E, asbetween 523 524 ;and T, as between... 936 946 Every experimental point inthe graphs represents 10 observations.

o 1 2---....... distance

3 _subject

The following observations can be made

from the data in Fig. 1:

(a) Non-related number names inter

vening between related items, have

the effect in all three subjects of

interfering with the retention of

the earlier of a pair of number

names. This knock-out effect is not

equally strong among the subjects.

(b) In subjects 2 and 3 the ambiguous

accentuation with pitch accents on the

T elements has more than once led the

panel to erroneously interpret the

accent pattern as sequentially

marked. Their judgments are far from

unanimous and show a large variability.

As indicated in section 2, the ambi

guity of a pitch accent on T is such

that it may either be a neutral accent

pattern or a relation-marking pattern.

Subject 1, however, produced neutral

accent patterns with pitch accents

on all three elements H, E and T,

so that a pitch accent on T alone

could correctly be interpreted

by the panel as the pattern of

relationship.

The experiment as a whole enables

us to think about coding processes

in the minds of language users: an

earlier number name is found to

be retained in the mind if

44

not displaced by comparably structured number names. It is retained and used as a

component in a relation with a second number name if this second item has both

identical and different elements in its structure. Without actually being aware of

the process, the reader appears to organize a sequence of input items such that

larger structural units emerge.

summary

Subjects reading aloud sequences of numbers as a rule produce accent patterns that

reflect perceived structural relationships, if any, between the number names. The

accentuation principle - loosely formulated - is that constant elements are left

unaccented whereas varying elements are accented. By changing the distance between

two related number names and introducing intervening number names, we investigated

bv looking at the accent patterns, when the subjects did or no longer did pick up

the relationships. The accent patterns turn out to be natural and easily detectable

indicators of specific mental organisation processes.

references

Crowder, R. (1970) The Role of One's Voice in Immediate Memory, Cognitive Psychology,.l, p. 157-178.

Willems, L.F. (1966) The Intonator, IPO Annual Progress Report .l, p. 123-125.

45

A SYMPOSIUM ON DYNAMIC ASPECTS OF SPEECH PERCEPTION

I.P.O., AUGUST 4-6, 1975

A. Cohen and S.G. Nooteboom

In August 4-6, 1975 we organized a small symposium on Dynamic Aspects of Speech

Perception. The idea for this symposium arose during the second Speech Communica

tion Seminar in Stockholm a year earlier, on which occasion people agreed that

there was a need for a conference where proper attention could be paid to new

developments in the field of speech perception. A Planning Committee was formed

by Mark Haggard, David Pisoni, Sven Ohman and the authors of the present paper, the

latter"acting as an Organizing Committee. Hospitality and sponsoring were provided

by IPO. The symposium was held under the auspices of The Royal Netherlands

Academy of Arts and Sciences and The Netherlands Organization for the Advancement

of Pure Research.

As it was agreed that the field of speech perception would benefit from greater

attention being paid to the perception of connected speech, moving away from too

great a concentration on perceptual studies of phonemes and syllables, it was

decided to bring the perception of connected speech into focus by limiting contri

butions as much as possible to this area.

Apart from members of our own institute, eventually some 25 participants, coming

from Belgium, Canada, Denmark, Great Britain, Holland, Japan, Sweden and the

United States gathered in Eindhoven in the first week of August. The contributions,

20 in number, were distributed in advance, which allowed us to limit verbal pre

sentations to about 10 minutes each and concentrate mainly on the discussion.

This scheme worked quite well and resulted in lively and often thorough discussions

both of specific points and of topics of more general interest to the field.

It became quite clear during the symposium that participants were in general

empirically oriented. There were, however, a few interesting attempts, notably

by Pisoni and Massaro, to relate different levels of processing to each other

within larger frameworks, but on the whole there was little theorizing. It was

also evident that most of the work was concerned with structures, i.e. the effect

of stimulus structures on perception, or the effect of syntactic and/or semantic

structures on perception, or even the relation between speech perception and

particular physiological structures of the brain. Very little work was concerned

with perceptual processes. It was noted by the psychologists Levelt and Flores

d'Arcais that in this respect the field of speech perception seems to be lagging

behind much contemporary work in psycholinguistics, notably of the information

processing kind. They also observed that there was too little concern with task

dependent aspects of the subject's decision procedures in experimental tasks.

One outcome of the symposium was that people seem to be less prepared to look for

"the perceptual unit" of speech, than they were, let us say, ten years ago. There

was a general awareness that there may be a number of different perceptual units

at different processing levels. To us it seems that in looking for "perceptual

units" or "decision units" one should not too readily postulate a one-to-one

relationship between such units and the units of linguistic descriptions.


46

It might be worth while to switch from the time-honoured question of perceptual

units to a search for the scanning processes in auditory perception per se, instead

of taking a lead from linguistics. Huggins took the position that there are no

such things as perceptual units: speech perception may be a continuous process.

As to the question of how linguistic expectancies affect the decoding process in

the perceptual system, there appeared to be a recognition that bottom-up and

top-down processes are required, interesting questions being, what is the lowest

level where these levels meet, how can we study this, and what is the general

nature of the confluence of the two? It was noted that, by the implied serial

processing in most tentative outlines for models of speech perception in the lite

rature, very early semantic and syntactic processing appears to be excluded.

Parallel processing schemes might be more realistic, but the right experimental

paradigms do not yet appear to be available to allow us to deal with such questions.

Of the actual research reported in the symposium, a great part was devoted to the

study of the role of prosody in speech perception. Some of these investigations

were concerned with the effect of prosodic structures on phoneme identifications,

others with the effect of position within phrase on preferred duration and duration

discriminability of phoneme-size segments, others again with the effect of prosody

on intelligibility, immediate recall or reaction times, on the perception of

phrase boundaries, sentence position within a paragraph, or the location of

switching the speech signal from one ear to the other.

On the whole we feel that this type of perceptual study is still too much concerned

with the demonstration of certain effects and too little with plotting functional

relationships which might lead to a better understanding of the underlying processes.

Obviously this is due to the fact that the problems concerned with the perceptual

functionirig of speech prosody have only recently come to the attention of speech

researchers and there is still a lack of experimental paradigms tailored to these

problems.

This criticism applies much less to those investigations concerned with very

short-term auditory storage of speech, which has been studied by means of backward

recognition masking and the perception of speech with periodically inserted silent

intervals. Here we find functional relationships apparently related to some decay

function of auditory storage, although a proper model explaining the results is

still lacking. A similar observation holds good for short-term memory for indivi

dual properties of speech.

A few papers, not directly related to the perception of connected speech, dealt

with the perception of time-varying formant frequencies, coarticulation and lip

rounding, developmental aspects of speech in very young children, and dichotic

speech mode listening.

What seems to emerge from this picture i~ that the study of the perception of

connected speech is still a young, but promising field of research. We hope that

this symposium, and its Proceedings (Cohen and Nooteboom, 1975), will contribute

to finding new and interesting ways of studying the complex information processing

taking place in the perceptual decoding of connected speech. We are grateful to all

those, participants and IPO colleagues, who helped to make the symposium a success.

47

reference

Cohen, A. and Nooteboom, S.G., Eds. (1975) Structure and Process in Speech Perception;Proceedings of the Symposium on Dynamic Aspects of Speech Perception, SpringerVerlag, Heidelberg.

3 visual perception

48

49

RESEARCH ON VISION 1975

JAJ. Routs, J.J. Andriessen, Th.M. Bos, H. Bouma, F.L. Engel, Ch.P. Legein,

JA Pellegrino van Stuyvenberg, AL.M. van Rens and CW.J. Schiepers

In 1975, a major part of the effort again went into research on relatively central

processing of visual information, including its cognitive aspects.

Activities were devoted to three main fields of research viz.

a. Visual processes in reading,

b. Conspicuity of visual objects,

c. Dynamic processing of simple visual signals.

Some work on earlier projects concerning aids for the visually handicapped was

followed up. In these projects again our workshop was deeply involved

(Mr. H.E.M. Melotte).

visual processes in reading

A theory on word recognition has been developed for three-letter words. It predicts

both correct and incorrect recognition scores for words from (a) correct and

incorrect letter recognition in unpronounceable letter strings and (bl a list of

all existing Dutch three-letter words. Predictions have been compared with experi

mental scores for four different retinal eccentricities of the stimuli, as measured

earlier (Bouma, 1973). As to the average correct word scores, theory and experiment

are in reasonable agreement.

As to average correct letter scores in word responses, theory and experiment are

in quantitive agreement, for all four retinal eccentricities and three-letter

positions in the word. A separate contribution can be found in this issue (Bouma

and Bouwhuis, 1975).

Concerning the perception of contours of Dutch words of three letters, experiments

have been carried out on the implication of succession of ascending letters, des

cending letters, and letters without extension. Characteristic confusions are found

to exist. The letter position farthest from the fovea is favoured in perception.

Recognition experiments on word stimuli with one letter added or one letter missing

have supplied supporting evidence for the role of word length in word recognition.

With respect to reading difficulties a follow-up study on twenty dyslectic children

and twenty control subjects has been undertaken. The measurement of response

latency was included. On two groups of 4 subjects a number of pilot experiments

have been carried out. It was found that in search tests for letters in words and

in letter strings, the scores for dyslectic children were lower than for the average

readers. Part of the difficulty seems to arise from the fact that the dyslectic

children skip lines in carrying out the task, and target letters towards the end

of the word tend to be missed by dyslectic children. A separate contribution on

this subject can be found in this issue (Bouma, Legein and van Rens, 1975).


so

A new start has been made with research on the control characteristics of eye

movements in reading. Five subjects each silently read seven texts of a rather diffi

cult grade while their eye movements were being recorded. The texts to be read

were political comments in Dutch newspapers. The eye movements were measured and

recorded by means of an apparatus based on the limbus reflection technique.

The X- and Y- coordinates in a plane of 15 x 100 visual angle were off-line fed

into a P 9200 computer. A biting board was used to reduce the head movements.

The absolute accuracy of position in the X-direction was about .5 0, the relative

precision being .25 0 within the range of 150, The individual letters subtended

.250

of visual angle, meaning that the size of saccades were accurate within two

letters. The sequence of the successive saccades and successive fixation courses

have been determined for each individual subject and text. From this sequence

histograms of forward and regressives saccades and fixation pauses have been cal

culated. The analysis of data is now directed towards the possible correlation, for

instance, between successive saccades or successive fixation pauses in order to

find characteristics of the underlying process of eye movement control in reading.

conspicuity of visual objects

In the previous issue (Engel and Bos, 1974) an experiment was de~cribed in which

the subjects dual task was to search for a given target object and avoid looking

at a certain non-target, both objects being simultaneously presented with unknown

location in a complex background. The size of the relevant conspicuity area

(visual field in which the object concerned is capable of being noticed in a single

fixation) was related to the probability of target discovery as well as the

probability of involuntary fixation on a non-target. Assuming a random pattern of

eye movements during performance of the task, both probabilities were expressed

in an "effective" size of the conspicuity area concerned. These two relations have

now been established more fUlly as illustrated in figure 1.

",

a

oo 5

2.rf o. T.8.

P o. EE.

I1.0

" 0,,P.0.2+0.28 ~SO,

010° 0 5 10°

•RSO b •RSO

Fig. 1. For ~wo ob~ervers, the relation between the size Rso of the conspicuityarea ~det~rmlned dlrectly by means of brief stimulus presentation allowing for oneeye flxatlon only) and the effective size p of the conspicuity area (derived fromthe dual task data) is shown. Fig. la demonstrates it for targets to be searchedand Fig. lb for non-targets, involuntarily fixated.

51

A new phenomenon was observed during direct determination of the conspicuity area.

Although strict fixation of the display centre was required, small involuntary

eye movements in the direction of the target were discovered. Occurrence and

delay depended on both target eccentricity and size of the relevant conspicuity

area. This phenomenon is described more fully in a separate contribution in this

issue (Engel and Bos, 1975).

dynamic processing of simple visual stimuli

A perturbation technique has been developed to measure directly any kind of responses

at threshold level, using only two system postulates. These postulates are: i) a

linear processing of small signals and ii) top detection. Pulse-, step-, and

frequency responses have been measured successfully. The linearity assumption has

been validated, for instance by comparing pulse- and step responses, see separate

contribution (Roufs and Blommaert, 1975). The band pass filter type of transfer

with respect to fast changes (see Roufs, 1974) has been established beyond

doubt for all three types of responses. This is new circumstantial evidence per

taining to the concept of at least two output variables of the visual system.

One of these was thought to be processed in a low-pass type of transfer and be

connected with brightness variations, the other was assumed to be processed in a

linear band-pass type of transfer, sensitive to fast changes in the visual field

and connected with the "agitation" percept.

aids for the visual handicapped

The embossed drawing set has been given a new design based on injection moulding

technique. This design is ~etter suited for larger-scale production, which will be

put into effect in 1976.

The T.V. magnifier, Fig. 2, has been considerably improved. Owing to the availabi

lity of a new small camera, the design was made more compact (table model) as well

as less expensive, a number of ergonomic improvements having been included as well.

The magnification range has been extended to 3-24 x, the smaller value ensuring

easy surveying (search) and the possibility of viewing large photographs and dia

grams. The apparatus has been commercially available since December 1975 (Philips

Nederland B.V.).

The reading desk, presented in the previous issue, has been tested in several

practical situations. For the visually handicapped the magnification (about 2x)

has generally been f?und too small to be of real help. It is interesting to note

that people with normal vision found it very useful in reading text of low print

quality.

The procedure for traininp a blind subject to learn to read with the Optacon

(Engel, 1974) has been completed. A rate of 27 words per minute for oral reading

and S4 words per minute for silent reading has finally been achieved.

52

Fig. 2. T.V. magnifier,type 3-24 M.

references

Bouma, H. (1973) Visual Interference in the Parafoveal Recognition of Initial andFinal Letters of Words, Vision Res. ll, p. 767-782.

Bouma, 11. and Bouwhuis, D.G. (1975) Word Recognition and Letter Recognition:Toward a Quantitative Theory for the Recognition of Words of Three Letters,IPQ Annual Progress Report, this issue.

Bouma, H., Legein, Ch.P. and van Rens, A.L.M. (1975) Visual Recognition byDyslectic Children, IPO Annual Progress Report, this issue.

Engel, F.L. (1974) Learning to Read with the Optacon, IPQ Annual Progress Report,~, p. 110-113.

Engel, F.L. and Bos, Th.M. (1974) Visual Conspicuity and Eye Movements, IPQ AnnualProgress Report, ~, p. 94-98.

Engel, r.L. and Bos, Th.M. (1975) Small Involuntary Eye Movements, IPQ AnnualProgress Report, this issue.

Roufs, J.A.J. (1974) Dynamic Properties ~f Vision-IV. Thresholds of DecrementalFlashes and Doublets in Relation to Flicker Fusion, Vision Res. li, p. 831-851.

Roufs, J.A.J. and Blommaert, F.J.J. (1975) Pulse- and Step responses of the VisualSystem Measured by a Perturbation Technique, IPQ Annual Progress Report, thisissue.

S3

WORD RECOGNITION AND LETTER RECOGNITION:

toward a quantitative theory for the recognition of words of three letters

H. Bouma and D.G. t::3ouwhuis

introduction

This paper deals with the understanding of processes involved in visual word

recognition. Earlier experiments on letter and word recognition in eccentric vision

(Bouma, 1973) had indicated certain correspondences between letter recognitions

from unpronounceable strings and letter counts from recognized words.

Starting from these indications, we made two assumptions about processes which might

lead from recognition of letters to recognition of words, including both correct

and incorrect recognitions.

Here we present not only some of the earlier evidence, but also the two basic

assumptions and compare certain theoretical predictions with the outcome of the

earlier experiment on word recognition. The theory can be seen as an addition to

Morton's logogen model (Morton, 1969) and as a restricted version of Rumelhart

and Siple's (1974) theory in a somewhat more natural test situation.

earlier relations between letter scores in words and in strings

The earlier experiments (Bouma, 1973) related to two types of recognition in

eccentric vision: word recognition and letter recognition.

Dutch words (3-6 letters) were presented for 100 ms at a number of retinal eccen

tricities. Average fractions of correct and incorrect recognitions as well as

responses "illegible" are indicated in Fig. 1.

illegible

_sO -. -3 -2

1·0

-1 0 1 2Eccentric Itimulus position [degrees]

,.

Fig. 1. Average response scoresfor Dutch word stimuli as afunction of retinal eccentricity.Word lengths 3-6 letters, presentation time 100 msec. Note thehigh fraction of incorrect words,mostly existing Dutch words, andthe higher correct scores in theright visual field. From Bouma,1973.

Note the well-known advantage of words in the right visual field and the high

fraction of incorrect responses. The great majority of incorrect responses were

existing Dutch words. At the time we were particularly interested in the role of

the initial and the final letters of these words. Therefore, in all responded

words, we counted the correctly reported initial and final letters. It cannot, of

cours~ be inferred that these correctly reported letters had been correctly per

ceived, since their presence can have been derived from other information in the

stimulus words. We therefore carried out another type of experiment, in which

/PO annual progress report 10 /975

54

subjects recognized initial and final letters from unpronounceable letter strings,

in which lengths and letter distributions were similar to those in the word recog

nition experiment. It was assumed that the letter string experiment gives us the

perception proper of the particular letters. In Fig. 2 it can be seen that letter

counts in word recognition are indeed higher than letter scores from strings.

P

·2 1·0 - If

-5' -4 -3 -2 -1 0 1 2 3 4 S·

Eccentric stimulus position [degrees)

Fig. 2. Average correct scoresof initial and final letters asa function of retinal eccentricity. Large symbols: letter countsin word responses to word stimuli;s~all symbols: counts of letterresponses to unpronounceablestrings. Note the higher countsfrom word responses and the similar trends for the two typesof experiment. From Bouma, 1973.

The difference between the two was called "completion" because it is a measure of

the contribution of the other word properties to the correct reporting of particular

letters in words. Recently, we added the recognition of the central letter to

the letter recognition experiments from unpronounceable strings of three letters.

In the results, a number of correspondences turned up between letter counts in

word responses and letter recognitions from strings: (1) Both left and right of

fixation, outward letters farthest from the fovea had higher scores than inward

letters closest to the fovea. (2) Differences between left and right field were

similar in the two counts (Fig. 2). Possible reasons for these somewhat peculiar

results, which also hold good for each of the four stimulus lengths separately,

have been advanced in Bouma (1973). (3) Correct scores for individual letters in

the two experiments showed a clear relation (Fig. 3, upper data), (4) as did the

number of times that a letter was incorrectly responded (Fig. 3, lower data). Thus,

certain phenomena observed in letters in word responses already existed when letter

responses to unpronounceable strings were considered.

1·0 ,-0-----,

..rn"00~ ·2

·0·0

---.--..-~---,

"''''''':/'y

/'. :: .' ~",,'os.d -]

u z'/" ~o< b v

q ,(, ..:':...-_-' J....__....... _

·2 '4 ·6 ·8 1-0letter str ings

initial andfinal letters

Fig. 3. Scatter diagram relating,for individual letter scores inword recognition, to scores fromunpronounceable strings.Upper right: correct letter

scores.Lower left : sum of incorrect

reports of a letter.Averages over four retinalpositions..

55

The experimental correspondences are sufficiently striking to provide a basis for

the hypothesis that the recognition of letters proper is a direct contributory factor to

word recognition and that higher letter scores in words are due to redundancy effects.

We shall now work out such a possible relationship quantitatively, confining both

theory and data to words and strings of three letters.

theoretical model

Correct and incorrect letter scores for individual letters from unpronounceable

strings, for initial, middle and final letter of the string separatel~ formed the

data base of the model. Thus, for each of the four retinal positions considered,

there were three different confusion matrices of letters. Given a certain stimulus

of three letters, say lap, we selected in the above matrices the rows (correct +

incorrect response fractions) for 1--, -a-, and --po

The next step was to develop combination rules for arriving at probabilities of

all possible response combinations. The first assumption was that the responses for

each of the three letter positions combined independently and that the probability of

a certain response combination was the product of the respective entries in the

matrices; for example the probability of the response combination iag p(iagllap)=

p(i--!l--). p(-a-I-a-). p(--gl--p). Since many entries in the matrices will be zero,

the total number of response combinations will be far less than the maximum value

of 26 3• Response combinations will to some extent be words, but mostly pronounceable

or unpronounceable non-words, and their probabilities add up to a value of 1.0.

Fig. 4 gives a simplified scheme, in which only two possible responses were assumed

for each stimulus letter, giving 2 3 = 8 response combinations.

The second assumption makes a sharp distinction between words and non-words, such

that all non-word letter combinations are excluded by means of a weighting factor

zero, whereas all words are retained, their probabilities being allotted a common

weighting factor. This transformation factor is not a free parameter, since it is

determined by the requirement that, for each stimulus, word-response probabilities

add up to 1.0. The assumption of a commonweighting factor for a selection (Le.

words) of all possible responses (i.e. all letter combinations), is in fact the

application of the Constant Ratio Rule (Clarke, 1957: Luce, 1959).

In order to carry out the corresponding calculations, we composed a list of existing

Dutch words of three letters out of two existing counts, supplemented by our linguis

tic intuition. We arbitrarily excluded names and abbreviations, ending up with

519 existing words.

The results of the calculation, given a particular stimulus word, was a list of

possible word responses each with its response probability.

Thus the word recognition experiment was predicted fully and quantitatively on the

basis of the letter string experiment and two basic assumptions.

Assumptions and calculations were taken to reflect processes in an observer who

forms word responses by independently combining perceived letter characteristics,

and has only existing Dutch words available as responses.

56

stimulus: lap I

D A T A B A S E C A L C U L A T I o N S P R E D I C T I o N S

recognition of letters fromunpronounceable strings

stimulus response fraction resp product fraction weighing word resp resp lettercomb factor fract word letter fraction

1--< 1-- 0.60 --- ,lap .6x.5x.8 = 0.24 1.92 0.46 lap 1-- 0.70

Ii-- 0.40, I lag .6x.5x.2 = 0.06 1. 92 0.12 lag i-- 0.30

\ , I

\ ' Ilep*( I .6x.5x.8 = 0.24 0

-a-< -a-0.50'\,' -a- 0.58

~ leg .6x.5x.2 = 0.06 1.92 0.12 leg/l 0.42-e- 0.50\11 -e-

~ \ iap* .4x.5x.8 = 0.16 0" I, \ \

iag*--p < --p

0.80' \ \ .4x.5x.2 = 0.04 0 --p 0.76, \\

--g 0.20, \ iep .4x.5x.8 = 0.16 1. 92 0.30 iep --g 0.24,"- I

"-~ieg* .4x.5x.2 = 0.04 0--- ---

*non-word 1.00 1.00

words = 0.52

transformation1

factor=~1.92

Fig. 4. Calculation scheme of the theory. Schematic example in which only twopossible responses are assumed for each stimulus letter.

prediction of and experiment on word recognition compared

Since the model concerned word responses of three letters only, we leave out of

account the 81% of responded words of different length.

Fig. 5 compares prediction and experiment as to average fractions of correct words

(50 words, 11 subjects) at four retinal positions. Predicted values were of the

right order of magnitude, but systematically lower by about 0.08. The left-right

differences were predicted correctly.

Average fractions of correct letters in correct + incorrect word responses are

compared in Fig. 6, completion turning up as the difference between correct fractions

from stl.tngs (y) and correct fractions in word responses (predictions 0, experi

ments • ). Trends were succesfully predicted, predictions falling short by only

0.02 on the average.

Finally, distributions of correct and int:orrect words were considered. 7~& of experi

mentally responded words have a probability less by 10- 4 than predicted. As to

the other responses, the aim was to find out how the experimental word-response

distribution related to the set of predicted responses. To this end, we assigned

response words to classes, according to predicted probability, each class comprising

a factor of 110 in predicted probability.

57

/ ~~

J·8

~=~I ·6

o,perliont :z~prediction ~ .4

I

.2jp

I.o-~

Fig. 5. Comparison of experimental and calculated fractions of correct word responses.

-5 -4 -3 -2 -1 0 1 2 3 4 5°

eccentric stimulus position (degrees)

trz:rz:?-rrz:? 1.0__t

~_?=?=?

.,p.rllont1-

Dr.diction '8.romle'ringS

Il-

I

'6

+- ·4

I·2

Ijp.O~

Fig. 6. Comparison of experimental and calculated fractions of correct letters incorrect + incorrect words.

Y indicate experimentalletter scores from unpronounceable strings, serving asinput to the model.

-5 -4 -3 -2 -1 0 1 2 3 4

eccentric stimulus position (degrees)

For one eccentricity, Fig. 7 compares prediction to experiments with respect to

summed numbers within these classes. The fit was satisfactory, except perhaps for

the highest class, which consisted, almost exclusively, of correct response words.

discussion

Despite the simplicity of the model, predictions were sufficiently close to make

discussion useful.

As to correct word scores, predictions fell short of results by some 0.08. We would

suggest that this is due in part to the primitive way we accounted for differences

in response availability. If a smaller set of words were used in the calculations,

higher correct scores would be predicted, provided the stimulus words belonged to

this vocabulary. For a further investigation of this factor, a general frequency

count would probably reflect response availability inadequately and a direct

experimental access to the vocabularies of the subjects would be required.

Left-right differences in correct scores were predicted closely. The implication

would seem to be that this traces back to a better recognition of the component

58

• - 1.75°

• - 2.75°

o + 1.75°

Fig, 7. Comparison of experiment and theory as to divisionof response words over classesof predicted response probability. Summated response frequencies within each class areplotted for four eccentricities.

~o

o + 2.75°

•

5til.~Uc:GI::saGI...-"GI-U"GI...Q. •

0.1

•5 50

observed frequenc ies500

letters in the right visual field. Knowledge of words, as well as their availabil

ities, could then be assumed to be equal in the portions of the brain serving the

right and the left visual fields. The advantage of the left cerebral hemisphere,

to which the word advantage of the right visual field is usually ascribed, should

then be located at or before the level of letter recognition, which is lower than

commonly assumed.

Predictions of correct letter fractions in correct + incorrect words were close to

the experimental values, thus adding strength to the proposed explanation of the

completion effect. This is true, despite the fact that correct word fractions were

predicted too low, incorrect word responses apparently making up for it. Possibly,

regularities of letter distributions in Dutch words, which are implicit in the word

list, played a part here.

The comparison between predictions and experiments with respect to individual

correct and incorrect response words can be considered as the most critical test

of the theory. The comparison which we can offer here is inadequate for two

reasons at least, due to the fact that the experiments had been set up for a dif

ferent purpose.

First, the data base suffered from not having enough definitely incorrect responses.

Second, the stimulus words were presented once only to eleven subjects, which made

us use pooled data rather than individual response words. An experiment directed

towards testing the theory more extensively is in preparation.

Finally, we would like to emphasize the role played by global features in word

recognition. Although the model was based on letters, global factors were involved

in two different ways. First, correct perception of letter position within the word

or string was tacitly assumed, letter position being a global factor. Second, the

59

model took for granted all interactions between letters preceding the level of

letter recognition.

As a final remark, extending the model to longer words would require a more comprehen

sive consideration of global factors.

summary

A simple theory is described on quantitative predictions about correct and incorrect

word recognitions (three letters only) based on (a) correct and incorrect letter

recognitions and (b) a list of (Dutch) words of three letters. The theory has no

free parameters. Predictions are compared with earlier experimental data on word

recognitions at two eccentricities right and left of fixation.

references

Bouma, H. (1973) Visual interference in the parafoveal recognition of initial andfinal letters of words, Vis. Res. ll, p. 767-782.

Clarke, F.R. (1957) Constant Ratio Rule for Confusion Matrices in Speech Communication,J.Acoust.Soc.Amer., ~, p. 715-720.

Luce, R.D. (1959) Individual Choice Behavior, A Theoretical Analysis, New York,Wiley.

Morton, J. (1969) Interaction of information in word recognition, Psych.Rev.,~, p. 165-178.

Rumelhart, D.E. and Siple, P. (1974) Process of recognizing tachistoscopicallypresented words, Psych.Rev., ~, p. 99-118.

60

PULSE AND STEP RESPONSE OF THE VISUAL SYSTEMmeasured by means of a perturbation technique

JAJ. Routs and F.J.J. Blommaert*

introduction

The motive in developing the perturbation technique described in this paper was

the need to evaluate an earlier developed model for the dynamic visual processing

of transients. In this model (Roufs, 1974 IV) it is assumed that the system reacts

with two dissimilar output variables to small.temporal changes in luminance.

These variables are connected with two types of psychological attributes: gradual

changes in brightness, caused by gradual changes in the stimulus luminance and

"agitation" (a percept which is difficult to describe), caused by transients

(or fast changes in the luminance).

An arbitrary stimulus time function gives rise in general to both output variables.

Each output variable is assumed to have a critical value at which the percept is

seen in 50% of the cases. In detecting this stimulus, threshold intensity is thought

to be determined by the strongest variable compared with the said critical value.

For example, in the case of sinusoidal modulation of the luminance (De Lange Curve),

brightness variation (swell) at low frequencies is the perceptual attribute deter

mining the threshold of the first variable mentioned. At high frequencies only

"agitation" is seen, thus determining the threshold of the second variable. The

model mentioned above was constructed to explain small peculiarities in the shape

of the threshold curves of rectangular incremental flashes, not expected on the

basis of an earlier low-pass-filter model, the transmission of which was fitted

to the experimental De Lange curves. The model also explains quantitative properties

of threshold curves of doublets, consisting of pairs comprising one incremental and

one identical decremental flash, which are particularly sensitive to the low

frequency behaviour of the system.

In order to be able to calculate these threshold curves, three systems-properties

were postulated: 1) the signal, proportional to the stimulus intensity, is processed

quasi-linearly, 2) the signal is detected if its amplitude exceeds a certain deviation

"d" from the stationary state, 3) the transfer function of the second output

variable has the minimum-phase property.

The modulus of the transfer function is fitted to the top and the high-frequency

side of the De Lange curve. These parts of the De Lange curve have the percept

"agitation" in common.

The conclusion drawn was that the transfer function fitted in this way and,

at the same time explaining all the given experimental data on flashes had to be

of a band-pass-filter type. Consequently the transmission at the low-frequency

side is lower than the De Lange curve.

It seemed desirable to cross-check this non-trivial band-pass-filter character

by using another method, viz. perturbation of the subliminal system responses.

* Student at the Eindhoven University of Technology.


61

Some of the results obtained by this method will be described briefly in this

article. Pulse- and step responses obtained in this way will be shown for one

subject. These enable us to check the method and the postulates involved. Prediction

is verified by comparing the experimental thresholds of rectangular incremental

flashes as a function of flash duration with the data calculated from the measured

pulse-response by convolution.

a perturbation technique

The method is based principally on changes in the threshold values of a sensor

flash due to perturbation of its response caused by the response of a small

test flash. For the present purpose we use fast-changing stimuli relatively

favouring the variable connected with the percept "agitation" (see Fig. 1).

Fig. 1. Illustration of the hypothetical mechanism for detecting fast- and slowchanging stimulus luminance. The upper branch depicts the processing of thesignal which gives rise to brightness variations. The lower one shows the underlying mechanism for the detection of "agitation".

Only two basic postulates are required:

i) the signal, which is proportional to the stimulus-intensity, is processed

by a quasi-linear system beE), whose parameter values are dependent on the

background intensity E.

ii) A change in the stimulus luminance causing a transient response at the output

of the system b is seen only if this response exceeds a certain deviation "d"

from the stationary state.

Let us, for the moment, assume that one of the phases is dominant and that this

phase is positive as sketched in Fig. 2. (This dominance is, for instance, consistent

with doublet threshold curves within the linearity assumption.) The amplitude of the

sensor flash-response needed in order to be detected, can be changed by superimposing

it on a test-flash-response. By varying the time shift or the amplitude of the

test-flash the sum response is changed. In this way the amplitude of the sensor

response and thus its intensity threshold is changed observahly. In order to

avoid misinterpretation of these changes the amplitude of the test flash has to

be small compared to that of the sensor flash. This is ascertained by pre-setting

the amplitude ratio's of test- and sensor flash. In this respect the method differs

from general subliminal summation.

For the sake of simplicity we take sensor- and test flash as being of equal duration.

62

... T, lieM

>- €c(T,li €c(T2}c:Q)-"f E

, ,I I~time

,I I, ,I

,,_1___ .... dI

I

Q. I

E I

III, ,I ,

iii I ,c: I ,Dl I ~"iii ,.

r.1.t-time

Fig. 2. A drawing showing the principles of the perturbation technique L On theupper row two stimulus conditions are shown. In this case they differ only inthe delay of the small test flash compared to the sensor flash, the ratio q of theflash intensities being constant. The duration of the flashes is equal and shortcompared to the time-constants of the system. The drawn lines in the lower row arethe sensor- and test-flash responses. The dashed curves are the sum responseswhich have just reached the required value "d". The change in amplitude, and thusthe change in flash intensity in order to detect the combination is demonstrated.

The threshold condition of the combination can be formulated as follows:

+ d ( 1)

with:E

C50% threshold intensity of the strongest flash in the combination

used.

& duration of the flashes

Uo pulse response of system b

q pre-set ratio of test- and sensor-flash intensities

d required deviation of response at 50% threshold

If q is sufficiently small and the dominant phase of Uo is positive, then eq. (1)

can be written as

(2)

where: t ex = time at which Us is extreme

63

By measuring E as a function of T, the pulse response expressed in d unitsc

UcS(t ex - T)can be found to be

d

U,(t )& u ex

d(3)

U15

(tex

- ,)In plotting l/E

cas a function of T, is found scaled up by a factor & q,

d

and shifted 1/ E1 ( where El is the threshold intensity of the sensor flash

alone, see eq. (4)).

However, in order to increase measuring precision, the influence of the non-station

ary stochastic behaviour of d has to be decreased. Therefore a reference stimulus

is used. (Details and underlying principles of this procedure will be published

elsewhere).

The threshold condition for the sensor flash alone is

d (4)

Combining eqs. (2) and (4) one obtains

q &(5)

For reasons of convenience we usually reduce Uo by dividing by the extreme value

U8

(tex

- T)

U15

(t ex ) q(6)

It can easily be shown that the effective threshold variation can be enlarged by

a factor of 2 by using a combination of sensor flash and a decremental test flash

instead of the single-flash-reference stimulus (In fact this was the procedure

in measuring the results shown in Fig. 3).

This method does not provide information on the sign of the dominant phase of the

pulse response and on the relative position of the stimulus and response on the

time axis. Appliance of a reference stimulus consequently causes an increase of a

factor of 2 in spread due to the stationary stochastic fluctuations with the same

number of trials. Yet it is benificial, since the large non-stationary effects

are neutralized largely by measuring with respect to the reference stimulus.

methods

The stimulus is a "white" circular, centrally fixated field of 10, having a dark

surround, the luminance of which is varied around a constant background value of

64

E = 1200 td. A 2 mm artificial pupil with an entoptic guiding system was used.

The light is generated by a linearised glow-modulator. The desired time functions

of the stimulus and its amplitude are controlled electronically around the working

point which corresponds to the background intensity. The subject has one knob ~to

release the stimulus, which is delayed a convenient preset time interval. Three

knobs enable him to react whith "yes", "no", or "rejection". All intensity

thresholds are 50% probability values obtained by a modified "method of constant

stimuli".

results

In Fig. 3a the reduced pulse response of subject F.B. is shown. The absolute response,

expressed in "d" units, can be found by multiplying the reduced values by the

extreme value given in the figure under the code "norm.constant". This "norm.constant"

is obtained by averaging the sensor-flash intensity (1 over all 10 sessions and

applying eq. (4). Per session, 2-3 points of the curve were measured, each point

being calculated from the average of 5 threshold differences according to eq. (6).

In fact 600 trials per point were necessary. The intervals between the small

horizontal lines stand for 2 times the experimentally determined standard deviation

of the mean. The order of measurement compared to the T-axis is randomly chosen.

The step response of the same subject, obtained by the same technique, but using

slightly modified formulae, is shown in Fig. 3b. In this case the averages were

obtained in a slightly different manner, that is by measuring all points on the

time axis every session, and averaging the results of the 5 sessions held. All

other things were equal. The dashed curves were obtained by first averaging over

the coefficients of the Fourier transform of the pulse response and the coefficients

of the pulse response calculated from the step response coefficients, second an

inverse transform applied to these averages.

In Fig. 4 experimentally obtained 50%-intensity-thresholds of rectangular incremen

tal flashes are given as a function of the duration. All durations were presented

at one session. The circles are the average results of two sessions. Thresholds

in the first session were measured in random order compared to duration, the order

of the second session being reversed.

The drawn line is the set of predicted thresholds calculated from the convoluted

pulse response in Fig. 3a and using an equation analogous to eq. (4) for rectangular

pulses.

In Fig. 5 the reciprocal of the 50% thresholds of the amplitude of the sinusoIdal

modulation at the given frequencies are shown for the same subject. The sinusoIdal

modulation was restricted in duration by a gate, which exposed 1S peaks to the

subject. The beginning and end of the train is smoothed in order to avoid transients

(Roufs, 1974 VI). The order of measurement compared to the frequency is again

randomized and reversed in a second session.

a

1/,

/-50 /

-1

\

"\ \

65

50

SUBJ. FBE=1200 ld

It= 2msNORM. CONST= 1.2

-:1' (ms)

b ~

~~/ ~

/+ 1/ T\o

SUBJ. FBE = 1200 td<:p = 1°NORM. CONST. =0.15

100

'i. i- -!- _ -=-1' (ms)

+

Fig. 3. Figure 3a shows the measured pulse response. The dots are the mean valuesobtained after reduction by dividing the response by its extreme value.This value is given in the legend at norm. const. but for the sign.

Figure 3b is the reduced step response. The dashed curves are simultaneousfittings, the pulse response being the derivative of the step response.

discussion

It is clearly seen from Fig. 3 that the measured responses are sufficiently large

in comparison with the spread. The pulse response, within measuring accuracy, is

equal to the derivative of the step response, as demonstrated by the dashed curves

which satisfy this property exactly. This is what a linear system should do.

It is also consistent with the postulated peak detection.

The absolute value of the Fourier transform of the pulse response is shown in

Fig. 5. The transmission has definitely a band-pass-filter character. The position

with respect to a measured De Lange curve, which reflects at low frequencies the

transfer to the upper branch of Fig. 1 and atmiddle and high frequencies that of

lower branch, is instructive. First, it shows that the frequency at maximum

66

JL

'~SUBJ. FBE = 1200 td<p= 1°

~ J:X l.1)' c:r ..D_ - -0- e -0-...., 0

0

1

w>tlI)ZUJt-

~2c...JoJ:lI)UJa:::I:t-

elo...J

o 1LOG DURATION .&

2 3 4

Fig. 4. Thresholds of incremental rectangular flashes as a function of the durationof the flashes (circles). The solid line is calculated from the measured pulseresponse. The dashed line is a correction for the effect of increased detectionprobability, caused by transients at long durations.

LOG FREQUENCY

conclusion

It is obvious that the Fourier transform

of the pulse response also provides

information about the phase of the

transfer function.

In conclusion we feel justified in saying

that the perturbation method yields a

viable technique for gaining information

of the visual system. The results confirm

the linearity and peak detection postu

lated for the model. The prediction of

threshold curves of rectangular flashes

from the pulse responses is excellent.

transmission is equal to that found by

the De Lange characteristic. Second, the

shift towards lower transmission values

is consistent with a calculated shift

of about 0.2 log units as a result of

increased probability of seeing caused

by the repeated equal peaks (Roufs,

1974 VI). The closeness of the agreement

between the predicted threshold curves

and the actual measured values shown in

Fig. 4, having possible daily variation

in mind, must be a coincidence.

0.5

SUBJ. FBE=1200 td<p=1°

-1.0,----,---------..---------,_

-15

~

> -2.0t-~t-iiizUJlI)

UJ -2.50:>t-::JCl.~<t

el -3.00...J

0

Fig. 5. The absolute value of theFourier transform of the measured pulseresponse is shown by the dashed line.The measured De Lange curve for thesame subject is also indicated. A gatedsinusoid exposing 15 fully fledgedpeaks was used.

67

The findings are in agreement with the band-pass nature of the transfer with respect

to "agitation", based on another type of approach (Roufs, 1974 IV).

summary

A perturbation technique is described for the measurement of subliminal responses

in the visual system. The method is based mainly on quasi-linearity and top

detection. Pulse and step responses have been measured and are found to be consistent

with each other and with the measured threshold of rectangular flashes.

references

Roufs, J.A.J. (1974) Dynamic Properties of Vision-IV. Thresholds of DecrementalFlashes, Incremental Flashes and Doublets in Relation to Flicker Fusion,Vision Res. l±, p. 831-851.

Roufs, J.A.J. (1974) Dynamic Properties of Vision-VI. Stochastic Threshold Fluctuations and their Effect on Flash-to-Flicker Sensitivity Ratio, Vision Res. l±,p. 871-888.

68

SMALL INVOLUNTARY EYE MOVEMENTS

F.L. Engel and Th.M. Bos

introduction

This paper reports observations on the occurrence of small involuntary eye movements

during the determination of the conspicuity area, that is the retinal field in

which the relevant test object can be discovered from its background in a single

fixation. The conspicuity areas are determined by having the observer fixate the

display centre, the stimulus pattern with the test object being presented at an

eccentric location unknown to him before-hand.

During the brief (80 msec) stimulus exposures in earlier conspicuity area determina

tions (Engel, 1971, 1974) practically no eye movements occurred, this being

conceivable for a number of reasons including the longer ocular reaction times.

However, during the longer exposures used this time (1 sec), a ~mall to-and~fro

eye movement was frequently observed in the direction of the test object discovered,

some 400 msec after onset of the stimulus pattern.

As will be shown, occurrence as well as delay of these movements depended on the

size of the conspicuity area and on the eccentricity of presentation.

The experimental results are discussed briefly, and will be dealt with more fully

elsewhere (Engel, 1976).

experiments

The stimulus consisted of a random disk pattern as background and a dissimilar disk

as test object. The extent of the conspicuity area is determined from the experimen

tally determined probability bf discovering the test object as a function of its

distance to the fixation point.

A series of SO stimulus patterns was prerecorded on video tape, each 1 sec in duration

and separated by a 1 sec plain rest field. The stimulus patterns were shifted

versions of the same original, so that the location of the test object differed

during each presentation with respect to the fixation centre, but not to the back

ground configuration. Each series of stimulus patterns was presented 4 times to the

observer.

The observer was instructed to keep his eyes fixated on a small continuously visible

cross in the centre of the screen and on appearance of the stimulus pattern, to

indicate with a push-button switch if he discovered the test object (target).

Eye fixation was monitored during the experiments by means of the cornea reflection

technique.

results and discussion

Although the observers did their utmost to maintain fixation on the marked centre of

the screen, a small involuntary eye saccade occurred rather frequently, some

400 msec after onset of the stimulus pattern, mostly in the target direction.

Usually this saccade was followed about 200 msec later by a second small saccade

IPO annual progress report 10 /975

69

back to the fixation centre. The movements in the direction of the target were

generally too short to reach the target, the average length being about 0.70

of visual angle. Nearly all these movements were followed roughly 300 msec later

by the push-button signal, indicating the discovery of the target.

Mostly the observers were not aware of their eye movements, if they were, they

felt that the eye movement was made before they realized it.

occurrence

The observed eye movements have been put into two categories according to their

direction: movements deviating less than 18 0 from the true target direction were

considered to be "target eye movements", the others were taken to be "non-target

eye movements". The value of ~ 18 0 corresponds to the difference between the possible

target directions in the stimulus material.

The proportions of the eye movements of one observer classified in this way are

shown in Fig. 1 as a function of the target eccentricity. The 100% level corresponds

to 40 stimulus presentations at each eccentricity concerned.

100%P~

.~

P .~

150 .~. \Pnt ~ •

().1-o_=-:::::::~~~0

------0 2.5 5.00

7.5

~ R

Fig. 1. Proportion (P) of involuntary eyemovements as a percentage of the total number(40) of targets presented at the indicatedeccentricity R. Pt refers to target eyemovements while Pnt refers to non-target eyemovements. For comparison purposes, P , theproportion of discovered targets has ~lso beenplotted.

In Fig. 1 the proportion of target discoveries is plotted as well (from these data

we determine the conspicuity area of the test object). A relation between the

discovery of the target and the target eye movement is apparent from this plot.

In fact, on 90% of the occasions that such a saccade occurred, the target was

discovered consciously. In about 35% of the total number of target discoveries

there occurred a target eye movement.

The proportion of occurrence of the non-target movements is found to be rather low

and independent of the eccentricity of test object presentation. These movements

might perhaps be connected with certain background objects, which were sometimes

quite close to the fixation centre. No systematic origin could be found, however,

except that they were related in time to the onset of the stimulus pattern.

delays

In Fig. 2 the delay between the stimulus onset and the target eye movement, and

also the delay between the stimulus onset and the discovery of the target (push

button signal'

is plotted as a function of target eccentricity. Both these delays

increase with target eccentricity, the difference between them being almost constant

(300 msec for the observer whose data are presented in Fig. 2) With respect to

70

1000m,Fig. 2. The delay time 6T

tbetween stimulus

onset and target eye movement and the delaytime 6T • between stimulus onset and thepush-button signal. are shown against theretinal eccentricity (R) of the targetpresentation.

oo 2.5 5.0

--....._ R

these data it should be remarked that the indicated delay times at R = 6.0 0 and

R = 7.5 0 of visual angle are less reliable because of the small number of events

(see Fig. 1) at these eccentricities.

As shown in Fig. 3 it is possible to relate these delays to the eccentricity of

test object presentation and with the size of the relevant conspicuity area.

Fig. 3. (b) The arithmetic meanof the delay times betore occurrenceof a target eye mo~ement (6f

t),

against the size (R ) of thecorresponding conspf8uity area.

Fig. 3. (a) The delay time 6T t ,before occurrence of a target. eyemovement, and the delay time 6Tbefore occurrence of the observRrsresponse, plotted against thenormallzed eccentricity RX

= R/RSO 'where R

SOis the size of the con

spicuity area at 50% threshold level.Measurements for 4 different diameters of the test object are shown(the diameter of the backgrounddisk was 0.550 of visual angle).

~. 10156 +21 R*I

v500

1000mi

dT

I~o

250

obs, F.E.

0

0 o.s I.() l.5 2.0•R*

"} . :}=0.A5" ;}-0.63" ~.0.69"H =0.304

600ms-T.B.

lit F.E~~ 0

J'00D~

OftOlOlLsol200

oo 2 6

---....... RSO

71

The increase in delay time, amounting to about 100 msec/degree of visual angle in

the transition region, compares favourably with the value determined by Schiepers

(1974), who found 150 msec/degree of visual angle for the increase in vocal response

latency against eccentricity of word presentation.

summary

Experiments are described in which observers made involuntary eye movements in the

direction of a test object to be discovered, strict fixation of the display centre

being required.

The occurrence of these eye movements was related to the conscious discovery of

the test object. The delay time between the stimulus onset and involuntary eye

movement is shown to depend on target eccentricity and on the size of the relevant

conspicuity area.

references

Engel, F.L. (1971) Visual Conspicuity, Directed Attention and Retinal Locus,Vision Res. ll, p. 563-576.

Engel, F.L. (1974) Visual Conspicuity and Selective Background Interference inEccentric Vision, Vision Res. li, p. 459-471.

Engel, F.L. (1976) Visual Conspicuity, Visual Search and Fixation Tendencies ofthe Eye, to be published.

Schiepers, C.W.J. (1974) Response Latencies in Parafoveal Word Recognition,IPQ Annual Progress Report, ~, p. 99-103.

72

VISUAL RECOGNITION BY DYSLECTIC CHILDREN

further exploration of letter, word and number recognition in 4 weak and 4 normal readers

H. Bouma, Ch.P. Legein and A.L.M. van Rens

introduction

The results obtained last year from the examination of 20 dyslectic children

and 20 which read normally (Bouma, Legein and v. Rens, 1974), encouraged us to

continue this study in two directions. First, a follow-up study of all 40 subjects

to get an idea of the course of these perceptual processes. Second, further explo

ration of possible defective processes in visual recognition. This paper will

report both on the follow-up study and on the further exploration in four dyslectic

children and four normal readers selected from the above-mentioned groups.

Partly summarizing last year's results we found:

1. backwardness in reading-level in the case of all dyslectic children of at least

two years

2. in the tachistoscopic experiments a significantly lower recognition score for the

dyslectic group with reference to embedded letters and words, especially in

parafoveal presentation. There was also evidence of better recognition in the

right visual half field.

The tachistoscopic testing programme used this year was the same as last year's

but was extended so that the same stimuli were presented on both sides in parafoveal

recognition experiments. Thus more data were obtained and a more firmly based

conclusion could be drawn as to a possible left-right difference. As ceiling

effects influenced last year's results on foveal recognition of words of up to five

letters we planned recognition experiments on words of greater lengths (~= 6, 7,

8 letters).

As we concluded that there were more visual interferences in the dyslectic group,

such an effect could also be expected with numbers, leading perhaps to difficulties

in arithmetic. Since numbers are not redundant, as opposed to words, this also

provides an opportunity for checking the influence of word knowledge on recognition.

As the rehearsal memory could be a limiting factor in reporting long numbers, experi

ments were carried out in which the presentation time was prolonged. This function

was also tested by auditory presentation of digit strings.

To explore whether large printing or more widely spaced printing could improve

the recognition of parafoveal words a tachistoscopic recognition test was done.

Finally a letter-search test was tried out to investigate the extent to which inter

ferences between adjacent letters would i~fluence the marking of target letters

within words or letter strings. If succesful, such a test could perhaps be of help

in early detection, in the classroom, of reading difficulties.


73

methods

Four dyslectic boys - ages 12-13 years - with different low reading levels (Fig. 1)

were selected and compared with normal readers of the same ages.

7 grade

--W::I:C) 6Z

~-Gi 5>.!!Clc 4'0IIICD...

// /

I~Sle~

5 6 7 8 9 10age

11 12 13 14 yrs.

Fig. 1. Follow-up of reading level 1974 (small symbols) - 1975 (large symbols)

The follow-up examination consisted of the somewhat extended tachistoscopic testing

programme of isolated letters, embedded letters and of words (length ~ = 3, 4, 5

letters), both in foveal and parafoveal vision (¢ = 10). Again an assessment of their

reading level was made using the Tanghe test. Vocal latencies were also measured

but will not be reported on here.

Foveal recognition of longer words (~= 6, 7, 8 letters) was explored, as were

numbers varying in length, at a normal (100 msec.) and a prolonged (500 msec.)

exposure time. The parafoveal recognition of numbers (£ = 2) was also tested.

Apart from visual presentation auditory presentation of digit strings was also done

to test the short-memory capacity.

Twe.lve commonly used words (£ = 5) were presented parafoveally three times in dif

ferent modes: normal printing; double-magnified and extra spaced printing. The

letter closest to fixation was at ¢ = 10

. In order to avoid immediate word repeti

tion, presentations of the same word in different modes were spread over three

parts of the session with other tests in between. The order-of-printing mode was

balanced over the three parts.

The letter search test consists of two pages; one page with 240 8-letter words and

one page with 240 unpronounceable 8-letter strings, each 32 lines with normal

spacing. 80 words and 80 strings contained the target letter e. The target letter

positions were equally divided over all letter positions. The subject was asked

to mark these target letters in pencil.

74

results

follow-up

In this part of the examination there was hardly any change in the high foveal

recognition scores of both groups (Table 1), In parafoveal scores, however, there

was a definite improvement in particular for words in the dyslectic group. As in

this follow-up study all para foveal stimuli were presented on both sides of the

fixation point the better scores in the right visual field could be demonstrated

especially in the dyslectic group (Table 2). Tables 1 and 2 should not be compared

in detail because Table 2 relates to twice as many presentations,

foveal parafovealdysl. contr. dysl. contr.

lal 96 (96) 94 (99) 91 (8l) 98 (98)

Ixaxl 72 (68) 94 (95) 28 (19) 53 (54)

Iwrdl 78 (73) 100 (100) 54 (38) 71 (59)

parafovealdysl. contr.

L R L R

lal 90 94 97 98

Ixaxl 22 45 51 59

Iwrdl 39 73 74 80,~

Table 1. 1975 Average correct recognition scores of single letters /d/;embedded letters /xdx/; words /wrd/.Small numbers: 1974.

Table 2. Average correct scor5s in leftand right visual field (¢ = 1 )

As to the reading level of the eight subjects (Fig. 1) we conclude that the dyslectic

children especially made progress and for them a fair correlation between improved

reading level and better parafoveal recognition of words is shown (Fig. 2).

1'0 r-------------------

dyslectic

p

r

0,....+1..e-

·5......"Ca...0~......

control

G t 0

~

.0 L...-_......I.-__.l...-_.....&...__.l...-_.....&...__.l...-_....I

1 234

read ing level5 6 grade7

(TANGHE)

Fig. 2. Follow-up of para foveal word recognition and of reading level 1974 (small

symbols) - 1975 (large symbols).

75

words (~=6, 7, 8 letters)

Further exploration was done on foveal recognition of common words (freq ~ 10- 4)

of greater length (~ = 6, 7 and 8). Six words of each length were presented (expo

sure time 100 msecJ, With the dyslectic group the scores for these larger words

are definitely lower than for the shorter words, whereas for the normal groups the

scores remained perfect (Table 3). The scores for word lengths ~ = 3-5 of these

subjects have also been included.

dysl. contr.

.1'=3 77 100

.1'=4 77 100

.1':5 80 100

.1'=6 42 100

.1'=7 38 100

.1'= 8 33 96

Table 3. Correct foveal word recognition scores forvarious word lengths .

visual and auditory numbers

Forty randomly composed digit strings (length ~ = 1, 2, 3, 4) were foveally presen

ted (100 msec. exposure time). Table 4 gives the results for both groups and it is

obvious that, as from ~ = 3, dyslectics have far more difficulties than children

which read normally. We tested two digit numbers parafoveally (¢ = 10 ) and found

that both groups had rather high scores: dyslectics 82% and controls 94%.

A foveal-recognition experiment of numbers (~ 3, 4, 5) was also done using a

prolonged presentation time (500 msecJ. Table 4 shows an increase in correct scores.

This probably indicates that the short-term memory capacity is not an essential

limiting factor in these experiments. Nevertheless the short-term memory function

(rehearsal) seems worse in the dyslectic group, as is indicated by testing this

function by auditory presentation of digit strings (~ = 3, 4, 5) as well, at a

pronunciation speed of 2 digits per second. Table 5 shows the results and indicates

a low score for the dyslectic group in repeating long digit strings (~ = 5). In

conclusion, the lower scores obtained in visual presentation of numbers (Z 2 5)

then seem due to perceptual factors rather than to short-term memory dysfunction.

dysl. contr.100 ms sooms 100mS 500mS

.1'= 1 95 - 100 -.1'= 2 95 - 100 -.1'=3 68 91 95 100

.1'=4 38 70 98 92

.1': 5 - 30 - 65Table 4. Correct foveal number scores forvarious lengths (~) at two stimulusdurations

dysl. contr.

/=3 100 100

~4 80 88

/:5 42 80Table 5. Auditory presentation of digitstrings. Correct scores for various length.

76

large and spaced words

In this experiment - see methods - a positive effect on the recognition scores

could not be demonstrated in the dyslectic group (Table 6).

dysl. cootr.

normal 44 79

spaced 49 86

large 49 98

Table 6. Correct parafoveal word scores printedin three modes.

letter sea rch test

For the purpose of bridging the gap hetween tachistoscopic recognition and ordinary

reading we developed a search experiment in which the letter e (target letter)

within words or letter strings of 8 letters had to be marked.

As to the number of errors, the dyslectic group missed twice as many target letters

as the control group and both groups missed more target letters in the words than

in the letter strings (Table 7).

dysl. contr.

words 36 23 14 13

strings 26 15 12 9

Table 7. Error scores in letter search test.Large print: total errors. Small print: aftercorrection for skipped lines.

Fig. 3 (black and hatched) shows error position histograms for words and strings

of both groups. When scoring the errors it was striking to see in the dyslectic

group how many lines occurred in which not a single target letter was marked.

This could be an indication that they just skipped full lines probably due to an

insufficient control of the eye movement towards the new line. We made a correction

for these, probably not inspected, lines, so that the solid columns are taken to

indicate the errors in inspected lines.

The general tendency is that more target letters are missed in the second part of

the words and letter strings, and that the first, fourth and fifth letter positions

are at an advantage. Many errors are made at the last few letter positions in the

letter strings by both groups, and also in the words by the dyslectic group.

discussion

When comparing the present results with those reported last year (Bouma, Legein,

v. Rens, 1974) it should be borne in mind that the present data refer to two groups

of four subjects as compared with the two groups of twenty subjects examined last

year. Although the averages will therefore show greater variability the general

trends are quite similar. Compared with 1974 the dyslectic group has improved more

than the control group. The dramatic difference in recognition scores for longer

words in foveal presentation (Table 3) may be partly due to increased interferences,

77

r-----------------~----------

70 %WORDS STRINGS

dyslexic

control

60

50

40

30

20

10

o 12345678 12345678 12345678 12345678

position target-letter

Fig. 3. Error histograms in letter search test. Overall, total errors: blackcolumns after correction for skipped lines.

but it seems likely that the dyslectic children also have insufficient knowledge

of the word forms of these longer words. Of course, when reading text dyslectic

children quite obviously experience greater difficulties with longer words.

All four dyslectic children have definitely higher recognition scores right of

fixation compared to left, which is in line with results reported in the literature

(Mishkin and Forgays, 1952 ; Bouma, 1973 ; McKeever and Hul ing ,1970 ). This indi

cates that the basis of dyslexia is not a general inability of the language-special

ized left cerebral hemisphere. It is of some interest that perception of digit

strings causes difficulties for the dyslectic subjects quite similar to that of

perception of embedded letters. The two components in such a task are a) visual

perception of the strings,b) rehearsal of the strings until the moment of report.

The positive influence of a prolonged presentation time points towards perceptual

difficulties. Also, the higher scores in the auditory mode, which makes use of the

same rehearsal process, seems to indicate that the capacity of the rehearsal memory

is not responsible for the poor performance in the visual presentations.

As to the letter search test, the conclusion that both groups made more errors

in the word test than in the string test is indicative of them using knowledge of

word forms to a certain extent. The specificity of this knowledge has not yet been

investigated. In conclusion, this test draws attention to a probably insufficient

eye control on the part of dyslectic children which, together with strong parafoveal

interference effects, influences their scores. A new version of the test might sepa

rate these effects. As to the research on dyslexia, the conclusion then seems to

be that eye control, perception and recognition of letters and words, and storage

processes should be studied not just in isolation but also in mutual dependence

on one another. This conclusion links up with the notion that dyslexia stems from

78

many different adverse factors (Malmquist, 1958; Valtin, 1970; Vernon, 1971; Klasen,

1972).

But it is not just the causative factors which are found to be manifold, the resul

ting difficulties are moreover not confined to the reading of text, but are clearly

present in the recognition of numbers. Indeed, dyslexia is a syndrome. We have hope

that understanding of the underlying phenomena of dyslexia may proceed equally fast as

understanding of normal reading processes, which, in literature as well as here at

IPO, is a subject of renewed interest.

summary

Four dyslectic and four average readers recognized letters and words, corroborating

last year's results indicating greater interference effects in dyslectics and a

superiority of the right visual field.

Reading-level and word scores had clearly improved since 1974. New explorations

indicated lower scores in dyslectics, too, for number recognitions and, particularly,

for longer words in foveal vision. A new letter search test has been tried out.

references

Bouma, H., Legein, Ch.P. and van Rens, A.L.M. (1974) Visual Recognition by DyslecticChildren, IPO Annual Progress Report ~, p. 104-109.

Bouma, H. (1973) Visual Interference in the Parafoveal Recognition of Initial andFinal Letters of Words, Vision Res. ll, p. 767-782.

Klasen, E. (1972) The Syndrome of Specific Dyslexia, University Park Press, Baltimore.

Malmquist, E. (1958) Factors related to Reading Disabilities in the First Grade ofthe Elementary School,Almqvist and Wiksell, Stockholm.

McKeever, W.F. and Huling, M.D. (1970) Lateral Dominance in Tachistoscopic WordRecognitions of Children at two Levels of Ability, Quart.J.Exp.Psychol., 22,p. 600-604. -

Mishkin, M. and Forgays, D.G. (1952) Word Recognition as a Function of Retinal Locus,J.Exp.Psychol., ~, p. 43-48.

Valtin, R. (1970) Legasthenie - Theorien und Untersuchungen, Beltz Verlag, Basel.

Vernon, M.D. (1971) Reading and its Difficulties, Cambridge University Press.

79

4 instrumentation

80

I.P.O. INSTRUMENTATIOl\l1957 - 1975

D.J.H. Admiraal

history

Since the foundation of the I.P.O. in 1957 by Prof. Dr. J.F. Schouten, the "Genera]

Instrumentation" group has played an important role in supporting the research pro

gram of the Institute. It was to this group that the first employee was appointed.

The ratio between the number of people working in the instrumentation group to

those in the research groups has to date fluctuated with slight variations around

the average of 1 : 3~.

The very presence of the Instrumentation Group at this Institute is due to the factthat

the often highly specialized apparatus required in research is mostly not commer

cially available. Admittedly, this makes the design and construction of instruments

expensive, obliging researchers to consider carefully the possibilities offered

by commercially available or other existing apparatus when designing their experi

ments. The skills of the Instrumentation Group have benefited not only the research

workers at the I.P.O., instruments have also been made at the request forins~ance of

Philips Biometric Centre, the Evoluon and Philips Phonographic Industries. For

the last-named an annoyance-measuring divice was designed for testing magnetic

tapes, for example (Admiraal, 1968).

Due to the growing complexity and specificity of the instruments and the need for

their development concurrently with the research projects, especially in the

Phonetics Group, it was decided in 1964 to design intruments of this kind under

the sole responsibility of the said group. This made the contribution of the

instrument designers to the research projects more active. A speech synthesizer

(Willems, 1966) is an example of equipment so developed.

The apparatus usually provides only a partial solution to a particular problem.

For example where a television system is used, performance is determined not only

by the electronics behind the camera, but also by the characteristics in front of

it, such as the illumination of the scene. The measurement of the diameter of the

pupil of the eye by a television scanning technique (Admiraal and Alewijnse, 1966)

can be taken as an example. This instrument worked well electronically, but results

remained poor, however because adequate illumination of the eye was not immediately

feasible. After some months of experimentation the solution was found in the use of

half a ping-pong ball, with which adequate illumination of the eye was obtained.

This demonstrates that an active contribution on the part of the researchers to the

work of the instrument designers helped to solve the problem.

Since the inception of I.P.O., people have been interested in the generation and

measurement of time intervals. One example is a reaction-time-measuring instrument

(Moonen, 1967) which measures reaction times of one subject with a choice of 15

responses out of 15 stimuli.

In 1967 a modular mechanical system (19") was introduced at I,P.O., which allowed

a flexible system to be created for housing instruments. The modular system consists


81

of individual modules, which can be assembled into cabinets (with a maximum of

6 modules each). The first use of the modular building system was made for a time

generator (Valbracht, 1968). Since then hundreds of modules have been constructed,

such as a two-quadrant multiplier (Noordermeer and Moons, 1970) and time-measuring

devices (Lammers and Moonen, 1970). The large-scale integration technique enables

us to combine a number of functions in one module. Limits to such a miniaturization

are set, however, by the ergonomic demands of essential controls iocated on the

front panel of the module.

design and development

The development of an apparatus can be initiated by researchers on the basis of

a demand specification. After the development of a prototype which meets the

requirements, a definitive version is made in the workshop. In the final design

of a device ergonomic criteria are also taken into account.

Practice in this laboratory has shown, however, that it is not always possible to

specify the demands completely in advance. This is partly due to the uncertainty

of research workers as to the line their experiments are likely to follow, on the

other hand experiments are largely dominated by the restriction set by the possi

bilities of the apparatus. This situation was exemplified in the request by the

Visual Research Group for a light source with a high intensity, capable of being

modulated over a wide frequency range and of such geometrical proportions that at

an eye distance of 25 cm a visual angle of at least 25 0 could be obtained. These

primary conditions were met by the application of a T.V. projection tube combined

with a spiral-scanned frame (Alewijnse, 1969). Only after this choice has been

made could the complete specifications be set up.

An important adjunct to the apparatus developed is the relevant documentation. A

number of possibilities are open: a) The I.P.O. report, which discusses the

principles of the apparatus and compares it with other possible approaches

b) the draft specification, which gives complete documentation of a particular

apparatus, including circuit diagrams, adjustments to be made, etc. c) the manual

intended mainly for the user, describing the functioning of an apparatus, its

technical specifications and its control functions. As regards the modular system,

a somewhat shorter version of this manual is also published by the Phonetics

Group. As an example of the amount of documentation provided on a particular

device, that on the Fourier synthesizer (Admiraal, 1969) can be taken. The report

discussing the principles covered 27 pages, whereas the draft specification took

53 pages.

present and future

At I.P.O. the computer age started in July 1970 with the installation of a Philips

P 9202 (Muller, 1970). The installation of a computer at this laboratory was

motivated by a) the need to be able to control experiments b) the desirability

of simulation studies and c) the need to carry out computations. Quite apart from

the work it had to do for us, the computer also set us work to do: much specialized

hardware had to be developed, especially all kinds of interfaces. Connecting the

computer up with all sorts of experiments located at various places in the laboratory

82

which required on-line use, necessitated the development of matching units (Moonen

and de Jong, 1973). The recent introduction of a new computer system (P857), which

has a different organization of the in-and output, now makes it necessary to dupli

cate this development work.

Some examples of the present development of apparatus are described elsewhere in

this issue. The development of measuring and recording apparatus for eye movements,

as reported on in the previous issue, together with the processing of the data in the

computer, has resulted in an increasing commitment with the Visual Research Group.

It is thus to be expected that the Instrumentation Group will in the future contri

bute to the research program of the Institute more directly than hitherto.

references

Admiraal,D.J.H. and Alewijnse, M.A. (1966) An Infrared Pupillometer Based on theTelevision Scanning Technique, IPQ Annual Progress Report l, p. 126-134.

Admiraal, D.J.H. (1968) An Annoyance Measuring Instrument to Check Magnetic Tapeson Drop-outs, IPQ Annual Progress Report l, p. 108-112.

Admiraal, D.J .H. (1969) The "Pan Pipes", a Fourier Synthesizer, IPQ Annual ProgressReport ±' p. 131-139.

Alewijnse, M.A. (1969) The Time-Place Generator, a Modulatable Light-Source,IPQ Annual Progress Report ±' p. 152-159.

Lammers, C.A. and Moonen, G.J.J. (1970) The Modular Time Measuring Eq\\ipment (MTM) ,IPQ Annual Progress Report ~, p. 205-208.

Moonen, G.J.J. (1967) EVA. A Singular Unpaced Reaction Measuring Device, IPQAnnual Progress Report ~, p. 181-183.

Moonen, G.J.J. and de Jong, Th.A. (1973) MARIE Interface between Computer andExperiment, IPQ Annual Progress Report ~, p. 54-56.

Muller, H.F. (1970) Computer Installation, IPQ Annual Progress Report ~, p. 230-231.

Noordermeer, W.H. and Moons, C. (1970) The Vario-S-Gate. A Two-Quadrant AnalogueMultiplier, IPQ Annual Progress Report ~, p. 223-226.

Valbracht, J.e. (1968) MTG A Modular Time Source, IPQ Annual Progress Report l,p. 113-114.

Willems, L.F. (1966) IPQVQX II. A Speech Synthesizer, IPQ Annual Progress Report l,p. 120-123.

83

A SPEECH SPECTRUM ROTATOR

A.C. van Nes

introduction

The contribution of temporal speech structures to the perceptual process has been

investigated in a number of research projects carried out in our Institute. In

order to isolate these structures from the syntactic and semantic content of

the speech signal, this signal is made unintelligible. An apparatus for obtaining

such a signal is. described in this report. The speech spectrum is rotated, which

makes the speech unintelligible, without altering the temporal structure. It is

also possible to shift the resulting spectrum by a certain amount.

principle

fc-flfc-bf2----... f

fl

The rotated spectrum is obtained by a well known modulation technique, i.e.

single-sideband modulation with suppressed carrier. Fig. la shows the original

speech spectrum, whereas Fig. lb shows the spectrum after modulation with a carrier

frequency f c ' This latter spectrum contains the desired rotated (mirrored against f c )

ampl.

1

a bfc

Fig. 1. The speech spectrum (a) and the spectrum obtained after modulation (b).

spectrum as the lower sideband. General methods are available for isolating this

lower sideband of the compound spectrum. A simple one is shown in Fig. 2, using a

filter to suppress the carrier and the upper sideband. This method has the serious

disadvantage that a rather sharp filter is needed, this causing a great deal of

flexibility to be lost.

t----I---I MODULATOR

fc

Fig. 2. Filter realization of a single-sideband system.


84

In the present design another method is chosen in which the suppression of the upper

sideband is obtained by a 1800 phase difference for the upper sidebands in the out

put of a double balanced modulator, Fig. 3. Due to the absence of a filter the

resulting spectrum can now also be easily shifted by changing the carrier frequency.

WIDE BAND

90· cos w I

NETWORK

BALANCED ~COS(IX-W)I + ~ cos(lX+w)1

MODULATOR

sinwl

WIDE BAND

cos IXI

BALANCED

NETWORK

sin IX I

MODULATOR ~coS(IX-w)1 - ~ cos(lX+w)1

CARRIER--.....__....L .... ---I

fig. 3. Double balanced modulator design for a single-sideband system.

realization

The complete diagram of the spectrum rotator is presented in Fig~ 4. The main

components are two identical 90 0 wideband phase shifters and two identical balanced

modulators. The phase shifters have two outputs, with a phase difference of

90 0~ 0.75 0 in the frequency range of 100 Hz - 10 kHz. The components for this part

of the circuit have to be selected very carefully in order to keep the phase dif

ference as close to 90 0 as possible in the frequency range indicated.

The suppression of the unwanted sideband decreases fast when the phase deviates

from 900

; we measured a 15 dB decrease in suppression for a 10 phase error.

The balanced modulators (TCA 240 or ~A 796) suppress the carrier frequency, which

can be trimmed with potentiometers Pl and P2. This suppression can be checked at

points TPA and TPB. With the aid of operational amplifier C the signals of the

two modulators are subtracted so that the upper sideband is present at the amplifier

output. With operational amplifier D the signals of the two modulators are added,

so that the lower sideband is present at the output. The suppression of the

undesired sideband and the carrier is 60 dB.

_----- _1-,'0-:..='''''--='-="+'/0''',-'-,,,, _

'If':'

J.tII ""'''''<'''' r

rllN,.... .J.

... P.·~

T

'0..

,,'r--i.:.!!..j--.,--H,,I '----llf---+I,,,,,I,,

1~ ...1.

,---------r-r-r-----------._-.... 'I''''

: .--'"'"d(1)

nr1"'"'!CS

'"'!or1"I"r1"o..;

86

ELECTRONIC EAR TRUMPET

GH van Leeuwen

Before the advent of the electronic hearing aid, the hard-of-hearing would sometimes

use ear trumpets in an attempt to improve perception of speech. At present, too,

remarks have been made to the effect that ear trumpets have certain assets (Groen,

1968). Attention has been drawn on many occasions to their directional sensitivity,which improves the signal-to-disturbance ratio. 11ention is made, too, of the

acoustical gain that goes with such horns.

Before going into the topic of this paper, it is worth while mentioning some measure

ments we made of the gain on the part of horns (Van Leeuwen,1975). It is found to

be roughly equal to the ratio of the areas of the mouth and the throat. With an

external meatus area of 1 cm 2, a 20 dB gain can be achieved given a mouth area of

100 cm 2, measured perpendicular to the direction of the sound source. From Olson

(1957) it is clear that good directivity also requires rather a big horn.

However, the frequency characteristics of horns are far from ideal, showing a

multitude of resonances and anti-resonances. These characteristics are responsible

for the "metallic" sound quality. We set out to design an electronic device, com

bining the advantages of the ear trumpet with a flat frequency charact2ristic

and smaller dimensions. The size should not be too small, however, since, if a

good directional pattern is to be achieved with a high quality microphone element,

dimensions should be of the order of 5 cm. Ease in handling is also an indication

for avoiding extremely small dimensions. Finally, we think it worth while to aim

at a device which can be applied or put away like the trumpet without fumbling.

This, in turn, sets a limit to the permissible amplification at any frequency and

constitutes an additional argument in favour of the directivity of the microphone.

The first prototype, shown in Fig. 1, was made from the left-hand shell of a

Koss Pro-4AA headphone with the dynamic loudspeaker and the liquid-filled circum

aural cushion included. A tube, 6 cm long and 1.6 cm wide, mounted on the shell,

contains a cardioid electret microphone cartridge, Philips LBC 1060/01. This tube

preserves the 20 dB or more front-to-rear ratio of the microphone element in the

bandwidth employed. The electrical amplification is achieved by means of a TAA 370

integrated amplifier with slightly modified additional circuitry (Peters, 1969).

The low supply voltage, together with low efficiency of the louJspeaker, limit

the maximal output sound pressure level to about 90 dB. Complete specifications

are listed in table I. From this the feasibility of the idea is clear, although

improvements should be possible.

Subjective impressions reveal the usefulness of the directivity of the device.

Positive reactions were also obtained from a hard-of-hearing subject. The ear trum

pet, especially, will be suited to the mIldly hard-of-hearing, while amplification

can reach 26 dB maximum. Future development will concentrate on ergonomic and sound

quality factors.


87

Table I

Microphone input sensitivity at 1 kHz

Microphone front-to-rear ratio at 1 kHz

Loudspeaker efficiency for 1 mW at 1 kHz

Maximumx input at 1 kHz

Maximumx output at 1 kHz

Supply voltage

Supply current less than

XDistortion

references

Fig. 1. Electronic ear trumpet.

300 llV/llbar

30 dB

94.5 dB SPL

65 dB SPL

90 dB SPL

1.5 V

3 rnA

3 %

Groen, J.J. (1968) Slechthorendheid en Hoortoestellen, Stafleu's wetensch. Uitg.Mij. Leiden, p. 64 -65.

Leeuwen, G.H. van (1975) Enige metingen aan hoorns, IPO report 274.

Olson, H.F. (1957) Acoustical Engineering, D. van Nostrand Compo Inc. Princeton,New Jersey, p. 47/108.

Peeters, A.M. (1969) Monolithic Integrated Hearing-Aid Circuit TAA 370, PhilipsAppl. Information 137.

88

A MINIATURE EMG DEVICE

J. Vredenbregt and J.H.M. van der Straaten*

Although the small EMG device described by Vredenbregt and Basten (1971) proved to

be very useful in studying the human motor system, we found that its relatively high

weight (4S grammes) was still a disadvantage. During measurements of movements,

artefacts appeared in the EMG signals due to very fast changes in acceleration as

e.g. encountered in gait studies. These fast changes, together with the relatively

high inertia of the device, resulted in small movements of the device and hence of

the electrodes with respect to the skin. This caused non-relevant low-frequencies

in the EMG signal.

This disadvantage, in addition to the fact that high input impedances are possible

nowadays contributing to better reproduceability of the electrical activity, have

initiated further miniaturisation.

Based on the existing electronic concept, an operational amplifier was made using

the thick-film technique and suitable miniature amplifiers (LM 308, N.S.C.) and

components.

The resulting amplifier fulfils nearly all the electric requirements Lid is smaller

than a normal dual in-line package.

This part of the set-up was made by the Electronic Department in cooperation with

the Mathematics Department of the University of Nijmegen, The Netherlands.

The small dimensions made it possible to fit and screen the amplifier between the

surface electrodes. Amplifier with electrodes were put in one rubber suction cup.

The device is shown in Fig. 1. in its final design. The dimensions are 4S x 1S x 8 mm

and the weight is 8 grammes. The electrodes are 28 mm apart.

Fig. 1. The miniature EMG device.

* J.H.M. van der Straaten, M.D., Department of Anatomy, University of Nijmegen.


89

The device can be very easily and quickly placed on the. skin by Vacuum or with ~n

elastic band. Intensive preparation of the skin is not required as any change in

skin-electrode impedance is negligibly small compared to the input impedance of the

device. Needle electrodes can be applied instead of surface electrodes if desired.

The characteristics are shown in Table I.

voltage gain (fixed value)

frequency response

input impedance

output impedance (minimum)

common mode rejection

noise level at input side

-input short-circuited

-input closed with 0.5 Mohm to common

supply voltage range

maximum output voltage

power dissipation

100

flat within 1% between 18 and 800 Hz

greater than 400 Mohm15 ohm

better than 70 dB at 50 Hz

5 lJV rms

8 lJV rms

between + 4V and + 15 V dc

29 V

40 mW (unloaded)

Table I.

The device can be short-circuited without damage. Likewise a high voltage (220 V)

at the electrodes will cause no damage to the circuit.

frequency

I--+-j----I---r-- ---1---+-+-1---+---1

-7t2~----LS----:1LO---:2--S:----..:'10"""L"--!-2-...lS-1.L0

".,---,2L----'-S-110

4

Fig. 2b

5 ltf 2 5 103 2 5 104

frequency

I/ \

/ 1\/ I'\.

"1---

'5 1.0'-.

£ 0.8

:; 0.6a.

~ 0.4

In Figures 2a and b the frequency response and the phase shift relation are presented.

It may be noted that the latter is less than 20 degrees over the whole flat frequency

range, which is an additional improvement compared to the previous device. Finally,

the device proved to be mechanically very reliable.10

2• 1.2

0.2

o2 5 10 2

Fig. 2a

Fig. 2. The frequency response and the phase shift.

reference

Vredenbregt, J. and Basten, e.G. (1971) A Modified Small Electromyograph, IPQAnnual Progress Report §., p. 130-131.

5 Lp.o. publications

90

91

I.P.O. PUBLICATIONS 1975

P 277 G. Rau en J. Vredenbregt

Het afleiden van het elektromyogram en de toepassing voor kwantificeringvan de spieractiviteit.

Sport, lichamelijke vorming en wetenschap: een overzicht van het wetenschappelijk onderzoek in Nederland op het gebied van de sport en lichamelijkevorming.Eds. J.E. Hueting en R.A. Binkhorst, Leiden: Meander 1972, p. 83-94.

P 278 B.L. Cardozo

Some notes on frequency discrimination and masking.

Acustica, 1974, ll, p. 330-336.

In order to study frequency discrimination near and at the threshold ofmasking, a paradigm was used which permitted frequency discrimination anddetection of noise-embedded sinusoids to be measured in one experiment. Thejust noticiable difference in frequency at the threshold of detection wasfound to be tof" 'U 16 Hz with durations of the sinusoid of tot = 64 ms andwith tot = 256 ms. With tot = 16 ms, tof" 'U 64 Hz. The frequency of the sinusoidswas 1000 Hz in all experiments. The possible implications for a placetheory and a periodicity theory of pitch perception are mentioned.

P 279 G. Rau and J. Vredenbregt

Mechanical and electromyographic phenomena during normal finger tremoroscillations.

Paper presented to the 6th Congreso Internacional de Medicina Fisica,Barcelona, 1972. Volume II of the Proceedings, p. 316-336.

P 280 L.P.A.S. van Noorden

Temporal coherence in the perception of tone sequences.

Doctoral Thesis, Eindhoven University of Technology, February 1975.

Whether a tone sequence is perceived as temporally coherent (like a melody)or rather as split up in parts (a phenomenon which we call fission here)depends to a certain extent on the physical properties of the sequence.This is the main theme of the present exploratory study, introduced inchapter 1 along with literature in this field (that can be characterizedas a branch of auditory pattern recognition or as auditory Gestalt psychology).Chapter 2 is devoted mainly to the effect of the pitch interval I betweensuccessive tones and the tone repetition time T (the reciprocal of the tempo)in sequences with a simple structure, like ABAB .. or ABA.ABA ... Differentpsychoacoustic methods are used to determine the domain of existence oftemporal coherence and its boundaries in the I - T plane. With small pitchintervals I, there is always temporal coherence. When I is increased at agiven value of T the fission boundary will be reached at a given moment.Beyond this boundary there is a large region in which the way the listenerdirects his attention determines whether he will hear fission or temporalcoherence. When I is increased further, the temporal coherence boundary iseventually crossed and the observer perceives fission irrespective of hisattentional set. The temporal coherence boundary for fast tone sequencesis situated at smaller tone intervals than for slow sequences.Chapter 3 describes qualitative experiments showing contiguity of theexcitation sites on the basilar membrane to be a necessary condition fortemporal coherence. However, this condition is not a sufficient one, asevidenced by the loss of temporal coherence in monotone sequences with sufficiently large loudness differences between successive tones. Qualitativeexperiments also make it clear that temporal coherence may exist between .pure tones and one of the components of a complex tone. This effect might beused to form the basis for investigation of auditory frequency resolution;however, this falls outside the scope of the present investigation. AppendixA shows the feasibility of determining the subjective partial loudness of acomponent of a complex tone.The effect of alternating loudness of consecutive tones is further exploredin chapter 4, dealing with such problems as the relation between temporalcoherence and the continuity effect. In this context we hit upon a phenomenon

IPO annual proRress report 10 /975

92

that, to our knowledge, has not been mentioned in the literature before:when listening to the weak tones in fast sequences of tones with alternatingloudness, the observer hears these weak tones with twice their actual tempo.This phenomenon is called the "roll effect".Chapter 5 deals with less simple tone sequences in which the pitch intervalswere chosen at random from a certain set of intervals. The results suggestthat anticipation has little or no influence on the ability to hear temporalcoherence in fast sequences. The hypothesis is advanced that pitch tracking- which seems to be controlled by the stimulus rather than by the observer is performed by the auditory system by means of pitch motion detectors.There is no way to test this hypothesis experimentally at present, butresults from the study of the analogous field of visual movement which arepresented and discussed do make it plausible.Before returning to the starting points, music and speech, a number of experiments are described in chapter 6 showing that the temporal structure of atone sequence gets more blurred as the temporal coherence becomes lessmarked. A possible relation with forward masking is sketched in Appendix B.In chapter 7 we have tried to fit the present findings into the frameworkof musical theory, with particular reference to counterpoint rules. Phenomenarelated to fission that are found to occur in technical speech processingare also briefly discussed.A retrospect (chapter 8), a glossary and a gramophone record with explanatorycomment complete the present study.

P 281 B. Shackel and F.L. van Nes

Ergonomics in Brazil.

Applied Ergonomics, 1975, ~, p. 43-44.

P 282 J.A.J. Roufs

The standard observer: a controversial subject.

Ophtalmologica, 1975, l2l, p. 43-44.

P 283 F.L. Engel

Visibility, conspicuousness and attention.

Ophtalmologica, 1975, 12l, p. 41-42.

P 284 H. Duifhuis

A theory on cochlear nonlinearity and second filter

Poster presented to the Fifth International Biophysics Congress of theInternational Union for Pure and Applied Biophysics, Copenhagen, 1975.Poster Abstract 57.

P 285 I.H. Slis

Spraaksynthese door regel s.

Nederlands Akoestisch Genootschap, 1975, publ.nr. 32, p. 1-7.

A computer programme is described which contains a system used for thesynthesis of reasonably intelligible Dutch. The input of the programme isgiven in terms of a string of symbols for phonemes and conditions underwhich the phonemes have to be synthesized (e.g. stress). The output is aset of parameter values for a hardware speech synthesizer (Rockland) whichneeds information for every period of the fundamental frequency.

P 286 J. 't Hart and R. Collier

Integrating different levels of intonation analysis.

J.Phonetics, 1975, ~, p. 235-255.

This paper deals with a partly experimental approach to the complex relationship that exists between the abstract, global structures of intonation, andthe concrete, atomistic features of the course of fundamental frequency. Morespecifically, we have introduced three levels of description and have attempted to establish links between these: a concrete and atomistic level of theperceptually relevant pitch movements, a concrete and global level of theaudible pitch contours and the measurable fundamental frequency curves,and finally, an abstract and global level of the intonation patterns. In thecorresponding three main parts of the paper it will be shown (1) how pitchmovements can perceptually segment the fundamental frequency continuum;(2) how a 'grammar" can be designed that is capable of generating all and


93

only the acceptable combinations of pitch movements, i,e. pitch contours;(3) how listeners categorize different pitch contours into meaningfulclasses, i.e. intonation patterns. The investigation was based on Dutchutterances.


94

papers accepted for publication

MS 240 S.G. Nooteboom

On the Internal Auditory Representation of Syllable Nucleus Durations~

To appear in: Auditory Analysis and Perception of Speech.

This paper will report on some perceptual experiments in which subjectsare asked to adjust the durations of syllable nuclei in synthesized wordsaccording to some internal criterion. The results indicate that the internal,auditory representation of syllable nucleus durations may be more accuratethan spectrographic measurements. The internal representation of how wordsshould sound appears to be governed by rather strict timing rules, inwhich phonological vowel quantity, s~ress and position in foot and word aremajor factors. The role of the resulting timing patterns in the auditoryprocessing of speech will be discussed.

MS 241 I,H. Slis

Consequences of Articulatory Effort on Articulatory Timing.

To appear in: Auditory Analysis and Perception of Speech.

Four different effort oppositions have been studied on labial plosives,viz. :

(1) the voiceless-voiced (tense-lax) opposition' in /p/ vs. /b/,(2) initial /b/ before long (tense) and short (lax) vowels,(3) lip closing of /p/ after short ("scharf geschittene") and long ("weich

geschittene") vowels, and(4) stress vs. non-stress in intervocalic /p/.

Lip closing activity was measured on the orbicuZaris oris mus~le and closureduration was measured by means of lip contacts. More effort in the oppositionsbetween voiceless and voiced plosives, lip closing after short and longvowels, and stress vs. non-stress, results in higher closing activity andlonger closure duration of the lips. In the fourth opposition, /b/ beforelong and short vowels, no difference in emg activity was found with moreeffort. These results were interpreted as an advancement of the commandswith more effort compared to those with less effort.

MS 258 J.J. Andriessen and H. Bouma

Eccentric vision: Adverse interactions between line segments.To appear in: Vision Research.

The paper deals with adverse interactions between line stimuli in eccentricvision. Both contrast threshold and just noticeable difference of slanthave been measured for a test line as a function of the distance from anumber of surrounding lines. Test lines were either parallel or perpendicular to the surrounding lines.It turns out that the interference affects both contrast threshold andj .n.d. of slant with a clear-cut orientational specificity. The surprisingresult is the extensive spatial range of the interference: between parallellines it operates over retinal distances of about 0.4¢t degrees, where ¢tis the eccentricity of the test line. Large-distance interference limitseccentric spatial vision in daily life much more than classic visualacuity limits would indicate, and makes eccentric vision probably quitedifferent from "unfocussed" foveal vision.

MS 261 F.L. van Nes

Analysis of Keying Errors.

To appear in: Ergonomics.

The performance of keyboard operators can be expressed in terms of keyingtime and errors; this paper dea.s with errors. If the causes of errorswere known, it might be possible to reduce the percentage of wrong keystrokes.Therefore, an attempt was made to identify these causes by classifying293 errors, collected in a field study, into seven categories. About 25%of the errors were due to the operator misinterpreting input data; betterdata presentation may decrease this percentage. At least 40% of the keyingerrors could be traced to underlying errors in finger movement control,and would not seem amenable to direct error decreasing measures. Automatic


95

punching of repetitive information brings about numerous repetitive errorsas well; improved instructions on the use of programmed punching facilitiesmay reduce these errors.

MS 263 H. Bouma

Auditieve Funkties.

Verschijnt in: Nederlands Handboek voor de Psychonomie, Hoofdstuk 8.

MS 267 H. Duifhuis

Cochlear nonlinearity and second filter; possible mechanism and implications.

To appear in: J.Acoust.Soc.Amer.

We indicate that the directional sensitivity of the hair cell together witha directional distribution of frequency over the hair cells comprise apossible physiological basis for the second filter. Tuning disparity offirst and second filter denotes the difference in tuning frequency; at agiven position x the tuning frequency of the first filter is aCF, of thesecond CF, with a>l. This accounts for the asymmetry in location oftwo-tone suppression areas. The compressive nonlinearity is describedby a vth law device with v<l. We analyse implications of this model fortwo-tone suppression, sharpening, pure-tone masking, and combination tonegeneration. Basic features of these phenomena are described adequately.For combination tones the propagation problem needs further study. Onthe basis of a comparison of literature data and theoretical predictionswe estimate ~=1.2, and v=0.6. Regarding accurate shape of first and secondfilter, the discussed data provide means for a qualitative evaluationonly. Possibilities for a quantitative analysis are indicated.

MS 270 H. Bouma and Ch.P. Legein

Foveal and Parafoveal recognition of letters and words by dyslexics andby average readers.

To appear in: Neuropsychologica.

In adult readers, parafoveal recognition of words is limited by stronginterferences between letters.In the present study subjects were twenty dyslexic children and twentyaverage readers (9-14 yrs.). Recognition scores of isolated letters, ofembedded letters and of words were compared both in foveal and in parafoveal vision.The groups did equally well on isolated letters whereas the dyslexicsgenerally stayed behind on embedded letters and on words. Individualscores of embedded letters and of words were moderately correlated as wereword score and reading level.It is advocated that research on dyslexia is directed at possible deficitsin reading processes such as eye control, word recognition, and storagenot only as separate factors but rather in their intimate relationships.

MS 272 S.G. Nooteboom and A. Cohen

Anticipation in speech production and its implications for perception.

To appear in: Structure and Process in Speech Perception; Proceedings ofthe Symposium on Dynamic Aspects of Speech Perception.

In speech production we find a number of examples indicating that a speakeris anticipating on what is yet to be produced. As such anticipatory behaviouris often reflected in the acoustic structure of speech, it seems reasonableto expect that studies of anticipatory behaviour in speech productionmay provide clues to the dynamic organisation of speech perception.In this paper we will discuss some examples of empirical evidence foranticipation in speech production, and their possible implications forspeech perception. The examples are taken from three, rather divergent,areas, viz. slips of the tongue, intonational patterning, and temporalorganization of speech.It will be made plausible that the size of chunks of speech material overwhich anticipation may take place depends on the linguistic level of organization, and that correspondingly, the size of decision units or chunks inspeech perception, varies with the level of processing.Some testable hypotheses will be derived as to the perceptual implicationsof acoustic cues stemming from anticipatory phenomena in speech production.With respect to anticipatory effects in the temporal organization of speechthe hypotheses are put to the test. Experimental evidence shows thatlisteners have an implicit knowledge of such effects, and may actually usethis knowledge in resolving perceptual ambiguities.


96

MS 273 J. 't Hart and R. Collier

The role of intonation in speech perception

To appear in: Structure end Process in Speech Perception: Proceedings ofthe Symposium on Dynamic Aspects of Speech Perception.

Recent research of prosodic phenomena has shown a growing interest in theimportance of intonational cues as mediators in speech processing.Our earlier study on systematically occurring pitch events in speech hasled to the development of an adequate descriptive system of (Dutch)intonation. In developing a "grammar of intonation" as a device thatgenerates contours for entire utterances as composed of perceptually relevantpitch movements, the need was felt to introduce the "intonational block" asan intermediate unit between pitch movement and contour. The internalstructure of the blocks is subject to rather stringent limitations, buttheir external coupling appears to be almost free. These properties ofthe blocks suggest that they are more than arbitrary clusters of pitchmovements.In this paper we will concentrate on the possible role of the ~locks inthe perception of speech. We will try to show that the block boundariescan stand in some relation to the boundaries of syntactic and/or semanticunits.Consequently, the block boundaries may constitute cues for the listenerabout how to apply a first, rough segmentation of the speech continuuminto units that are the most suitable candidates for being processed asa whole.

MS 274 I . H. S1 is.

Rules for the synthesis of Speech

Paper presented to the 8th International Congress of Phonetic Sciences.

MS 275 J. 't Hart

Discriminability of magnitude of pitch movements in speech-like signals.


MS 278 S.G. Nooteboom

Context effects in the perception of phonemic vowel length.


MS 279 A. van Katwijk

The Role of Respiratory Effort in Accentuation.



97

reprints and preprints of i.p.o. publications

Requests for reprints or preprints of publications listed on pages 91 - 96 can be

stated in the form appearing below.

&ack numbers and reprints or preprints may be obtained from:

LibraryInstitute for Perception ResearchP.O. Box 513

EINDHOVEN 4502The Netherlands

type-print or capitals please:

NAME:

FUNCTION and/or

INSTITUTE:

ADDRESS:

CITY:

(STATE/AREA CODE)

COUNTRY:

would appreciate receiving a reprint or preprint of the following publications:

title and author(s):

Data: Signature:

IPO ANNUAL PROGRESS REPORT Nr - Eindhoven University of ...alexandria.tue.nl/tijdschrift/IPO...

Documents

Transcript of IPO ANNUAL PROGRESS REPORT Nr - Eindhoven University of ...alexandria.tue.nl/tijdschrift/IPO...