IPO ANNUAL PROGRESS REPORT Nr - Eindhoven University of ...alexandria.tue.nl/tijdschrift/IPO...
Transcript of IPO ANNUAL PROGRESS REPORT Nr - Eindhoven University of ...alexandria.tue.nl/tijdschrift/IPO...
IPO ANNUAL PROGRESS REPORT
Nr.10 1975
Editor: A.J. Breimer
Typist: Helena Koning
INSTITUTE FOR PERCEPTION RESEARCH - fNSTITUUT VOOR PERCEPTIE ONDERZOEK
P.O. BOX 513 EINDHOVEN HOLLAND
NATIONAL (040) 756605TELEPHONE -------
II\JTERNATIONAL +3140 756605
I I
ORGANIZATION I.P.O.
supervisory board
(31.12.1975)
Ir. K. Kooij (chairman)
Dr. J.H. Bannier
Prof. Dr. W.A.T. Meuwese
Prof. Dr. J.F. Schouten
Dr. Ir. K. Teer
- Eindhoven
- 's-Gravenhage
- Eindhoven
- Eindhoven
- Eindhoven
scientific board Prof. Dr. H. B. G. Casimir (chairman)
(31.12.1975) Prof. Ir. R. G. Boiten
Prof. Dr. I r. P. Eijkhoff
Prof. Dr. H.E. Henkes
Prof. Dr. S.L. Kwee
Prof. Dr. W.J.M. Levelt
Prof. Dr. I r. R. Plomp
Prof. Ir. O. Rademaker
Prof. Dr. R.J. Ritsma
Prof. Dr. P.C. Veenstra
Prof. Dr. C.J.D.M. Verhagen
Dr. Ir. P. L. Walraven
Prof. Dr. P.J. Willems
Prof. Dr. Ir. A. van Wijngaarden
- Heeze
- Delft
- Eindhoven
- Rotterdam
- Eindhoven
- Nijmegen
- Soesterberg
- Eindhoven
- Groningen
- Eindhoven
- Delft
- Soesterberg
- Tilburg
- Amsterdam
director Dr. H. Bouma (from 15-9-1975)
Prof. Dr. C.A.A.J. Greebe (till 15-9-1975)
adviser Pro f. Dr. A. Co hen
research associates Ing. D.J .H. Admiraal
Ing. J.J. Andriessen
Ing. H.J. BleilevenIr. F.J,J. Blommaert
Drs. D.G. Bouwhuis
Ir. A.J. BreimerIr. J.P.L. Brokx
Drs. B.L. Cardozo
Dr. Ir. H. Duifhuis
Dr. J.P.M. Eggermont
Ir. F.L. Engel
J. t t Hart
Ing. Th.A. de Jong
Dr. A.F.V. van Katwijk
Dr, Ch.P. Legein (part-time)
Ing. F.F. Leopold
Ing. G.J.J. Moonen
/PO annual progress report 10 /975
research staff
secretaries
library
workshop
III
H.F. \1uller
Dr. Ir. F.L. van NesDr. Ir. L.P.A.S. van Noorden (Z.W.O.~)
Dr. S.G. Nooteboom
Drs. J.J. de Rooij (Z.W.O.l<)
Dr. Ir. J.A.J. RoufsDrs. C.W.J. Schiepers (Z.W.O.~)
I.H. Slis
Ing. J.e. Valbracht
Ir. L.L.M. Vogten
Ing. J. Vredenbregt
Ir. L.F. Willems
Ing. M.A. Alewijnse
Th.M. Bos
Ing. E. de Braal
G.J.N. Doodeman
Ing. J.C. Jacobs
C.A. Lammers
Ing. G.H. van Leeuwen
A.W.J.J. Melchers
A.C. van Nes
W.H. Noordermeer
Ing. J.A. Pellegrino van Stuyvenberg
Ing. J. Polstra
A.L.M. van Rens
K.G. van der Veen
Ing. H. de Vries
Mrs. J.A.C.E. van Esch-van der Vleuten
Mrs. C.J. Mennen-Senkeldam
Mrs. L.J. Savenije-Clignett
Mrs. P. Thiele
Mrs. J.W. Tielemans
Mrs. J.C.G.M. Verbruggen-Jansen
Mrs. J.M. Hoogervorst
Miss C.W. Koning
M.A. van den Ban
C.G. Basten
J.H. Bolkestein
P.A.N. Broekmans
C.Th.P. Godschalx
H.E.M. Melotte
D.J. van der Wees
~ Netherlands Organization for the Advancement of Pure Research
IPO annual progress report 10 1975
IV
This report or any part thereof may not be reproduced in any form without the
written permission of the Institute for Perception Research. Reprints of the
separate contributions are available. Illustrations may be reproduced only
with explicit mentioning of source; copies will be appreciated.
/PO annual progress report /0 /975
v
INTRODUCTION
As new director of the Institute for Perception Research IPO, I gladly take the
opportunity to greet our colleagues and all those interested in the IPO through
the medium of the 10th issue of the IPO Annual Progress Report.
Four members of the scientific board left in 1975, viz. Professor L.B.W. Jongkees,
Professor J. Koekebakker, Professor H. Mol, Professor A.J.B.N. Reichling and
Professor A.J.H. Vendrik, and I should like to express our thanks to them for their
help in improving the quality of our research and in maintaining effective relation
ships with other research groups.
For IPO, the year 1975 saw a number of changes, the main one being that
Dr. C.A.A.J. Greebe, who had succeeded Dr. J.F. Schouten in 1972 as director of
IPO, himself left on September 15th. He served IPO, among other things, by his
concentrated efforts in formulating explicit lines of research in close consulta
tion with the IPO staff. In this programme, an increased effort is foreseen in
cognitive aspects of perception. Our Institute is greatly indebted to Dr. Greebe,
and I consider myself fortunate in continuing along the research lines drawn up
by him.
Other members of IPO who left us this year are Linda Savenije-Clignett, Jos Tielemans
and Messrs Alewijnse, Andriessen, v.d. Ban, Eggermont, Engel, van Noorden, Noordermeer
Schiepers, Slis and Vredenbregt. IPO was fortunate in having them as colleagues and
we are grateful for their contributions both to the work and to the comradely
atmosphere of the Institute. We are glad to say that we can maintain close relation
ships with some of them at least.
As will be seen from the contents of this Progress Report, we have all done our
best to limit the impact of these changes on the work. Our research potential will
soon be strengthened by the advent of new members of staff. The absence of reports
on ergonomic subjects is purely accidental.
In the scientific field, I wish to draw attention in particular to two papers,
published this year (see Publications list on p. 91). Dr. van Noorden, in his
dissertation, worked out an original approach to the audition of perceptually
ambiguous tone sequences. Mr. 't Hart and Dr. Collier have published their inspiring
continuation of the early work on intonation by Cohen and 't Hart, based on the
melodic line of spoken sentences, which has now led to a grammar of Dutch intonation
in which a number of hierarchical levels are distinguished. Furthermore, I should
like to make mention of the "Dynamic Aspects of Speech Perception" conference on
which occasion IPO was happy to provide a meeting place for linguists, physicists
and psychologists, all interested in the perception of running speech. A general
survey of the symposium is given in a separate contribution by Dr. Cohen and
Dr. Nooteboom, who have assumed responsibility for the Proceedings which have
already appeared.
Both in pure and in applied science in the field of human perception, we wish to
contribute to basic understanding of the processes involved. Close contacts, both
with our colleagues abroad and in the Netherlands are essential for the purpose and,
indeed, a source of encouragement to ourselves.
H. Bouma
IPO annual progress report 10 1975
VI
CONTENTSpage
organization II
introduction V
contents VI
1 auditory perception
B.L. Cardozo, H. Duifhuis, G.H. van Leeuwen
L.P.A.S. van Noorden and L.L.M. Vogten
Auditory Research in 1975
L.P.A.S. van Noorden
Temporal Coherence and the Perception of
Temporal Position in Tone Sequences
H. Duifhuis
Psychophysical Two-Tone Suppression
2 speech
S.G. Nooteboom, J.P. Brokx, G.J.N. Doodeman,
Th.A. de ,long, J. It Hart, A.F.V. van Katwijk,
,l.,l. de Rooij, J.H. Slis and L.F. Willems
Research on Speech Perception in the IPQ 1975
2
4
19
25
,l. It Hart
The Location of the Non-Final Fall in Pitch
Contours in Dutch 27
,l.J. de Rooij
Prosody and the Perception of Syntactic
Boundaries
A.F.V. van Katwijk
Accent Patterns in Number Name Sequences
A. Cohen and S.G. Nooteboom
A Symposium on Dynamic Aspects of Speech Perception
IPQ, August 4-6, 1975
3 visual perception
J.A.J. Roufs, J.J. Andriessen, Th.M. Bos, H. Bouma,
F.L. Engel, Ch.P. Legein, J.A. Pellegrino van Stuyvenberg,
A.L.M. van Rens and C.W.J. Schiepers
Research on Vision 1975
H. Bouma and D.G. Bouwhuis
Word Recognition and Letter Recognition
IPQ annual progress report 10 1975
36
40
45
49
53
VII
J.A.J. Roufs and F.J.J. Blommaert
Pulse and Step Response of the Visual System
F.L. Engel and Th.M. Bos
Small Involuntary Eye Movements
H. Bouma, Ch.P. Legein and A.L.M. van Rens
Visual Recognition by Dyslectic Children
4 instrumentation
D.J.H. Admiraal
IPQ Instrumentation 1957-1975
A.C. van Nes
A Speech Spectrum Rotator
G.H. van Leeuwen
Electronic Ear Trumpet
J. Vredenbregt and J.H.M. van der Straaten
A Miniature EMG Device
5 i.p.o. publications 1975
/PO annual progress report 10 1975
60
68
72
80
83
86
88
91
1 auditory perception
2
AUDITORY RESEARCH IN 1975
B.L. Cardozo, H. Duifhuis, GH van Leeuwen, L.P.A.S. van Noorden and L.L.M. Vogten
This introduction to the auditory section is intended as a background to research
in the field. The objectives are at three levels. The basic one is to obtain
knowledge of "how the ear works". A second level may be briefly described as
research intended as support to speech research. A third level of research objec
tives aims at capitalising on the first and second levels and is involved in
projects of a more applied nature.
It has been stated on other occasions, that from the viewpoint of basic research,
knowledge of the functioning of hearing should acknowledge physiological findings.
In this area there have been important developments in recent years. Physiologists
have supplied us with important new data both on the hydromechanics of the cochlea
and on the physiology of the more peripheral stages of the auditory nervous system.
Although these data refer to animals, mostly under anaesthesia, and one is not,
as a rule, in a position to measure activity in more than one single neuron at a
time, these data represent important reductions in the set of possible models of
auditory information processing. We want to stress that these reductions are based
on what we do know about auditory physiology. On the other hand, the map of the
ear's physiology shows a vast terra incognita and it would be unfortunate indeed
to be ohliged to wait until this area were fully explored before any elaborate
model of the hearing mechanism could be ventured.
Having indicated the restrictions of auditory physiology, it is but fair to recall
the limitations of psychoacoustics, in which most experimentation is performed
with relatively simple stimuli in relatively simple, threshold-like paradigms that
allow relatively firm criteria to be used by listeners. We must bear in mind that
the relatively well-defined, more or less quantitative models based on psychoacoustic
data depict only a very narrow rim of ~uman audition.
In terms of the ultimate objective of basic research on audition, progress is slow.
Some new results are presented in a paper by Duifhuis in the present issue. Vogten
has been investigating the consequences of his interpretation (Vogten, 1974) of
the low-level ~laximum ~Iasking Frequency shift Ul~lF). Pilot measurements of the
pulsation threshold (Houtgast, 1973) have been carried out at low stimulus levels
in order to check the interpretation of MMF at low levels in terms of two-tone
suppression. Unfortunately, at these low levels the reproducibility of the measure
ments is so poor that it is very difficult to draw conclusions.
On a second level of research, we aim at ohtaining a more qualitative description
of auditory phenomena less amenable to threshold-like paradigms. This type of
investigation can still be regarded as p~ychoacoustic, but the criteria for the
listener are less firm, the experimental data tend to show greater variance, so
that interpretation in terms of auditory models is difficult indeed. The study of
temporal coherence in tone sequences (van Noorden, 1975 and this issue) is a good
example. In this domain, there are many problems that could be tackled, e.g. the
perception of accent in tone sequences. This work is primarily descriptive. The
IPO annual progress report /0 /975
3
important thing is to analyse perception and try to find elementary percepts and
categories of perception, in the hope of bridging the gap between psychoacoustics
and perceptual phonetics.
The third level of research comprises a multitude of practical problems, such as
the perception and perceptual evaluation of noises, factors contributing to the
intelligibility of speech and the development of an acoustic amplifier in an attempt
to optimize certain factors (e.g. signal-to-h~ckground ratio) for the benefit of
the mildly hard of hearing (van Leeuwen, this issue).
references
Houtgast, T. (1973) Psychophysical Experiments on "tuning curves" and "two-toneInhibition", Acustica ~, p. 168-179.
Vogten, L.L.M. (1974) Low-level pure-tone masking and two-tone suppression, IPQAnnual Progress Report ~, p. 22-31.
4
TEMPORAL COHERENCE AND THE PERCEPTION OF TEMPORALPOSITION IN TONE SEQUENCES
L.PAS. van Noorden
introduction
It has been shown by a number of authors (Bregman and Campbell (1971), Schouten
(1962), Thomas and Fitzgibbons (1971) and Wilcox (1972)) thatit is difficult to
report the temporal order of the elements of a cyclic, repeated tone sequence if
both the rate of presentation and the frequency intervals between the successive
tones have high values. It has been proposed that this is due to the fact that
the successive tones are not perceived as one coherent whole, but that they fall
into groups according to their frequency region.
In Van Noorden (1975) it has bee~ shown th2t a largest frequency interval can be
found between the tones A and B of the alternating tone sequence ABAB .. where this
tone sequence can still be perceived as one coherent whole (i.e. the temporal
coherence boundary). This boundary was found to depend on the tone repetition
time T in such a way that with increasing value of T the value of the boundary
increases. When the tone sequence is perceived as split up into two strings A A.
and B B.the observer has the impression that he can not tell the precise position
of the tones B relative to the tones A.
In this article some expcrimer.ts are prescrrced with the aim of discovering whether
the loss of temporal acuity in perception of tone sequences is caused by the loss
of temporal coherence. We shall not study the perception of temporal order but
the perception of the precise temporal relations between the tones, as this can
be made in the simple tone sequence ABAB .. of which we have precise measurements
of the temporal coherence boundary. Furthermore, we shall determine the temporal
acuity in al sequCi:CC-i of 0:11y 2 or :' tones, i;1 W,l;_C): we ;'ave S;'OW:1 that t1e temporal
coherence boundary lies at much larger frequency intervals, and h) dichotically
alternating tone sequences.
the alternating tone sequence ABAB..
method
We used a tracking method to measure how precisely the temporal position of the
tones B with respect to the tones A can be observed in the tone sequence ABAB ..
Starting with the continuous tone sequence ABAB .. described in Van Noorden (1975),
we made the tone repetition time of the tones B (T B) about 1% smaller or larger
than that of tones A (TA); as a result, the tones B will gradually shift from
the midpont between the tones A. The observer is now asked to depress a push
button as soon as he perceives that the tones B are no longer precisely half-way
between the tones A. At this moment the relative positions of tones A and Bare
recorded and TB is changed from TA + 1% to TA - 1% or from TA - 1% to TA + 1%,
so that the tones B move back through the middle and so on. In other words, the
tones B oscillate continuously about the midpoint between the tones A during the
experiment (see Fig. 1).
IPa annual progress report /0 /975
5
Fig. 2. Set-up for the tracking experiment.
Fig. 1. Form of the stimulus used for investigation of the perception of the temporalposition of the tones B in the continuoustone sequence ABAB .. by a tracking method.The repetition time of the tones A, (TA),is equal to 2T. The repetition time of thetones B, (TB), is equal to 2T - 1% or2T + 1%. The sign changes after each response of the observer. 6T' and 6T" represent the just noticeable displacement of thetones B from the midpoints between the tonesA.
o.'s response
This measuring method resembles that
used by Von Bekesy for the semi
automatic tracking of the auditory
threshold, but the criterion that the
observer has to use in our experiments
is more difficult than the criterion
of whether or not one can hear a tone.
Moreover, in the auditory threshold
measurements the mean of the two rever
sal points is taken as a measure of the
threshold, while in our measurements
we are interested in the distance be
tween the reversal points. For these
reasons, among others, completely un
biased results cannot be expected from
these measurements; ~owever, the speed
and directness of this method counted
as advantages in this exploratory
investigation.
The distance 2l1T = LIT' + LIT" between
the reversal points on the two sides
of the half-way position can be taken
as a measure of the accuracy with
which the observer can perceive the
position of tones B with respect to
tones A. The measuring set-up is shown
in Fig. 2.
:..., ~
TA =2T + TB =2T+1%
- TB=2T-1%
I -
lilT" :" ,
'~ I~ ';--
+ I .. ~- I - I
o:s response
, '-,,...
TIME
, + ,+'-A
6T": :,....,-,....
A B A B
G ~T:T:zUJ:::>oUJa::LL
observations
Before discussing the measurements, we shall describe what the observer can hear
when he listens to the tone sequence in which the tones B are shifted with respect
to the tones A. We used a tone duration of 40 ms, trapezoidal burst envelopes and
a level of about 35 dB SL, f B = 1000 Hz, fA and T variable.
In slow tone sequences (T = 400 ms) in which the temporal coherence of the tones
A and B can always be heard, we hear a rhythmic change in the tone sequence
(comparable with the structure of iambic or trochaic verse). The size of the tone
interval between A and B does not change this effect essentially. The situation
is different with slightly faster tone sequences (T about 160 ms). Fission now
occurs when the tone interval is large enough. In this case it is more difficult
to perceive the position of the tones B, but with large shifts away from the
half-way position we perceive a sequence of groups AB AB instead of two separate
~trings A.A. and B.B .. The separation by pitch is as it were replaced by temporal
separation into groups of two tones. When the tone interval is smaller, the transi
tion from isochronous sequence to trochee or iamb can still be clearly heard.
6
In very fast tone sequences (T about 100 ms), we hear that B is no longer in the
half-way position, not so much on the basis of temporal differences but rather as
a consequence of subtle differences in the tone bursts themselves: the starts of
the tones seem to be either more or less gradual or staccato, or to differ in
level. These differences are more marked at small tone intervals. At large tone
intervals, the phenomenon of separation into groups AB AB can be observed.
measurements
We used the above-mentioned tracking method to measure the just perceptible dis
placement of the tones B from half-way between the tones A in the continuous tone
sequences ABAB .. , for a number of values of the tone interval I and the tone repe
tition time T. The tone duration was always 40 ms and the tone bursts had trapezoi
dal envelopes with rise and fall times of 5 ms; f B was 1 kHz and fA was variable.
The level of both tones was the same and was chosen so as to give about 35 dB SL
for the isochronous tone sequence with T = 100 ms, D 40 ms and fA = f B = 1 kHz.
Values of 62,82,101,120,158,202,278 and 398 ms were taken for the tone
repetition time, and of 0,1,2,3,5,7,10,14,19 and 25 semitones for the tone
interval I. About 8 reversal points on each side of the middle position were deter
mined in succession for each tone sequence with given values of T and I. With a
given value of T, all values of I were dealt with in a random order; this took
about 20 minutes. The various values of T were also dealt with in a random order.
In order to test the reproducibility, this whole set of measurements was repeated.
In all, about 5000 reversal points were determined in the course of a week. All
these measurements were carried out by one observer, the author. However, pilot
experiments indicated that different observers get comparable results.
results
The results of the measurements are presented in Fig. 3. The difference between the
first and second sets of measurements were slight; we therefore used their average.
It may he seen from Fig. 3 that the distance between the reversal points 26T
increases with increasing frequency interval I; however, the extent of this increase
depends strongly on the value of T. Regression lines have been drawn through the
mean reversal points as functions of the tone interval. The intercepts of these
lines on the T-axis and their slopes, together with the corresponding correlation
coefficients, are given in Tahle 1. In most cases the correlation coefficient is
greater than 0.9.
The strongest dependence of the value of 2nT on the tone interval was found at
T = 120 ms, the weakest at T = 398 ms. It is further striking that when T is less
than or equal to 120 ms, a systematic difference is found between the just obser
vable shifts forward and backward from the middle. The spread in the measured
values increases slightly with 2nT and v~ries roughly between 2 and 15 ms.
discussion
In order to compare these results with those obtained for the temporal coherence
boundary, we have presented them in a different way in Fig. 4. In this figure the
LvN.f8
=1kHz
7
T 2 3F B F B F B
ms ms ms
62.4 -.95 .96 56 66 -1.39 0.8681 .6 -.99 .96 70 90 -1 .70 0.79
100.8 -.99 .97 89 112 -1 .92 O. 71120.0 -.97 .94 105 135 -2.50 1 .26158.4 -.96 .87 145 171 -1 .98 1 .45201 .6 -.94 .94 188 216 -1 .43 1 .6527 8.4 -.64 .91 260 295 -0.37 0.73398.4 -.54 .50 379 421 -0.29 0.21
Fig. 3. The just noticeable forward displacement ~T' and backwarddisplacement ~T" of the tones Bfrom the midpoints between thetones A in the tone sequenceABAB .. as functions of the toneinterval I with tone repetitiontime T as parameter. The experimental points are the mean of15 determinations. At I = 0 andI = 25 semitones the spread ofthe observations (+ standarddeviat ion) is indicated bv horizontal bars. The lines throughthe experimental points correspondto the regression data of Table 1.As can be seen, ~t large values ofT the values of 6T' and 6T" donot depend on I, as they do atsmaller values of T. For T < 120 ms,there is considerable asymm~trybetween 6T' and 6T".
Table 1
1) correlation coefficient2) intercept of regression line
(I = 0)3) slope of regression line
F: T1 < T ; B: T1 > T
relative just perceptible shift ~T/T, is plotted as a function of T with the fre
quency interval I as parameter. The mean of the forward and backward shifts from
the middle (H = (~T' + H")/2) is taken.
We can read this graph as follows. The longest time int~rvals give the sharpest
perception of shifts from the middle, which does not much depend on the tone
interval. As T is reduced ~T/T increases, the more so as I is larger. This increa
se in ~T/T continues until T = 120 ms, when a constant value is reached. The
results for T < 120 ms can be approximated to by the expression ~T/T = a + b.I,
~here a = 0.11 and b 0.016 per semitone. These values were found with the aid
of a chi square grid method. With these values the linear correlation coefficient
of the measured points and the predicted points amounts 0.969, which means that
94% of the variation in the experimental points can be explained with this model.
8
wasIn Van Noorden (1975) it
dichotic tone sequences
In order to show the connection
with the temporal coherence boun
dary (Van Noorden, 1975), we have
indicated in Fig. 4 the value of
T at which the temporal coherence
boundary is found at the value of
I in question. It will be seen
that for frequency intervals of
~ 3 semitones 6T/T increases, pas
sing the temporal coherence boun
dary. At the coherence boundary
£lT/T has a value of about 20%; this
is roughly three-to four-fold com
pared with the smallest values
found (5 to 7% at T = 400 ms). The
gradual manner in which £lT/T in
creases as T decreases is in
agreement with the gradual nature
of the temporal coherence boun
dary.
shown that fission occurs in
dichotic tone sequences with
Fig. 4. The mean relative just noticeabledisplacement £lT/T = (£IT' + £IT" )/2T of thetones B in the tone sequence ABAB .. as a function of the tone repetition time T with Ias a parameter. These data are derived fromthose of Figure 3. The set of curves is cons~ructed by using the formula £lT/T = a + bIwith a = 0.11 and b = 0.016 semitone- 1 forT < 120 ms, and by drawing the best smoothcurve through the experimental points by eyefor larger values of T. The dotted area indicates the values of T where at a certain valueof I the temporal coherence boundary is found.As can be seen, £lT/T increases at the temporal coherence boundary when we follow thecurve of a certain value of I from large tosmall values of T. fA = f B and T less than about
150 ms. The experiments described
below were performed in order to see whether the perception of the relative temporal
position also deteriorated under these conditions. We used the tracking method of the
first section, except t~at only the forward s~ift from the middle was determined. The
direction of shift of the tones B with respect to tones A was reversed automati
cally as soon as the middle was reached. The experiments were performed on the tone
sequences ABAB .. and ABA ABA ...
60 1= • 25 semitonesI c 19 LvN
50 -- 25 • 14, 10
· 7
19v 5
3
" 2~ 14 • 1-30 · 0f-
10f-<I 7
20 5
2110 ___ 0
t
00 50
The results are plotted in Figure 5. The experimental points are the medians of ten
reversal points. It will be seen that the perception of relative temporal position
is less accurate with dichotic presentation than with diotic. However this difference
is less at large T than at small T. The results with dichotic presentation deterio
rate compared with those for diotic presentation between 160 and 140 ms. This is in
good agreement with the value of 150 ms of Van Noorden (op. cit.) for the loss of
temporal coherence with dichotic presentation, and with the minimum repetition time
of 172 + 25 ms for the ability to follow apparent movement of sounds from left to
right and vice versa found by Blauert (1~70)
Here again, loss of temporal coherence is accompanied by deterioration in the
accuracy of perception of relative temporal position. It may further be noted that
perception of the relative temporal position is more difficult in the tone sequence
ABA ABA ... than in the sequence ABAB ..
9
Fig. 5. The just noticeable relative forwarddisplacement ~T/T of the tone B in the tonesequences ABAB .. and ABA ABA ... in dichoticand diotic presentation. The dotted areaindicates the values of T at which temporalcoherence is lost in dichotic sequences.
aI . I • LvN
fA :fa
a --- ~~:Aa ABA -~'. \;:{::
a L':X.--01-dlOhco~ 0 ~ \
o 0............... .o....... o:::~
aa
-dir:'";:F-l LvNfA:fs
aABAj- - re').,,1
01--- diohco-o-o-o_o"'''_ ~""'_...
a I I I 1-0
2
3
3
4
5
6
_2~
a 50 100 150 200 250TONE REPETITION TIME T (ms)
Huggins (1974) has measured the perceived rate of dichotically alternating sequences
of clicks compared with the perceived rate of sequences of binaural clicks. He
finds values for the interpulse intervals between 70 and 100 ms below which the rate
of the dichotic sequence seems to be half that of the binaural sequence. The dis
agreement with the results mentioned above is perhaps due to the difference in task.
short tone sequences
In the initial phase of our investigation many of the measurements (van Noorden,
1971a, 1971b; Linssen, 1973; Augustus and Nederhand, 1973) were devoted to the
perception of the relative temporal position of the tones in short tone sequences.
It seemed at first as if the results of the measurements were hardly compatible
with the observer's impression that he could not perceive the relative temporal
position of the tones well in cases of fission.
However, later experiments (Van Noorden, 1974) showed that temporal coherence can
be perceived over faster tone intervals in short tone sequences than in conti
nuous ones. We shall now discuss briefly a couple of these measurements.
The just noticeable displacement of the second tone in a two-tone sequence with
a tone interval between the tones was determined with the aid of a binary-choice
method. The observer was presented with two tone pairs. In the second pair, the
time interval between the tones was either the same as that for the first pair (T),
br shorter (T - ~T)(see Fig. 6). The observer now had to say whether there was a
displacement in the second pair. We determined the value of T at which 75% of
the responses were correct with a sequential up-and-down method such as can be
performed with the aid of the IPa threshold tester (Cardozo and de Jong, 1971).
A B A B
" t5f'..-., !+-....-500 ms ----:
TIME
10
Fig. 6. Form of the stimulus of the forced-choice experiment for measuring the just noticeable displacementin time of the second tone in a two-tone sequence .In the second pair the forward displacement with respect to the first pair is zero or ~T, both with equalprobability.
We chose 1 = 100 ms, D = SO ms, f B = 500, 1000 or 2000 Hz, while fA was given va
rious values between f B/4 and 4fB. The threshold was determined in duplicate for
a numher of different ratios fA/fB' by trained observers. In the successive thres
hold determinations at a given value of f B, the values of fA were varied throughout
the range of interest, once upwards and once downwards. The level of the tone
bursts was 35 dB 5L at 1 kHz.
The results of the measurements for f B = 1000 Hz are plotted in Fig. 7; the results
for f B = 2000 Hz were basically the same.
It follows from these measurements that the accuracy with which the relative tempo
ral position can be determined depends on the tone interval. The best results are
obtained when fA = fBi the just noticeable displacement is then 5 ms. As the tone
interval increases, the just noticeable difference increases gradually to about
15 ms at a tone interval of 2 octaves. These results seem to agree with the findings
in Van Noorden (1975), that temporal coherence can be observed over a much greater
tone interval in two-tone sequences than in continuous sequences. The value of
15 ms mentioned above is much smaller than the 50 ms found for the same values of
T and I in continuous tone sequences; however, we cannot place too much reliance on
the difference because of the differences in the measuring methods used in the
two cases.
ABAB - - • LvN
- - LvNB AB °
50T=100 ms
-40
H~ 30....~ •20 I..-
AO
10,..P.O/"
a0 10 20 30
I (semilones)
Fig. 7. The just noticeable displaceme~t ~T/T of the secondtone of a two-tone sequence as a functlon of the tone lnterval I. The mean of the cases f < f£ and fA : f R has beentaken. The results for the just notlceable dlsplacement ofthe tones B in a continuous tone sequence ABAB at T = 10~ mstaken from Figure 4, are included for the sak~ of comparlson.As can be seen_in the two-tone sequ~nce the ~lsplacementcan be perceived more readily than In a contlnuous tonesequence.
In order to permit better comparison of the accuracy of
continuous tone sequences, we have determined both with
red to in the first section.
observation in short and
the tracking method refer-
11
It was convenient to perform these measurements with the tone sequence ABA ABA ...
The just noticeable shift of the tones B with respect to the tones A in this se
quence can then be compared with that found in the tone sequence obtained by omit
ting every other group ABA. The remaining ABA groups are than so far apart that
they are perceived as separate short tone sequences.
The advantage of this method is that we can now use the same measuring conditions
for both tone sequences. If we shift only tone B in every other group ABA in the
continuous sequence, this means that the tone B takes just as long to move a given
distance from the middle in the two sequences; see Fig. 8. We only measured the
just noticeable shift of B in the forward direction.
~T";'T-':'-2T~-., --
.. ..... .....i.- 4T-%%-+---4T-----...:..- 4T-%%+-- 4T-----.:t
ffit----------------------1:::Jo~ :..T..:..T7--6T"----.,-:LL
Fig. 8. Form of the stimulus used tomeasure the just noticeable displacementof the tones B in three-tone sequencesand continuous sequences. The time intervals are so dimensioned that in bothcases an equal length of time is neededto reach a certain displacement fromthe middle.
~ ~ ~
i-----8T-Jr4%----<·.:.·~--8T-11.%---:
TIME
quence depends much more strongly
on the frequency interval than it
does in the 3-tone sequences. As
we have seen in Van Noorden (1974)
that temporal coherence can be per
ceived over larger frequency inter
vals in short tone sequences,
these observations provide further
support for the view that the tem
poral position can only be perceived
properly when there is temporal co
herence.
202I (sem ito nes)
40
r-x
50
Fig. 9. Just noticeable displacement 6Tof the tones B ill a Three-tone (0) and acontinuous (x) tone sequence. The experimental points inrticate individual reversalpoints in the tracking experiment.
(/) 30E
~ 20
Another point in which these measurements differ from those of the first section
is that in these measurements fA is slowly swept from a frequency lower than f B to
one higher than fB
. The disadvantage of this method is that the range of frequen
cies covered must be limited in order not to make the whole cycle of measurements
too long, which would tire the observer. Further, T = 100 ms, D = 40 ms, f B = 1 kHz
and LA = LB = 3S dB SL at 1 kHz.
The results of the measurements performed by the author are plotted in Fig. 9.
It is clearly seen from this graph
that the just noticeable shift of
B from the half-way point between
the tones A in the continuous se-
12
discussion I: perception of displacement compared with perception of order
Summarizing, we may state that we have shown that the accuracy with which the tem
poral position can be determined deteriorates when there is no temporal coherence
between the tones whose temporal position is to be studied. We have shown this in
both dichotically and diotically presented continuous tone sequences.
The question is now to what extent these results can be compared with those of
investigators who have shown that the order i~ tone sequences or other auditory pat
terns can no longer be observed if elements which differ too much from one another
are presented in close succession in these sequences (Schouten, 1962; Norman, 1967;
Bregman and Campbell, 1971, Neisser et aI., 1974; Warren, 1974; Thomas and Fitzgibbons
1971). There is certainly qualitative agreement. Most of the investigators ascribe
the failure to observe the order to the loss of temporal coherence. (Bregman (1971)
uses the concept "streams" and Thomas and Fitzgibbons that of "perceptual classes".)
Some of them also observed that the order can be perceived better in short tone
sequences than in continuous ones (Warren et al., 1969).
It should be possible to derive a measure of the inaccuracy of the relative temporal
position from these order-perception experiments. If we assume that the interchange
of two successive elements is the most common error here, we might conclude that
the inaccuracy is of the same order of magnitude as the tone repetition time.
The tone repetition times at which difficulties arise in the perception of order
are those ranging from 250 to 500 ms. These values are several times greater than
the largest inaccuracies observed in our experiments. The following facts should
be taken into consideration when assessing this discrepancy:
1. The tone sequences used in the order experiments were more complex, consisting
as they did of more than two different tones.
2. In these order experiments, there was generally no silent interval at all between
successive tones. As we saw in Van Noorden (1974), fission is more likely to occur
under these conditions.
3. In point of fact, the shift experiments are concerned with the detection of dif
ferences. Wilcox et al. (1972) and Warren (1974) have shown that differences in
order can be detected in tone sequences which are so fast that the order itself
can no longer be identified.
4. In our experiments, the shift from the middle causes a change from fission on
the basis of pitch to a grouping in time: in fact, we get a rapid succession of
two-tone sequences, where the perception of temporal position is easier than in a
continuous sequence.
Although the loss of the ability to perceive order can lead to more spectacular
results, we preferred to use shift measurements because this made it possible to
use the same tone sequences as for the determination of the temporal coherence
boundary; the results of the two investigations are thus directly comparable.
Moreover, this shift method as such gives interesting results which throw light on
the question of the timing required in playing music.
1 3
discussion II: the discrimination of time intervals
So far we have only shown that the perception of temporal position and that of
temporal coherence are closely related. However, the results of our experiments
can also be compared with published data on the discrimination of time intervals.
The measurements described in the literature were all performed with different
methods and stimuli, so that the values found for the just noticeable difference
in time interval show quite a considerable spread; this naturally makes comparison
difficul t.
The oldest measurements were concerned with the discrimination of the duration
of tones or noise bursts; they generally gave a 6T of 10-20 ms at a reference
time interval of 100 ms (Burghardt, 1971; Creelman, 1962; Chistovicr., 1959, ile:lry,
1948; Stott, 1935; Small and Campbell, 1962). However, our experiments were concer
ned with the discrimination of time intervals between two tone bursts. Abel (1972)
has shown that the time interval between two noise or tone bursts cannot be discri
minated as well as the duration of tone burst. She found a just noticeable difference
for the interval between two pulses, 6T, of about 40 ms at a reference time interval
of about 100 ms. There is a big discrepancy between this value and that of 5 ms
which we found in our experiments with two-tone sequences with fA = fE
at 100 ms.
One possible explanation for this difference is that the reference value of the
interval from presentation to presentation varied over a wide range (0.63 < T < 630 ms)
in Abel's experiments, while in our experiment we always worked at T = 100 ms,
permitting much better training for this specific interval.
The experiments with the sweep method on continuous sequences AEAE ... can best
be compared with measurements on tempo or rhythm discrimination. Michon (1964)
has shown that the tempo of a sequence of clicks where the interval between suc
cessive clicks is 100 ms can be distinguished from that of a sequence where the
interval between the clicks is 0.8% longeror shorter. The difference between his
experiments and ours is that he studied the discrimination between sequences with
different tempos, while our observers had to compare temporal intervals within
one sequence. Lunney (1974) recently described an experiment which was more com
parable with ours. His observers had to adjust the temporal position of every
fourth pulse in a sequence of isochronous metronome pulses so that they could
just detect an irregularity in the rhythm. At a pulse repetition time of 100 ms,
the shift set amounted to 3-4 ms. Bearing in mind the difference in stimulus and
measuring method, we may consider this result to be in fair agreement with the
value of 8 ms we found at T = 100 ms in the first section. The T dependence of
6T in his experiments also agreed well with ours.
So far, we have only compared our results for I = 0 with published data. Little
is known about how the discrimination of time intervals depends on the frequency
interval between the two tone bursts defining the time interval. However, we may
conclude from the results of Williams and Perrott (1971), Perrott and Williams
(1970) and Divenyi and Hirsch (1972) that the time discrimination deteriorates
with increasing frequency interval. There is little point in carrying the comparison
further, in view of the great differences in measuring method and stimulus involved.
14
It would be good to be able to form a picture of the mechanism by which time
intervals are discriminated in perception. It might be thought that up to tone
repetition times of about 200 ms, peripheral processes such as masking or adaptation
could provide the information needed for discrimination of the relative temporal
position (see Appendix). A coupling with the peripheral excitation pattern could
also offer an explanation of the deterioration in time discrimination with increa
sing frequency interval, and perhaps also of the asymmetry found in the sweep
experiments with the tone sequence ABAB .. at T < 120 ms. However, since we still
have to find an explanation for the fact that time intervals can be distinguished
better in short tone sequences than in long continuous ones, and since we can only
guess at the mechanism responsible for subjective time measurements at longer
time intervals, it would seem to be better to gather more experimental data before
trying to reach definite conclusions.
summary
In tone sequences of two alternating tones ABAB, with a large frequency interval
between the tones and a fast rate, the observer does not hear the string ABAB .. ,
instead the strings AA.and BB.are formed in the perception.
In this paper we investigated whether this "loss of temporal coherence" also
implies that the observer can no longer hear the temporal relations between the
tones A and B. A measurement was carried out in which we slowly shifted the tones
B out of the temporal midpoint between the tones A. The observer had to detect
whether the tones B were no longer in the middle. The tone repetition time and the
frequency interval were systematically varied. The results indicated that the
relative just-noticeable-shift increased with decreasing tone repetition time
and with increasing frequency interval. Similar experiments were carried out in
sequences of only two and three tones and in dichotic sequences. In all cases the
relative just noticeable shift reflected the loss of temporal coherence.
appendix
a simple model for the discrimination of the relative temporal position of the tones A and B
in the tone sequence ABAB..
The phenomena of forward masking and loudness integration indicate that the per
ception of a tone lasts longer than the physical tone burst itself. The tone is,
as it were, spread out in time in the hearing. As a result, interaction can be
produced between successive tones, and will be stronger when the tones are closer
together in time. A keen observer will pay attention to this interaction if it
enables him to discriminate between successive tones more accurately than would
be possible with his time interval measuring mechanism alone.
In order to gain insight into how this temporal "blurring" could permit better
discrimination of the time interval between successive tones, we have worked out
a simple model for this mechanism. For the sake of concreteness let us consider
an electrical circuit.
15
Its input voltage can be compared to the amplitude of the tone bursts, while the
output voltage is comparable with the temporally blurred excitation produced by
the tone bursts somewhere in the auditory system. For the sake of simplicity, we
will assume that the incoming tone bursts have the form of a square wave. The
simplest circuit which can produce the kind of temporal spread we are interested
in is a first-order RC integrator circuit (leaky integration), which is characteri
zed by its time constant T; see Fig. A-1. As long as the time between two succes
sive voltage pulses is large compared with T, the maximum output voltage at the
end of both pulses will be the same. However, as soon as the time interval becomes
small compared with T, the maximum output voltage at the end of pulse B will be
higher than that at the end of pulse A.
We now assume that it is possible to compare the maximum output voltage at the end
of the successive pulses; the relative temporal position of the two pulses can
then be derived from the result of this comparison (see Fig. A-2).
INPUT
n
Fig. A-1.
OUTPUT
../"--
ltop) and A-2. (bottom).
Let us now consider a continuous sequence
of pulses analogous with the tone sequence
ABAB .. used in the paper. It will be clear
that, apart from transient effects, the
output voltage at the end of the pulses
B (UB) will not be equal to that at the end
of the pulses A (UA) when B is not situated
half-way between the two As.
It follows from the differential equation for this circuit, with the boundary
condition that the situation is stationary in time (Vt = Vt + 2t)' that
UB
1 + exp(- T - ~T)T ( 1 )
where ~T is the displacement of B from the midpoint between the two As. It will
be noticed that the tone duration does not appear in (1), as long as it is the same
for the tones A and B, and the pulses do not overlap at the input.
Now what we want to do is to detect whether 4T ~ 0, i.e. whether UA/UB ~ 1. We
assume that U = IUB/UA - 11 must exceed a certain value (the detection threshold)
for this to be possible. When we know the height of the detection threshold, the
just noticeable value ~T follows from (1), that is
(2)
We have now found a relation between ~T and T in which two parameters (the time
constant T and the detection threshold U) playa role.
16
This model can be compared with the tone sequence with I = O. However, we have
seen before that T increas~s witll increasing I. This is not unexpected on the
basis of the assumption that the discrimination of the temporal position is
realized with the aid of an interaction such as forward masking or loudness sum
mation, since interactions of this type decrease when the tone interval between the
successive tones increases. This will clearly make the discrimination of temporal
position more difficult in some way or another. We can simulate this effect in
our model by letting the detection threshold rise with increasing I. In a first
approximation, we can write
U a + bI (3 )
In so doing, we have added one parameter to the description of the system.
Although equation (2) was derived on the basis of a very simple model, we have
investigated how well (2) and (3) can describe the measured values of AT found.
The relation between AT and T in (2) is such that the former increases with increa
sing T. This is the case for the measured values of AT in the range T < 120 ms.
We therefore determined the values of the parameters T, a and b, giving the smal
lest differences between the measured values and those calculated from (2) and (3),
for the experimental points for 60, 80, 100 and 120 ms and values of I from
o to 25 semitones.
Using the minimum value of chi squared as criterion in a grid method, we found
T = 63 ms, a = 0.0585 and b = 0.0105 per semitone. The linear correlation coeffi
cient between the measured values and the values calculated with these parameter
values was 0.9765, whi~ll means th~t 95% of the variation in the experimental points
can be explained on the basis of this simple model. This is roughly the same
explained variation as we found in the first section with the model
ATIT c + dI ( 4 )
While equation (4) has one parameter less, the derivation via equation (2) has the
advantage that it provides physical insight into the probable operative mechanism,
while (4) is merely a simple empirical expression.
The fact that equations (2) and (3) allow reasonable prediction of the measured
values does not necessarily mean, of course, that this simple model provides an
adequate description of the processes occurring in the auditory system. In parti
cular, we do not yet know whether the amplitude of the voltage pulses at the input
of the RC network should be regarded as a function of the amplitude of the tone
bursts. It is possible that the pulses are first standardized (as Burghardt (1972)
assumed in a similar model which he used to predict the subjective duration of
tone bursts), thus ma:<ing the time-interval discrimination ir'dependent of t:le
amplitude.
If there is a monotonic relation between the amplitudes of the tone bursts and the
voltage pulses, the values found for T could perhaps be related to the long time
constant found for forward masking (75 ms; Duifhuis, 1972) and the just noticeable
17
difference in loudness (~ dB; Domburg, 1966). However, before we can form an opinion
about this we would have to repeat the measurements of short tone sequences with
amplitude differences between the tones.
references
Abel, S.M. (1972) Discrimination of Temporal Gaps, J.Acoust.Soc.Amer. ~, p. 519-524
Augustus, B. and Nederhand, B. (1973) De Discriminatie van Tijdintervallen tussenTonen van Verschillende Frequentie in Toonreeksen van Variabele Lengte, Unpubished Report.
Blauert, J. (1970) Zur Tragheit des Richtungshorens bei Laufzeit- und Intensitatsstereophonie, Acustica ~, p. 287-293.
Bregman, A.S. and Campbell, J. (1971) Primary Auditory Stream Segregation andPerception of Order in Rapid Sequences of Tones, J.Exp.Psychol. 89,p. 244-249.
Burghardt, H. (1971) Subjective Duration of Sinusoidal Tones, Proc. 7th Int.Congr.on Acoustics l, 20 H1, p. 353-356.
Burghardt, H. (1972) Einfaches Funktionsschema zur Beschreibug der subjektivenDauer von Schallimpulsen und Schallpausen, Kybernetik ~, p. 21-29.
Cardozo, B.L. and Jong, Th.A. de (1971) A Note on a Sequential Up-and-Down Methodof Threshold Finding, IPO Annual Progress Report ~, p. 125-127.
Chistovich, L.A. (1959) Discrimination of the Time Intervals between Two ShortAcoustic Pulses, Akusticheskii Zhurnal 5, p. 480-484.Translation: Soviet Phys.Acoust. ~ (196Q), p. 493-497.
Creelman, C.D. (1962) Human Discrimination of Auditory Duration, J.Acoust.Soc.Amer. l.!, p. 582-593.
Divenyi, P.L. and Hirsh, I.J. (1972) Discrimination of the Silent Gap in Two-ToneSequences of Different Frequencies, J.Acoust.Soc.Amer ~, p. 166(A).
Domburg, G. (1966), The Just Noticeable Difference for Loudness, IPO Annual ProgressReport l, p. 8-11.
Duifhuis, H. (1973) Consequences of Peripheral Frequency Selectivity for Nonsimultaneous Masking, J.Acoust.Soc.Amer. 2i, p. 1471-1488.
Henry, F.M. (1948) Discrimination of the Duration of Sound, J.Exp,Psychol. ~,
p.734-743.
Huggins, A.F.W. (1974) On Perceptual Integration of Dichotically Alternated PulseTrains, J.Acoust.Soc.Amer. ~, p. 939-943.
Linssen, M.R. (1973) De Discriminatie van Tijdintervallen tussen Toonstootjes vanVerschillende Frequentie, Unpublished IPO Report No. 252.
Lunney, H.W.M. (1974) Time as Heard in Speech and Music, Nature 249, p. 592.
Michon, J.A. (1964) Studies on Subjective Duration I. Differential Sensitivityon the Perception of Repeated Temporal Intervals, Acta Psych. ~, p. 441-450.
Neisser, U. (1972) On the Perception of Auditory Sequences, Paper presented at theAmerican Psychological Association, Honolulu 1972.
Neisser, U. and Hirst, W. (1974) Effect of Practice on the Identification of AuditorySequences, Perc. & Psychophysics ~, 391-398.
Noorden, L.P.A.S. van (1971a) Discrimination of Time Intervals Bounded by Tones ofDifferent Frequencies, IPO Annual Progress Report ~, p. 12-15.
Noorden, L.P.A.S. van (1971b) Een Inleidend Onderzoek naar het Waarnemen van deTijdstructuur van Trillerachtige Auditieve Patronen, Unpublished IPO ReportNo. 214
Noorden, L.P.A.S. van (1972) De Discriminatie van Tijdintervallen tussen Toontjesvan Verschillende Frequentie, Unpublished IPO Report No. 228.
Noorden, L.P.A.S. van (1974) Temporal Coherence in Random Tone Sequences, IPOAnnual Progress Report ~, p. 4-21.
Noorden, L.P.A.S. van (1975) Temporal Coherence in the Perception of Tone Sequences,Thesis Eindhoven University of Technology.
18
Norman, D.A. (1967) Temporal Confusions and Limited Capacity Processors, Attention& Performance I, A.F. Sanders, Ed. (North Holland Publ. Comp., Amsterdam).
Perrott, D.R. and Williams, K.H. (1970) Auditory Temporal Resolution: Gap Detectionas a Function of Interpulse Frequency Disparity, Psychonomic Science ~,p. 73-74.
Schouten, J.F. (1962) On the Perception of Sound and Speech; Subjective TimeAnalysis, 4th Intern. Congress on Acoustics, Copenhagen, Congress Report II,p. 201-203.
Small, A.M. and Campbell, R.A. (1962) Temporal Differential Sensitivity for AuditoryStimuli, Amer.J.Psych. ~, p. 401-410.
Stott, L.H. (1935) Time-Order Errors in the Discrimination of Short Tonal Durations,J.Exp.Psychol. li, p. 741-766.
Thomas, I.B., and Fitzgibbons, P.J. (1971) Temporal Order and Perceptual Classes,Paper of the 81st Meeting of the Acoust.Soc.Amer.
Warren, R.M. (1974) Auditory Temporal Discrimination by Trained Listeners, Cognitive Psychology Q, p. 237~256.
Wilcox, G.W. (1972) Temporal Coherence of Tone Sequences, Paper given in a symposium entitled "Perception of Temporal Order in Hearing: Old Pattern-RecognitionProblems in a New Guise" at the American Psychological Association meeting,Honolulu, Hawaii, September 4, 1972.
Wilcox, G.W., Neisser, U. and Roberts, J. (1972) Recognition of Auditory TemporalOrder, Abstract of Paper presented at Eastern Psychological Association MeetingBoston, Mass. April 28, 1972.
Williams, K.N. and Perrott, D.R. (1971) Temporal Resolution of Tonal Pulses,J.Acoust.Soc.Amer. ~, p. 644-647.
This article covers the sixth chapter and Appendix B of the doctoral thesis
"Temporal Coherence in the Perception of Tone Sequences" by L.P.A.S. van Noorden,
submitted in February 1975 to the Eindhoven University of Technology.
19
PSYCHOPHYSICAL TWO-TONE SUPPRESSION
H. Duifhuis
introduction and experimental design
We have recently developed a physiologically specified theory on cochlear nonlinear
ity and second filter (Duifhuis, 1974 a,b, 1976), which has predictive value as
for two-tone suppression. This theory specifies an average effective stimulating
waveform E(x), for the hair cell at point x, in response to an arbitrary stimulus.
Furthermore, a monotonic relation is assumed between E(x) and the average firing
rate, fr(x), in fibres innervating that hair cell.
A prediction relevant to two-tone suppression is given in Fig. 1 (for details see
Duifhuis, 1976). This figure shows the average of the effective stimulating waveform
in response to two tones, labeled P (for probe), and M (for masker), as a function
of masker level, and as expectedly observahle in a channel tuned to the probe. The
interesting features of the figure are that:
1) point AM1 is a measure of the amplitude characteristic of the first filter,
2) the ratio AM2 /AM1 is a measure of the second filter and
3) the slopes (v-1) and v are determined by the nonlinearity, which was described
with a vth law device (we use A for signal amplitude in this paper).
Undoubtedly, this figure contains much interesting information. This led us to the
question of how to measure such a function psychophysically (we are not in the
position to carry out the alternative neurophysiological experiment).
-UJ
MASKER AMPLITUDE. dB
Fig. 1. The average effective stimulatingwaveform, E, in response to a probe + maskerstimulus, E(P+M), and in response to maskeralone, E (M), for two masker frequencies, asexpected in a channel tuned to the probefrequency. The probe amplitude is fixed.Masker amplitude is the independent variable,Suppression occurs between AM1 and AM2 , inwhich region E(P+M) < E(M). Suppressionis predicted only if probe and masker frequency are sufficiently different.
If, in line with Houtgast's (1974) pioneering work on the pulsation threshold, we
assume t0at the criterion for continuity at the pulsation threshold is that the
activity in the probe channel remains constant, we may have the necessary tool
enabling us to measure the activity in response to a two-tone complex in terms of
a single-tone response. The theory predicts E for a single tone to grow as E ~ AV,
so that, using the pulsation threshold, we would be able to measure AH1 and AM2and the slope l-l/v (Fig.2), Thus, we would be able to determine the parameters
mentioned above.
IPO annual progress report /0 /975
ApP !
20
Fig. 2. The expected result of scanningthe probe + masker response E by using thepulsation threshold technique. The abscissais the same as in Fig. 4, the ordinate nowgives the pulsation threshold. This resultsin a change of slopes, the transition pointsremaining fixed at the masker amplitude scale.For an increase of (signal-) probe levelfrom SP] to SP 2 , we expect a shift of thecurve at 45 0
, as indicated.
The choice of the vth law device to describe the compressive nonlinearity in the
cochlea, implies that the shape of the curve in Fig. 2 is independent of the probe
level. A change in probe level would lead to a 45 0 shift of the curve, as indicated
by the dotted line.
The stimulus to be used in such an experiment is depicted schematically in Fig. 3.
M M Fig. 3. Schematic temporal organization of
f \ lSP b.
the stimulus. The masker + signal probeSP signal (H+ SP) is interleaved with the scanning
pp pulsation probe signal (PP) . Masker frequencyis variable, probe frequency fixed at 1 kHz(same for SP and PP). Signal transients are
I , I I , i )I cosine-shaped with a duration of 20 ms.
0 100 200 300 400 ms
We presented trains of 10 masker + signal probe (SP) stimuli, interleaved with
9 pulsation probes (PP). The bursts were each 120 ms in duration (in a few cases
125 ms) measured at half-amplitude points of the envelopes, and onset and offset
were cosine-shaped with a duration of 20 ms. These values closely match those
which turned out to be useful in Houtgast's (1974) experiments. Since we were inter
ested in using the technique rather than investigating the continuity effect as such,
we simply adhered to these values. For a number of fixed levels of signal probe SP
the subjects had to adjust the pulsation threshold (A pp ) as a function of the
masker amplitude. The frequency of SP and PP was always 1 kHz in this experiment, the
masker frequency, f M, was a second parameter. Care was taken to keep SP and PP
in phase, so that addition of the two at equal amplitudes would leave no detectable
transients.
Stimuli were presented monaurally (right ear) through KOSS PRO/600 aa headphones.
Subjects were seated in a sound-treated booth.
results
The results of a number of measuring series are presented in Figs. 4 to 7 inclusive.
All amplitudes are expressed in terms of sound pressure level.
21
80 r-....,--.--r-----,--.--r-----,---,---,.-,
...Jll.lfJm 60
"00-
« 40
20
20 40 80
S, HvC1M, 200 Hz
a100
dB SPL
...Jll.lfJ
!ll 60
it« 40
40 60 80 100dB SPL
Fig. 4a. Fig. 4b.
Fig. 4. Experimental results for a 200 Hz masker for S. HvC in Fig. 4a and S. DBin Fig. 4b. In the interesting right hand part of the figure we note markedquantitative differences between subjects as well as between observed data and theexpected results indicated in Fig. 2.
80 80
Asp 69 o 0 C>\
~Asp r...J ...J
ll. ll.lfJ
~~lfJ
m 60 m 60
" t "49 - il 50 A---
0-0-
39 • J 40« 40
S, HvC S, DB
1M' 600 Hz 1M' 600 Hz
a b20 20
40 60 80 10020 40 60 80 100 20AM dB SPL AM dB SPL
Fig. Sa. Fig. Sb.
Fig. S. Similar to Fig. 4 for a 600 Hz masker.
""«
40
S, HvC201M, 1200 Hz
a
""«
60
S, HvL201M, 1200 Hz
b20 40 60 80 100
dB SPL20 40 60 80 100
dB SPL
Fig. 6a. Fig. 6b.
Fig. 6. Experimental results for a 1200 Hz masker for S. HvC in Fig. 6a and S. HvLin Fig. 6b. Note that App decreases monotonically with increasing masker amplitude .
...Jll.lfJm 60"
"-:t 40
Fig. 7. Data for S. HvC for different maskerfrequencies above 1 kHz. The slope of thedescending branche is found to decrease withincreasing masker frequency.
5, HvC
202~0---'----4c'::0:----"---~60=-----'---=8':-0---'--------=-IO.L..O=-----'-J
AM dB 5PL
22
Several features are noteworthy:
1) differences between data from different subjects are relatively large, especially
in so far as the amount of suppression (i.e. the size of the dips) is concerned;
2) except for points on the "trivial" horizontal part of the predicted curve
(representing intensity discrimination), the data points are accompanied by con
siderable variability (from repeated measurements, with the same subject), as
indicated in the upper curve of Fig. 4b;
for masker frequencies above 1kHz:
3) the ascending branch of the curve (right of AM2 ) is, in general, not reached for
the masker amplitudes that were used, in some cases the descending branch appears
to level off;
4) the slope of the descending branch decreases with increasing masker frequency;
5) the intersection points AM1 tend to shift approximately linearly with SP-amplitude;
for masker frequencies below 1kHz:
6) the data points qualitatively follow the predicted curves of Fig. 2;
7) the observed slope of the right-hand asymptote tends to be greater than 1;
8) the slope of the line connecting the AM1 points for different SP amplitudes is
considerably steeper than 1, and is found to be significantly steeper than the slope
referred to in point 7.
discussion
Qualitatively, the data show the suppression effect similar to the neurophysiological
two-tone "inhibition" (e.g. Sachs, 1969), and as predicted by our theory (Duifhuis,
1976). Quantitatively, however the points 4, 7 and 8 mentioned above cannot be
accounted for by this theory. We may phrase the difference as: the data are found
to exhibit more nonlinearity than the model attributes to the system. This result
was not entirely unexpected. Points 5 and 8 are roughly in line with Shannon's
(1975) results from a comparable forward masking experiment. Point 4 is consistent
with a very similar finding obtained neurophysiologically by Sachs (1969). His data,
likewise, show a decrease of the suppression slope as the suppressor frequency
increases. Points 7 and 8 are apparently related to the nonlinearity of pure-tone
masking, as reported already by Wegel and Lane (1924). The deviation from unity
slope in our data is in the same direction as in these classical data.
At this point several ways are open for an approach towards better understanding of
the data and of cochlear nonlinearity. We mention the possibility of considering a
nonlinearity in the basilar membrane excitation and mechanical hair cell excitation.
An other possibility is that the compressive nonlinearity at the hair cell, as
characterized by the model parameter v, behaves much more complicated than was
assumed. It is, e.g., conceivable that v would depend on spatial angle as well as
on excitation level. Another point is that. because of the nonlinearities, the
single-channel hypothesis (c.q. see DuifllUis. 1976) is in fact a no longer iusti
fiable simplification (cf. Verschuure, 1974).
For the psychophysical data pertinent to the above issues we face the problem
that the data result from an active interaction in the nonlinear system, and
are therefore "contaminated". Unless we have sufficient understanding of the system,
it will be impossible to localize and quantify the properties of parts of the system,
23
Our present aim is to investigate further and evaluate, experimentally as well
as theoretically, several of the possibilities mentioned above, in order to achieve
a better understanding of cochlear nonlinearity and two-tone suppression.
summary
A theoretical model on cochlear nonlinearity and second filter, which we have develop
ed in recent years, has direct predictive value as regards two-tone suppression. In
this paper we present data from an experiment set up to psychophysically determine
certain parameters of the model, such as the nonlinearity and second-filter charac
teristics.
It is found that the experimental data, which are consistent with literature data,
qualitatively agree with the theory, However, the data cannot be fully accounted
for quantitatively by a single adjustment of model parameters. This is felt to
imply that the system contains other nonlinearities than the one specified in the
theory.
acknowledgements
The author is indebted to H. van Cuyck and D. Bol, students of the Eindhoven
University of Technology (Physics dept.), who collected most of the data presented
in this paper.
references
Duifhuis, H. (1974a) An Alternative Approach to the Second Filter, in: Facts andModels in Hearing, E. Zwicker and E. Terhardt, Eds, (Springer, Berlin).
Duifhuis, H. (1974b) The Auditory Second Filter, IPO Annual Progress Report ~'
p. 32-37.
Duifhuis, H. (1976) Cochlear Nonlinearity and Second Filter; Possible Mechanism andImplications, J.Acoust.Soc.Amer. (in press).
Houtgast, T. (1974) Lateral Suppression in Hearing, doctorate dissertation,Free University, Amsterdam.
Kim, D.O., Molnar, C,E. and Pfeiffer, R.R. (1973) A System of Nonlinear DifferentialEquations Modelling Basilar-Membrane Motion, J.Acoust.Soc.Amer. ~' p.1517-1529.
Sachs, M.B. (1969) Stimulus-Response Relation for Auditory-Nerve Fibers: Two-ToneStimuli, J.Acoust.Soc.Amer. ~' p. 1025-1036.
Schroeder, M.R. (1975) Amplitude Behavior of the Cubic Difference Tone, J.Acoust.Soc.Amer. ~' p. 728-732.
Shannon, R.V. (1975) Suppression of Forward Masking, doctorate dissertation,University of California at San Diego.
Verschuure, J., Rodenburg, M. and Maas, A.J.J. (1974) Frequency selectivity andTemporal Effects of the Pulsation Threshold Method, Proc. 8th Int. Congr.Acoust.,London, Vol. I, p. 131.
Wegel, R.L. and Lane, C.E. (1924) The Auditory Masking of one Pure Tone by Anotherand its Probahle Relation to the Dynamics of the Inner Ear, Phys.Rev. ~'
p. 266-285.
25
RESEARCH ON SPEECH PERCEPTION IN THE I.P.O. 1975
S.G. Nooteboom, J.P. Brokx, G.J.N. Doodeman, Th.A. de Jong, J. 't Hart, A.FY van Katwijk,
J.J. de Rooij, I.H. Slis and L.F. Willems
introductionA number of research projects on aspects of speech perception are at present
being carried out in our institute. The perception of speech, and particularly
connected speech, may be seen as a highly complex and flexible processing of
auditory information, the details of which are largely unknown. We hope and expect
research in this field not only to lead to making explicit the relation between
the structure of acoustic stimuli and the responses of listeners in well defined
experimental tasks, but also to the discovery of mental structures and processes
involved in the auditory perception of speech in particular, and of complex infor
mation processing in humans in general.
The main effort in our research is directed towards the perceptual structure and
functioning of speech prosody. Pitch contours and temporal structures show a high
degree of organization over utterances of phrase length. Controlled studies of
how these structures may function seems a promising way of gaining acces to per
ceptual processes and strategies dealing with the non-simultaneity of acoustic
cues relevant to the decoding of linguistic messages.
prosody and syntax
Earlier research at IPQ has shown that the structure of pitch contours in longer
utterances in Dutch may be described in terms of substructures (intonational
blocks) strung together. In the past year a few experiments have been run to test
the hypothesis that the boundaries between such intonational blocks can be related
to major syntactic boundaries, and may help listeners to detect the syntactic
structures of messages. Some results are reported in this issue ('t Hart).
In the above experiments the only variable was the pitch contour. Another research
project is at present under way, with the aim of assessing the relative contribu
tion of pitch contours and temporal structures to the detection of major syntactic
boundaries. A pilot experiment is reported in this issue (De Rooij).
discriminability of size of pitch movements
The experiments on discriminability of the size of pitch movements in speech and
piano tone sequences, reported on in our Progress Report 1974, have been extended.
It is found that the distribution of thresholds over subjects is bi-modal, both
for rises and for falls, in speech as well as in piano tone sequences. Whether the
bi-modal distributions of differential thresholds reflect inherent differences
between classes of subjects, or are caused by different listening strategies, is
at present unknown ('t Hart).
perceptual interaction between prosodic and inherent vowel duration
A number of perceptual measurements are being carried out in which the categori
zation of identical sets of vowel durations into phonemically long and short vowels
is studied as a function of position within a phrase. The results obtained so far
IPO annual progress report 10 1975----
26
can be interpreted in terms of an effect of expected prosodic duration on the
criterion for long vowel/short vowel perception. Accuracy of perception, as indicated
by the slopes of the identification functions, remains surprisingly good, also
in longer phrases (Nooteboom, Doodeman).
perception of syllable boundaries
In a perceptual experiment with synthetically generated sequences of identical
nonsense syllables, the perception of syllable boundaries has been studied as a
function of both the detailed temporal structure of the sequences of vowel and
consonant segments, and what was expected from the particular phonemic segment
that a stimulus sequence started with. It was found that what was expected general
ly over-ruled the subtle differences in durational structure (Slis).
sequential effects in production and perception of accent patterns in number names
Inreading aloud compound number names such as those for 562, 563, etc versus
491, 591, etc, accent patterns are often determined by the preceding number names.
The grammatical components which determine correspondence and non-correspondence
leading to such sequential effects in accentuation are at present being investi
gated. Some experiments have been run to find out what factors influence the
retention of preceding number names in the mind of readers, and what information
on sequential position listeners can derive from the accent patterns. Some results
are reported in this issue (Van Katwijk).
auditory coherence of connected speech
If connected speech is assembled from prerecorded words or syllables the result
may be unnatural and unintelligible due to a lack of auditory coherence, which
in normal speech is supplied by the integrity over time of production processes.
A research project has been set up to study the relative importance of several
types of acoustic discontinuities to intelligibility and naturalness (Brokx).
research facilities
In our last year's Progress Report a system was described for manipulating pitch
contours and durational structures of speech. The system was based on a channel
vocoder. With a view to improving the quality of vocoderized speech we are in the
course of developing an analysis-synthesis system based on an LPC vocoder (Willems).
The speech editing system, also described in the Progress Report 1974, is being
improved on, mainly with respect to ease of operation and sound quality. The
segmentation component of the system is provided with variable rise and decay
times (De Jong).
27
THE LOCATION OF THE NON-FINAL FALL IN PITH CONTOURS IN DUTCHJ. 't Hart
introduction
Earlier reports (Collier & 't Hart, 1971; 't Hart & Cohen, 1973; 't Hart & Collier,
1975) have described how we have tried to give an account of regularities in contours
of speech pitch in Dutch in terms of rules and, ultimately, in the form of a grammar
of Dutch intonation. A particular property of these rules, and of the grammar, is
that they do not accept input conditions as to semantic or syntactic aspects of
the word content - with one exception: if the utterance in question is split into
a number of fairly independent parts, continuations will appear at the boundaries
of these parts; a continuation, formerly called "caesura", is characterised by a
non-prominence-Iending rise, very late in the last syllable before the boundary
immediately followed by an inaudible fall back to the original low level of pitch
before the rise.
Continuations come up for discussion again later in this paper. The main issue,
however, is the problem of the location of another non-prominence-Iending movement,
viz. the non-final fall. We have recently gained better insight into this problem.-
The non-final fall is a perceptually relevant pitch movement of Dutch intonation,
which has to occur between two successive prominence-lending rises to provide
the necessary pitch "reset" after the first rise. Unlike the final fall, it does
not lend prominence; accordingly, its position with respect to the vowel onset of
the syllable is different from that of the final fall, viz. rather early in an
inconspicuous syllable or indeed so early that it can be considered to fall in
between two syllables ('t Hart & Cohen, 1973).
With this definition, nothing has yet been said about the issue as to which syllable
(or syllables) of the utterance we are dealing with. This question seems to have
been answered in Cohen & 't Hart (1967), where it is stated that the non-final
fall occurs immediately after the word which contains the first prominence-lending
rise, i.e., the remaining syllables of that word retain a high tone, and the non
final fall is located between the last syllable of that word and the first one of
the next. (Fig. 1 ,A).
Later investigations, however, showed that this simple rule was violated too often
to be retained. According to a new rule, formulated in 't Hart & Cohen (1973), the
non-final fall may occur anywhere in between the two successive rises, provided
that it does not cause spurious prominence by being located too close to the lexi
cally stressed syllable of a non-dominant, but nevertheless not totally unimportant
wo rd . (F i g. 1, B) .
In general, upon inspection of more extended material than that on which this new
rule was based, there was little reason to curtail the liberty obtained by its
very generous formulation. In particular, there was found to be one position of the
non-final fall, frequently used in spontaneous speech, that could always be taken
- The clarification of this problem was achieved in cooperation with Dr. R. Collier,whose contribution to the design of the experiments and the evaluation of theirresults is gratefully acknowledged.
/PO annual progress report /0 /975
28
instead of the one provided by the original rule, viz. a position immediately
following the first prominence-lending rise, so to speak on the same syllable.
Both positions were equally acceptable. (Fig. 1 ,C).
However, quite a number of cases were observed in which the non-final fall seemed
to be deliberately "postponed", i.e., it occurred not between the last syllable
of the accentuated word and the first one of the next word, but later. And, typi
cally in these cases, this postponed position could not be freely altered into the
one provided by the original rule: in that position it sounded fully unacceptable.
(Fig. 1 ,D). (Acceptability can be judged by means of Intonator (Willems, 1966)
stylizations which enable us to locate a certain pitch movement at any position in
the pitch contour).
WORD:SYLLABLE:
A
2 3
etc!4
B
c
D
._...". ......-~_. --
Fig. 1. Different observed locations of the non-final fall.A. Original rule: prominence-lending rises on syllable 2 of word 1
and on syllable 3 of word 4; syllables 3 and 4 of word 1 remainhigh; non-final fall immediately after word 1.
B. Second rule: almost anywhere. Suppose word 3 not unimportant but notdominant either, with lexical stress on syllable 2, thenavoid non-final fallon that syllable (!).
C. Frequently occurring location: immediately after first rise.D. Postponed positions. According to original rule unacceptable (:).
These observations bring the realization that the second rule (of 1973) is not
valid either: where the original rule was too rigid, this one is too tolerant. It
was not clear how the rule about the position of the non-final fall should be refor
mulated so as to have it account for the seemingly "anomalous" cases of the post
poned non-final fall. Were the anomalous cases seldom enough to consider them
marginal, and would it hence be sufficient to extend the rule by a number of
exceptions? Or should its reformulation be more fundamental, for instance by virtue
of the introduction of some more sophisticated dimension?
It seems as if an answer to these questions has been given by the outcome of two
experiments, as reported on extensively in Collier & It Hart (1975). In the next
two sections, we will briefly deal with these experiments. In the second experiment,
not only non-final falls, but also continuations were involved. Therefore, a
subsequent section deals with possible consequences of the distinction made
between non-final falls and continuations in the description.
29
experiment 1
The Intonator was set to produce "hummed contours" of ten different kinds, in two
groups: in the first group of five contours, there were fifteen "syllables" (of
uniform duration and with uniform time intervals) and a pitch contour that provided
pitch accents on the second, eighth and thirteenth syllables, with a non-final fall
at five different positions between the first and the second pitch accent; in the
second group, there were thirteen syllables, pitch accents on syllables nos.
two, five and eleven, a fixed non-final fall immediately after the first pitch
accent and a second, variable one at five different positions hetween the second
and the third pitch accent. (See Fig. 2.) The main considerations in the choice of
the stimulus material were that there should be enough room to manipulate the loca
tion of the non-final fall and that too simple contours should be avoided in order
to ensure a fair amount of variability in the reactions of the listeners.
250 ms-r-\r-\r-\C"\C"\C"\C\C"\C"\C"\C"\C"\C"\C"\C\15 "syllables" AMPLITUDE ENVE LOPE
----f;-.-.- ..,..... -.,...-.-.-,....-...,\ " " ' ''0' "\·_-._....1._._ . .1. .... _ .l. _._.'
o EB cA \---PITCH CONTOUR
r-\r-\C\r-\C"\C"\C"\C"\C"\C)C"\C)C"\
(1)
-r\------r\-.- -.-.., - ,.. \, ..~"' ' , , ,._._._.~._. _.l. _,_. ~.... _...\
ABC 0 Elid.)
(2)13 "syllables" (id.)
Fig. 2. Stylized representation of the stimuli of the first experiment.
Five subjects were asked to think of sentences for which the hummed contours would
constitute suitable intonations. They were instructed to take care to put accentuated
syllables in the proper places, but they were not explicitly told about the partic
ular positions of the non-final falls. The sentences should be grammatically
correct, but there would be no objections to semantic nonsense. 144 sentences were
produced in all to go with the ten different hummed contours.
The sentences were tested with the following questions:
1) Given a particular hummed contour, can the subjects think of sentences with a
different word content?
The answer is clearly affirmative.
2) Given that sentences of various word content have been written down to go
with a particular stimulus (hummed contour), do they possess similar syntactic
structures?
This time the answer is negative: at comparable places of different sentences
with one particular stimulus, practically all kinds of syntactic elements are
30
found, subjects, objects, adverbial phrases, verb phrases, coordinative and
subordinative clauses.
3) Given these variegated syntactic structures, do they nevertheless share some
common property in the case of one particular hummed contour, which is system
atically different in the case of another stimulus?
As will be demonstrated below, such a common property can indeed be found.
The sentences were analysed in terms of syntactic constituents on the level of
phrase markers. An example is shown below.
S (S ((wat aardig) (van j e)) S S ((dat) (j e) (gisteren) (al weer) (opbelde)) S ) S1 1 2 2
(how kind of you that you yesterday again called)
A confrontation between the thus analysed sentences and the corresponding hummed
contours yielded a highly frequent coincidence of non-final falls with boundaries
of the syntactic constituents. See Fig. 3, in which, for instance,three sentences
are given that had been produced to go with one contour, and another three to go
with a different contour.
~-- /- ~--((het strijk-orkest-je )( kent weI ) (~-ven)(stukken) (van Beethoven ))
(( ( of Pe- ter nog komt)) ((weet) (geen) (mens) (totnogtoe) (met ~-kerheid)))
(((wat aardig van je ))((dat)(je ) (~-te-ren ) (al weer) (~belde )))
_/(((toen) (dacht)(hij)(waarschijnlijk))((dat)( ik Jewel )(voor hem)(zou uit-wij-ken)))
C((van - mor - gen ) (dacht ik al)) (( er) (moet) (toch) (wat op te vin-den zijn)))
(((be -loof me ) (nou eerst eens)) ((dat) (j ij ) ( er ) (weI toe in staat zou z ijn)))
Translations in same word order, respectively: The string-band knows seven pieces ofBeethoven; if Peter still comes knows no one as yet with certainty; how kind of youthat you yesterday again called; then thought he probably that I for him would pullout; this morning thought I already there must be something to be done about it;promise me first that you would be able to do it.
Fig. 3. Examples of sentences produced to go with two different kinds of hummedcontours.
In fact, the extent to which the coincidence mentioned has been observed in the
material can be read off from Table I. Rows represent the number of syllables
between the prominence-lending rise and the non-final fall of the hummed contour
which served as the stimulus; columns refer to the number of written syllables
between the - obligatory - accentuated syllable and the syntactic boundary. Figures
31
on the junctures are frequencies of occurrence in the percentages of the total
number of sentences produced for one stimulus. Corresponding stimuli of the two
groups are taken together.
Table I. Frequency of occurrence of sentences with a given distance (in syllables)between accentuated syllable and syntactic boundary, produced on hummedcontours with a given distance between prominence-lending rise and nonfinal fall.
Number of written syllables between accentuatedsyllable and syntactic boundary
Vlbll(1) ~
...... 0"".o'"d ......t1i~ ......
...... (1)t1i
............ '+<:>-.Vl(1) ......
u t1i'"d~~(1) (1)0""S~'+<So"" I~ S ~~oo
;.. ~
'+<0..o '"d
~ ~;.. (1) t1i(1) (1)
.0;::(1)S...., Vl~ (1)0""z.o ;..
o
2
3
4
o
20
7
o
4
8
35
93
7
8
8
2
28
o
93
4
4
3
10
o
o
84
8
4
7
o
o
o
72
The main diagonal shows the essence of the outcome. Point 0 - 0 does not seem to
follow this general trend; we will deal with the top row later. Firstly, we note
that the lower-left triangle shows the few cases in which the syntactic boundary
occurred prior to the non-final fall, whereas the higher-right triangle shows that,
generally, no syntactic boundaries have been generated beyond the non-final fall
- with the exceptions given in the top row.
The top row represents stimuli in which the non-final fall was located immediately
after the prominence-lending rise. The fact that with such stimuli sentences are
produced with syntactic boundaries at rather arbitrary distances from the accen
tuated syllable is in agreement with the observation mentioned in the Introduction,
that there seemed to be one position of the non-final fall that could always be
taken, viz. immediately following the first rise. This means that the speaker is
free to choose this position of the non-final fall, irrespective of the loction
of the syntactic boundary: the resulting pitch contour is fully acceptable.
If, however, the speaker does not choose this particular position, he is obliged
to mark the syntactic boundary. If he fails to mark it, there are two possibilities:
either the syntactic boundary precedes the non-final fall; from the lower left
part of Table I we may expect that in such cases a certain number of listeners
will not notice this non-coincidence; or the speaker produces a non-final fall
32
prior to the syntactic boundary (although not on the accentuated syllable); in
that case, it will be very readily be noticed by the listeners.
The outcome of this experiment can be interpreted to the effect that the non
final fall, instead of being a mere "reset" of pitch in preparation for the next
rise, may by its very position constitute an important intonational cue to the
listener for the segmentation of the surface structure of the utterance into units
that are suitable candidates for processing as a whole.
The following two additional remarks deal with the nature of the syntactic constit
uent boundaries under consideration.
First, they appear to be major syntactic boundaries (MSB) of two hierarchically
distinguishable kinds, viz. those between clauses (51 - 52) on the one hand, and
those between phrases (NP - VP - AdvP and the like) on the other. There were no
minor boundaries involved, such as between adjective and substantive, or auxiliary
and main verb.
Moreover, in the case of two successive syntactic constituents of which the second
lacks a pitch accent, there is still the choice of having the non-final fall mark
the boundary before the second constituent or the one after it. The experiment
gave strong evidence in favour of the hypothesis that in such cases the hierarchi
cally dominant boundary is marked.
expriment 2
The purpose of the second experiment was to see if, and to what extent, speakers
were willing and able to provide listeners with the cues mentioned above.
Ten subjects were asked to read aloud 30 sentences, arbitrarily chosen from the
material produced in experiment 1. Each sentence was read twice, so that 600 sen
tences were available for analysis. However, not all of this material was expe~ted
to be suitable for our particular interest, that is if the M5Bs were intonationally
marked. Obviously, there is freedom to use the variant in which the non-final fall
is attached to the preceding rise, and although among such cases there could be
instances in which the MSB does occur immediately after the accentuated syllable,
we cannot generally use them for our purpose. Furthermore, the subjects cannot be
forced to use the same kind of contours as had been applied in the first experiment.
A number of alternative ways of acceptably intonating the given sentences will no
doubt exist, and some of these may not contain non-final falls, or comparable phenom
ena, at the places of interest.
Indeed, 240 utterances, or 40% of the material, had to be considered not suitable
for further analysis, for either of these reasons.
Of the remaining 360 utterances, in 204 cases non-final falls were produced in pitch
contours that were fully comparable to those of the first experiment.
In 192 cases, the position of the non-final fall coincided with the M5B.
In 156 utterances, use was made of the alternative way of breaking up the intona
tional continuum, that is a continuation. As in the case of our rule about the use
of continuations mentioned in the Introduction, it has long been assumed that this
feature would be capable of marking syntactic boundaries. And indeed, in 150 in-
33
stances, it coincided with the MSB.
This outcome amply confirms expectations as to what the speaker does, as formulated
above on the basis of the results of experiment 1.
discussion: non-final falls versus continuations
The by-product of the second experiment confirming that continuations are at least
equally strongly connected to syntactic boundaries as are non-final falls, gives
rise to yet another question:
Why is it that sentences, for which contours only having non-final falls as boundary
markers were considered to be suitable intonations, are often reproduced with
continuations?
It seems as if the answer can be found with the aid of what is stated in 't Hart &Cohen (1973) about an "optional alternative rule" for continuations.
The optional alternative rule was an addition to the original rule for continuations,
as stated in the Introduction.Apart from the continuation rise and the immediately
following inaudible fall, a characteristic feature of continuations according to
the original rule is that the last pitch accent before the MSB in the form of a
rise-pIus-fall or a mere fall (Fig. 4.A) According to the optional alternative
rule, the last pitch accent should be realised by a rise, the remaining syllables
(before the MSB) being kept high and, as is the case with the original rule, there
is an inaudible fall immediately after the last syllable before the MSB (Fig. 4.B)
MSB
•
Fig. 4. Two different shapes of a continuation.A: according to the original rule;B: optional alternative.
This may answer the question as to the difference between the shapes of the bounda
ry markers in the stimuli of experiment 1 and of those in the response material
produced in experiment 2: listeners in experiment 1 may have interpreted the into
nation feature of the hummed contour as the optional alternative form of a conti
nuation and may have produced a sentence in which the MSB could appropriately be
marked by means of a continuation. Accordingly, in the second experiment the
subjects have in part produced continuations of the original kind and partly
of the alternative kind.
But why did the listeners interpret the non-final fall as if it were the "resump
tion of low pitch" which is so typical of continuations? The answer is that the
shape of the alternative kind of continuation is, on purely melodic grounds, not
distinguishable from the "anomalous" cases in which the non-final fall was typically
34
"postponed". Yet, as is reflected e.g. in our grammar of Dutch intonation ('t Hart
& Collier, 1975), they have been kept apart for some reason or other. The main
"reason" for their distinction might have been the fact that the interpretation of
the empirical, and experimentally tested second rule, according to which the non
final fall could be located anywhere between two successive rises, was still
implicitly biased by the first rule, to the effect that "anywhere" would not mean
"later than immediately after the accentuated word", so that "postponed" was
"anomalous".
The experiments mentioned in this paper have put an end to this biased interpreta
tion. But this does not necessarily imply that there would no longer be any reason
to distinguish the non-final fall from the resumption of low pitch in the alterna
tive form of the continuation. The experiments are not decisive in this respect.
Namely, we do not know whether the MSBs produced by listeners who have interpreted
the non-final fall as alternative continuation have some common property (which the
other MSBs lack) that makes them candidates to be read, in experiment 2, with the
intonational features of a continuation (of either kind).
It is clear that, already on melodic grounds, a continuation in its original shape
should be distinguished from a non-final fall. The actual problem is, however,
whether the alternative continuation, by virtue of being an alternative form of a
"genuine" continuation, should likewise be distinguished from a non-final fall, or
due to its being melodically identical to the non-final fall, should consequently
be held as distinct from a genuine continuation.
It might be doubtful whether this question should be answered at all. In any case,
we should not expect to find the answer in the material gathered in these experi
ments; rather, we should turn to the natural situation of spontaneous conversation.
Thus that the by-product of the second experiment, apart from confirming an intu
itive supposition, has yielded the new problem of how the various boundary markers
are most realistically distinguished and best described. The main product of the
two experiments is that we have found a frame into which both "regularities" and
"irregularities" of the location of the non-final fall can be made to fit.
Finally, it should cause no surprise that our initial, strictly melodic approach,
in which the non-final fall could only be considered to be a reset to low pitch
between successive rises, was not capable of specifying its location in more detail.
Once we take into consideration its functioning with respect to the syntactic
structure marking, we can fully understand the exact location of the non-final fall
in the pitch contour.
summary
In the course of our attempts to develop rules for the construction of well-formed
pitch contours for Dutch utterances, several propositions have been made with
repect to the location of the so-called non-final fall, each of which, however, led
to a different set of seemingly incoherent cases of exception.
The outcome of two new experiments have now enabled us to reformulate the rule so
as to account for formerly "regular" as well as "irregular" cases. To achieve this
it was necessary to introduce as a new dimension the syntactic structure of the
utterance.
35
Namely, it was demonstrated that, unless the speaker applies a strategy - which
he is free to choose - by which the non-final fall is located immediately after
the preceding prominence-lending rise, he must make the non-final fall coincide
with a major syntactic boundary (separating syntactic units such as Noun Phrase,
Verb Phrase, Adverbial Phrase from each other).
We have interpreted this outcome to the effect that the non-final fall may consti
tute a cue for the segmentation of the speech continuum into units suitable for
processing as a whole. Furthermore, these findings give support to the notion of
the intonational block, which has been introduced, partly for reasons of descrip
tive convenience, in establishing our grammar of Dutch intonation.
Melodically, the shape of the non-final fall is similar to that of one particular
type of continuation. These experiments have suggested that there is no difference
in their functional aspects either. The final part of this paper, therefore, deals
with the question of whether or not we can decide to stop keeping them apart in
our description.
references
Cohen, A and 't Hart, J. (1967) On the anatomy of intonation, Lingua ~, no. 2,p. 177-192.
Collier, R. and 't Hart, J. (1975) The role of intonation in speech perception,Structure and process in speech perception, Proc. of the Symposium on SpeechPerception, A. Cohen and S.G. Nooteboom Eds., Springer, Heidelberg.
't Hart, J. and Cohen, A. (1973) Intonation by rule: a perceptual quest, J. ofPhonetics 1., p. 309-327.
't Hart, J. and Collier, R. (1975) Integrating different levels of intonationanalysis, J. of Phonetics 3, p. 235-255.
Willems, L.F. (1966) The Intonator, IPO Annual Progress Report 1., p. 123-125.
36
PROSODY AND THE PERCEPTION OF SYNTACTIC BOUNDARIES
J.J. de Raaij
introduction
A speaker's message, acoustically coded, shows a number of prosodic regularities.
The speech signal, for instance is prosodically related to the syntactic structure
of the sentence in that major syntactic boundaries (MSBs) are often marked by
segmental lengthening (Huggins, 1974; Klatt, 1975) and specific pitch movements,
such as the continuation rise (J. 't Hart and A. Cohen, 1973).
The question arises as to the part, if any, which these prosodic regularities
play in the recovery of syntactic structure.
In the context of a research project set up to investigate the relative contribu
tions of temporal structures and pitch contours to the recovery of MSBs in the
perception of speech, a preliminary experiment has been carried through, designed
to find out whether naive subjects are able to use prosodic cues in the perception
of MSBs at all.
On the linguistic level of the decoding process the listener brings the whole of
his linguistic knowledge to bear. In order to isolate prosodic information from
the segmental, syntactic and semantic content and thus to be able to assess its
contribution to the perceptual process, use can be made of nonsense imitations
(Kozhevnikov and Chistovich, 1965; Carlson et al., 1972), hummed imitations
(Svensson, 1974) or spectrally rotated speech (Blesser, 1969).
In the present experiment normally spoken sentences were made unintelligible by
electronically distorting the segmental information (see Method); the prosodic
structure, however, remained unaffected in this process. The location of prosodic
markers of MSBs was varied systematically and its effect on perceptual interpre
tations of the stimuli investigated.
method
Stimuli were prepared from 5 normally spoken Dutch sentences consisting of 7
syllables and 6 or 7 words. Each sentence had an MSB, viz. the boundary between
the main clause and the subordinate, object clause. An English example of this
type of sentence might be: The major said "that is wrong". This particular struc
ture was chosen, because it contains a great syntactic break which is often proso
dically conspicuous. The location of the MSB was varied systematically.
The sentences were read aloud without pauses at the MSBs by a trained speaker.
Artificial, stylized contours were superimposed on the signal in order to keep the
intonational cues equivalent and well specified for all sentences. This was done
by using the channel vocoder as an Into~ator (F. Willems and Th.A. de Jong, 1974).
The positions of the various pitch movements on the time axis were adjusted until
a satisfactory perceptual equivalent of the original contour was found.
There were 3 pitch accents, the first being a rise and a fall, the second a rise
and the third a fall. The syllable before the boundary was given a continuation rise,
which is a rise coming rather late in the syllable, not lending prominence and often
IPO annual progress report 10 1975
37
marking MSBs (J. I t Hart and A. Cohen, 1973).
The S contours were as follows:
syllable number 7
sentencenumber
1 2 3 4 5 6I\.., /~--=--~~-----'\\
2~/ \3~/ \4~/ \5 _-----JA'----,/ \
2 3 4
In this figure the declination lineis omitted. The numbers below referto the pitch movements; 1 is a riseand a fall, 2 a continuation rise, 3a rise and 4 a fall.
In resynthesizing the utterances, spectral scrambling was applied by sending the
permuted output of the 1S analysis filters to the 1S synthesis filters of the
vocoder. By so doing, segmental information was distorted so as to make the speech
unintelligible. Prosodic information, however, was preserved.
The distorted utterances were recorded on Language Master Cards.
In a preliminary test of the stimuli, colleagues were asked to listen to the
spectrally scrambled sentences and make up sentences fitting the prosodic patterns
they heard.
This made the disadvantages of the scrambling technique apparent. These experienced
listeners succeeded in recognizing some of the original words. They often felt
hampered by the new spectral information. Some short vowels, lengthened at the
MSB, were interpreted as unlengthened long vowels. For investigations into the
role of temporal factors in speech perception, the scrambled signal is perhaps
not optimal. Nevertheless, subjects fairly often recovered the correct MSBs, without
recognizing the original words.
In view of these considerations, I decided to provide subjects with typed-out
response sentences. As each stimulus had a different stress pattern, it was given
a set of response sentences of its own. This set consisted of 6 sentences, in S
of which the MSBs were varied in accordance with those of the stimulus sentences;
the 6th had no MSB of the type shown. An example of such a sentence in English
might be: My dear aunt lives in London.
experiment and results
21 naive subjects (students at the Eindhoven University of Technology) listened
to the stimuli through headphones, in individual sessions. They could listen to a
stimulus as often as they wished. The response sentences were type written. The
38
5 stimuli and the 6 possible response sentences for each stimulus were randomized.
They were asked to concentrate on the rhythm and the melody of the utterance and,
from the list provided, to choose the best fitting response disregarding conflic
ting information on individual speech sounds. They gave their response by writing
down the number of the response sentence. The whole task took about 8 minutes on
the average.
An example of our results is shown in table I.
51 behaves differently from the other stimuli be
cause the M5B is perceived more often 2 than 1
syllable away from its real position. For 53 R6has a syntactic boundary between subject and pre
dicate, and for 54 R6 has a boundary between verb
and adverbial adjunct, corresponding to the proso
dic boundary marker in the stimulus.
The scores on the diagonal are correct scores in
that the M5B in the response sentence is located
in the same position as the prosodic boundary
marker in the stimulus.
5 6
1
A repetition of this experiment two weeks later,
with different scrambling to avoid recognition
of the stimulus material, with 7 of the original
21 subjects, gave essentially the same results,
with a slight improvement in performance. A second
repetition with the same 7 subjects and monotonized
stimuli, gave somewhat lower scores on the diagonal, but still showed a high cor
relation between stimulus structure and the perception of M5Bs.
Table I. Row number retersto the position of prosodicboundary markers in the stimulus, column number to thecorresponding position ofMSBs in the response sentences; column number 6 refersto the response sentencewithout M5B of a similartype.
discussion
It is encouraging to see that naive subjects have no real difficulty in using prosodic
boundary markers in assigning syntactic structures to speech signals. This leads
us to believe that it is worth while to pursue this line further in attempting to
assess the relative contribution of temporal and intonational cues to the percep-
tion of syntactic boundaries.
The fact that our subjects performed less well, but still consistently and meaning
fully, with monotonized stimuli, is itself an indication that both pitch contours
and temporal cues (in our case segment durations but no pauses) are valuable aids
in the perception of potential syntactic boundaries. Using scrambled speech was found
to be a severe drawback as segment durations were not always easy to interpret pros
odically, because our subjects did not know which phonemes were realized in the
acoustic segments concerned. Obviously, in normal speech there is a high degree of
interaction between phonemic and prosodic factors affecting speech sound durations
(Klatt, 1973; Lindblom and Rapp, 1973; Nooteboom, 1972).
If the listener is denied access to one level of processing, in our case the pho
nemic one, he can in no way take the interaction between the two levels into account,
and may come up with the wrong conclusions.
39
We therefore conclude that, if we wish to study the relevance of prosodic segment
durations to the perception of MSBs in more detail, we should rather turn to
another type of stimulus material, e.g. nonsense strings with a simple and irre
levant phonemic make-up.
summary
Ih a listening experiment subjects assigned type-written response sentences to
unintelligible, but prosodically intact utterances. Each utterance had a prosodic
boundary marker arid a set of 6 possible responses.
In each set there was one response with a major syntactic break whose location
corresponded to that of the boundary marker in the utterance.
Results indicate that prosodic information delimits possible responses and may
therefore be a valuable aid in recovering syntactic structures.
references
Blesser, B. (1969) Perception of Spectrally Rotated Speech, Dissertation, M.I.T.,Cambridge, Massachusetts.
Carlson, R., Granstrom, B., Lindblom, B. and Rapp, K. (1972) Some timing andfundamental frequency characteristics of Swedish sentences: data, rules and aperceptual evaluation, Speech Transmission Laboratory, KTH, Stockholm QPSR ~.
't Hart, J. and Cohen, A. (1973) Intonation by rule: a perceptual quest, Journalof Phonetics ~, p. 309-327.
Huggins, A. (1974) An effect of Syntax on syllable timing, Q.P.R. 114, M.I.T.,Massachusetts.
Klatt, D. (1973) Interaction between two factors that influence vowel duration,J.Acoust.Soc.Amer., ~, p. 1102-1104.
Klatt, D. (1975) Vowel lengthening is syntactically determined in a connecteddiscourse, Journal of Phonetics l, p. 129-140.
Kozhevnikov, V. and Chistovich, L. (1965) Speech: Articulation and Perception,(JPRS 30), Washington D.C.,(Moscow-Leningrad).
Lindblom, B. and Rapp, K. (1973) Some temporal regularities of spoken Swedish,Papers from the Institute of Linguistics, University of Stockholm, Publication 21.
Nooteboom, S. (1972) The interaction of some intra-syllable and extra-syllablefactors acting on syllable nucleus durations. IPQ, Annual Progress Report I.
Svensson, S. (1974) Prosody and Grammar in Speech perception, Dissertation, MILUS 2(monographs from the Institute of Linguistics, University of Stockholm).
Willems, L. and de Jong, Th., (1974) Research tools for speech perception studies,IPQ Annual Progress Report 9.
40
ACCENT PATrERNS IN NUMBER NAME SEQUENCES
A.F.v. van Katwijk
introduction
It is now possible to synthesize accented syllables and provide recorded speech
with F O contours in such a way that the syllables we choose are accented. A
rise-pIus-fall in the FO contour is one of the means of giving accents to syllables.
Given a text, what we do not know is which syllables should be or may be accented.
Syntactic information, which can be made explicit, is not enough. Situational and
contextual information is generally felt to be indispensable for the prediction of
accents. This kind of information, however, is still largely beyond the scope of
explicit rules and descriptions.
In this paper we have worked with a simple instance of accent prediction. Having
found that there are variable accent patterns in number names, we have arrived by
analysis at the explicit factors which might be found in number name sequences
to account for the observed variants of accent patterns, and we have looked into
what readers actually do, i.e., what accent patterns they produce when the distance
(in terms of the number of intervening items) between contextual accentuation
factor and target is varied.
If sequences of numbers are read aloud, the accent patterns of the number names
usually bear the marks of the sequential structures of the sequences. Thus, the
sequence 24 25 26 etc. and the sequence 31 41 51 etc. get different accent
patterns. In the first named sequence one should expect the units to be accented
and in the latter sequence the tens to be accented. The principle of accentuation,
loosely formulated, would be: the constant parts of a sequence are left unaccented,
the varying parts are accented. That this principle is not restricted to number
name sequences, could be inferred from the accentuation in utterances such as (1),
he put a blue box on a red box
where at least the second "box" goes unaccented.
( 1)
Starting from simple material (number names) and a simple accentuation principle
(constancy and variation) we cannot hope to throw much light on many complex
problems involved in accent prediction. Our limited goal, however, is to see how
a tangible part of the accentuation problem can be specified and be made to play
a role in an experiment on accent strategies in the production and perception of
accent patterns.
accents in number names
The compound number names that will be discussed have the structure H E T where
H counts the hundreds, E the units, and T the tens, as may be seen in
negenhonderdzesenvijftig
H E T
/PO annual progress report J0 /975
(956: "nine hundred six and fifty") (2)
41
The parts H, E, and T are the potential accent carriers of number names of this
type.
A number of observations have been made on subjects reading aloud all kinds of
lists with numbers, the main of which are:
(a) An isolated number name of the type H E T can be accented, with three pitch
accents (on H, E, and T), with two pitch accents (either on Hand T or on E and T)
or with one pitch accent (on T). Being isolated number names the accent patterns
of these items might be defined as "neutral" accent patterns.
(b) Sequentially marked accentuation occurred with distinct clarity in non
isolated number names, and notably when a pair of numbers had constant and varying
elements (e.g., 365 366, or 528 628, or 734 744). If the units E or the hundreds
H were the varying elements, the absence of pitch accents on the constant tens (T)was a striking feature, next to the pitch accents on the varying elements E or H
in the second number name of each pair. This is mentioned because an isolated
number name may have a neutral accent pattern with one pitch accent on T. A
pitch accent on T, therefore, is ambiguous in principle. We will show in the next
section how this ambiguity played a part in the experiment.
In order to test the perceptual strength of different accent types as cues of
sequential structure, a listening session was prepared for a panel of phonetically
non-naive subjects, who had to judge whether or not auditorily presented pairs
of number names had accent patterns expressing the sequential structure of the
numerical values. An example might illustrate the experiment: Among the 140 pairs
of numbers (all of the H E T type) that the listeners saw on a list, one pair
might have read, e.g. 273 274. What they then would hear was, e.g., tweehonderd
drieenzeventig tweehonderdvierenzeventig, with pitch accents on the underlined
elements. The sequentially appropriate accentuation would have to have (at least)
a pitch accent on -vier- in the second number-name. Absence of this would probably
be interpreted by the listeners as a case of inappropriate accentuation.
All pairs of number names had artificial Fa contours, prepared by means of the
INTONATOR (Willems, 1966). The pitch accents were of a rise-pIus-fall type, imposed
on the syllable to be accented. Each number name had only one such rise-pIus-fall,
on H, or on E, or on T. There was a gradually declining pitch base line for greater
naturalness. Table shows the percentages of judgments by seven listeners. Note
that each pair had one varying element (H, E, or T), and two constant elements, and
the pitch accents could occur (a) neither on the first varying element nor on
the second, (b) only on the varying element of the first number name, (c) only on
the varying element of the second number name, and (d) on the varying elements in
both first and second number name.
From Table 1 two main conclusions might be made:
(a) The perceptual strength of the accent pattern as a marker of the sequential
numerical structure is considerable, for - in spite of the rather vague task
the judgments are practically unanimous.
(b) The accentuation of the first number name does not influence the judgments
appreciably. It is the accent on the varying element in the second number name
of a pair that really counts.
42
accent not on accent only on accent only on accent on bothvarying elements varying element varying element varying elements1st or 2nd 1st number name 2nd number name 1st and 2nd
3% 1' 83% 94%
I
-6
(on 252 judgments) (on 126 judgments) (on 126 judgments) (on 63 judgments)
Table I. Percentages of judgments that accent patterns of number names correctlyexpressed the sequential numerical structures of pairs of numbers. Seven listeners,140 pairs.
The observations reported so far indicate that accentuation and sequential struc
ture in number names are closely linked. Readers as a rule choose the syllables
for accentuation in such a way that the sequential structures, if any, are marked,
and listeners as a rule are capable of perceiving such accentual marking of sequen
tial structure. A second point that should be made is that the lack of importance
of the first number name as far as its accentuation is concerned, suggests that in
reading and in perceiving the first item of a related pair, the relationship with
the second is as a rule not anticipated.
coding and retention
In this section an experiment will be discussed on reading aloud number names of
the H E T type, where related items occurred in succession or separated by 1, 2,
or 3 intervening unrelated number names. It stands to reason that the accent pattern
of the second item of a related pair will have a probability of carrying accentual
sequence marking that should be a declining function of the number of intervening
non-related number names. Before going into the description of the experiment let
me suggest that it is not the passage of time which mainly interacts with the
relation: Five subjects who were confronted with a sentence like (3) produced a
second number name clearly expressing the numerical relationship with the first,
in spite of the large number of intervening words.
482 is a number, and of all the numbers one could mention - (3)
there being infinitely many -, one could also mention 483.
The interesting point seems to be the kinds of interfering factors that have
knock-out effects (Crowder, 1970) on the retention of the coded representation of
an earlier number name.
Three subjects had been asked to read aloud numbers from 120 cards. On each card
there were 8 numbers of the H E T type, having varying units (E), or varying
tens (T) or varying hundreds (H). One example:
772 439 866 593 227 327 954 681 (4)
In (4) the fifth and sixth numbers are the related pair.
The positions of the experimental numbers were varied, and the distance between
the related items were also varied. A second example:
476 833 567 294 921 788 667 345 (5)
1iI. ..,I.,
--....~
~~ 1
0 ::II
1
Kl r ] 2 H01. ~
~~
3
01 ~
~ --~~
1
43
In (5) the third and seventh items
form the related pair.
The first and last position was not
used for experimental items. Every
distance was represented 10 times
in the material.
From the recorded performances the
related pairs were rerecorded in
temporal contiguity without the
extraneous number names. Panels of
three listeners judged whether the
accent patterns expressed the numeri
cal relationships. Fig. 1 shows the
results.
Fig. 1. Proportion of sequential accentpatterns in pairs of related number names asa function of the number of interveningnumber names.There were three reading subjects (producing120 times 8 number names with the accentpatterns) and three listeners (judgingwhether or not the recorded pairs of relatednumber names had the relevant sequentialaccent patterns) and there were three typesof relationships between input numbers:H, as between the 478 ... 578 ... ; E, asbetween 523 524 ;and T, as between... 936 946 Every experimental point inthe graphs represents 10 observations.
o 1 2---....... distance
3 _subject
The following observations can be made
from the data in Fig. 1:
(a) Non-related number names inter
vening between related items, have
the effect in all three subjects of
interfering with the retention of
the earlier of a pair of number
names. This knock-out effect is not
equally strong among the subjects.
(b) In subjects 2 and 3 the ambiguous
accentuation with pitch accents on the
T elements has more than once led the
panel to erroneously interpret the
accent pattern as sequentially
marked. Their judgments are far from
unanimous and show a large variability.
As indicated in section 2, the ambi
guity of a pitch accent on T is such
that it may either be a neutral accent
pattern or a relation-marking pattern.
Subject 1, however, produced neutral
accent patterns with pitch accents
on all three elements H, E and T,
so that a pitch accent on T alone
could correctly be interpreted
by the panel as the pattern of
relationship.
The experiment as a whole enables
us to think about coding processes
in the minds of language users: an
earlier number name is found to
be retained in the mind if
44
not displaced by comparably structured number names. It is retained and used as a
component in a relation with a second number name if this second item has both
identical and different elements in its structure. Without actually being aware of
the process, the reader appears to organize a sequence of input items such that
larger structural units emerge.
summary
Subjects reading aloud sequences of numbers as a rule produce accent patterns that
reflect perceived structural relationships, if any, between the number names. The
accentuation principle - loosely formulated - is that constant elements are left
unaccented whereas varying elements are accented. By changing the distance between
two related number names and introducing intervening number names, we investigated
bv looking at the accent patterns, when the subjects did or no longer did pick up
the relationships. The accent patterns turn out to be natural and easily detectable
indicators of specific mental organisation processes.
references
Crowder, R. (1970) The Role of One's Voice in Immediate Memory, Cognitive Psychology,.l, p. 157-178.
Willems, L.F. (1966) The Intonator, IPO Annual Progress Report .l, p. 123-125.
45
A SYMPOSIUM ON DYNAMIC ASPECTS OF SPEECH PERCEPTION
I.P.O., AUGUST 4-6, 1975
A. Cohen and S.G. Nooteboom
In August 4-6, 1975 we organized a small symposium on Dynamic Aspects of Speech
Perception. The idea for this symposium arose during the second Speech Communica
tion Seminar in Stockholm a year earlier, on which occasion people agreed that
there was a need for a conference where proper attention could be paid to new
developments in the field of speech perception. A Planning Committee was formed
by Mark Haggard, David Pisoni, Sven Ohman and the authors of the present paper, the
latter"acting as an Organizing Committee. Hospitality and sponsoring were provided
by IPO. The symposium was held under the auspices of The Royal Netherlands
Academy of Arts and Sciences and The Netherlands Organization for the Advancement
of Pure Research.
As it was agreed that the field of speech perception would benefit from greater
attention being paid to the perception of connected speech, moving away from too
great a concentration on perceptual studies of phonemes and syllables, it was
decided to bring the perception of connected speech into focus by limiting contri
butions as much as possible to this area.
Apart from members of our own institute, eventually some 25 participants, coming
from Belgium, Canada, Denmark, Great Britain, Holland, Japan, Sweden and the
United States gathered in Eindhoven in the first week of August. The contributions,
20 in number, were distributed in advance, which allowed us to limit verbal pre
sentations to about 10 minutes each and concentrate mainly on the discussion.
This scheme worked quite well and resulted in lively and often thorough discussions
both of specific points and of topics of more general interest to the field.
It became quite clear during the symposium that participants were in general
empirically oriented. There were, however, a few interesting attempts, notably
by Pisoni and Massaro, to relate different levels of processing to each other
within larger frameworks, but on the whole there was little theorizing. It was
also evident that most of the work was concerned with structures, i.e. the effect
of stimulus structures on perception, or the effect of syntactic and/or semantic
structures on perception, or even the relation between speech perception and
particular physiological structures of the brain. Very little work was concerned
with perceptual processes. It was noted by the psychologists Levelt and Flores
d'Arcais that in this respect the field of speech perception seems to be lagging
behind much contemporary work in psycholinguistics, notably of the information
processing kind. They also observed that there was too little concern with task
dependent aspects of the subject's decision procedures in experimental tasks.
One outcome of the symposium was that people seem to be less prepared to look for
"the perceptual unit" of speech, than they were, let us say, ten years ago. There
was a general awareness that there may be a number of different perceptual units
at different processing levels. To us it seems that in looking for "perceptual
units" or "decision units" one should not too readily postulate a one-to-one
relationship between such units and the units of linguistic descriptions.
IPO annual progress report 10 1975
46
It might be worth while to switch from the time-honoured question of perceptual
units to a search for the scanning processes in auditory perception per se, instead
of taking a lead from linguistics. Huggins took the position that there are no
such things as perceptual units: speech perception may be a continuous process.
As to the question of how linguistic expectancies affect the decoding process in
the perceptual system, there appeared to be a recognition that bottom-up and
top-down processes are required, interesting questions being, what is the lowest
level where these levels meet, how can we study this, and what is the general
nature of the confluence of the two? It was noted that, by the implied serial
processing in most tentative outlines for models of speech perception in the lite
rature, very early semantic and syntactic processing appears to be excluded.
Parallel processing schemes might be more realistic, but the right experimental
paradigms do not yet appear to be available to allow us to deal with such questions.
Of the actual research reported in the symposium, a great part was devoted to the
study of the role of prosody in speech perception. Some of these investigations
were concerned with the effect of prosodic structures on phoneme identifications,
others with the effect of position within phrase on preferred duration and duration
discriminability of phoneme-size segments, others again with the effect of prosody
on intelligibility, immediate recall or reaction times, on the perception of
phrase boundaries, sentence position within a paragraph, or the location of
switching the speech signal from one ear to the other.
On the whole we feel that this type of perceptual study is still too much concerned
with the demonstration of certain effects and too little with plotting functional
relationships which might lead to a better understanding of the underlying processes.
Obviously this is due to the fact that the problems concerned with the perceptual
functionirig of speech prosody have only recently come to the attention of speech
researchers and there is still a lack of experimental paradigms tailored to these
problems.
This criticism applies much less to those investigations concerned with very
short-term auditory storage of speech, which has been studied by means of backward
recognition masking and the perception of speech with periodically inserted silent
intervals. Here we find functional relationships apparently related to some decay
function of auditory storage, although a proper model explaining the results is
still lacking. A similar observation holds good for short-term memory for indivi
dual properties of speech.
A few papers, not directly related to the perception of connected speech, dealt
with the perception of time-varying formant frequencies, coarticulation and lip
rounding, developmental aspects of speech in very young children, and dichotic
speech mode listening.
What seems to emerge from this picture i~ that the study of the perception of
connected speech is still a young, but promising field of research. We hope that
this symposium, and its Proceedings (Cohen and Nooteboom, 1975), will contribute
to finding new and interesting ways of studying the complex information processing
taking place in the perceptual decoding of connected speech. We are grateful to all
those, participants and IPO colleagues, who helped to make the symposium a success.
47
reference
Cohen, A. and Nooteboom, S.G., Eds. (1975) Structure and Process in Speech Perception;Proceedings of the Symposium on Dynamic Aspects of Speech Perception, SpringerVerlag, Heidelberg.
3 visual perception
48
49
RESEARCH ON VISION 1975
JAJ. Routs, J.J. Andriessen, Th.M. Bos, H. Bouma, F.L. Engel, Ch.P. Legein,
JA Pellegrino van Stuyvenberg, AL.M. van Rens and CW.J. Schiepers
In 1975, a major part of the effort again went into research on relatively central
processing of visual information, including its cognitive aspects.
Activities were devoted to three main fields of research viz.
a. Visual processes in reading,
b. Conspicuity of visual objects,
c. Dynamic processing of simple visual signals.
Some work on earlier projects concerning aids for the visually handicapped was
followed up. In these projects again our workshop was deeply involved
(Mr. H.E.M. Melotte).
visual processes in reading
A theory on word recognition has been developed for three-letter words. It predicts
both correct and incorrect recognition scores for words from (a) correct and
incorrect letter recognition in unpronounceable letter strings and (bl a list of
all existing Dutch three-letter words. Predictions have been compared with experi
mental scores for four different retinal eccentricities of the stimuli, as measured
earlier (Bouma, 1973). As to the average correct word scores, theory and experiment
are in reasonable agreement.
As to average correct letter scores in word responses, theory and experiment are
in quantitive agreement, for all four retinal eccentricities and three-letter
positions in the word. A separate contribution can be found in this issue (Bouma
and Bouwhuis, 1975).
Concerning the perception of contours of Dutch words of three letters, experiments
have been carried out on the implication of succession of ascending letters, des
cending letters, and letters without extension. Characteristic confusions are found
to exist. The letter position farthest from the fovea is favoured in perception.
Recognition experiments on word stimuli with one letter added or one letter missing
have supplied supporting evidence for the role of word length in word recognition.
With respect to reading difficulties a follow-up study on twenty dyslectic children
and twenty control subjects has been undertaken. The measurement of response
latency was included. On two groups of 4 subjects a number of pilot experiments
have been carried out. It was found that in search tests for letters in words and
in letter strings, the scores for dyslectic children were lower than for the average
readers. Part of the difficulty seems to arise from the fact that the dyslectic
children skip lines in carrying out the task, and target letters towards the end
of the word tend to be missed by dyslectic children. A separate contribution on
this subject can be found in this issue (Bouma, Legein and van Rens, 1975).
IPO annual progress report /0 /975
so
A new start has been made with research on the control characteristics of eye
movements in reading. Five subjects each silently read seven texts of a rather diffi
cult grade while their eye movements were being recorded. The texts to be read
were political comments in Dutch newspapers. The eye movements were measured and
recorded by means of an apparatus based on the limbus reflection technique.
The X- and Y- coordinates in a plane of 15 x 100 visual angle were off-line fed
into a P 9200 computer. A biting board was used to reduce the head movements.
The absolute accuracy of position in the X-direction was about .5 0, the relative
precision being .25 0 within the range of 150, The individual letters subtended
.250
of visual angle, meaning that the size of saccades were accurate within two
letters. The sequence of the successive saccades and successive fixation courses
have been determined for each individual subject and text. From this sequence
histograms of forward and regressives saccades and fixation pauses have been cal
culated. The analysis of data is now directed towards the possible correlation, for
instance, between successive saccades or successive fixation pauses in order to
find characteristics of the underlying process of eye movement control in reading.
conspicuity of visual objects
In the previous issue (Engel and Bos, 1974) an experiment was de~cribed in which
the subjects dual task was to search for a given target object and avoid looking
at a certain non-target, both objects being simultaneously presented with unknown
location in a complex background. The size of the relevant conspicuity area
(visual field in which the object concerned is capable of being noticed in a single
fixation) was related to the probability of target discovery as well as the
probability of involuntary fixation on a non-target. Assuming a random pattern of
eye movements during performance of the task, both probabilities were expressed
in an "effective" size of the conspicuity area concerned. These two relations have
now been established more fUlly as illustrated in figure 1.
",
a
oo 5
2.rf o. T.8.
P o. EE.
I1.0
" 0,,P.0.2+0.28 ~SO,
010° 0 5 10°
•RSO b •RSO
Fig. 1. For ~wo ob~ervers, the relation between the size Rso of the conspicuityarea ~det~rmlned dlrectly by means of brief stimulus presentation allowing for oneeye flxatlon only) and the effective size p of the conspicuity area (derived fromthe dual task data) is shown. Fig. la demonstrates it for targets to be searchedand Fig. lb for non-targets, involuntarily fixated.
51
A new phenomenon was observed during direct determination of the conspicuity area.
Although strict fixation of the display centre was required, small involuntary
eye movements in the direction of the target were discovered. Occurrence and
delay depended on both target eccentricity and size of the relevant conspicuity
area. This phenomenon is described more fully in a separate contribution in this
issue (Engel and Bos, 1975).
dynamic processing of simple visual stimuli
A perturbation technique has been developed to measure directly any kind of responses
at threshold level, using only two system postulates. These postulates are: i) a
linear processing of small signals and ii) top detection. Pulse-, step-, and
frequency responses have been measured successfully. The linearity assumption has
been validated, for instance by comparing pulse- and step responses, see separate
contribution (Roufs and Blommaert, 1975). The band pass filter type of transfer
with respect to fast changes (see Roufs, 1974) has been established beyond
doubt for all three types of responses. This is new circumstantial evidence per
taining to the concept of at least two output variables of the visual system.
One of these was thought to be processed in a low-pass type of transfer and be
connected with brightness variations, the other was assumed to be processed in a
linear band-pass type of transfer, sensitive to fast changes in the visual field
and connected with the "agitation" percept.
aids for the visual handicapped
The embossed drawing set has been given a new design based on injection moulding
technique. This design is ~etter suited for larger-scale production, which will be
put into effect in 1976.
The T.V. magnifier, Fig. 2, has been considerably improved. Owing to the availabi
lity of a new small camera, the design was made more compact (table model) as well
as less expensive, a number of ergonomic improvements having been included as well.
The magnification range has been extended to 3-24 x, the smaller value ensuring
easy surveying (search) and the possibility of viewing large photographs and dia
grams. The apparatus has been commercially available since December 1975 (Philips
Nederland B.V.).
The reading desk, presented in the previous issue, has been tested in several
practical situations. For the visually handicapped the magnification (about 2x)
has generally been f?und too small to be of real help. It is interesting to note
that people with normal vision found it very useful in reading text of low print
quality.
The procedure for traininp a blind subject to learn to read with the Optacon
(Engel, 1974) has been completed. A rate of 27 words per minute for oral reading
and S4 words per minute for silent reading has finally been achieved.
52
Fig. 2. T.V. magnifier,type 3-24 M.
references
Bouma, H. (1973) Visual Interference in the Parafoveal Recognition of Initial andFinal Letters of Words, Vision Res. ll, p. 767-782.
Bouma, 11. and Bouwhuis, D.G. (1975) Word Recognition and Letter Recognition:Toward a Quantitative Theory for the Recognition of Words of Three Letters,IPQ Annual Progress Report, this issue.
Bouma, H., Legein, Ch.P. and van Rens, A.L.M. (1975) Visual Recognition byDyslectic Children, IPO Annual Progress Report, this issue.
Engel, F.L. (1974) Learning to Read with the Optacon, IPQ Annual Progress Report,~, p. 110-113.
Engel, F.L. and Bos, Th.M. (1974) Visual Conspicuity and Eye Movements, IPQ AnnualProgress Report, ~, p. 94-98.
Engel, r.L. and Bos, Th.M. (1975) Small Involuntary Eye Movements, IPQ AnnualProgress Report, this issue.
Roufs, J.A.J. (1974) Dynamic Properties ~f Vision-IV. Thresholds of DecrementalFlashes and Doublets in Relation to Flicker Fusion, Vision Res. li, p. 831-851.
Roufs, J.A.J. and Blommaert, F.J.J. (1975) Pulse- and Step responses of the VisualSystem Measured by a Perturbation Technique, IPQ Annual Progress Report, thisissue.
S3
WORD RECOGNITION AND LETTER RECOGNITION:
toward a quantitative theory for the recognition of words of three letters
H. Bouma and D.G. t::3ouwhuis
introduction
This paper deals with the understanding of processes involved in visual word
recognition. Earlier experiments on letter and word recognition in eccentric vision
(Bouma, 1973) had indicated certain correspondences between letter recognitions
from unpronounceable strings and letter counts from recognized words.
Starting from these indications, we made two assumptions about processes which might
lead from recognition of letters to recognition of words, including both correct
and incorrect recognitions.
Here we present not only some of the earlier evidence, but also the two basic
assumptions and compare certain theoretical predictions with the outcome of the
earlier experiment on word recognition. The theory can be seen as an addition to
Morton's logogen model (Morton, 1969) and as a restricted version of Rumelhart
and Siple's (1974) theory in a somewhat more natural test situation.
earlier relations between letter scores in words and in strings
The earlier experiments (Bouma, 1973) related to two types of recognition in
eccentric vision: word recognition and letter recognition.
Dutch words (3-6 letters) were presented for 100 ms at a number of retinal eccen
tricities. Average fractions of correct and incorrect recognitions as well as
responses "illegible" are indicated in Fig. 1.
illegible
_sO -. -3 -2
1·0
-1 0 1 2Eccentric Itimulus position [degrees]
,.
Fig. 1. Average response scoresfor Dutch word stimuli as afunction of retinal eccentricity.Word lengths 3-6 letters, presentation time 100 msec. Note thehigh fraction of incorrect words,mostly existing Dutch words, andthe higher correct scores in theright visual field. From Bouma,1973.
Note the well-known advantage of words in the right visual field and the high
fraction of incorrect responses. The great majority of incorrect responses were
existing Dutch words. At the time we were particularly interested in the role of
the initial and the final letters of these words. Therefore, in all responded
words, we counted the correctly reported initial and final letters. It cannot, of
cours~ be inferred that these correctly reported letters had been correctly per
ceived, since their presence can have been derived from other information in the
stimulus words. We therefore carried out another type of experiment, in which
/PO annual progress report 10 /975
54
subjects recognized initial and final letters from unpronounceable letter strings,
in which lengths and letter distributions were similar to those in the word recog
nition experiment. It was assumed that the letter string experiment gives us the
perception proper of the particular letters. In Fig. 2 it can be seen that letter
counts in word recognition are indeed higher than letter scores from strings.
P
·2 1·0 - If
-5' -4 -3 -2 -1 0 1 2 3 4 S·
Eccentric stimulus position [degrees)
Fig. 2. Average correct scoresof initial and final letters asa function of retinal eccentricity. Large symbols: letter countsin word responses to word stimuli;s~all symbols: counts of letterresponses to unpronounceablestrings. Note the higher countsfrom word responses and the similar trends for the two typesof experiment. From Bouma, 1973.
The difference between the two was called "completion" because it is a measure of
the contribution of the other word properties to the correct reporting of particular
letters in words. Recently, we added the recognition of the central letter to
the letter recognition experiments from unpronounceable strings of three letters.
In the results, a number of correspondences turned up between letter counts in
word responses and letter recognitions from strings: (1) Both left and right of
fixation, outward letters farthest from the fovea had higher scores than inward
letters closest to the fovea. (2) Differences between left and right field were
similar in the two counts (Fig. 2). Possible reasons for these somewhat peculiar
results, which also hold good for each of the four stimulus lengths separately,
have been advanced in Bouma (1973). (3) Correct scores for individual letters in
the two experiments showed a clear relation (Fig. 3, upper data), (4) as did the
number of times that a letter was incorrectly responded (Fig. 3, lower data). Thus,
certain phenomena observed in letters in word responses already existed when letter
responses to unpronounceable strings were considered.
1·0 ,-0-----,
..rn"00~ ·2
·0·0
---.--..-~---,
"''''''':/'y
/'. :: .' ~",,'os.d -]
u z'/" ~o< b v
q ,(, ..:':...-_-' J....__....... _
·2 '4 ·6 ·8 1-0letter str ings
initial andfinal letters
Fig. 3. Scatter diagram relating,for individual letter scores inword recognition, to scores fromunpronounceable strings.Upper right: correct letter
scores.Lower left : sum of incorrect
reports of a letter.Averages over four retinalpositions..
55
The experimental correspondences are sufficiently striking to provide a basis for
the hypothesis that the recognition of letters proper is a direct contributory factor to
word recognition and that higher letter scores in words are due to redundancy effects.
We shall now work out such a possible relationship quantitatively, confining both
theory and data to words and strings of three letters.
theoretical model
Correct and incorrect letter scores for individual letters from unpronounceable
strings, for initial, middle and final letter of the string separatel~ formed the
data base of the model. Thus, for each of the four retinal positions considered,
there were three different confusion matrices of letters. Given a certain stimulus
of three letters, say lap, we selected in the above matrices the rows (correct +
incorrect response fractions) for 1--, -a-, and --po
The next step was to develop combination rules for arriving at probabilities of
all possible response combinations. The first assumption was that the responses for
each of the three letter positions combined independently and that the probability of
a certain response combination was the product of the respective entries in the
matrices; for example the probability of the response combination iag p(iagllap)=
p(i--!l--). p(-a-I-a-). p(--gl--p). Since many entries in the matrices will be zero,
the total number of response combinations will be far less than the maximum value
of 26 3• Response combinations will to some extent be words, but mostly pronounceable
or unpronounceable non-words, and their probabilities add up to a value of 1.0.
Fig. 4 gives a simplified scheme, in which only two possible responses were assumed
for each stimulus letter, giving 2 3 = 8 response combinations.
The second assumption makes a sharp distinction between words and non-words, such
that all non-word letter combinations are excluded by means of a weighting factor
zero, whereas all words are retained, their probabilities being allotted a common
weighting factor. This transformation factor is not a free parameter, since it is
determined by the requirement that, for each stimulus, word-response probabilities
add up to 1.0. The assumption of a commonweighting factor for a selection (Le.
words) of all possible responses (i.e. all letter combinations), is in fact the
application of the Constant Ratio Rule (Clarke, 1957: Luce, 1959).
In order to carry out the corresponding calculations, we composed a list of existing
Dutch words of three letters out of two existing counts, supplemented by our linguis
tic intuition. We arbitrarily excluded names and abbreviations, ending up with
519 existing words.
The results of the calculation, given a particular stimulus word, was a list of
possible word responses each with its response probability.
Thus the word recognition experiment was predicted fully and quantitatively on the
basis of the letter string experiment and two basic assumptions.
Assumptions and calculations were taken to reflect processes in an observer who
forms word responses by independently combining perceived letter characteristics,
and has only existing Dutch words available as responses.
56
stimulus: lap I
D A T A B A S E C A L C U L A T I o N S P R E D I C T I o N S
recognition of letters fromunpronounceable strings
stimulus response fraction resp product fraction weighing word resp resp lettercomb factor fract word letter fraction
1--< 1-- 0.60 --- ,lap .6x.5x.8 = 0.24 1.92 0.46 lap 1-- 0.70
Ii-- 0.40, I lag .6x.5x.2 = 0.06 1. 92 0.12 lag i-- 0.30
\ , I
\ ' Ilep*( I .6x.5x.8 = 0.24 0
-a-< -a-0.50'\,' -a- 0.58
~ leg .6x.5x.2 = 0.06 1.92 0.12 leg/l 0.42-e- 0.50\11 -e-
~ \ iap* .4x.5x.8 = 0.16 0" I, \ \
iag*--p < --p
0.80' \ \ .4x.5x.2 = 0.04 0 --p 0.76, \\
--g 0.20, \ iep .4x.5x.8 = 0.16 1. 92 0.30 iep --g 0.24,"- I
"-~ieg* .4x.5x.2 = 0.04 0--- ---
*non-word 1.00 1.00
words = 0.52
transformation1
factor=~1.92
Fig. 4. Calculation scheme of the theory. Schematic example in which only twopossible responses are assumed for each stimulus letter.
prediction of and experiment on word recognition compared
Since the model concerned word responses of three letters only, we leave out of
account the 81% of responded words of different length.
Fig. 5 compares prediction and experiment as to average fractions of correct words
(50 words, 11 subjects) at four retinal positions. Predicted values were of the
right order of magnitude, but systematically lower by about 0.08. The left-right
differences were predicted correctly.
Average fractions of correct letters in correct + incorrect word responses are
compared in Fig. 6, completion turning up as the difference between correct fractions
from stl.tngs (y) and correct fractions in word responses (predictions 0, experi
ments • ). Trends were succesfully predicted, predictions falling short by only
0.02 on the average.
Finally, distributions of correct and int:orrect words were considered. 7~& of experi
mentally responded words have a probability less by 10- 4 than predicted. As to
the other responses, the aim was to find out how the experimental word-response
distribution related to the set of predicted responses. To this end, we assigned
response words to classes, according to predicted probability, each class comprising
a factor of 110 in predicted probability.
57
/ ~~
J·8
~=~I ·6
o,perliont :z~prediction ~ .4
I
.2jp
I.o-~
Fig. 5. Comparison of experimental and calculated fractions of correct word responses.
-5 -4 -3 -2 -1 0 1 2 3 4 5°
eccentric stimulus position (degrees)
trz:rz:?-rrz:? 1.0__t
~_?=?=?
.,p.rllont1-
Dr.diction '8.romle'ringS
Il-
I
'6
+- ·4
I·2
Ijp.O~
Fig. 6. Comparison of experimental and calculated fractions of correct letters incorrect + incorrect words.
Y indicate experimentalletter scores from unpronounceable strings, serving asinput to the model.
-5 -4 -3 -2 -1 0 1 2 3 4
eccentric stimulus position (degrees)
For one eccentricity, Fig. 7 compares prediction to experiments with respect to
summed numbers within these classes. The fit was satisfactory, except perhaps for
the highest class, which consisted, almost exclusively, of correct response words.
discussion
Despite the simplicity of the model, predictions were sufficiently close to make
discussion useful.
As to correct word scores, predictions fell short of results by some 0.08. We would
suggest that this is due in part to the primitive way we accounted for differences
in response availability. If a smaller set of words were used in the calculations,
higher correct scores would be predicted, provided the stimulus words belonged to
this vocabulary. For a further investigation of this factor, a general frequency
count would probably reflect response availability inadequately and a direct
experimental access to the vocabularies of the subjects would be required.
Left-right differences in correct scores were predicted closely. The implication
would seem to be that this traces back to a better recognition of the component
58
• - 1.75°
• - 2.75°
o + 1.75°
Fig, 7. Comparison of experiment and theory as to divisionof response words over classesof predicted response probability. Summated response frequencies within each class areplotted for four eccentricities.
~o
o + 2.75°
•
5til.~Uc:GI::saGI...-"GI-U"GI...Q. •
0.1
•5 50
observed frequenc ies500
letters in the right visual field. Knowledge of words, as well as their availabil
ities, could then be assumed to be equal in the portions of the brain serving the
right and the left visual fields. The advantage of the left cerebral hemisphere,
to which the word advantage of the right visual field is usually ascribed, should
then be located at or before the level of letter recognition, which is lower than
commonly assumed.
Predictions of correct letter fractions in correct + incorrect words were close to
the experimental values, thus adding strength to the proposed explanation of the
completion effect. This is true, despite the fact that correct word fractions were
predicted too low, incorrect word responses apparently making up for it. Possibly,
regularities of letter distributions in Dutch words, which are implicit in the word
list, played a part here.
The comparison between predictions and experiments with respect to individual
correct and incorrect response words can be considered as the most critical test
of the theory. The comparison which we can offer here is inadequate for two
reasons at least, due to the fact that the experiments had been set up for a dif
ferent purpose.
First, the data base suffered from not having enough definitely incorrect responses.
Second, the stimulus words were presented once only to eleven subjects, which made
us use pooled data rather than individual response words. An experiment directed
towards testing the theory more extensively is in preparation.
Finally, we would like to emphasize the role played by global features in word
recognition. Although the model was based on letters, global factors were involved
in two different ways. First, correct perception of letter position within the word
or string was tacitly assumed, letter position being a global factor. Second, the
59
model took for granted all interactions between letters preceding the level of
letter recognition.
As a final remark, extending the model to longer words would require a more comprehen
sive consideration of global factors.
summary
A simple theory is described on quantitative predictions about correct and incorrect
word recognitions (three letters only) based on (a) correct and incorrect letter
recognitions and (b) a list of (Dutch) words of three letters. The theory has no
free parameters. Predictions are compared with earlier experimental data on word
recognitions at two eccentricities right and left of fixation.
references
Bouma, H. (1973) Visual interference in the parafoveal recognition of initial andfinal letters of words, Vis. Res. ll, p. 767-782.
Clarke, F.R. (1957) Constant Ratio Rule for Confusion Matrices in Speech Communication,J.Acoust.Soc.Amer., ~, p. 715-720.
Luce, R.D. (1959) Individual Choice Behavior, A Theoretical Analysis, New York,Wiley.
Morton, J. (1969) Interaction of information in word recognition, Psych.Rev.,~, p. 165-178.
Rumelhart, D.E. and Siple, P. (1974) Process of recognizing tachistoscopicallypresented words, Psych.Rev., ~, p. 99-118.
60
PULSE AND STEP RESPONSE OF THE VISUAL SYSTEMmeasured by means of a perturbation technique
JAJ. Routs and F.J.J. Blommaert*
introduction
The motive in developing the perturbation technique described in this paper was
the need to evaluate an earlier developed model for the dynamic visual processing
of transients. In this model (Roufs, 1974 IV) it is assumed that the system reacts
with two dissimilar output variables to small.temporal changes in luminance.
These variables are connected with two types of psychological attributes: gradual
changes in brightness, caused by gradual changes in the stimulus luminance and
"agitation" (a percept which is difficult to describe), caused by transients
(or fast changes in the luminance).
An arbitrary stimulus time function gives rise in general to both output variables.
Each output variable is assumed to have a critical value at which the percept is
seen in 50% of the cases. In detecting this stimulus, threshold intensity is thought
to be determined by the strongest variable compared with the said critical value.
For example, in the case of sinusoidal modulation of the luminance (De Lange Curve),
brightness variation (swell) at low frequencies is the perceptual attribute deter
mining the threshold of the first variable mentioned. At high frequencies only
"agitation" is seen, thus determining the threshold of the second variable. The
model mentioned above was constructed to explain small peculiarities in the shape
of the threshold curves of rectangular incremental flashes, not expected on the
basis of an earlier low-pass-filter model, the transmission of which was fitted
to the experimental De Lange curves. The model also explains quantitative properties
of threshold curves of doublets, consisting of pairs comprising one incremental and
one identical decremental flash, which are particularly sensitive to the low
frequency behaviour of the system.
In order to be able to calculate these threshold curves, three systems-properties
were postulated: 1) the signal, proportional to the stimulus intensity, is processed
quasi-linearly, 2) the signal is detected if its amplitude exceeds a certain deviation
"d" from the stationary state, 3) the transfer function of the second output
variable has the minimum-phase property.
The modulus of the transfer function is fitted to the top and the high-frequency
side of the De Lange curve. These parts of the De Lange curve have the percept
"agitation" in common.
The conclusion drawn was that the transfer function fitted in this way and,
at the same time explaining all the given experimental data on flashes had to be
of a band-pass-filter type. Consequently the transmission at the low-frequency
side is lower than the De Lange curve.
It seemed desirable to cross-check this non-trivial band-pass-filter character
by using another method, viz. perturbation of the subliminal system responses.
* Student at the Eindhoven University of Technology.
/PO annual progress report /0 /975
61
Some of the results obtained by this method will be described briefly in this
article. Pulse- and step responses obtained in this way will be shown for one
subject. These enable us to check the method and the postulates involved. Prediction
is verified by comparing the experimental thresholds of rectangular incremental
flashes as a function of flash duration with the data calculated from the measured
pulse-response by convolution.
a perturbation technique
The method is based principally on changes in the threshold values of a sensor
flash due to perturbation of its response caused by the response of a small
test flash. For the present purpose we use fast-changing stimuli relatively
favouring the variable connected with the percept "agitation" (see Fig. 1).
Fig. 1. Illustration of the hypothetical mechanism for detecting fast- and slowchanging stimulus luminance. The upper branch depicts the processing of thesignal which gives rise to brightness variations. The lower one shows the underlying mechanism for the detection of "agitation".
Only two basic postulates are required:
i) the signal, which is proportional to the stimulus-intensity, is processed
by a quasi-linear system beE), whose parameter values are dependent on the
background intensity E.
ii) A change in the stimulus luminance causing a transient response at the output
of the system b is seen only if this response exceeds a certain deviation "d"
from the stationary state.
Let us, for the moment, assume that one of the phases is dominant and that this
phase is positive as sketched in Fig. 2. (This dominance is, for instance, consistent
with doublet threshold curves within the linearity assumption.) The amplitude of the
sensor flash-response needed in order to be detected, can be changed by superimposing
it on a test-flash-response. By varying the time shift or the amplitude of the
test-flash the sum response is changed. In this way the amplitude of the sensor
response and thus its intensity threshold is changed observahly. In order to
avoid misinterpretation of these changes the amplitude of the test flash has to
be small compared to that of the sensor flash. This is ascertained by pre-setting
the amplitude ratio's of test- and sensor flash. In this respect the method differs
from general subliminal summation.
For the sake of simplicity we take sensor- and test flash as being of equal duration.
62
... T, lieM
>- €c(T,li €c(T2}c:Q)-"f E
, ,I I~time
,I I, ,I
,,_1___ .... dI
I
Q. I
E I
III, ,I ,
iii I ,c: I ,Dl I ~"iii ,.
r.1.t-time
Fig. 2. A drawing showing the principles of the perturbation technique L On theupper row two stimulus conditions are shown. In this case they differ only inthe delay of the small test flash compared to the sensor flash, the ratio q of theflash intensities being constant. The duration of the flashes is equal and shortcompared to the time-constants of the system. The drawn lines in the lower row arethe sensor- and test-flash responses. The dashed curves are the sum responseswhich have just reached the required value "d". The change in amplitude, and thusthe change in flash intensity in order to detect the combination is demonstrated.
The threshold condition of the combination can be formulated as follows:
+ d ( 1)
with:E
C50% threshold intensity of the strongest flash in the combination
used.
& duration of the flashes
Uo pulse response of system b
q pre-set ratio of test- and sensor-flash intensities
d required deviation of response at 50% threshold
If q is sufficiently small and the dominant phase of Uo is positive, then eq. (1)
can be written as
(2)
where: t ex = time at which Us is extreme
63
By measuring E as a function of T, the pulse response expressed in d unitsc
UcS(t ex - T)can be found to be
d
U,(t )& u ex
d(3)
U15
(tex
- ,)In plotting l/E
cas a function of T, is found scaled up by a factor & q,
d
and shifted 1/ E1 ( where El is the threshold intensity of the sensor flash
alone, see eq. (4)).
However, in order to increase measuring precision, the influence of the non-station
ary stochastic behaviour of d has to be decreased. Therefore a reference stimulus
is used. (Details and underlying principles of this procedure will be published
elsewhere).
The threshold condition for the sensor flash alone is
d (4)
Combining eqs. (2) and (4) one obtains
q &(5)
For reasons of convenience we usually reduce Uo by dividing by the extreme value
U8
(tex
- T)
U15
(t ex ) q(6)
It can easily be shown that the effective threshold variation can be enlarged by
a factor of 2 by using a combination of sensor flash and a decremental test flash
instead of the single-flash-reference stimulus (In fact this was the procedure
in measuring the results shown in Fig. 3).
This method does not provide information on the sign of the dominant phase of the
pulse response and on the relative position of the stimulus and response on the
time axis. Appliance of a reference stimulus consequently causes an increase of a
factor of 2 in spread due to the stationary stochastic fluctuations with the same
number of trials. Yet it is benificial, since the large non-stationary effects
are neutralized largely by measuring with respect to the reference stimulus.
methods
The stimulus is a "white" circular, centrally fixated field of 10, having a dark
surround, the luminance of which is varied around a constant background value of
64
E = 1200 td. A 2 mm artificial pupil with an entoptic guiding system was used.
The light is generated by a linearised glow-modulator. The desired time functions
of the stimulus and its amplitude are controlled electronically around the working
point which corresponds to the background intensity. The subject has one knob ~to
release the stimulus, which is delayed a convenient preset time interval. Three
knobs enable him to react whith "yes", "no", or "rejection". All intensity
thresholds are 50% probability values obtained by a modified "method of constant
stimuli".
results
In Fig. 3a the reduced pulse response of subject F.B. is shown. The absolute response,
expressed in "d" units, can be found by multiplying the reduced values by the
extreme value given in the figure under the code "norm.constant". This "norm.constant"
is obtained by averaging the sensor-flash intensity (1 over all 10 sessions and
applying eq. (4). Per session, 2-3 points of the curve were measured, each point
being calculated from the average of 5 threshold differences according to eq. (6).
In fact 600 trials per point were necessary. The intervals between the small
horizontal lines stand for 2 times the experimentally determined standard deviation
of the mean. The order of measurement compared to the T-axis is randomly chosen.
The step response of the same subject, obtained by the same technique, but using
slightly modified formulae, is shown in Fig. 3b. In this case the averages were
obtained in a slightly different manner, that is by measuring all points on the
time axis every session, and averaging the results of the 5 sessions held. All
other things were equal. The dashed curves were obtained by first averaging over
the coefficients of the Fourier transform of the pulse response and the coefficients
of the pulse response calculated from the step response coefficients, second an
inverse transform applied to these averages.
In Fig. 4 experimentally obtained 50%-intensity-thresholds of rectangular incremen
tal flashes are given as a function of the duration. All durations were presented
at one session. The circles are the average results of two sessions. Thresholds
in the first session were measured in random order compared to duration, the order
of the second session being reversed.
The drawn line is the set of predicted thresholds calculated from the convoluted
pulse response in Fig. 3a and using an equation analogous to eq. (4) for rectangular
pulses.
In Fig. 5 the reciprocal of the 50% thresholds of the amplitude of the sinusoIdal
modulation at the given frequencies are shown for the same subject. The sinusoIdal
modulation was restricted in duration by a gate, which exposed 1S peaks to the
subject. The beginning and end of the train is smoothed in order to avoid transients
(Roufs, 1974 VI). The order of measurement compared to the frequency is again
randomized and reversed in a second session.
a
1/,
/-50 /
-1
\
"\ \
65
50
SUBJ. FBE=1200 ld
It= 2msNORM. CONST= 1.2
-:1' (ms)
b ~
~~/ ~
/+ 1/ T\o
SUBJ. FBE = 1200 td<:p = 1°NORM. CONST. =0.15
100
'i. i- -!- _ -=-1' (ms)
+
Fig. 3. Figure 3a shows the measured pulse response. The dots are the mean valuesobtained after reduction by dividing the response by its extreme value.This value is given in the legend at norm. const. but for the sign.
Figure 3b is the reduced step response. The dashed curves are simultaneousfittings, the pulse response being the derivative of the step response.
discussion
It is clearly seen from Fig. 3 that the measured responses are sufficiently large
in comparison with the spread. The pulse response, within measuring accuracy, is
equal to the derivative of the step response, as demonstrated by the dashed curves
which satisfy this property exactly. This is what a linear system should do.
It is also consistent with the postulated peak detection.
The absolute value of the Fourier transform of the pulse response is shown in
Fig. 5. The transmission has definitely a band-pass-filter character. The position
with respect to a measured De Lange curve, which reflects at low frequencies the
transfer to the upper branch of Fig. 1 and atmiddle and high frequencies that of
lower branch, is instructive. First, it shows that the frequency at maximum
66
JL
'~SUBJ. FBE = 1200 td<p= 1°
~ J:X l.1)' c:r ..D_ - -0- e -0-...., 0
0
1
w>tlI)ZUJt-
~2c...JoJ:lI)UJa:::I:t-
elo...J
o 1LOG DURATION .&
2 3 4
Fig. 4. Thresholds of incremental rectangular flashes as a function of the durationof the flashes (circles). The solid line is calculated from the measured pulseresponse. The dashed line is a correction for the effect of increased detectionprobability, caused by transients at long durations.
LOG FREQUENCY
conclusion
It is obvious that the Fourier transform
of the pulse response also provides
information about the phase of the
transfer function.
In conclusion we feel justified in saying
that the perturbation method yields a
viable technique for gaining information
of the visual system. The results confirm
the linearity and peak detection postu
lated for the model. The prediction of
threshold curves of rectangular flashes
from the pulse responses is excellent.
transmission is equal to that found by
the De Lange characteristic. Second, the
shift towards lower transmission values
is consistent with a calculated shift
of about 0.2 log units as a result of
increased probability of seeing caused
by the repeated equal peaks (Roufs,
1974 VI). The closeness of the agreement
between the predicted threshold curves
and the actual measured values shown in
Fig. 4, having possible daily variation
in mind, must be a coincidence.
0.5
SUBJ. FBE=1200 td<p=1°
-1.0,----,---------..---------,_
-15
~
> -2.0t-~t-iiizUJlI)
UJ -2.50:>t-::JCl.~<t
el -3.00...J
0
Fig. 5. The absolute value of theFourier transform of the measured pulseresponse is shown by the dashed line.The measured De Lange curve for thesame subject is also indicated. A gatedsinusoid exposing 15 fully fledgedpeaks was used.
67
The findings are in agreement with the band-pass nature of the transfer with respect
to "agitation", based on another type of approach (Roufs, 1974 IV).
summary
A perturbation technique is described for the measurement of subliminal responses
in the visual system. The method is based mainly on quasi-linearity and top
detection. Pulse and step responses have been measured and are found to be consistent
with each other and with the measured threshold of rectangular flashes.
references
Roufs, J.A.J. (1974) Dynamic Properties of Vision-IV. Thresholds of DecrementalFlashes, Incremental Flashes and Doublets in Relation to Flicker Fusion,Vision Res. l±, p. 831-851.
Roufs, J.A.J. (1974) Dynamic Properties of Vision-VI. Stochastic Threshold Fluctuations and their Effect on Flash-to-Flicker Sensitivity Ratio, Vision Res. l±,p. 871-888.
68
SMALL INVOLUNTARY EYE MOVEMENTS
F.L. Engel and Th.M. Bos
introduction
This paper reports observations on the occurrence of small involuntary eye movements
during the determination of the conspicuity area, that is the retinal field in
which the relevant test object can be discovered from its background in a single
fixation. The conspicuity areas are determined by having the observer fixate the
display centre, the stimulus pattern with the test object being presented at an
eccentric location unknown to him before-hand.
During the brief (80 msec) stimulus exposures in earlier conspicuity area determina
tions (Engel, 1971, 1974) practically no eye movements occurred, this being
conceivable for a number of reasons including the longer ocular reaction times.
However, during the longer exposures used this time (1 sec), a ~mall to-and~fro
eye movement was frequently observed in the direction of the test object discovered,
some 400 msec after onset of the stimulus pattern.
As will be shown, occurrence as well as delay of these movements depended on the
size of the conspicuity area and on the eccentricity of presentation.
The experimental results are discussed briefly, and will be dealt with more fully
elsewhere (Engel, 1976).
experiments
The stimulus consisted of a random disk pattern as background and a dissimilar disk
as test object. The extent of the conspicuity area is determined from the experimen
tally determined probability bf discovering the test object as a function of its
distance to the fixation point.
A series of SO stimulus patterns was prerecorded on video tape, each 1 sec in duration
and separated by a 1 sec plain rest field. The stimulus patterns were shifted
versions of the same original, so that the location of the test object differed
during each presentation with respect to the fixation centre, but not to the back
ground configuration. Each series of stimulus patterns was presented 4 times to the
observer.
The observer was instructed to keep his eyes fixated on a small continuously visible
cross in the centre of the screen and on appearance of the stimulus pattern, to
indicate with a push-button switch if he discovered the test object (target).
Eye fixation was monitored during the experiments by means of the cornea reflection
technique.
results and discussion
Although the observers did their utmost to maintain fixation on the marked centre of
the screen, a small involuntary eye saccade occurred rather frequently, some
400 msec after onset of the stimulus pattern, mostly in the target direction.
Usually this saccade was followed about 200 msec later by a second small saccade
IPO annual progress report 10 /975
69
back to the fixation centre. The movements in the direction of the target were
generally too short to reach the target, the average length being about 0.70
of visual angle. Nearly all these movements were followed roughly 300 msec later
by the push-button signal, indicating the discovery of the target.
Mostly the observers were not aware of their eye movements, if they were, they
felt that the eye movement was made before they realized it.
occurrence
The observed eye movements have been put into two categories according to their
direction: movements deviating less than 18 0 from the true target direction were
considered to be "target eye movements", the others were taken to be "non-target
eye movements". The value of ~ 18 0 corresponds to the difference between the possible
target directions in the stimulus material.
The proportions of the eye movements of one observer classified in this way are
shown in Fig. 1 as a function of the target eccentricity. The 100% level corresponds
to 40 stimulus presentations at each eccentricity concerned.
100%P~
.~
P .~
150 .~. \Pnt ~ •
().1-o_=-:::::::~~~0
------0 2.5 5.00
7.5
~ R
Fig. 1. Proportion (P) of involuntary eyemovements as a percentage of the total number(40) of targets presented at the indicatedeccentricity R. Pt refers to target eyemovements while Pnt refers to non-target eyemovements. For comparison purposes, P , theproportion of discovered targets has ~lso beenplotted.
In Fig. 1 the proportion of target discoveries is plotted as well (from these data
we determine the conspicuity area of the test object). A relation between the
discovery of the target and the target eye movement is apparent from this plot.
In fact, on 90% of the occasions that such a saccade occurred, the target was
discovered consciously. In about 35% of the total number of target discoveries
there occurred a target eye movement.
The proportion of occurrence of the non-target movements is found to be rather low
and independent of the eccentricity of test object presentation. These movements
might perhaps be connected with certain background objects, which were sometimes
quite close to the fixation centre. No systematic origin could be found, however,
except that they were related in time to the onset of the stimulus pattern.
delays
In Fig. 2 the delay between the stimulus onset and the target eye movement, and
also the delay between the stimulus onset and the discovery of the target (push
button signal'
is plotted as a function of target eccentricity. Both these delays
increase with target eccentricity, the difference between them being almost constant
(300 msec for the observer whose data are presented in Fig. 2) With respect to
70
1000m,Fig. 2. The delay time 6T
tbetween stimulus
onset and target eye movement and the delaytime 6T • between stimulus onset and thepush-button signal. are shown against theretinal eccentricity (R) of the targetpresentation.
oo 2.5 5.0
--....._ R
these data it should be remarked that the indicated delay times at R = 6.0 0 and
R = 7.5 0 of visual angle are less reliable because of the small number of events
(see Fig. 1) at these eccentricities.
As shown in Fig. 3 it is possible to relate these delays to the eccentricity of
test object presentation and with the size of the relevant conspicuity area.
Fig. 3. (b) The arithmetic meanof the delay times betore occurrenceof a target eye mo~ement (6f
t),
against the size (R ) of thecorresponding conspf8uity area.
Fig. 3. (a) The delay time 6T t ,before occurrence of a target. eyemovement, and the delay time 6Tbefore occurrence of the observRrsresponse, plotted against thenormallzed eccentricity RX
= R/RSO 'where R
SOis the size of the con
spicuity area at 50% threshold level.Measurements for 4 different diameters of the test object are shown(the diameter of the backgrounddisk was 0.550 of visual angle).
~. 10156 +21 R*I
v500
1000mi
dT
I~o
250
obs, F.E.
0
0 o.s I.() l.5 2.0•R*
"} . :}=0.A5" ;}-0.63" ~.0.69"H =0.304
600ms-T.B.
lit F.E~~ 0
J'00D~
OftOlOlLsol200
oo 2 6
---....... RSO
71
The increase in delay time, amounting to about 100 msec/degree of visual angle in
the transition region, compares favourably with the value determined by Schiepers
(1974), who found 150 msec/degree of visual angle for the increase in vocal response
latency against eccentricity of word presentation.
summary
Experiments are described in which observers made involuntary eye movements in the
direction of a test object to be discovered, strict fixation of the display centre
being required.
The occurrence of these eye movements was related to the conscious discovery of
the test object. The delay time between the stimulus onset and involuntary eye
movement is shown to depend on target eccentricity and on the size of the relevant
conspicuity area.
references
Engel, F.L. (1971) Visual Conspicuity, Directed Attention and Retinal Locus,Vision Res. ll, p. 563-576.
Engel, F.L. (1974) Visual Conspicuity and Selective Background Interference inEccentric Vision, Vision Res. li, p. 459-471.
Engel, F.L. (1976) Visual Conspicuity, Visual Search and Fixation Tendencies ofthe Eye, to be published.
Schiepers, C.W.J. (1974) Response Latencies in Parafoveal Word Recognition,IPQ Annual Progress Report, ~, p. 99-103.
72
VISUAL RECOGNITION BY DYSLECTIC CHILDREN
further exploration of letter, word and number recognition in 4 weak and 4 normal readers
H. Bouma, Ch.P. Legein and A.L.M. van Rens
introduction
The results obtained last year from the examination of 20 dyslectic children
and 20 which read normally (Bouma, Legein and v. Rens, 1974), encouraged us to
continue this study in two directions. First, a follow-up study of all 40 subjects
to get an idea of the course of these perceptual processes. Second, further explo
ration of possible defective processes in visual recognition. This paper will
report both on the follow-up study and on the further exploration in four dyslectic
children and four normal readers selected from the above-mentioned groups.
Partly summarizing last year's results we found:
1. backwardness in reading-level in the case of all dyslectic children of at least
two years
2. in the tachistoscopic experiments a significantly lower recognition score for the
dyslectic group with reference to embedded letters and words, especially in
parafoveal presentation. There was also evidence of better recognition in the
right visual half field.
The tachistoscopic testing programme used this year was the same as last year's
but was extended so that the same stimuli were presented on both sides in parafoveal
recognition experiments. Thus more data were obtained and a more firmly based
conclusion could be drawn as to a possible left-right difference. As ceiling
effects influenced last year's results on foveal recognition of words of up to five
letters we planned recognition experiments on words of greater lengths (~= 6, 7,
8 letters).
As we concluded that there were more visual interferences in the dyslectic group,
such an effect could also be expected with numbers, leading perhaps to difficulties
in arithmetic. Since numbers are not redundant, as opposed to words, this also
provides an opportunity for checking the influence of word knowledge on recognition.
As the rehearsal memory could be a limiting factor in reporting long numbers, experi
ments were carried out in which the presentation time was prolonged. This function
was also tested by auditory presentation of digit strings.
To explore whether large printing or more widely spaced printing could improve
the recognition of parafoveal words a tachistoscopic recognition test was done.
Finally a letter-search test was tried out to investigate the extent to which inter
ferences between adjacent letters would i~fluence the marking of target letters
within words or letter strings. If succesful, such a test could perhaps be of help
in early detection, in the classroom, of reading difficulties.
IPO annual progress report 10 1975
73
methods
Four dyslectic boys - ages 12-13 years - with different low reading levels (Fig. 1)
were selected and compared with normal readers of the same ages.
7 grade
--W::I:C) 6Z
~-Gi 5>.!!Clc 4'0IIICD...
// /
I~Sle~
5 6 7 8 9 10age
11 12 13 14 yrs.
Fig. 1. Follow-up of reading level 1974 (small symbols) - 1975 (large symbols)
The follow-up examination consisted of the somewhat extended tachistoscopic testing
programme of isolated letters, embedded letters and of words (length ~ = 3, 4, 5
letters), both in foveal and parafoveal vision (¢ = 10). Again an assessment of their
reading level was made using the Tanghe test. Vocal latencies were also measured
but will not be reported on here.
Foveal recognition of longer words (~= 6, 7, 8 letters) was explored, as were
numbers varying in length, at a normal (100 msec.) and a prolonged (500 msec.)
exposure time. The parafoveal recognition of numbers (£ = 2) was also tested.
Apart from visual presentation auditory presentation of digit strings was also done
to test the short-memory capacity.
Twe.lve commonly used words (£ = 5) were presented parafoveally three times in dif
ferent modes: normal printing; double-magnified and extra spaced printing. The
letter closest to fixation was at ¢ = 10
. In order to avoid immediate word repeti
tion, presentations of the same word in different modes were spread over three
parts of the session with other tests in between. The order-of-printing mode was
balanced over the three parts.
The letter search test consists of two pages; one page with 240 8-letter words and
one page with 240 unpronounceable 8-letter strings, each 32 lines with normal
spacing. 80 words and 80 strings contained the target letter e. The target letter
positions were equally divided over all letter positions. The subject was asked
to mark these target letters in pencil.
74
results
follow-up
In this part of the examination there was hardly any change in the high foveal
recognition scores of both groups (Table 1), In parafoveal scores, however, there
was a definite improvement in particular for words in the dyslectic group. As in
this follow-up study all para foveal stimuli were presented on both sides of the
fixation point the better scores in the right visual field could be demonstrated
especially in the dyslectic group (Table 2). Tables 1 and 2 should not be compared
in detail because Table 2 relates to twice as many presentations,
foveal parafovealdysl. contr. dysl. contr.
lal 96 (96) 94 (99) 91 (8l) 98 (98)
Ixaxl 72 (68) 94 (95) 28 (19) 53 (54)
Iwrdl 78 (73) 100 (100) 54 (38) 71 (59)
parafovealdysl. contr.
L R L R
lal 90 94 97 98
Ixaxl 22 45 51 59
Iwrdl 39 73 74 80,~
Table 1. 1975 Average correct recognition scores of single letters /d/;embedded letters /xdx/; words /wrd/.Small numbers: 1974.
Table 2. Average correct scor5s in leftand right visual field (¢ = 1 )
As to the reading level of the eight subjects (Fig. 1) we conclude that the dyslectic
children especially made progress and for them a fair correlation between improved
reading level and better parafoveal recognition of words is shown (Fig. 2).
1'0 r-------------------
dyslectic
p
r
0,....+1..e-
·5......"Ca...0~......
control
G t 0
~
.0 L...-_......I.-__.l...-_.....&...__.l...-_.....&...__.l...-_....I
1 234
read ing level5 6 grade7
(TANGHE)
Fig. 2. Follow-up of para foveal word recognition and of reading level 1974 (small
symbols) - 1975 (large symbols).
75
words (~=6, 7, 8 letters)
Further exploration was done on foveal recognition of common words (freq ~ 10- 4)
of greater length (~ = 6, 7 and 8). Six words of each length were presented (expo
sure time 100 msecJ, With the dyslectic group the scores for these larger words
are definitely lower than for the shorter words, whereas for the normal groups the
scores remained perfect (Table 3). The scores for word lengths ~ = 3-5 of these
subjects have also been included.
dysl. contr.
.1'=3 77 100
.1'=4 77 100
.1':5 80 100
.1'=6 42 100
.1'=7 38 100
.1'= 8 33 96
Table 3. Correct foveal word recognition scores forvarious word lengths .
visual and auditory numbers
Forty randomly composed digit strings (length ~ = 1, 2, 3, 4) were foveally presen
ted (100 msec. exposure time). Table 4 gives the results for both groups and it is
obvious that, as from ~ = 3, dyslectics have far more difficulties than children
which read normally. We tested two digit numbers parafoveally (¢ = 10 ) and found
that both groups had rather high scores: dyslectics 82% and controls 94%.
A foveal-recognition experiment of numbers (~ 3, 4, 5) was also done using a
prolonged presentation time (500 msecJ. Table 4 shows an increase in correct scores.
This probably indicates that the short-term memory capacity is not an essential
limiting factor in these experiments. Nevertheless the short-term memory function
(rehearsal) seems worse in the dyslectic group, as is indicated by testing this
function by auditory presentation of digit strings (~ = 3, 4, 5) as well, at a
pronunciation speed of 2 digits per second. Table 5 shows the results and indicates
a low score for the dyslectic group in repeating long digit strings (~ = 5). In
conclusion, the lower scores obtained in visual presentation of numbers (Z 2 5)
then seem due to perceptual factors rather than to short-term memory dysfunction.
dysl. contr.100 ms sooms 100mS 500mS
.1'= 1 95 - 100 -.1'= 2 95 - 100 -.1'=3 68 91 95 100
.1'=4 38 70 98 92
.1': 5 - 30 - 65Table 4. Correct foveal number scores forvarious lengths (~) at two stimulusdurations
dysl. contr.
/=3 100 100
~4 80 88
/:5 42 80Table 5. Auditory presentation of digitstrings. Correct scores for various length.
76
large and spaced words
In this experiment - see methods - a positive effect on the recognition scores
could not be demonstrated in the dyslectic group (Table 6).
dysl. cootr.
normal 44 79
spaced 49 86
large 49 98
Table 6. Correct parafoveal word scores printedin three modes.
letter sea rch test
For the purpose of bridging the gap hetween tachistoscopic recognition and ordinary
reading we developed a search experiment in which the letter e (target letter)
within words or letter strings of 8 letters had to be marked.
As to the number of errors, the dyslectic group missed twice as many target letters
as the control group and both groups missed more target letters in the words than
in the letter strings (Table 7).
dysl. contr.
words 36 23 14 13
strings 26 15 12 9
Table 7. Error scores in letter search test.Large print: total errors. Small print: aftercorrection for skipped lines.
Fig. 3 (black and hatched) shows error position histograms for words and strings
of both groups. When scoring the errors it was striking to see in the dyslectic
group how many lines occurred in which not a single target letter was marked.
This could be an indication that they just skipped full lines probably due to an
insufficient control of the eye movement towards the new line. We made a correction
for these, probably not inspected, lines, so that the solid columns are taken to
indicate the errors in inspected lines.
The general tendency is that more target letters are missed in the second part of
the words and letter strings, and that the first, fourth and fifth letter positions
are at an advantage. Many errors are made at the last few letter positions in the
letter strings by both groups, and also in the words by the dyslectic group.
discussion
When comparing the present results with those reported last year (Bouma, Legein,
v. Rens, 1974) it should be borne in mind that the present data refer to two groups
of four subjects as compared with the two groups of twenty subjects examined last
year. Although the averages will therefore show greater variability the general
trends are quite similar. Compared with 1974 the dyslectic group has improved more
than the control group. The dramatic difference in recognition scores for longer
words in foveal presentation (Table 3) may be partly due to increased interferences,
77
r-----------------~----------
70 %WORDS STRINGS
dyslexic
control
60
50
40
30
20
10
o 12345678 12345678 12345678 12345678
position target-letter
Fig. 3. Error histograms in letter search test. Overall, total errors: blackcolumns after correction for skipped lines.
but it seems likely that the dyslectic children also have insufficient knowledge
of the word forms of these longer words. Of course, when reading text dyslectic
children quite obviously experience greater difficulties with longer words.
All four dyslectic children have definitely higher recognition scores right of
fixation compared to left, which is in line with results reported in the literature
(Mishkin and Forgays, 1952 ; Bouma, 1973 ; McKeever and Hul ing ,1970 ). This indi
cates that the basis of dyslexia is not a general inability of the language-special
ized left cerebral hemisphere. It is of some interest that perception of digit
strings causes difficulties for the dyslectic subjects quite similar to that of
perception of embedded letters. The two components in such a task are a) visual
perception of the strings,b) rehearsal of the strings until the moment of report.
The positive influence of a prolonged presentation time points towards perceptual
difficulties. Also, the higher scores in the auditory mode, which makes use of the
same rehearsal process, seems to indicate that the capacity of the rehearsal memory
is not responsible for the poor performance in the visual presentations.
As to the letter search test, the conclusion that both groups made more errors
in the word test than in the string test is indicative of them using knowledge of
word forms to a certain extent. The specificity of this knowledge has not yet been
investigated. In conclusion, this test draws attention to a probably insufficient
eye control on the part of dyslectic children which, together with strong parafoveal
interference effects, influences their scores. A new version of the test might sepa
rate these effects. As to the research on dyslexia, the conclusion then seems to
be that eye control, perception and recognition of letters and words, and storage
processes should be studied not just in isolation but also in mutual dependence
on one another. This conclusion links up with the notion that dyslexia stems from
78
many different adverse factors (Malmquist, 1958; Valtin, 1970; Vernon, 1971; Klasen,
1972).
But it is not just the causative factors which are found to be manifold, the resul
ting difficulties are moreover not confined to the reading of text, but are clearly
present in the recognition of numbers. Indeed, dyslexia is a syndrome. We have hope
that understanding of the underlying phenomena of dyslexia may proceed equally fast as
understanding of normal reading processes, which, in literature as well as here at
IPO, is a subject of renewed interest.
summary
Four dyslectic and four average readers recognized letters and words, corroborating
last year's results indicating greater interference effects in dyslectics and a
superiority of the right visual field.
Reading-level and word scores had clearly improved since 1974. New explorations
indicated lower scores in dyslectics, too, for number recognitions and, particularly,
for longer words in foveal vision. A new letter search test has been tried out.
references
Bouma, H., Legein, Ch.P. and van Rens, A.L.M. (1974) Visual Recognition by DyslecticChildren, IPO Annual Progress Report ~, p. 104-109.
Bouma, H. (1973) Visual Interference in the Parafoveal Recognition of Initial andFinal Letters of Words, Vision Res. ll, p. 767-782.
Klasen, E. (1972) The Syndrome of Specific Dyslexia, University Park Press, Baltimore.
Malmquist, E. (1958) Factors related to Reading Disabilities in the First Grade ofthe Elementary School,Almqvist and Wiksell, Stockholm.
McKeever, W.F. and Huling, M.D. (1970) Lateral Dominance in Tachistoscopic WordRecognitions of Children at two Levels of Ability, Quart.J.Exp.Psychol., 22,p. 600-604. -
Mishkin, M. and Forgays, D.G. (1952) Word Recognition as a Function of Retinal Locus,J.Exp.Psychol., ~, p. 43-48.
Valtin, R. (1970) Legasthenie - Theorien und Untersuchungen, Beltz Verlag, Basel.
Vernon, M.D. (1971) Reading and its Difficulties, Cambridge University Press.
79
4 instrumentation
80
I.P.O. INSTRUMENTATIOl\l1957 - 1975
D.J.H. Admiraal
history
Since the foundation of the I.P.O. in 1957 by Prof. Dr. J.F. Schouten, the "Genera]
Instrumentation" group has played an important role in supporting the research pro
gram of the Institute. It was to this group that the first employee was appointed.
The ratio between the number of people working in the instrumentation group to
those in the research groups has to date fluctuated with slight variations around
the average of 1 : 3~.
The very presence of the Instrumentation Group at this Institute is due to the factthat
the often highly specialized apparatus required in research is mostly not commer
cially available. Admittedly, this makes the design and construction of instruments
expensive, obliging researchers to consider carefully the possibilities offered
by commercially available or other existing apparatus when designing their experi
ments. The skills of the Instrumentation Group have benefited not only the research
workers at the I.P.O., instruments have also been made at the request forins~ance of
Philips Biometric Centre, the Evoluon and Philips Phonographic Industries. For
the last-named an annoyance-measuring divice was designed for testing magnetic
tapes, for example (Admiraal, 1968).
Due to the growing complexity and specificity of the instruments and the need for
their development concurrently with the research projects, especially in the
Phonetics Group, it was decided in 1964 to design intruments of this kind under
the sole responsibility of the said group. This made the contribution of the
instrument designers to the research projects more active. A speech synthesizer
(Willems, 1966) is an example of equipment so developed.
The apparatus usually provides only a partial solution to a particular problem.
For example where a television system is used, performance is determined not only
by the electronics behind the camera, but also by the characteristics in front of
it, such as the illumination of the scene. The measurement of the diameter of the
pupil of the eye by a television scanning technique (Admiraal and Alewijnse, 1966)
can be taken as an example. This instrument worked well electronically, but results
remained poor, however because adequate illumination of the eye was not immediately
feasible. After some months of experimentation the solution was found in the use of
half a ping-pong ball, with which adequate illumination of the eye was obtained.
This demonstrates that an active contribution on the part of the researchers to the
work of the instrument designers helped to solve the problem.
Since the inception of I.P.O., people have been interested in the generation and
measurement of time intervals. One example is a reaction-time-measuring instrument
(Moonen, 1967) which measures reaction times of one subject with a choice of 15
responses out of 15 stimuli.
In 1967 a modular mechanical system (19") was introduced at I,P.O., which allowed
a flexible system to be created for housing instruments. The modular system consists
IPO annual progress report 10 1975
81
of individual modules, which can be assembled into cabinets (with a maximum of
6 modules each). The first use of the modular building system was made for a time
generator (Valbracht, 1968). Since then hundreds of modules have been constructed,
such as a two-quadrant multiplier (Noordermeer and Moons, 1970) and time-measuring
devices (Lammers and Moonen, 1970). The large-scale integration technique enables
us to combine a number of functions in one module. Limits to such a miniaturization
are set, however, by the ergonomic demands of essential controls iocated on the
front panel of the module.
design and development
The development of an apparatus can be initiated by researchers on the basis of
a demand specification. After the development of a prototype which meets the
requirements, a definitive version is made in the workshop. In the final design
of a device ergonomic criteria are also taken into account.
Practice in this laboratory has shown, however, that it is not always possible to
specify the demands completely in advance. This is partly due to the uncertainty
of research workers as to the line their experiments are likely to follow, on the
other hand experiments are largely dominated by the restriction set by the possi
bilities of the apparatus. This situation was exemplified in the request by the
Visual Research Group for a light source with a high intensity, capable of being
modulated over a wide frequency range and of such geometrical proportions that at
an eye distance of 25 cm a visual angle of at least 25 0 could be obtained. These
primary conditions were met by the application of a T.V. projection tube combined
with a spiral-scanned frame (Alewijnse, 1969). Only after this choice has been
made could the complete specifications be set up.
An important adjunct to the apparatus developed is the relevant documentation. A
number of possibilities are open: a) The I.P.O. report, which discusses the
principles of the apparatus and compares it with other possible approaches
b) the draft specification, which gives complete documentation of a particular
apparatus, including circuit diagrams, adjustments to be made, etc. c) the manual
intended mainly for the user, describing the functioning of an apparatus, its
technical specifications and its control functions. As regards the modular system,
a somewhat shorter version of this manual is also published by the Phonetics
Group. As an example of the amount of documentation provided on a particular
device, that on the Fourier synthesizer (Admiraal, 1969) can be taken. The report
discussing the principles covered 27 pages, whereas the draft specification took
53 pages.
present and future
At I.P.O. the computer age started in July 1970 with the installation of a Philips
P 9202 (Muller, 1970). The installation of a computer at this laboratory was
motivated by a) the need to be able to control experiments b) the desirability
of simulation studies and c) the need to carry out computations. Quite apart from
the work it had to do for us, the computer also set us work to do: much specialized
hardware had to be developed, especially all kinds of interfaces. Connecting the
computer up with all sorts of experiments located at various places in the laboratory
82
which required on-line use, necessitated the development of matching units (Moonen
and de Jong, 1973). The recent introduction of a new computer system (P857), which
has a different organization of the in-and output, now makes it necessary to dupli
cate this development work.
Some examples of the present development of apparatus are described elsewhere in
this issue. The development of measuring and recording apparatus for eye movements,
as reported on in the previous issue, together with the processing of the data in the
computer, has resulted in an increasing commitment with the Visual Research Group.
It is thus to be expected that the Instrumentation Group will in the future contri
bute to the research program of the Institute more directly than hitherto.
references
Admiraal,D.J.H. and Alewijnse, M.A. (1966) An Infrared Pupillometer Based on theTelevision Scanning Technique, IPQ Annual Progress Report l, p. 126-134.
Admiraal, D.J.H. (1968) An Annoyance Measuring Instrument to Check Magnetic Tapeson Drop-outs, IPQ Annual Progress Report l, p. 108-112.
Admiraal, D.J .H. (1969) The "Pan Pipes", a Fourier Synthesizer, IPQ Annual ProgressReport ±' p. 131-139.
Alewijnse, M.A. (1969) The Time-Place Generator, a Modulatable Light-Source,IPQ Annual Progress Report ±' p. 152-159.
Lammers, C.A. and Moonen, G.J.J. (1970) The Modular Time Measuring Eq\\ipment (MTM) ,IPQ Annual Progress Report ~, p. 205-208.
Moonen, G.J.J. (1967) EVA. A Singular Unpaced Reaction Measuring Device, IPQAnnual Progress Report ~, p. 181-183.
Moonen, G.J.J. and de Jong, Th.A. (1973) MARIE Interface between Computer andExperiment, IPQ Annual Progress Report ~, p. 54-56.
Muller, H.F. (1970) Computer Installation, IPQ Annual Progress Report ~, p. 230-231.
Noordermeer, W.H. and Moons, C. (1970) The Vario-S-Gate. A Two-Quadrant AnalogueMultiplier, IPQ Annual Progress Report ~, p. 223-226.
Valbracht, J.e. (1968) MTG A Modular Time Source, IPQ Annual Progress Report l,p. 113-114.
Willems, L.F. (1966) IPQVQX II. A Speech Synthesizer, IPQ Annual Progress Report l,p. 120-123.
83
A SPEECH SPECTRUM ROTATOR
A.C. van Nes
introduction
The contribution of temporal speech structures to the perceptual process has been
investigated in a number of research projects carried out in our Institute. In
order to isolate these structures from the syntactic and semantic content of
the speech signal, this signal is made unintelligible. An apparatus for obtaining
such a signal is. described in this report. The speech spectrum is rotated, which
makes the speech unintelligible, without altering the temporal structure. It is
also possible to shift the resulting spectrum by a certain amount.
principle
fc-flfc-bf2----... f
fl
The rotated spectrum is obtained by a well known modulation technique, i.e.
single-sideband modulation with suppressed carrier. Fig. la shows the original
speech spectrum, whereas Fig. lb shows the spectrum after modulation with a carrier
frequency f c ' This latter spectrum contains the desired rotated (mirrored against f c )
ampl.
1
a bfc
Fig. 1. The speech spectrum (a) and the spectrum obtained after modulation (b).
spectrum as the lower sideband. General methods are available for isolating this
lower sideband of the compound spectrum. A simple one is shown in Fig. 2, using a
filter to suppress the carrier and the upper sideband. This method has the serious
disadvantage that a rather sharp filter is needed, this causing a great deal of
flexibility to be lost.
t----I---I MODULATOR
fc
Fig. 2. Filter realization of a single-sideband system.
IPO annual progress report 10 1975
84
In the present design another method is chosen in which the suppression of the upper
sideband is obtained by a 1800 phase difference for the upper sidebands in the out
put of a double balanced modulator, Fig. 3. Due to the absence of a filter the
resulting spectrum can now also be easily shifted by changing the carrier frequency.
WIDE BAND
90· cos w I
NETWORK
BALANCED ~COS(IX-W)I + ~ cos(lX+w)1
MODULATOR
sinwl
WIDE BAND
cos IXI
BALANCED
NETWORK
sin IX I
MODULATOR ~coS(IX-w)1 - ~ cos(lX+w)1
CARRIER--.....__....L .... ---I
fig. 3. Double balanced modulator design for a single-sideband system.
realization
The complete diagram of the spectrum rotator is presented in Fig~ 4. The main
components are two identical 90 0 wideband phase shifters and two identical balanced
modulators. The phase shifters have two outputs, with a phase difference of
90 0~ 0.75 0 in the frequency range of 100 Hz - 10 kHz. The components for this part
of the circuit have to be selected very carefully in order to keep the phase dif
ference as close to 90 0 as possible in the frequency range indicated.
The suppression of the unwanted sideband decreases fast when the phase deviates
from 900
; we measured a 15 dB decrease in suppression for a 10 phase error.
The balanced modulators (TCA 240 or ~A 796) suppress the carrier frequency, which
can be trimmed with potentiometers Pl and P2. This suppression can be checked at
points TPA and TPB. With the aid of operational amplifier C the signals of the
two modulators are subtracted so that the upper sideband is present at the amplifier
output. With operational amplifier D the signals of the two modulators are added,
so that the lower sideband is present at the output. The suppression of the
undesired sideband and the carrier is 60 dB.
_----- _1-,'0-:..='''''--='-="+'/0''',-'-,,,, _
'If':'
J.tII ""'''''<'''' r
rllN,.... .J.
... P.·~
T
'0..
,,'r--i.:.!!..j--.,--H,,I '----llf---+I,,,,,I,,
1~ ...1.
,---------r-r-r-----------._-.... 'I''''
: .--'"'"d(1)
nr1"'"'!CS
'"'!or1"I"r1"o..;
86
ELECTRONIC EAR TRUMPET
GH van Leeuwen
Before the advent of the electronic hearing aid, the hard-of-hearing would sometimes
use ear trumpets in an attempt to improve perception of speech. At present, too,
remarks have been made to the effect that ear trumpets have certain assets (Groen,
1968). Attention has been drawn on many occasions to their directional sensitivity,which improves the signal-to-disturbance ratio. 11ention is made, too, of the
acoustical gain that goes with such horns.
Before going into the topic of this paper, it is worth while mentioning some measure
ments we made of the gain on the part of horns (Van Leeuwen,1975). It is found to
be roughly equal to the ratio of the areas of the mouth and the throat. With an
external meatus area of 1 cm 2, a 20 dB gain can be achieved given a mouth area of
100 cm 2, measured perpendicular to the direction of the sound source. From Olson
(1957) it is clear that good directivity also requires rather a big horn.
However, the frequency characteristics of horns are far from ideal, showing a
multitude of resonances and anti-resonances. These characteristics are responsible
for the "metallic" sound quality. We set out to design an electronic device, com
bining the advantages of the ear trumpet with a flat frequency charact2ristic
and smaller dimensions. The size should not be too small, however, since, if a
good directional pattern is to be achieved with a high quality microphone element,
dimensions should be of the order of 5 cm. Ease in handling is also an indication
for avoiding extremely small dimensions. Finally, we think it worth while to aim
at a device which can be applied or put away like the trumpet without fumbling.
This, in turn, sets a limit to the permissible amplification at any frequency and
constitutes an additional argument in favour of the directivity of the microphone.
The first prototype, shown in Fig. 1, was made from the left-hand shell of a
Koss Pro-4AA headphone with the dynamic loudspeaker and the liquid-filled circum
aural cushion included. A tube, 6 cm long and 1.6 cm wide, mounted on the shell,
contains a cardioid electret microphone cartridge, Philips LBC 1060/01. This tube
preserves the 20 dB or more front-to-rear ratio of the microphone element in the
bandwidth employed. The electrical amplification is achieved by means of a TAA 370
integrated amplifier with slightly modified additional circuitry (Peters, 1969).
The low supply voltage, together with low efficiency of the louJspeaker, limit
the maximal output sound pressure level to about 90 dB. Complete specifications
are listed in table I. From this the feasibility of the idea is clear, although
improvements should be possible.
Subjective impressions reveal the usefulness of the directivity of the device.
Positive reactions were also obtained from a hard-of-hearing subject. The ear trum
pet, especially, will be suited to the mIldly hard-of-hearing, while amplification
can reach 26 dB maximum. Future development will concentrate on ergonomic and sound
quality factors.
/PO annual progress report /0 /975
87
Table I
Microphone input sensitivity at 1 kHz
Microphone front-to-rear ratio at 1 kHz
Loudspeaker efficiency for 1 mW at 1 kHz
Maximumx input at 1 kHz
Maximumx output at 1 kHz
Supply voltage
Supply current less than
XDistortion
references
Fig. 1. Electronic ear trumpet.
300 llV/llbar
30 dB
94.5 dB SPL
65 dB SPL
90 dB SPL
1.5 V
3 rnA
3 %
Groen, J.J. (1968) Slechthorendheid en Hoortoestellen, Stafleu's wetensch. Uitg.Mij. Leiden, p. 64 -65.
Leeuwen, G.H. van (1975) Enige metingen aan hoorns, IPO report 274.
Olson, H.F. (1957) Acoustical Engineering, D. van Nostrand Compo Inc. Princeton,New Jersey, p. 47/108.
Peeters, A.M. (1969) Monolithic Integrated Hearing-Aid Circuit TAA 370, PhilipsAppl. Information 137.
88
A MINIATURE EMG DEVICE
J. Vredenbregt and J.H.M. van der Straaten*
Although the small EMG device described by Vredenbregt and Basten (1971) proved to
be very useful in studying the human motor system, we found that its relatively high
weight (4S grammes) was still a disadvantage. During measurements of movements,
artefacts appeared in the EMG signals due to very fast changes in acceleration as
e.g. encountered in gait studies. These fast changes, together with the relatively
high inertia of the device, resulted in small movements of the device and hence of
the electrodes with respect to the skin. This caused non-relevant low-frequencies
in the EMG signal.
This disadvantage, in addition to the fact that high input impedances are possible
nowadays contributing to better reproduceability of the electrical activity, have
initiated further miniaturisation.
Based on the existing electronic concept, an operational amplifier was made using
the thick-film technique and suitable miniature amplifiers (LM 308, N.S.C.) and
components.
The resulting amplifier fulfils nearly all the electric requirements Lid is smaller
than a normal dual in-line package.
This part of the set-up was made by the Electronic Department in cooperation with
the Mathematics Department of the University of Nijmegen, The Netherlands.
The small dimensions made it possible to fit and screen the amplifier between the
surface electrodes. Amplifier with electrodes were put in one rubber suction cup.
The device is shown in Fig. 1. in its final design. The dimensions are 4S x 1S x 8 mm
and the weight is 8 grammes. The electrodes are 28 mm apart.
Fig. 1. The miniature EMG device.
* J.H.M. van der Straaten, M.D., Department of Anatomy, University of Nijmegen.
IPO annual progress report 10 /975
89
The device can be very easily and quickly placed on the. skin by Vacuum or with ~n
elastic band. Intensive preparation of the skin is not required as any change in
skin-electrode impedance is negligibly small compared to the input impedance of the
device. Needle electrodes can be applied instead of surface electrodes if desired.
The characteristics are shown in Table I.
voltage gain (fixed value)
frequency response
input impedance
output impedance (minimum)
common mode rejection
noise level at input side
-input short-circuited
-input closed with 0.5 Mohm to common
supply voltage range
maximum output voltage
power dissipation
100
flat within 1% between 18 and 800 Hz
greater than 400 Mohm15 ohm
better than 70 dB at 50 Hz
5 lJV rms
8 lJV rms
between + 4V and + 15 V dc
29 V
40 mW (unloaded)
Table I.
The device can be short-circuited without damage. Likewise a high voltage (220 V)
at the electrodes will cause no damage to the circuit.
frequency
I--+-j----I---r-- ---1---+-+-1---+---1
-7t2~----LS----:1LO---:2--S:----..:'10"""L"--!-2-...lS-1.L0
".,---,2L----'-S-110
4
Fig. 2b
5 ltf 2 5 103 2 5 104
frequency
I/ \
/ 1\/ I'\.
"1---
'5 1.0'-.
£ 0.8
:; 0.6a.
~ 0.4
In Figures 2a and b the frequency response and the phase shift relation are presented.
It may be noted that the latter is less than 20 degrees over the whole flat frequency
range, which is an additional improvement compared to the previous device. Finally,
the device proved to be mechanically very reliable.10
2• 1.2
0.2
o2 5 10 2
Fig. 2a
Fig. 2. The frequency response and the phase shift.
reference
Vredenbregt, J. and Basten, e.G. (1971) A Modified Small Electromyograph, IPQAnnual Progress Report §., p. 130-131.
5 Lp.o. publications
90
91
I.P.O. PUBLICATIONS 1975
P 277 G. Rau en J. Vredenbregt
Het afleiden van het elektromyogram en de toepassing voor kwantificeringvan de spieractiviteit.
Sport, lichamelijke vorming en wetenschap: een overzicht van het wetenschappelijk onderzoek in Nederland op het gebied van de sport en lichamelijkevorming.Eds. J.E. Hueting en R.A. Binkhorst, Leiden: Meander 1972, p. 83-94.
P 278 B.L. Cardozo
Some notes on frequency discrimination and masking.
Acustica, 1974, ll, p. 330-336.
In order to study frequency discrimination near and at the threshold ofmasking, a paradigm was used which permitted frequency discrimination anddetection of noise-embedded sinusoids to be measured in one experiment. Thejust noticiable difference in frequency at the threshold of detection wasfound to be tof" 'U 16 Hz with durations of the sinusoid of tot = 64 ms andwith tot = 256 ms. With tot = 16 ms, tof" 'U 64 Hz. The frequency of the sinusoidswas 1000 Hz in all experiments. The possible implications for a placetheory and a periodicity theory of pitch perception are mentioned.
P 279 G. Rau and J. Vredenbregt
Mechanical and electromyographic phenomena during normal finger tremoroscillations.
Paper presented to the 6th Congreso Internacional de Medicina Fisica,Barcelona, 1972. Volume II of the Proceedings, p. 316-336.
P 280 L.P.A.S. van Noorden
Temporal coherence in the perception of tone sequences.
Doctoral Thesis, Eindhoven University of Technology, February 1975.
Whether a tone sequence is perceived as temporally coherent (like a melody)or rather as split up in parts (a phenomenon which we call fission here)depends to a certain extent on the physical properties of the sequence.This is the main theme of the present exploratory study, introduced inchapter 1 along with literature in this field (that can be characterizedas a branch of auditory pattern recognition or as auditory Gestalt psychology).Chapter 2 is devoted mainly to the effect of the pitch interval I betweensuccessive tones and the tone repetition time T (the reciprocal of the tempo)in sequences with a simple structure, like ABAB .. or ABA.ABA ... Differentpsychoacoustic methods are used to determine the domain of existence oftemporal coherence and its boundaries in the I - T plane. With small pitchintervals I, there is always temporal coherence. When I is increased at agiven value of T the fission boundary will be reached at a given moment.Beyond this boundary there is a large region in which the way the listenerdirects his attention determines whether he will hear fission or temporalcoherence. When I is increased further, the temporal coherence boundary iseventually crossed and the observer perceives fission irrespective of hisattentional set. The temporal coherence boundary for fast tone sequencesis situated at smaller tone intervals than for slow sequences.Chapter 3 describes qualitative experiments showing contiguity of theexcitation sites on the basilar membrane to be a necessary condition fortemporal coherence. However, this condition is not a sufficient one, asevidenced by the loss of temporal coherence in monotone sequences with sufficiently large loudness differences between successive tones. Qualitativeexperiments also make it clear that temporal coherence may exist between .pure tones and one of the components of a complex tone. This effect might beused to form the basis for investigation of auditory frequency resolution;however, this falls outside the scope of the present investigation. AppendixA shows the feasibility of determining the subjective partial loudness of acomponent of a complex tone.The effect of alternating loudness of consecutive tones is further exploredin chapter 4, dealing with such problems as the relation between temporalcoherence and the continuity effect. In this context we hit upon a phenomenon
IPO annual proRress report 10 /975
92
that, to our knowledge, has not been mentioned in the literature before:when listening to the weak tones in fast sequences of tones with alternatingloudness, the observer hears these weak tones with twice their actual tempo.This phenomenon is called the "roll effect".Chapter 5 deals with less simple tone sequences in which the pitch intervalswere chosen at random from a certain set of intervals. The results suggestthat anticipation has little or no influence on the ability to hear temporalcoherence in fast sequences. The hypothesis is advanced that pitch tracking- which seems to be controlled by the stimulus rather than by the observer is performed by the auditory system by means of pitch motion detectors.There is no way to test this hypothesis experimentally at present, butresults from the study of the analogous field of visual movement which arepresented and discussed do make it plausible.Before returning to the starting points, music and speech, a number of experiments are described in chapter 6 showing that the temporal structure of atone sequence gets more blurred as the temporal coherence becomes lessmarked. A possible relation with forward masking is sketched in Appendix B.In chapter 7 we have tried to fit the present findings into the frameworkof musical theory, with particular reference to counterpoint rules. Phenomenarelated to fission that are found to occur in technical speech processingare also briefly discussed.A retrospect (chapter 8), a glossary and a gramophone record with explanatorycomment complete the present study.
P 281 B. Shackel and F.L. van Nes
Ergonomics in Brazil.
Applied Ergonomics, 1975, ~, p. 43-44.
P 282 J.A.J. Roufs
The standard observer: a controversial subject.
Ophtalmologica, 1975, l2l, p. 43-44.
P 283 F.L. Engel
Visibility, conspicuousness and attention.
Ophtalmologica, 1975, 12l, p. 41-42.
P 284 H. Duifhuis
A theory on cochlear nonlinearity and second filter
Poster presented to the Fifth International Biophysics Congress of theInternational Union for Pure and Applied Biophysics, Copenhagen, 1975.Poster Abstract 57.
P 285 I.H. Slis
Spraaksynthese door regel s.
Nederlands Akoestisch Genootschap, 1975, publ.nr. 32, p. 1-7.
A computer programme is described which contains a system used for thesynthesis of reasonably intelligible Dutch. The input of the programme isgiven in terms of a string of symbols for phonemes and conditions underwhich the phonemes have to be synthesized (e.g. stress). The output is aset of parameter values for a hardware speech synthesizer (Rockland) whichneeds information for every period of the fundamental frequency.
P 286 J. 't Hart and R. Collier
Integrating different levels of intonation analysis.
J.Phonetics, 1975, ~, p. 235-255.
This paper deals with a partly experimental approach to the complex relationship that exists between the abstract, global structures of intonation, andthe concrete, atomistic features of the course of fundamental frequency. Morespecifically, we have introduced three levels of description and have attempted to establish links between these: a concrete and atomistic level of theperceptually relevant pitch movements, a concrete and global level of theaudible pitch contours and the measurable fundamental frequency curves,and finally, an abstract and global level of the intonation patterns. In thecorresponding three main parts of the paper it will be shown (1) how pitchmovements can perceptually segment the fundamental frequency continuum;(2) how a 'grammar" can be designed that is capable of generating all and
IPO annual progress report 10 1975
93
only the acceptable combinations of pitch movements, i,e. pitch contours;(3) how listeners categorize different pitch contours into meaningfulclasses, i.e. intonation patterns. The investigation was based on Dutchutterances.
IPO annual progress report 10 1975
94
papers accepted for publication
MS 240 S.G. Nooteboom
On the Internal Auditory Representation of Syllable Nucleus Durations~
To appear in: Auditory Analysis and Perception of Speech.
This paper will report on some perceptual experiments in which subjectsare asked to adjust the durations of syllable nuclei in synthesized wordsaccording to some internal criterion. The results indicate that the internal,auditory representation of syllable nucleus durations may be more accuratethan spectrographic measurements. The internal representation of how wordsshould sound appears to be governed by rather strict timing rules, inwhich phonological vowel quantity, s~ress and position in foot and word aremajor factors. The role of the resulting timing patterns in the auditoryprocessing of speech will be discussed.
MS 241 I,H. Slis
Consequences of Articulatory Effort on Articulatory Timing.
To appear in: Auditory Analysis and Perception of Speech.
Four different effort oppositions have been studied on labial plosives,viz. :
(1) the voiceless-voiced (tense-lax) opposition' in /p/ vs. /b/,(2) initial /b/ before long (tense) and short (lax) vowels,(3) lip closing of /p/ after short ("scharf geschittene") and long ("weich
geschittene") vowels, and(4) stress vs. non-stress in intervocalic /p/.
Lip closing activity was measured on the orbicuZaris oris mus~le and closureduration was measured by means of lip contacts. More effort in the oppositionsbetween voiceless and voiced plosives, lip closing after short and longvowels, and stress vs. non-stress, results in higher closing activity andlonger closure duration of the lips. In the fourth opposition, /b/ beforelong and short vowels, no difference in emg activity was found with moreeffort. These results were interpreted as an advancement of the commandswith more effort compared to those with less effort.
MS 258 J.J. Andriessen and H. Bouma
Eccentric vision: Adverse interactions between line segments.To appear in: Vision Research.
The paper deals with adverse interactions between line stimuli in eccentricvision. Both contrast threshold and just noticeable difference of slanthave been measured for a test line as a function of the distance from anumber of surrounding lines. Test lines were either parallel or perpendicular to the surrounding lines.It turns out that the interference affects both contrast threshold andj .n.d. of slant with a clear-cut orientational specificity. The surprisingresult is the extensive spatial range of the interference: between parallellines it operates over retinal distances of about 0.4¢t degrees, where ¢tis the eccentricity of the test line. Large-distance interference limitseccentric spatial vision in daily life much more than classic visualacuity limits would indicate, and makes eccentric vision probably quitedifferent from "unfocussed" foveal vision.
MS 261 F.L. van Nes
Analysis of Keying Errors.
To appear in: Ergonomics.
The performance of keyboard operators can be expressed in terms of keyingtime and errors; this paper dea.s with errors. If the causes of errorswere known, it might be possible to reduce the percentage of wrong keystrokes.Therefore, an attempt was made to identify these causes by classifying293 errors, collected in a field study, into seven categories. About 25%of the errors were due to the operator misinterpreting input data; betterdata presentation may decrease this percentage. At least 40% of the keyingerrors could be traced to underlying errors in finger movement control,and would not seem amenable to direct error decreasing measures. Automatic
/PO annual progress report /0 /975
95
punching of repetitive information brings about numerous repetitive errorsas well; improved instructions on the use of programmed punching facilitiesmay reduce these errors.
MS 263 H. Bouma
Auditieve Funkties.
Verschijnt in: Nederlands Handboek voor de Psychonomie, Hoofdstuk 8.
MS 267 H. Duifhuis
Cochlear nonlinearity and second filter; possible mechanism and implications.
To appear in: J.Acoust.Soc.Amer.
We indicate that the directional sensitivity of the hair cell together witha directional distribution of frequency over the hair cells comprise apossible physiological basis for the second filter. Tuning disparity offirst and second filter denotes the difference in tuning frequency; at agiven position x the tuning frequency of the first filter is aCF, of thesecond CF, with a>l. This accounts for the asymmetry in location oftwo-tone suppression areas. The compressive nonlinearity is describedby a vth law device with v<l. We analyse implications of this model fortwo-tone suppression, sharpening, pure-tone masking, and combination tonegeneration. Basic features of these phenomena are described adequately.For combination tones the propagation problem needs further study. Onthe basis of a comparison of literature data and theoretical predictionswe estimate ~=1.2, and v=0.6. Regarding accurate shape of first and secondfilter, the discussed data provide means for a qualitative evaluationonly. Possibilities for a quantitative analysis are indicated.
MS 270 H. Bouma and Ch.P. Legein
Foveal and Parafoveal recognition of letters and words by dyslexics andby average readers.
To appear in: Neuropsychologica.
In adult readers, parafoveal recognition of words is limited by stronginterferences between letters.In the present study subjects were twenty dyslexic children and twentyaverage readers (9-14 yrs.). Recognition scores of isolated letters, ofembedded letters and of words were compared both in foveal and in parafoveal vision.The groups did equally well on isolated letters whereas the dyslexicsgenerally stayed behind on embedded letters and on words. Individualscores of embedded letters and of words were moderately correlated as wereword score and reading level.It is advocated that research on dyslexia is directed at possible deficitsin reading processes such as eye control, word recognition, and storagenot only as separate factors but rather in their intimate relationships.
MS 272 S.G. Nooteboom and A. Cohen
Anticipation in speech production and its implications for perception.
To appear in: Structure and Process in Speech Perception; Proceedings ofthe Symposium on Dynamic Aspects of Speech Perception.
In speech production we find a number of examples indicating that a speakeris anticipating on what is yet to be produced. As such anticipatory behaviouris often reflected in the acoustic structure of speech, it seems reasonableto expect that studies of anticipatory behaviour in speech productionmay provide clues to the dynamic organisation of speech perception.In this paper we will discuss some examples of empirical evidence foranticipation in speech production, and their possible implications forspeech perception. The examples are taken from three, rather divergent,areas, viz. slips of the tongue, intonational patterning, and temporalorganization of speech.It will be made plausible that the size of chunks of speech material overwhich anticipation may take place depends on the linguistic level of organization, and that correspondingly, the size of decision units or chunks inspeech perception, varies with the level of processing.Some testable hypotheses will be derived as to the perceptual implicationsof acoustic cues stemming from anticipatory phenomena in speech production.With respect to anticipatory effects in the temporal organization of speechthe hypotheses are put to the test. Experimental evidence shows thatlisteners have an implicit knowledge of such effects, and may actually usethis knowledge in resolving perceptual ambiguities.
IPO annual progress report 10 1975
96
MS 273 J. 't Hart and R. Collier
The role of intonation in speech perception
To appear in: Structure end Process in Speech Perception: Proceedings ofthe Symposium on Dynamic Aspects of Speech Perception.
Recent research of prosodic phenomena has shown a growing interest in theimportance of intonational cues as mediators in speech processing.Our earlier study on systematically occurring pitch events in speech hasled to the development of an adequate descriptive system of (Dutch)intonation. In developing a "grammar of intonation" as a device thatgenerates contours for entire utterances as composed of perceptually relevantpitch movements, the need was felt to introduce the "intonational block" asan intermediate unit between pitch movement and contour. The internalstructure of the blocks is subject to rather stringent limitations, buttheir external coupling appears to be almost free. These properties ofthe blocks suggest that they are more than arbitrary clusters of pitchmovements.In this paper we will concentrate on the possible role of the ~locks inthe perception of speech. We will try to show that the block boundariescan stand in some relation to the boundaries of syntactic and/or semanticunits.Consequently, the block boundaries may constitute cues for the listenerabout how to apply a first, rough segmentation of the speech continuuminto units that are the most suitable candidates for being processed asa whole.
MS 274 I . H. S1 is.
Rules for the synthesis of Speech
Paper presented to the 8th International Congress of Phonetic Sciences.
MS 275 J. 't Hart
Discriminability of magnitude of pitch movements in speech-like signals.
Paper presented to the 8th International Congress of Phonetic Sciences.
MS 278 S.G. Nooteboom
Context effects in the perception of phonemic vowel length.
Paper presented to the 8th International Congress of Phonetic Sciences.
MS 279 A. van Katwijk
The Role of Respiratory Effort in Accentuation.
Paper presented to the 8th International Congress of Phonetic Sciences.
IPO annual progress report 10 /975
97
reprints and preprints of i.p.o. publications
Requests for reprints or preprints of publications listed on pages 91 - 96 can be
stated in the form appearing below.
&ack numbers and reprints or preprints may be obtained from:
LibraryInstitute for Perception ResearchP.O. Box 513
EINDHOVEN 4502The Netherlands
type-print or capitals please:
NAME:
FUNCTION and/or
INSTITUTE:
ADDRESS:
CITY:
(STATE/AREA CODE)
COUNTRY:
would appreciate receiving a reprint or preprint of the following publications:
title and author(s):
Data: Signature: