THE RELATION BETWEEN SUBJECTIVE AND OBJECTIVE ASSESSMENT OF SPEAKING RATE IN CZECH RADIO NEWSREADERS

This article examined objective and subjective speaking rate and their relation. Read speech of 22 Czech radio newsreaders (13 male and 9 female) constituted the material for the study. The objective rate was measured as speech rate and articulation rate, both expressed in syllables per second. Two domains, a single news report and an intonation phrase, were chosen as the units of observation. A perception test was used to establish the subjective assessment of speaking rate. The test was made up of the full news reports and the subjects were asked to rate their tempo on a scale slow – normal – fast. The median of speech rate and articulation rate for the individual news reports was 5.7 syll/s and 6.2 syll/s, respectively. In general, the listeners rated the stimuli tempo as normal and there was no significant difference between the subjective evaluation of female and male speakers. The possible factors influencing the relation between the subjective and objective rating are discussed; no simple and direct relationship between them was found.


Speaking rate in Czech media speech
Radio and TV news broadcasts fall among read texts that are very often examined because newsreaders and other media speakers are taken as promoters of standard speech.Regarding speaking rate, there is objective evidence that speech pronounced in the Czech radio and TV has accelerated in the last decades.Mean speech rate of radio newsreaders until 1994 was 5.1 syll/s (males) and 4.8 syll/s (females), in 1996 it was 5.3 syll/s (males and females) (all measured by Bartošek, 2000) 1 and 6.2 syll/s in 2002 (according to Palková, published in Palková et al., 2003).Mean speech rate of Czech TV weather forecast is 5.6 syll/s (Balkó, 1999).Researchers pointed out the fast speaking rate in the mass media and difficulties in speech intelligibility that might be caused just by it (Bartošek, 1995, Palková, 2004, Havlík et al., 2013). 2e decided to inquire into this phenomenon and to inspect the relation between the objective and subjective speaking rate in Czech news broadcasts, because in previous experiments researchers focused mainly on objective measurements without systematic inspection of the subjective perspective of listeners.
In the experiment, both the speech rate and articulation rate3 of Czech radio newsreaders were measured and a perception test was created to receive listeners' assessment.Then the measured (objective) speaking rate was compared with the listeners' ratings, i.e. the perceived (subjective) speaking rate, to establish the degree of correlation.

Variability in speaking rate
Speaking rate shows both the variability between and within speakers; there are many factors that affect it, both extralinguistic and intralingustic.However, their influence on the speaking rate may not be direct; the factors may complement one another, but they may also be in contradiction (conf., e.g., Kohler, 1986).Verhoeven et al. (2004) examining the Dutch corpus of semi-spontaneous speech found that age, sex and the dialect region affect speaking rate.According to their experiment younger people spoke faster than the older, men spoke faster than women and speakers from the West region, considered to be the linguistic centre of the Netherlands (sic!), showed higher speech rate compared to speakers from dialect regions further from the centre.Quené analysing the same corpus claimed that these factors (age, gender and centre/periphery) may play a role, however he considered the length of the intonation phrase to be the most important factor.He quoted findings of Nooteboom and Lindblom -Rapp from the 70s and explained that longer phrases containing more syllables showed a tendency towards a faster speaking rate and a shorter syllable duration. 4ccording to Quené's (2005) experiment, younger people tended to use longer intonation phrases and this was the cause of their faster speaking rate.However, Quené admitted that it was not possible to provide such a direct explanation for the other factors.There were other aspects, arising from the linguistic structure of texts, that caused changes of speaking rate, such as syllable structure (Pfitzinger, 2006), or phrase-final deceleration (for Czech Dankovičová, 2001;Volín & Skarnitzl, 2007), etc.

The relation between objective and subjective speaking rate
The variability of speech rate is also reflected in perception.Just as in speech production, there are many factors that affect our perception of speech rate.Kohler (1986), investigating German words and sentences, found overall duration, overall F0 level and F0 movements to be important tempo cues.He examined not only the strength of the single factors, but pointed out that there was a relation between them and also between the factors and the production patterns.J. Koreman (2006) confirmed that articulation rate played an important role, yet not an exclusive one, for the perception of speech rate; other factors, e.g.pausing, disfluencies and other prosodic properties of speech, also determined the perceived speech rate.In his experiment the slow and fast speakers differed in their objective speech rates, however the grouping of the speakers according to the perceived speech rate did not match the categorisation based on the measured speech rate.
H. Pfitzinger (1998), who focused on local speech rate, claimed that linear combination of both the syllable and the phone rates (see 1.4 bellow) corresponded best to the perception of speaking rate.

Measurement of speaking rate -methodological notes
Speaking rate tells us, in general, the number of speech units produced by a speaker over a time unit.It is often expressed in syllables per second (syllable rate) or in phones per second (phone rate).It could also be expressed as the average syllable/phone duration.The interval unit could also be defined as vocalic and consonantal intervals (VC intervals) (conf.BonnTempo-Corpus (Dellwo et al., 2004)).
The chosen domain used to establish the speech rate did crucially influence the obtained results; it especially affected the degree of the speech rate variability (conf.Miller et al. (1984)).In the present study two domains were applied: a single item of a news bulletin and a unit we call an intonation phrase.The reason to choose an intonation phrase was based on the research of Dankovičová (1997Dankovičová ( , 2001)).She examined the variability of articulation rate in Czech comparing three domains (breath group, syntactic phrase and intonation phrase) and she found that it was only the intonation phrase that had regular patterns of articulation rate.This unit was phonetically defined as a group of stress units combined together into a compact intonation unit.However, to determine this unit in Czech the sound properties of its boundaries were crucial (Palková, 1997, Daneš, 1957).Janoušková (2008) formalized and experimentally verified the hierarchy system of sound units in Czech and showed that a pause, a distinct melodic contour and the presence of final lengthening were important boundary markers of an intonation phrase.See the Appendix for an example of stimuli with intonation phrase boundaries marked.
Pauses influence both the production and the perception of speaking rate.They were also taken into account in the procedure of speech rate measurements.In calculating speech rate, pauses were counted into the duration of speech; however, in articulation rate measurements the duration of pauses was excluded.The principal question was to define the minimal pause duration to determine an articulatory pause.The usual value ranged between 0.2 and 0.3 s, following the works of Goldman-Eisler (1961).Hieke et al. (1983) recommended a value closer to 0.1 s; the duration of 0.13 s was used by Dankovičová (2001).

Material
For the purpose of the experiment news broadcasts of the public Czech Radio (CRo) were used.The recordings were obtained from the web archive of CRo5 using the Cool-Edit96 software, sampling frequency 32 kHz, 16-bit amplitude resolution6 ; all the recordings were taken in the period between February-April 2008.The news bulletins were divided into single news reports; only the main speakers (i.e., the news readers) were taken into account, the speech of correspondents, analysts etc. was excluded as well as the news with backchannel sounds and disfluencies such as slips of the tongue.To eliminate yet another potential variable, the topic of the news was limited to "politics"; sports news and weather forecasts were also omitted.According to these criteria 22 speakers (13 male and 9 female) and their 22 news reports (1 recording per speaker) formed the corpus for the further procedure.7

Perception test
In order to keep the task as natural as possible, the news were presented to the listeners in continuous form, as retrieved from the source.The testing was carried out one year after retrieval of the material.Overall, the median length of the stimuli was 26.6 s, item length ranging between 24-32 seconds.Speech covered over 90.0% of each tested sample; except for one stimulus where the volume of pauses exceeded 10.0%.The stimuli contained 28 intonation phrases and 160 syllables on average and the number of intonation phrases/syllables was comparable in different stimuli.The intonation phrases that consisted of just one stress unit were the most numerous (46.3%). 8ach stimulus corresponded to one news report and it was played just once.Because of the total duration, the test was divided into two parts; each part lasted approx.7 min., including 3 stimuli for the training session.
The listeners were Czech natives, students of the Czech language at the Faculty of Arts at Charles University in Prague, their mean age was 21 years.Seventeen listeners participated in both parts I and II (with an interval of 1 week in between the two parts); in addition, other 8 listeners participated in part II.All the listeners were females; a sufficient number of males was not available.The perception test was carried out in a sound treated lecture room.For each stimulus, the listeners judged the speaking rate of the speaker.
The listeners were presented with 3 main categories of speaking: slow (-1) -normal (0) -fast (1); they had the possibility to refine the evaluation using arrows to indicate a finer assessment within the basic category, in essence, it was a 9-point scale.For an illustration of the sheet form see Tab. 1 (part A). (Part B shows how the judgements were expressed numerically.)

Segmentation of the speech material
First, each news report was transcribed orthographically.To prepare the canonical transcriptions, the software Convertor (Laun, 2001) and the template TRIPAC (Janoušková, 2003) were used and consequently manually corrected.The intonation phrases were labelled using the software Praat (Boersma & Weenink, 2002-2010).A pause was classified as an articulation pause if its duration was at least 0.2 s (see 1.4 above).Both the labelling and the listening method yielded the same results regarding the position of pauses.
The segmentation was done according to the rules based on Machač & Skarnitzl (2009).It was necessary to solve special cases where a consonant at the beginning of a word was preceded by a glottal gesture followed by a schwa.Such a preglottalized sequence was regarded as a part of the given word, and was counted as an extra syllable.(Conf.the findings about preglottalization in news reading in Skarnitzl & Machač, 2009.)

Laboratory measurements
Speaking rate was expressed in syllables per second, both for speech rate (SR) and articulation rate (AR).Except for preglottalization mentioned in 2.3, the canonical number of syllables in words was used to calculate speaking rate; in Czech, syllable compression was not frequent in general and it appeared only once in our material (unlike the elision of consonants).
The entire news report corresponding to one stimulus of the perception test served as the unit of observation; speaking rate was measured in the whole recording: the duration of each item was divided by the number of syllables to receive SR.The item duration reduced by the total duration of internal pauses and divided by the number of syllables served for the calculation of AR.
Because of the possible variation of speaking rate within the stimuli and its potential influence on the subjective assessment, articulation rate was also calculated in yet another way.
Another value of a speaker's articulation rate (ARIP) was obtained by determining the mean articulation rate of all intonation phrases within each recording (where the into-nation phrase AR was calculated as the duration of a single intonation phrase divided by the number of syllables it contained).
The values of SR, AR and ARIP for each stimulus (N = 22) were obtained.

Subjective speech rate
The speaking rate of every stimulus, i.e. each news report, was judged (see 2.3 above).Altogether there were 460 assessments available.The distribution of the global evaluation of speaking rate within the perception test was: slow (14.6%), normal (63.3%) and fast (22.2%).It is obvious that the listeners did not use all the categories equally in their assessments.The category normal speaking rate was chosen in two thirds of the assessments, while in almost one quarter of the cases the listeners assessed the speaking rate as fast.
Subsequently the assessments of individual stimuli were examined.See Fig. 1 for the categories slow -normal -fast as a whole and Fig. 2 for individual stimuli. 9The category normal speaking rate was not only chosen by the listeners most often but it also showed the highest agreement in the rating of single stimuli in comparison with the other categories.The range of agreement of rating the individual stimuli within the category normal was 23.5-88.2%,while the agreement on speaking rate in category slow was 0.0-76.5% and 0.0-58.3%for the category fast.It can be seen that at least 20% of the listeners agreed that the speaking rate of any given stimuli was normal, whereas the categories slow or fast remained unused in the evaluation of some stimuli.For the following procedure, a recalculation of the subjective judgements was done (see Tab. 1 Part B) and the average evaluation of each stimulus was calculated.The tendencies shown above were confirmed: the speaking rate of stimuli was evaluated as normal on average.
According to the Mann-Whitney test there was no significant difference between the subjective evaluation of female and male speakers at alpha = 0.05 (the U-value is 30, the critical value of U at p < 0.05 is 28).For each stimulus, we measured speech rate (SR) and articulation rate (AR and ARIP) 10 .

Objective speech and articulation rate
Median values of all the 22 stimuli (in syll/s) were: SR 5.7, AR 6.2, ARIP 6.1.The values for SR were lower than for the articulation rates, and there were visible overlaps between AR and ARIP.See Table 2 and the box plot in Fig. 3.The differences, tested by means of a t-test for correlated measures, were significant at alpha = 0.05 not only for SR and articulation rates, but also for AR and ARIP; SR -AR: t (21) = -17.4;SR -ARIP: t (21) = -10.9;AR -ARIP: t (21) = 4.5.According to the Pearson correlation coefficients all the correlations (SR -AR, SR -ARIP, AR -ARIP) were very high (r = 0.9 in all three cases).The scatter plots (Fig. 4a-c) show these relations very clearly.The values of SR, AR and ARIP in female speakers were more compact than in males, but there were visible overlaps and according to the Mann-Whitney test, the differences between males and females in SR, AR and ARIP were not significant at alpha = 0.05.See Table 2 and box plot in Fig. 5.

The relation between subjective and objective speech and articulation rates
Firstly, the relation between the subjective speaking rate and the measured values for SR, AR and ARIP was examined.Spearman rank correlation coefficients implied low/ moderate correlations (rs = 0.5 in all three cases).The scatter plot in Fig. 6 shows the relation between SR and the subjective speaking rate.The influence of pauses, as another potential variable, was also examined.The Spearman rank correlation coefficient implied very low or low correlation between the amount of pauses and the examined parameters (SR: rs = 0.08, AR rs = 0.36, ARIP: rs = 0.30, subjective speaking rate: rs = 0.05).
It seems that there was no single correlate that would certainly describe the relation between the objective and the subjective evaluation of the tested stimuli in our corpus.This hypothesis could be illustrated by other specific examples.Speaker A was one of the objectively fastest speakers in our corpus (SR: 6.2 syll/s, AR 6.9 syll/s).Subjectively his speech rate was evaluated as fast by 52.9% of listeners and as normal by 47.1% of listeners.Speaker B received similar subjective evaluation (fast in 58.3% and normal in 37.5%) but had lower objective values (SR: 5.8 syll/s, AR 6.2 syll/s).On the other hand, subjective speech rate of speaker C was evaluated as normal in 88.2%, although the objective values were higher (SR: 6.0 syll/s, AR 6.6 syll/s).

Discussion
The speaking rate of read radio bulletins was examined from both the subjective and objective point of view.
Regarding the objective rate, both the speech rate (SR) and the articulation rate (AR and ARIP) were calculated.All of these parameters were highly correlated with each other and at the same time the differences between them were significant.On the other hand, there was no significant difference found between male and female speakers; this finding was in accordance with the results of Veroňková and Janoušková (partially published in Veroňková, 2012), who examined the speaking rate of TV newsreaders.
The values measured in our experiment (median values in syll/s SR 5.7, AR 6.2, ARIP 6.1) corresponded to SR of Czech newsreaders obtained by other researchers in their studies from the last decade: 6.2 syll/s measured by Palková (in Palková et al., 2003) or 5.9 syll/s (males) and 6.1 syll/s (females) (Veroňková & Janoušková) 11 .According to Palková (2004), the acceptable SR of news in Czech ranged between 5.5 and 5.8 syll/s.As far as our recordings are concerned, the SR of the tested news reports falled mostly to the higher end of this range, some even exceeded the limit (compare with a similar finding in Shevchenko & Uglova (2006) for TV news in the USA).Speaking rate of news presentation in professional speakers was consistently faster than SR in other genres or in non-professional speakers: the average SR of read speech pronounced by Czech university students was 4.7 syll/s (Balkó, 1999), 4.5 syll/s (Veroňková-Janíková, 2004); SR of guests performing in the radio broadcasts was 4.3 syll/s (Bartošek, 2000); SR of direct sport reports was 5.6 syll/s (ibid).
Speaking rate of the tested speech samples, i.e. news reports, was subjectively judged as normal on average.The same finding was examined by the above-mentioned study (Veroňková, 2012).Two factors could serve as an explanation for the low/moderate correlations between subjective judgement and objective measures of speech rate.12Firstly, the age of listeners may play a role.In both cases, the subjects were university students, i.e. younger adults.It is possible that the younger generation was more tolerant to higher speaking rates. 13This opinion can be supported by the findings of Verhoeven et al. (2004) and Quené (2005) (mentioned in 1.1 above) regarding the faster SR of younger people, together with the results of Schwab (2011) who confirmed the influence of the listeners' own objective SR to their subjective judgement of speaking rate.Secondly, the listeners probably took the type of communication and text into account -the SR within the genre, i.e. the news reporting, was subjectively being rated as normal.An experiment examining the subjective assessment of speech samples from different genres is being prepared.
In our experiment, which was focused on relatively long stimuli, no simple and direct relation between the subjective and objective rating was found.As it was discussed, there are many factors that could influence both the perception and the production of speech rate.It is clear that the longer the speech sample the harder it is to establish and mark off the one and most important factor as all of them are tightly bound together.There are several factors that deserve special attention in the following analyses of the corpora: F0 contour, pauses and the way of segmentation in general, the precision of articulation and phone rate.In the future studies, it would be useful to obtain the subjective speaking rate assessment of single intonation phrases from our corpora and to compare it both with the subjective rating of the given speaker and with the objective measurements.

Translation
The crew of the Endeavour space shuttle has completed the fifth extra-vehicular activity.During the six-hour EVA, they attached a several-meter long extender with a camera.It will serve for space shuttle heat shield tests so that a disaster, similar to the one of space shuttle Columbia from 2003 when a small crack caused its break-up and all seven crew members died, would not happen again.
The astronauts in the Endeavour space shuttle are now resting before they will take off back to the Earth on Tuesday.

Figure 1 .
Figure 1.Subjective evaluation of speaking rate.The volume of slow -normal -fast ratings (in %).Indicated are the median, quartile range, and range.

Figure 2 .
Figure 2. Subjective evaluation of speaking rate of individual stimuli.The volume of slow -normal -fast ratings (in %).

Figure 5 .
Figure 5. Speech rate (SR), articulation rate AR and ARIP of stimuli.Indicated are the median, quartile range, and range, as well as extreme values.

Table 1 .
Illustration of the answer sheet for the perception test (part A), with the corresponding numerical expression of the judgements (part B).