| Experimental ergonomic evaluation with user trials: EEE product development procedures | ||
|---|---|---|
| Prev | Chapter 6. Discussion | Next |
All of the present substudies utilised user trials for ergonomic evaluation. Broberg (1997) integrated ergonomics into the product development process. His main area of interest was the prevention of occupational health hazards and the promotion of ergonomically sound workplaces during the development of new products. In this study, ergonomics was also integrated into the product development process, but in different ways. The main aim was to find methods to help in decision-making and in developing the product in a user-centred way and to find simple ways of taking into account the objective end-user match (including product dimensions vs. user anthropometrics), i.e. end-users’ opinions of the products.
In this study, it was kept in mind that the trials should not be excessively time-consuming or expensive. Pressure for time often detracts from the accuracy and quality of the information gained from the evaluation (Jordan et al. 1996). It is a challenge to maximise the speed at which an evaluation is carried out with minimum compromise of the integrity and usefulness of the results. However, engineering is concerned with trade-offs, and good management of trade-offs creates balance (Casaday 1991), which is needed in effective design. A good trade-off that can be achieved co-operatively with users was emphasised in the present study. Balance has been recently attained through multidisciplinary teams in product development. In phone development, for example, multi-disciplinary teams have been used by Sony Electronics to apply user-centred design to ensure a highly usable cordless phone (Rutter & Becka 1997) and at the University of Oulu in the development of automatic phone service for the elderly (Pirinen et al. 1997). In this study, multidisciplinary teams were used in the substudies described in the papers I, II, VI and VII.
This study belongs to the tradition of product design research. This study concentrated on the application of the method in different contexts. The procedure itself was not actually discussed, though some provisions were made. Some aspects of user trials emerged. The most important thing in a trial is the subject. The subjects must be taken into consideration carefully and provided friendly treatment. Little chats before the test make them feel relaxed, and coffee breaks and chats after the test help to ensure that they are willing to come again and be included in the ”subject pool”, which term was used at Philips (de Vries et al. 1994).
The subjects should be given the feeling that they are something special and necessary for the evaluation of the product or system. It must be emphasised that the product is investigated and its faults and usability improved with the results of the tests based on the subjects’ performance and preference. If this is not highlighted, the subjects believe that their performance is tested and observed – and wonder if they pass the test. However, much greater emphasis should be placed on arranging a user trial as if it were a study of its own. Further things to be considered are the order in which the tasks are to be done, the sampling of the subjects (for example with random sampling as in paper VI) and the physical or mental effort needed to maintain concentration during the trial. These are only a few aspects to be considered in more detail.
Below, the seven key terms of evaluation in the user trials presented in Fig. 5 will be discussed: (1) recommendations and legislation, (2) technology and markets, (3) approaches, (4) context, (5) simulation, (6) subjects and (7) methods.
The foundation of this study, and the individual substudies, lies in the ergonomics and usability literature. However, the evaluation was compatible with the ISO 13407 (1999) determination, which also has many roots in the literature. The user trials (with the methods used) provided feedback that can be used to improve design (a), and according to the trials, it will be possible to assess whether the user and organisational objectives have been achieved (b). However, the item long-term use of the product or system (c) was not monitored (cf. Chapters 6.2.2 and 6.5).
The methods of this study can also be applied in the EU product development processes (Building… 1996). In this study context, the methods were used with prototypes and products. The methods can also be applied using demonstrators. However, as Hyppönen (1999) has pointed out: “The legislation, directives, standards, official and unofficial rules, regulations and norms form base line, framework and trend for all development work, but are also influenced by different stakeholders who develop technology.”
The EU product development process (Building… 1996) also highlights the validation of development processes with two stages: a verification and a demonstration stage. This study fulfilled at least the demands for a verification stage: a small but sufficient sample of users in a real-life situation tested the technical feasibility of the product. And it partly also fulfilled the demonstration stage demands: a sufficiently large sample of users in a real-life situation provided information on user-friendliness.
Recently, more and more articles have been published to support the goal setting of the present study, e.g., Pinto et al. (2000) have written about ergonomics, gerontechnology and home environments, O’Brien et al. (1999) about home technology, and Hawthorn (2000) about the implications of ageing for the interface designer. Also, the anthropometrics of elderly people have been studied (Kothiyal & Tettey 2000), which was also done on a small scale here. It is also worth emphasising that the videophone system was studied from the perspectives of two user groups by the study team: the article of Kirvesoja et al. (1999a) deals with the medical staff as users, while paper VII considers patients as users. The use of videophone in the home care of the elderly and the disabled has also been studied by the team (Kirvesoja et al. 1999b).
Knowledge of human capabilities, physical limits, personal habits, cultural characteristics and individual preferences will be an essential part of novel product design (Lee et al. 1998). At the same time, mass customisation, personalised products, customer tastes and tailored services will become dominant in most markets. Developers and marketers should not rely on trials with young users only, because older consumers have different competencies compared to younger ones (Datan et al. 1987). The number of the elderly is increasing and they will have more purchasing power. More attention needs to be paid to individual user abilities during the product development process. Komatsubara (2000) pointed out recently that products with poor usability could prevent Japan from developing further as an ICT-oriented society.
Gerontechnology can be one key factor to help the elderly and the technology companies to find each other. One possibility is to try to follow the principle of ”design for all”, i.e. design for the elderly in such a way that all those who are younger and stronger with a better eyesight, hearing or manipulative skills also find the products usable. In this sense, the motto of the Centre for Applied Gerontology is very interesting: "Design for the young and you exclude the old, design for the old and you include the young" (Haigh 1993).
However, user evaluation can be affected, even more, by the clear differences in some other predictors than usability (paper VI, Fig. 1), such as familiarity with the different concepts, attitudes, and socio-economic factors. Among the elderly, the latter predictors are in favour of traditional concepts. As pointed out in paper VI, in the long run, both usability and things affecting the other predictors need to be developed. Familiarity and attitudes can be affected through, for instance, “pioneer users” and marketing. One possibility is to have nurses and social workers act as “model users” or “lead users” (cf. Deschamps & Nayak 1995). This can be recommended for corresponding future projects.
Three different types of subjective assessment were used in this study. The assessments were done:
by the user himself/herself (papers II, III, IV, V, VI and VII),
by an expert in view of the users’ abilities or needs (e.g. geriatric nurses and gerontechnologists in paper VI, patients’ escorts in paper VII, researchers as pilot users in paper V), and
as specific observation-based professional assessments (paper II).
Although the papers mainly measured subjective psychological preference, they were based on objective participation and perception in real trials. The following objective measurements were made:
observation by experts (papers I and II),
measurement of subjects (papers I, II, III and IV), and
measurement of product dimensions (papers I, II, III and IV).
At the beginning of this study, the aim was to concentrate on task-surface heights and the postures of the users (as in the papers I, II, III, IV). During the years of studies, however, the physical approach changed more and more towards usability engineering (Good et al. 1986, McClelland 1995, Whiteside et al. 1988), as the new projects involved more or less ICT (papers VI and VII). Methods were applied and developed to help in product development, and they were aimed to comprise more cognitive factors but still include also the physical factors of usability. Usability engineering is a process grounded in classical engineering, which amounts to specifying, quantitatively and in advance, what characteristics and to what extent the final product to be engineered is to have. This process is followed by actual construction of the product and a demonstration that it does indeed have the planned-for characteristics (Good et al. 1986). Without measurable usability specifications, there is no way to determine the usability needs of a product or to measure whether or not the finished product fulfils those needs. With the methods used in the present studies, usability was attained in a measurable form, and this study can hence be considered one dealing with usability engineering.
Shackel (1986) has proposed the following five fundamental features of design for usability, and similar emphases for user-centred design have been further developed by Pheasant (1996):
user-centred design: focused from the start on users and tasks,
participative design: users as members of the design team,
experimental design: formal user tests of usability in pilot trials, simulations and full prototype evaluations (measures of the performance and the subjective reactions of the users),
iterative design: - design, test & measure, and redesign as a regular cycle, until the results satisfy the usability specification,
user-supportive design: training, selection (when appropriate) manuals, quick reference cards, aid to ’local experts’, ’help’ systems.
This study has concentrated on the first three of these features: user-centred design, participative design and experimental design. No actual iterative design was done. However, iterative design can also be supported by the present methods. The measures taken are calculated as the value of the products, and the trials can be repeated with the new versions and the results compared. Once the values have been derived, goal attainment can be evaluated. However, usability is not the only aspect of product quality. Other important characteristics of the product include reliability, aesthetics and cost (de Vries et al. 1994). The effort spent on testing usability should be in proportion to the relative importance of the usability of the product concerned.
Pahl and Beitz (1988) conclude that, when preparing a detailed specification, it is essential to state whether the individual items are demands or wishes. One should select alternatives that meet at least all the requirements, i.e. the ”must” things. After that, the ”wish” things can be taken into account by weighting and rating the variants. The must things are related to performance, the wishes to preference. In this study, the wishes were made into must things. The adverse consequences (probability and seriousness) should be also considered in the evaluation of the variants.
With user trials, both performance and preference can be attained. Performance can be easily measured under controlled conditions, with the expert acting as an observer. The outside factors can be eliminated when using a home simulator. The user preference can also be attained for the subjective assessment of the product or system to be evaluated. All of the present substudies were based on user trials of some kind. Most subjective parts of EEE procedures are like extensions of fitting trials (Pheasant 1996, cf. chapter 6.5) or thermal sensation trials (Helander 1995).
The experiments described in the papers II and VI used expert evaluation in connection with user evaluation. In the task-surface experiment (paper II), the expert made an evaluation of the elderly subjects’ performance of different tasks at different heights. The users were also asked to rate the task-surfaces used. In the telephone experiment (paper VI), the aim was to find out if experts, i.e. geriatric nurses and gerontechnology researchers, can evaluate the needs of the elderly. Expert knowledge was also used in the chair experiment (paper V) to get to know the objective tree for a good chair. Experts created the criteria, but the actual weighting was done by users. Experts were also used as pilot subjects to evaluate the different chairs for paired comparison according the criteria. However, in some of the experiments, users were measured for anthropometrics (papers I and II). In that case, the user is no longer a “measuring device”, i.e. the user does not rate, rank or compare in pairs.
Observable human action derives its meaning from the context in which it occurs. The laboratory context is not the users’ natural work context. However, good operational fidelity (Sanders & McCormick 1993) was achieved within the laboratory context in this study. The essential features of the operational environment could be replicated in the testing environment. When the tests were accomplished in laboratory settings, the actual circumstances of use were simulated as far as possible. For example, the microwave oven work surfaces at different heights were derived by finding out their location at the homes of the elderly, the wall telephones were mounted on the wall, the sauna benches in the home simulator were constructed to be similar to the sauna benches that were available on the market.
Without consideration of the context of use, it is impossible to identify factors, including equipment design, that may contribute to error (Bogner 1998). When using a systems approach in examining errors in health care, the contextual factors, such as workload, stress, personal experience levels, and the physical, political, and psychosocial environments, should also be considered. Especially the videophone experiment (paper VII), which focused on the patients visiting a health care centre, is a good example of contextual research. A video consultation system in telemedicine was actually used in the health care organisation. The patients did not need to be involved in artificial situations. They were real patients and evaluated the system according to their opinions in the real context of use.
Real subjects were used in all of the present studies. Actual ageing people were doing real tasks. In the social context, we can mention field contacts during usability trials, discussions with subjects in the field and visits to a hobby and service centre for the elderly (paper VI). Some companies were involved in this study. For example, the phones (paper VI) were obtained from a telephone company and the video consultation equipment (paper VII) from a videophone company. Both organisational (health care centre in paper VII) and individual participation was involved.
In this study, a special mock-up simulator was constructed to simulate more closely the ”home context” (de Vries et al. 1994) in a usability laboratory used for observing the actions and movements of the elderly under controlled, adjustable conditions (paper II). The benefits of the simulator used in the task-surface experiment (paper II) were pointed out in another article by Väyrynen et al. (1996):
The simulator is an adjustable place for user trials.
It can be used as a usability laboratory and in other ways for usability studies.
It allows utilisation of both static and dynamic rapid prototyping (e.g. Cushman & Rosenberg 1991).
An experimental basis can be found for a systematic design method, e.g. for requirement lists and use-value analysis.
Evaluation techniques, such as conjoint analysis, can be used in a more realistic context (paper IV). Better operational fidelity can be achieved.
In order to involve users in a design process, the design ideas need to be expressed in a relevant and understandable form. This “communication” (Ulrich & Eppinger 2000) between the expert and the subject about a concept or product is very important. Especially for elderly subjects, the need for concreteness is evident. All of the study papers used some sort of concreteness, as introduced in chapter 4.2, with a successful outcome. The real work or daily living tasks done in a simulator with concrete products can be considered to represent good communication.
The user interface can be considered to be an essential part of product and subject communication. Evaluation of the effectiveness of a product from a user’s point of view means evaluation of the design of the user interface (McClelland 1995). Säde (1996) has defined user interface as follows: ”There are two sides in a product: the part that the user uses, gives input and gets feedback, and the part to which the user does not have access and which he or she does not have to think of. The assessable side of a product is called the user interface”. All the trials of this study can be considered to involve some sort of user interface, but not all are so evident as a telemedicine system or a telephone, though the more physical items, such as steps and furniture, were also considered user interfaces.
In usability evaluation, the number of test subjects is a major concern. With an excessively large sample, product cost and development time may increase. With too small samples, researchers might fail to detect problems that, uncorrected, would reduce the usability of the product. According to Virzi (1992), (1) 80% of usability problems are detected with four or five subjects, (2) additional subjects are less and less likely to yield new information, and (3) the more severe problems are likely to be detected by the first few subjects. A separate study by Lewis (1994) clearly supports the second claim, partially supports the first, but fails to support the third. The number of subjects used in the experiments was greater than needed for usability assessments, but too small for descriptive methods and evaluative methods.
Conjoint analysis is good in segmentation. Conjoint analysis can be applied to individuals, market segments, or entire markets (Koljonen 1999). At the individual level, for example, a sample size of one would be sufficient to predict which new home a potential home buyer would prefer. On the other hand, Curry (1997) writes that, for market segments, conjoint results tend to stabilise after about 30-50 respondents. In many studies, good results have been obtained with about 30 people (Liukko 1994). In the present conjoint experiments, 41 subjects were involved in paper IV and 30 subjects in paper III. The number of subjects can be considered adequate. In the paired comparison reported in the papers IV and V, the number of subjects was 41, which is much more than considered adequate for evaluative research by Torgenson (1958) and Mitchell (1992).
In the present study, the subjects can be considered to fit the user profile and, in that respect, the conclusions are valid. The elderly subjects were chosen by a multidisciplinary team (papers I and II). The subjects for paper VI were chosen randomly from the population. The patients visiting the health care centre (paper VII) were all given a questionnaire, though only one third returned it. If the subject does not fit the intended user profile, the opinions and preferences may not accurately reflect the real situation and the conclusions based on the data may not be valid (Cushman & Rosenberg 1991). Attitude measures may be distorted by biasing factors, such as the ”halo effect”, acquiescence and cognitive dissonance (Rubinstein & Hersh 1984 in Cushman & Rosenberg 1991). Subjects’ preferences are also affected by events in the recent past. Some guidelines for the selection of users are presented by, for example, McClelland (1995).
Traditionally, a product is designed based on the designer’s experience and an inspiration of artistic work, and the final decision is made based on the designer’s intuitive and subjective feelings (Hsiao 1998). Though intuition has led to a large number of good and even excellent solutions, a purely intuitive approach has the following disadvantages (Pahl & Beitz 1988):
the right idea rarely comes at the right moment, since it cannot be elicited at will,
the result depends markedly on individual talent and experience, and
there is the risk that the solutions are delimited by one’s special training and experience.
Both experts and users are needed to make the evaluation more accurate and besides that more methods should be used simultaneously. If the user and professional designers work together, the different perspectives can add to the effectiveness of the design process. In the present studies, the subjects were acting both as subjects and as evaluators (cf. chapter 2.1). For the elderly, it is easier to be an evaluator than a designer, because there is no need to imagine the wanted product, as the knowledge of the real needs is already available in some form. The user, as an evaluator, also helps to improve the process of decision-making by discovering design errors soon after they have been made (Lanning 1991).
Deschamps and Nayak (1995) have mentioned five obstacles to finding out what the customer wants: customers (1) all want different things, (2) do not know what they want or need, (3) do not always buy what they need, (4) do not always buy what they or others think they want, and (5) keep upgrading their expectations.
First, because customers all want different things, there is a need for universal design with can be adjusted for individuals. Even ergonomics-for-one may be profitable. Second, customers or users may wish to include options and functionality in the product that have no importance for the actual tasks. In addition, they may exclude some very important basic issues. Collection of task information can be used to reveal even these tacit needs and to get rid of unnecessary functionality. This provides at least some means for weighting different needs to orient the development work. Third, customers are not always users of the product. Special buying staff or the management may decide about the acquisition of a product or a system. This means that the real end-user requirements expressed to developers may be strongly biased. In order to create successful tools, both the buyers’ and end-users’ points of view need to be examined (Nieminen 1996).
Galer and Page (1996) found only an inconclusive relationship between what people perceive and what they actually do. People prefer, and thus use, products that they find to satisfy their needs, even though the products may not be optimal judged by traditional ergonomic criteria (Meyer & Seagull 1996). It is not wise to ask merely about the subjective preferences of users. Studying target markets with traditional methods, such as questionnaires and surveys or even focus groups, is often inadequate, since users are not very good at explicitly stating what they need with respect to a technology which does not exists yet (Kotler 1997). User trials are therefore needed to allow the subjects to experiment with a prototype of the product, as are also expert evaluations of products in use.
One of the key challenges of the present study was communication with the subjects. Product alternatives – especially their concepts – are difficult to represent realistically enough to the users for a usability evaluation. Many communication approaches were found to be applicable to elderly people"s user trials. They are therefore believed to be feasible for other user segments as well, and some solutions to this acute problem in design research could be given.
The methods used in the study were an appropriate basis for the EEE procedures formulated here. The methods were simple and easy to use. Different methods appropriate in different contexts were utilised. The simplest method, i.e. ranking, does not give any detailed knowledge about the differences. Although ranking is basically simple, proper communication of concepts is important and often difficult in connection with ranking, too. The alternatives can be almost equally good. Ranking merely indicates which of the products is the best, which is the second best, etc.
Rating also gives only a rough estimation of the order of the products/issues that are evaluated. However, average can be calculated from the ratings. Besides the rating scales used in this study, the study team have also used a 5-point rating scale. Hospital workers rated each of six implementation criteria of the telemedicine system on a scale from not at all important (1) to extremely important (5) (Kirvesoja et al. 1999a). In the same study, Mitchell"s method was also used to assess how the criteria were weighted. Both methods pointed out that training was the most important criterion in implementation.
The more complicated methods, conjoint and Mitchell analyses, are multivariate methods, as is also the use-value analysis applied in the multi-criteria method (paper V). Multivariate methods are especially useful for capturing and interpreting the complex data generated by human subjects. Not only do these techniques improve the quality of data interpretation, but they can frequently also reduce the amount of data that needs to be collected, which, in turn, reduces the time and money needed for data collection as well as the respondents’ fatigue – leading to a better quality of data (Miller et al. 1996).
Mitchell and conjoint analyses are optimal in different phases of product development (as already discussed in paper IV). If a new product is developed and its features are under elaboration as far as user needs and preferences are concerned, conjoint analysis is appropriate. If the product has already been specified and conceptualised, the conjoint method is no longer applicable. In conjoint analysis, the evaluation is done as a whole, as also in ranking and rating. Paired comparison by Mitchell and use-value analysis divide the product into features and the weighting is done according to the features, after which calculations can be done and values derived for products. Conjoint analysis (papers III and IV) utilises the same background theory of experimental design as the famous Taguchi approach in TQM (Logothetis 1992).
The number of design alternatives is not so constrained with the multi-criteria procedure (paper V) as with the Mitchell method. One product at a time is evaluated but not compared to the other products. Later, a new alternative or developed version can be evaluated similarly to get a value for the design, which is comparable to the earlier evaluation, provided that the evaluation is done by the same people.
The combination of different methods can be recommended. For example, if many products need to be evaluated, ranking can be used to reduce the number of product alternatives, and the best variants can be then evaluated in more detail using, for example, Mitchell’s method. Taylor et al. (1995) developed a simple decision aid which utilises both the positive and negative data gathered during a comparative test – Residual Acceptability Index (RAI). The RAI index is intended for use with data gathered with rating scales or from ranking exercises. The RAI score may be used as a weighting factor for design solutions, enabling design solutions to be ranked in terms of user preferences. The RAI technique may also be used to analyse the strengths and weaknesses in designs, which might otherwise go unnoticed.
Variants with a high rating but definite weak points (unbalanced value profile) may prove extremely troublesome during subsequent development. In such cases, it is very much less risky to select a variant with a slightly lower rating but a more balanced value profile. Sets of weighted criteria can be presented as value profiles, which easily indicate the weak points, as illustrated in Fig. 17. The lengths of the bars correspond to the values and their thicknesses to the weightings. The areas of the bars then indicate the weighted sub-values and the crosshatched area the overall weighted values of the solution variants. The weaknesses of the favourite variants can often be eliminated by a transfer of better sub-solutions from other variants.