|
|
The aim of this lecture is to introduce you the study of Human
Computer Interaction,
so that after studying this you will be able to:
. Understand the
DECIDE evaluation framework
30.1 DECIDE: A framework to
guide evaluation
Well-planned evaluations are driven by clear goals and
appropriate questions (Basili
et al., 1994). To guide our evaluations we use the DECIDE
framework, which
provides the following checklist to help novice evaluators:
1. Determine the overall goals that the evaluation addresses.
2. Explore the specific questions to be answered.
3. Choose the evaluation paradigm and techniques to answer the
questions.
4. Identify the practical issues that must be addressed, such as
selecting participants.
5. Decide how to deal with the ethical issues.
6. Evaluate, interpret, and present the data.
Determine the goals
What are the high-level goals of the evaluation? Who wants it
and why? An
evaluation to help clarify user needs has different goals from
an evaluation to
determine the best metaphor for a conceptual design, or to
fine-tune an interface, or to
examine how technology changes working practices, or to inform
how the next
version of a product should be changed.
Goals should guide an evaluation, so determining what these
goals are is the first step
in planning an evaluation. For example, we can restate the
general goal statements
just mentioned more clearly as:
. Check that the
evaluators have understood the users’ needs.
. Identify the
metaphor on which to base the design.
. Check to ensure
that the final interface is consistent.
. Investigate the
degree to which technology influences working practices.
. Identify how the
interface of an existing product could be engineered to improve
its usability.
These goals influence the evaluation approach, that is, which
evaluation paradigm
guides the study. For example, engineering a user interface
involves a quantitative
engineering style of working in which measurements are used to
judge the quality of
the interface. Hence usability testing would be appropriate.
Exploring how children
talk together in order to see if an innovative new groupware
product would help them
to be more engaged would probably be better informed by a field
study.
280
Explore the questions
In order to make goals operational, questions that must be
answered to satisfy them
have to be identified. For example, the goal of finding out why
many customers prefer
to purchase paper airline tickets over the counter rather than
e-tickets can he broken
down into a number of relevant questions for investigation. What
are customers’
attitudes to these new tickets? Perhaps they don't trust the
system and are not sure that
they will actually get on the flight without a ticket in their
hand. Do customers have
adequate access to computers to make bookings? Are they
concerned about security?
Does this electronic system have a bad reputation? Is the user
interface to the ticketing
system so poor that they can't use it? Maybe very few people
managed to complete
the transaction.
Questions can be broken down into very specific sub-questions to
make the evaluation
even more specific. For example, what does it mean to ask, "Is
the user interface
poor?": Is the system difficult to navigate? Is the terminology
confusing because it is
inconsistent? Is response time too slow? Is the feedback
confusing or maybe
insufficient? Sub-questions can, in turn, be further decomposed
into even finergrained
questions, and so on.
Choose the evaluation paradigm and techniques
Having identified the goals and main questions, the next step is
to choose the evaluation
paradigm and techniques. As discussed in the previous section,
the evaluation
paradigm determines the kinds of techniques that are used.
Practical and ethical issues
(discussed next) must also be considered and trade-offs made.
For example, what
seems to be the most appropriate set of techniques may be too
expensive, or may take
too long, or may require equipment or expertise that is not
available, so compromises
are needed.
Identify the practical issues
There are many practical issues to consider when doing any kind
of evaluation and it
is important to identify them before starting. Some issues that
should be considered
include users, facilities and equipment, schedules and budgets,
and evaluators'
expertise. Depending on the availability of resources,
compromises may involve
adapting or substituting techniques.
Users
It goes without saying that a key aspect of an evaluation is
involving appropriate
users. For laboratory studies, users must be found and screened
to ensure that they
represent the user population to which the product is targeted.
For example, usability
tests often need to involve users with a particular level of
experience e.g., novices or
experts, or users with a range of expertise. The number of men
and women within a
particular age range, cultural diversity, educational
experience, and personality
differences may also need to be taken into account, depending on
the kind of product
being evaluated. In usability tests participants are typically
screened to ensure that
they meet some predetermined characteristic. For example, they
might be tested to
ensure that they have attained a certain skill level or fall
within a particular
demographic range. Questionnaire surveys require large numbers
of participants so
ways of identifying and reaching a representative sample of
participants are needed.
281
For field studies to be successful, an appropriate and
accessible site must be found
where the evaluator can work with the users in their natural
setting.
Another issue to consider is how the users will be involved. The
tasks used in a
laboratory study should be representative of those for which the
product is de signed.
However, there are no written rules about the length of time
that a user should be
expected to spend on an evaluation task. Ten minutes is too
short for most tasks and
two hours is a long time, but what is reasonable? Task times
will vary according to
the type of evaluation, but when tasks go on for more than 20
minutes, consider
offering breaks. It is accepted that people using computers
should stop, move around
and change their position regularly after every 20 minutes spent
at the keyboard to
avoid repetitive strain injury. Evaluators also need to put
users at ease so they are not
anxious and will perform normally. Even when users are paid to
participate, it is
important to treat them courteously. At no time should users be
treated
condescendingly or made to feel uncomfortable when they make
mistakes. Greeting
users, explaining that it is the system that is being tested and
not them, and planning
an activity to familiarize them with the system before starting
the task all help to put
users at ease.
Facilities and equipment
There are many practical issues concerned with using equipment
in an evaluation For
example, when using video you need to think about how you will
do the recording:
how many cameras and where do you put them? Some people are disturbed
by having
a camera pointed at them and will not perform normally, so how
can you avoid
making them feel uncomfortable? Spare film and batteries may
also be needed.
Schedule and budget constraints
Time and budget constraints are important considerations to keep
in mind. It might
seem ideal to have 20 users test your interface, but if you need
to pay them, then it
could get costly. Planning evaluations that can be completed on
schedule is also important,
particularly in commercial settings. There is never enough time
to do
evaluations as you would ideally like, so you have to compromise
and plan to do a
good job with the resources and time available.
Expertise
Does the evaluation team have the expertise needed to do the
evaluation? For example,
if no one has used models to evaluate systems before, then
basing an evaluation
on this approach is not sensible. It is no use planning to use
experts to review
an interface if none are available. Similarly, running usability
tests requires expertise.
Analyzing video can take many hours, so someone with appropriate
expertise and
equipment must be available to do it. If statistics are to be
used, then a statistician
should be consulted before starting the evaluation and then
again later for analysis, if
appropriate.
Decide how to deal with the ethical issues
The Association for Computing Machinery (ACM) and many other
professional organizations
provide ethical codes that they expect their members to uphold,
particularly if their activities involve other human beings. For
example. people's
privacy should be protected, which means that their name should
not be associated
282
with data collected about them or disclosed in written reports
(unless they give
permission). Personal records containing details about health,
employment, education,
financial status, and where participants live should be
confidential. Similarly, it
should not be possible to identify individuals from comments
written in reports For
example, if a focus group involves nine men and one woman, the
pronoun “she”
should not be used in the report because it will be obvious to
whom it refers
Most professional societies, universities, government and other
research offices
require researchers to provide information about activities in
which human
participants will be involved. This documentation is reviewed by
a panel and the researchers
are notified whether their plan of work, particularly the
details about how
human participants will be treated, is acceptable.
People give their time and their trust when they agree to
participate in an evaluation
study and both should be respected. But what does it mean to be
respectful to users?
What should participants be told about the evaluation? What are
participants’ rights?
Many institutions and project managers require participants to
read and sign an
informed consent. This form explains the aim of the tests or
research and promises
participants that their personal details and performance will
not be made public and
will be used only for the purpose stated. It is an agreement
between the evaluator and
the evaluation participants that helps to confirm the
professional relationship that
exists between them. If your university or organization does not
provide such a form
it is advisable to develop one, partly to protect yourself in
the unhappy event of
litigation and partly because the act of constructing it will
remind you what you
should consider.
The following guidelines will help ensure that evaluations are
done ethically and that
adequate steps to protect users' rights have been taken.
. Tell participants
the goals of the study and exactly what they should expect if
they participate. The information given to them should include
outlining the
process, the approximate amount of time the study will take, the
kind of data
that will be collected, and how that data will be analyzed. The
form of the
final report should be described and, if possible, a copy
offered to them. Any
payment offered should also be clearly stated.
. Be sure to explain
that demographic, financial, health, or other sensitive information
that users disclose or is discovered from the tests is
confidential. A
coding system should be used to record each user and, if a user
must be identified
for a follow-up interview, the code and the person's demographic
details
should be stored separately from the data. Anonymity should also
be promised
if audio and video are used.
. Make sure users
know that they are free to stop the evaluation at any time if
they feel uncomfortable with the procedure.
. Pay users when
possible because this creates a formal relationship in which
mutual commitment and responsibility are expected.
. Avoid including
quotes or descriptions that inadvertently reveal a person's
identity, as in the example mentioned above, of avoiding use of
the pronoun
"she" in the focus group. If quotes need to be reported, e.g.,
to justify conclusions,
then it is convention to replace words that would reveal the
source
with representative words, in square brackets. Ask users'
permission in
advance to quote them, promise them anonymity, and offer to show
them a
copy of the report before it is distributed.
283
The general rule to remember when doing evaluations is do unto
others only what you
would not mind being done to you.
The recent explosion in Internet and web usage has resulted in
more research on how
people use these technologies and their effects on everyday
life. Consequently, there
are many projects in which developers and researchers are
logging users' interactions,
analyzing web traffic, or examining conversations in chat rooms,
bulletin boards, or
on email. Unlike most previous evaluations in human-computer
interaction, these
studies can be done without users knowing that they are being
studied. This raises
ethical concerns, chief among which are issues of privacy,
confidentiality, informed
consent, and appropriation of others’ personal stories (Sharf,
1999). People often say
things online that they would not say face to face. Further
more, many people are
unaware that personal information they share online can be read
by someone with
technical know-how years later, even after they have deleted it
from their personal
mailbox (Erickson et aL 1999).
Evaluate, interpret, and present the data
Choosing the evaluation paradigm and techniques to answer the
questions that satisfy
the evaluation goal is an important step. So is identifying the
practical and ethical
issues to be resolved. However, decisions are also needed about
what data to
collect, how to analyze it, and how to present the findings to
the development team.
To a great extent the technique used determines the type of data
collected, but there
are still some choices. For example, should the data be treated
statistically? If
qualitative data is collected, how should it be analyzed and
represented? Some general
questions also need to be asked (Preece et al., 1994): Is the
technique reliable? Will
the approach measure what is intended, i.e., what is its
validity? Are biases creeping
in that will distort the results? Are the results generalizable,
i.e., what is their scope?
Is the evaluation ecologically valid or is the fundamental
nature of the process being
changed by studying it?
Reliability
The reliability or consistency of a technique is how well it
produces the same results
on separate occasions under the same circumstances. Different
evaluation processes
have different degrees of reliability. For example, a carefully
controlled experiment
will have high reliability. Another evaluator or researcher who
follows exactly the
same procedure should get similar results. In contrast, an
informal, unstructured
interview will have low reliability: it would be difficult if
not impossible to repeat
exactly the same discussion.
Validity
Validity is concerned with whether the evaluation technique
measures what it is
supposed to measure. This encompasses both the technique itself
and the way it is
performed. If for example, the goal of an evaluation is to find
out how users use a new
product in their homes, then it is not appropriate to plan a
laboratory experiment. An
ethnographic study in users' homes would be more appropriate. If
the goal is to find
average performance times for completing a task, then counting
only the number of
user errors would be invalid.
284
Biases
Bias occurs when the results are distorted. For example, expert
evaluators performing
a heuristic evaluation may be much more sensitive to certain
kinds of design flaws
than others. Evaluators collecting observational data may
consistently fail to notice
certain types of behavior because they do not deem them
important.
Put another way, they may selectively gather data that they
think is important.
Interviewers may unconsciously influence responses from
interviewees by their tone
of voice, their facial expressions, or the way questions are
phrased, so it is important
to be sensitive to the possibility of biases.
Scope
The scope of an evaluation study refers to how much its findings
can be generalized.
For example, some modeling techniques, like the keystroke model,
have a narrow,
precise scope. The model predicts expert, error-free behavior
so, for example, the
results cannot be used to describe novices learning to use the
system.
Ecological validity
Ecological validity concerns how the environment in which an
evaluation is
conducted influences or even distorts the results. For example,
laboratory experiments
are strongly controlled and are quite different from workplace,
home, or leisure
environments. Laboratory experiments therefore have low
ecological validity because
the results are unlikely to represent what happens in the real
world. In contrast,
ethnographic studies do not impact the environment, so they have
high ecological
validity.
Ecological validity is also affected when participants are aware
of being studied. This
is sometimes called the Hawthorne effect after a series of
experiments at the Western
Electric Company's Hawthorne factory in the US in the 1920s and
1930s. The studies
investigated changes in length of working day, heating, lighting
etc., but eventually it
was discovered that the workers were reacting positively to
being given special
treatment rather than just to the experimental conditions |
|
|
|