*Ockey, G. J. (2009). Developments and Challenges in the Use of
Computer-Based Testing for Assessing Second Language Ability.
Modern Language Journal, 93(Focus Issue), 836-847.
*Cummins, P. W. & Davesne, C. L. (2009). Using Electronic Portfolios
for Second Language Assessment. Modern Language Journal, 93(Focus
Issue), 848-867.
Ockey argues that Computer based testing has failed to realize its anticipated potential. Describe and discuss on his reasons for his view, and tell why you either agree or disagree with him.
Cummins & Davesne offer an alternative to CBT with electronic portfolios. Comment on some of the ideas from this article that you'd be interested in trying out in your own classroom.
Ockey states that computer-based testing (CBT) has failed to realize its anticipated potential due to a number of factors. Issues that have played an important role involve challenges and limitations with forms of delivery, computer-adaptive testing, test security, and language skills assessment.
One major concern for CBT is ensuring test security. For large-scale, high-stakes testing, it’s difficult and expensive to develop large enough item banks to ensure test security. Web-based testing (WBT) makes assessments possible from any computer in the world, but creates challenges for confirming test taker identity. The storage and retrieval of assessment instruments are also vulnerable to breeches in security from computer hackers. This is becoming less of a concern though with continued improvements in internet technology. Another area of controversy concerns the assessment of language skills and test/task formatting. Language ability is divided into four skill sets: reading, listening (both receptive skills), writing and speaking (productive skills). Initially, test developers focused on assessing these language skills independently. More recently, however, CBT has started integrating one or more productive/receptive skills. Assessments have to take into account which type of tasks to use and task authenticity. Test scoring has proven to be a major challenge as well, especially in the assessment of writing abilities. Automated essay-scoring (AES) systems use corpus linguistics to assess text, but critics insist that although computers can assess quite accurately the mechanics of writing, they can’t interpret the feelings and meaning attached to it. So there still appears to be a place for human raters. Computer-human hybrid approaches seem to be a fair compromise.
It’s this combination of unresolved problems and related issues that have affected CBT development and Ockey’s view that CBT has failed to reach its potential. However, he remains optimistic in stating that it will be worthwhile to continue addressing these challenges in the future.
The concept of electronic portfolios is a refreshing one. There are situations in which formal testing, as outlined in Ockey’s article, is necessary and appropriate. These formal assessments offer a “snapshot” of a person’s abilities at that given moment. An electronic portfolio (EP), on the other hand, evaluates language ability and progress over time and incorporates a component of self-assessment. I think self-assessment is crucial in language learning, echoing Alderson’s view that “without self-assessment there can be no self-awareness” (p851). This self-awareness helps learners set goals and take responsibility for their own learning. I would like to incorporate this notion of self-assessment into my own classroom.
I think that an electronic portfolio says so much more about a person’s ability to function in another language than a test score alone. I thought it was interesting that the Common European Framework of Reference for Languages (CEFR) assessment scale made an additional distinction between spoken production and spoken interaction, because I think that these two contexts of “speaking” do indeed require different skills.
Electronic portfolios seem to better address concerns regarding authenticity, learning/assessment that reflects “real life” situations, and assessment of intercultural competence. I think EPs ought to be promoted and integrated into high school and college language curricula, because I see great value in this approach to language assessment. In ESL and foreign language programs, EPs positive format of “can do” assessments has the potential to encourage learner motivation and aid collaboration.
This comment has been removed by the author.
ReplyDeleteHi Diana,
ReplyDeleteGood point-- that it is only the mechanics of writing that computers are able to measure. For some reason, this is probably the issue in the article that I have been wondering most about. If it is obvious that they can only measure the mechanics --and not real commendation-- then why are computers being used to measure writing at all? Makes me think of a mechanical machine with nuts and bolts and gears. A machine can be very beautiful and wonderfully complex and impressive...and yet completely useless, if it does not work!
Your post got me thinking: I have no idea about HOW a program can actually measure an essay or spoken language --and I have absolutely no experience in this area-- but my guess is that computers might give a more valid assessment of a person's speech than of their writing. What do you think?
For example, it does not require such a great deal of training for a person to learn to spell correctly, to learn to string together a few more-or-less grammatically coherent sentences, and throw in a few low-frequency vocabulary words to impress the computer. Lots of people who have not even learned to speak or listen in the target language can do that. In some cases, you might not even be able to call that true communication.
However, if there are programs that can accurately recognize whether a person is pronouncing words or phrases clearly and coherently, I think that might be a much more valid assessment of a person's actual communicative ability (as opposed to their ability to memorize a range of vocabulary and "good" sentence patterns for essays, etc.). But I suppose you might have to call that making an inference, if you are going on the assumption that pronunciation correlates with general communicative ability. I still don’t think a computer will ever be able to measure whether a person is actually making sense or not. Maybe a person could talk complete nonsense, but still get a high score if their pronunciation is not too bad!
Hi Diana,
ReplyDeleteI see that you realized that Ockey is optimistic toward CBT because it's still used and proves its worth in many testing centers.
For EPs, I agree with you that they will be very valuable to motivate ESL students in improving their SLA.
Hi Diana, your blog is getting pretty popular, probably because you do such a good job. I like your enthusiasm for the EPs, I too feel that they offer a good alternative to the standardized test of someone's language ablility.
ReplyDeleteScott, sounds like you've given this quite some thought...I guess I didn't consider it that deeply. I definitely think computers have their limitations in assessing all aspects of writing and current technology limitations also pose some difficulty in assessing listening skills. Computers might not interpret speech correctly if the learner has a heavy accent, for example. Looking at the distinction made in the Cummins article between spoken production and spoken interaction, I think that computers would do a better job interpreting production than interaction. All in all, not a comprehensive view of a learner's language abilities. Thanks for sharing your thoughts ~ I appreciate it!
ReplyDeleteHaifa and Chris ~ thanks too for your replies! It's nice that someone reads what I write...
ReplyDeleteGreat dialogue here guys! Scott raises some great points. I am very skeptical about computers assessing speech and writing.
ReplyDelete