Digitising the New York Times
CAPTCHA stands for Completely Automated Public Turing Test To Tell Computers and Humans Apart and was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon University, who developed the CAPTCHA programme. To us mere mortals it often appears as hemetic arabic language font, so heavily distorted even humans can’t read it. However they’ve gone a step further, and whilst I may be slow on picking up on this I noticed the ‘easier’ CAPTCHA code to read looks like old type font, and sure enough it is!
CAPTCHA is a program developed by that can tell whether its user is a human or a computer. CAPTCHAs are used by many websites to prevent abuse from “bots,” or automated programs usually written to generate spam. No computer program can read distorted text as well as humans can, so bots cannot navigate sites protected by CAPTCHAs.
About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.
In an attempt to archive human knowledge digitally archive materials, multiple projects are currently digitizing physical books that were written before the computer age. The book pages are being scanned as images, and then transformed into text using “Optical Character Recognition” (OCR). Whilst images are readable by humans the text isn’t searchable and cannot be indexed, also file size is compromised as images are much larger and harder to store.
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.
I was sold by this point and thought it absolutely novel and twee, but I couldn’t help but wonder how they know what we’re entering is correct. The gimmick is, one of the words is a control word, already known and intentionally seeded back and usually from the same source as the second word, the project assumes that you have entered it correctly and saves the word after enough people have entered the same word in the same fashion and assumes it is correct with higher confidence.
The only downside to this project is that at present they’re digitizing old editions of the New York Times, which isn’t of much benefit to mankind as a whole IMHO, but such is life. If you’re REALLY bored, you can click here to answer reCAPTCHA’s just to contribute to the project.
More by von Ahn
Matchin’ is a covert experiment in artificial intelligence. Every time players agree on a picture, it’s tagged as prettier. Von Ahn, a 28-year-old professor of computer science at Carnegie Mellon, will put the game online this summer, and as thousands of people play it, his database of 100,000 photos will be imbued with something quintessentially human: an aesthetic sensibility, encoded as a ranking of attractiveness.
The game basically tricks humans into teaching computers what constitutes prettiness. If enough people play Matchin’ — and von Ahn’s previous games have garnered millions of play-hours — it could eventually rate the appeal of every image on the Internet. Google could incorporate the ratings into its search engine, so you could search specifically for “beautiful” pictures of houses, people, or landscapes.
“People are good at figuring out what’s attractive, and computers are good at quickly searching and finding,” von Ahn says. “You put them together, and bang!”
This is “human computation,” the art of using massive groups of networked human minds to solve problems that computers cannot. Ask a machine to point to a picture of a bird or pick out a particular voice in a crowd, and it usually fails. But even the most dim-witted human can do this easily. Von Ahn has realized that our normal view of the human-computer relationship can be inverted. Most of us assume computers make people smarter. He sees people as a way to make computers smarter.
Odds are you’ve already benefited from von Ahn’s work. Like when you type in one of those stretched and skewed words before getting access to a Yahoo email account or the Ticketmaster store. That’s a Captcha, which von Ahn developed in 2000 to thwart spambots. Or there’s von Ahn’s picture-labeling games, which have lured thousands of bored Web surfers into tagging 300,000 photos online — doing it so effectively that Google bought his idea last year to improve its Image Search engine.
Above excerpt from Wired Magazine (16.07) For Certain Tasks, the Cortex Still Beats the CPU by Clive Thompson
Categories: general, news, technology
Tags: digitisation, news, photography, technology
Comments: No Comments.






















