Captcha-ing More Than You Know

CAPTCHAs, we all know them, we’ve all used them. They’re for security right? To tell us from the robots? You’d be right to think that they are in fact a security feature used on hundreds of thousands of websites, or at least this was their initial purpose.

an example of a CAPTCHA

Standing for Completely Automated Public Turing test to tell Computers and Humans Apart, CAPTCHA’s came to life in 2000 and were the brainchild of Carnegie Mellon University graduate, Luis von Ahn. Taking the system a step further, the project was renamed reCAPTCHA when it was redesigned to add further layers of distortion on top of the text to beat the hackers. But that wasn’t all, the next development was the true genius behind the system. The proof? Google bought it.

Google Acquires reCAPTCHA

Luis realised that every day in excess of 100 million CAPTCHAs were being completed, providing the opportunity to use words tagged as unreadable in the digitising of books and other printed materials as CAPTCHA phrases. In doing this, Luis has displayed an excellent example of a combination of O’Reilly’s first and second core patterns of Web 2.0, Harnessing Collective Intelligence, as discussed last week, and the idea that ‘Data is the next Intel Inside’.

This concept refers to the increased importance and reliance on data in Web 2.0 applications. The reCAPTCHA system exhibits this by fulfilling 4 of the 5 best practices of the core pattern.

Seek to own a unique, hard to recreate source of data: Luis von Ahn did just this in the creation of reCAPTCHA, as evidenced though Google’s decision to purchase it from him. Seeing the value and complexity of the system, Google capitalised on the opportunities created through the acquisition.

Enhance the core data:  Under the reCAPTCHA project, the data was enhanced and re-purposed to provide the dual benefit of security, along with the deciphering of printed texts.

Let users control their own data: Prior to acquisition by Google, Luis had created the opportunity for users to submit ‘unreadable’ words to the system for deciphering, however, it is unclear whether Google keeps this privilege exclusively for themselves.

Define a data strategy: There is a very clear data strategy being used for reCAPTCHA. Providing a security system to ensure only humans are accessing or posting data, as well as double purposing and deciphering texts tagged as unreadable by current text recognition software.

Google really is CAPTCHA-ing more than we thought.


Thanks for reading and please feel free to leave feedback!


– Matt

Tags: , , , , , ,

6 responses to “Captcha-ing More Than You Know”

  1. ngjerfen says :

    Hi Matt,

    Very very interesting and clear post. I wouldnt have known that this form of data application was owned by google! I never knew this would be a great form of app for ‘data is the next intel inside’. Is there any disadvantages of using this reCaptcha?


    • Matt08H says :

      Hi Jerfen,
      While there are alternative security measures available, the reCAPTCHA system is a stable security system. The only downside I can see is that once the unreadable phrases are inserted into CAPTCHAs, the strength of the security is arguably halved as instead of two phrases to be matched, there is now only one alongside the unreadable text. Thanks for the feedback!

      – Matt

  2. monique says :

    Hey Matt,
    This is really interesting – never knew that a company “owned” (re)CAPTCHA, let alone used them to decipher unreadable text. A very clever idea and no wonder Google scooped it up. Wondering how the system knows that the word the user entered was correct, if it was never known to being with? Or is that part of the CAPTCHA not actually used for security..

    • Matt08H says :

      Hi Monique,
      Exactly my thoughts! The system uses two phrases in the CAPTCHA, one which is known, providing the security, while the other may be unreadable text. Therefore, if the user is human and can read the known text, then the users response can be used for the unreadable text. Of course, human error is still an issue, therefore, the same image of unreadable text will be used in multiple CAPTCHAs to ensure accuracy.

      – Matt

  3. Alicia says :

    Although I love the idea of reCAPTCHA in principle the reality is less than amazing! I swear they get more indecipherable every day or, might just be my eyesight going down the pan! I have even started using CAPTCHA bypass software called rumola to take the frustration out of blogging because, when you hit your 10th CAPTCHA of the morning its a bit much! If anyone else has similar issues I would recommend going to and having a go with the free trial. Definitely decreased the likelihood of me chucking my laptop out of the window with rage!

    • Matt08H says :

      Hi Alicia,

      I agree, sometimes they can be a bit difficult to decipher, but I guess that hassle is all part and parcel with security. Thanks for your feedback!

      – Matt

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: