Thimblescan: A smarter way to read

WHAT IS thimblescan?

The availability of inexpensive devices such as Arduino and Raspberry Pi, the proliferating expertise of users with touch interactions on digital devices, and the persistence of printed information on paper in an increasingly digital environment provides an interesting opportunity to explore new interactions. This project describes the construction of a scanner constructed to fit on the index finger, in the form of a thimble, that will be used as a mini scanner for textual information. The scanner can be used to read short words, phrases, or URLs (limited text length) printed on physical paper, and translate them to an attached computer, thereby converting physical text into digital text. While scanning the user will be able to perform a few familiar touch input actions, such as single taps, double taps, and long press, to control the scanning of text and to direct different functions to be performed on the scanned text, once translated to digital text and available on the attached computer.

Our Thimblescan Prototype.

Team and Role: We were a team of two masters students of HCI. I took the lead in the team for this project and was mainly responsible for the concept idea and outlining the pipeline. In terms of implementation, I worked on the Raspberry Pi hardware interface and form factor, and communication between the Pi and the computer, whereas my team mate worked on the stitching and OCR over the obtained images.

Skills used: Interaction design, Python, Raspberry Pi prototyping, Bash Shell scripting.

Duration of Project: 3 months

Motivation

  • To enable copy-paste from the real physical world to the digital world
  • Seamless interpretation of real world data as is done with data in the virtual world
  • to explore the thimble as a wearable in it's form factor 
    • Capitalize of existing known finger interactions of tapping and touching
    • Use a conditioned "natural" interaction of using a finger to perform a query, as on an iPad or touch screen surface.

system overview

Symbolic representation of the ThimbleScan concept

Labelled representation of the actual working prototype


 

Overview of the System Architecture and Pipeline:

- ImageMagick: Image processing to remove shadows and clean captured images.

- Hugin: 3rd party suite of tools used for stitching captured images of scanned text into one linear panorama image.

- Tesseract OCR: Optical Character Recognition run over stitched linear panorama outputted by Hugin.

- InputInterpretations.py: gesture recognition for finger gestures written by us.

- gothimble.sh: master Bash script.

 

 

Demonstration of the working system

Scanning a URL from the physical world, and opening the same in a browser. The system correctly responds to this command, albeit with some delay.

Copying Text from the physical world onto a clipboard. Here we see one of the shortcomings of the system. The OCR incorrectly detects the Dollar Sign as the numeral '3'.

interactions

Inputs

  • Long press Gesture: Signals the system to expect a scanned input and start scanning text from the real world.
  • Single Tap: Copy scanned text to Clipboard
  • Double Tap: Open scanned text URL in browser, or perform a "search" on scanned text in the browser.

Outputs

  • LED feedback for image capture: When scan is running, the feedback LED glows green (this effectively turns green when a long press is registered), and when the the finger is lifted off to deactivate a scan, the feedback LED glows red. This also ensures that the user can concentrate on the hardware while scanning than expect an output feedback from the computer.
  • On screen output for each gesture: The visual output itself based on a notification for a clipboard entry being copied, or the launching of a browser on the attached computer.
 

A state diagram for the permitted interactions and their corresponding feedback

 

Applications of such a system

  • Copying details from resume's or visiting cards
  • Looking up links from textbooks
  • Query words and terms from paper
  • Make quick e-notes on a computer while reading from a paper.

Challenges and future work

  • Focal length of Camera: For our prototype we used the PiCamera hat can be connected to the Raspberry Pi. However, this had a fixed focal distance forcing us to increase the distance between the camera lens and text to be scanned. This pushed the camera higher up on the finger causing a whole new suite of problems. Our field of view became too wide and we had to custom print large font on a sheet of paper with no other words around it, for our proof of concept prototype. Ideally, we would like to mount something like the NanEye camera close to the finger tip so that the user can accurately point at the text he is interested in, while also restricting the FOV.
  • Long Runtime: As seen in the video, the runtimes for processing text is way too long for practical usage. Mostly, this is due to the Hugin stitcher algorithm that runs on all the captured images to be processed. Future work includes optimizing this to run in real time, and examining alternatives for this. 
  • Dimensionality of input: We already are using 3 input gestures, however, we could also use the free thumb to interact with the thimble as a modifier to the input, thus increasing it's dimensionality. The location of the thumb near the thimble makes it an ideal input location.
  • Limitations of OCR:  Since we are using Tessaract as a 3rd party OCR tool, we are bound by it's limitations and the sometimes wrong output it produces.

 

*** DETAILED DOCUMENTS AND ACM STYLE PAPERS ON IMPLEMENTATION AND THE SYSTEM ARE AVAILABLE ON REQUEST! ***