Software Projects

Google Goggles isn’t dead

Posted in Uncategorized by rmt on February 4, 2017

Google Goggles is an app that first appeared around 2012 and was ahead of its time, yet it was too unfocused to gain widespread user acceptance despite getting over 10 million downloads. Marketed as a generalized way to digitally identify and make sense of anything that you could point your camera at, Google Goggles left many users confused as to what exactly to do with it. Now, several years later, the app has been removed from the iOS app store, and the Android version hasn’t been updated in 2-1/2 years. But despite the fading relevance of the app itself, the technologies that made up the Google Goggles app are very much alive, having been chopped up and redistributed among Google’s other offerings.

General object identification in images: Cloud Vision API

The Google Cloud Vision API recognizes things like landmarks, artwork, and products. The Goggles app collected data to train the cloud vision API models in the same way that the 1-800-GOOG-411 telephone directory assistance service collected voice data to train Google’s speech recognition models. For example, data is gathered in part by Goggles’ “Search from camera” mode that vacuums up all the photos you take with your phone camera. Now the resulting object recognition capability is available for a fee from the Cloud Vision API, where Google continues to gather data and improve its models. Image-based search is also available through the Google search engine and a shortcut in the Chrome web browser.

On-device frame-based processing, text detection and OCR, and barcode scanning: Mobile Vision API

Google’s Mobile Vision API is a separate system from the Cloud Vision API that works offline – that is, it can run on a phone when there’s no network connection available because it functions as a part of Google Play Services. It provides optical character recognition that’s based on Tesseract and currently works for languages with Latin-based alphabets. The Mobile Vision API also provides the same optical flow-based object tracking that the Goggles app used. The Mobile Vision API is extensible and developer-friendly too. If a developer wants to implement a custom image processing system such as, say, overlaying graphics onto faces like the Snapchat dog face filter, they can do that with the Mobile Vision API.

Translation of text visible from the camera: Google Translate

The text translation features originally available in Goggles have been superseded by those now available in the Google Translate app. Google Translate provides a better interface than Goggles because the language is already selected by the user, eliminating the need for identifying the language of printed text in a given image, thereby removing a potential source of error. Further, Google Translate’s on-device image processing allows for fast OCR that enables a quick translation at the per-word (but not per-sentence) level.

Exploring your world with a camera: Google Cardboard VR headset

The idea of parsing and augmenting what you see using image processing is incorporated into Google Cardboard. The design is such that it allows the wearer to get input from the camera as they wear the headset. We may start to see Street View and dashcam-gathered data integrated into this type of an augmented reality system.

New hardware-software integration: Pixel handset

Google Goggles uses an image blur detection algorithm to determine when the device camera is out of focus, triggering the camera autofocus cycle in response, and thereby setting up the camera input for optimal scanning. A similar integration of software and hardware is used in the accelerometer-based camera stabilization incorporated into the Pixel’s high-end camera, which provides a smooth and fast camera input even when the user has shaky hands.

Future capabilities

When the Goggles app was split apart, its on-device capabilities ended up in the Mobile Vision API, and its cloud-based capabilities ended up in the Cloud Vision API. Given the current trends, the Google Mobile Vision API is best positioned to reveal important capabilities over the coming years. As device CPU speeds increase and multicore handsets proliferate, more and more powerful image processing will be able to run on the device itself, without incurring the slowdown required to transmit images to a cloud-based API. Video input, as an alternative to one-frame-at-a-time image processing, will become more achievable. Developers will have the flexibility to create a variety of apps around Google’s models through their APIs. Users will be able to make sense of image-based data with more speed and clarity. We’ll see more apps doing something like serving as a generalized scanner for all camera input, recognizing objects of all types. The incremental improvements spearheaded by Google will continue to power new apps as the company turns its old experiments into new APIs and products.

Parallel Machine Translations as an Aid to Human Translators

Posted in Uncategorized by rmt on October 18, 2015

Anyone who’s used machine translation tools like Google Translate knows that machine translation is an inexact science. Mistakes and mistranslations are common, and the accuracy of machine translations seems to range from “acceptable” to “completely wrong”. Clearly, there is an opportunity for new approaches to help people get better results from machine translation.

Translation inaccuracies arise due to imperfect language models and unclear input text. Machine translation systems tend to have problems when translating text between languages that are very linguistically different from one another. The format of the text makes a huge difference too: Input text that is short or idiomatic similarly leads to inaccuracies.

Even with these inaccuracies in translation, the resulting imperfect translations can leave the user with the gist of the original underlying meaning, or at least a hint or a starting point for a better translation. Human translators and those with some proficiency in the target language will often use more than one system– both Google and Microsoft, for example–to help them compare results and choose between the translations.

Below is a link to an Android app I’ve developed that performs machine translations in parallel using different systems to make this comparison easier. The app shows results on the screen at the same time from different machine translation systems: Google Translate, Microsoft Translator, and Yandex Translate. I’d be interested in getting feedback on whether this approach is helpful for mobile users.

Android app on Google Play

Receiving OCR Progress Updates when Using Tesseract on Android

Posted in Uncategorized by rmt on December 17, 2014

The running time required to perform optical character recognition is influenced by the size of the image and the language of the text being recognized. When running OCR on more than a small block of text, or when using a language with many characters (like Chinese), the time delay required to perform OCR can be an annoyance to users.

To provide a better user experience, I’ve added some code written by Renard Wellnitz for Text Fairy to provide a progress callback method to the tess-two API. Objects implementing this method will receive updates during OCR with the percent complete and coordinates of the bounding box around the word that the OCR engine is currently working on.

The progress percentage can be used in a thermometer-style ProgressBar. The bounding boxes can be drawn on top of the display of the input image during recognition.

Implementing this callback requires using an alternate constructor for the TessBaseAPI object and implementation of the ProgressNotifier interface:

ProgressBar progressBar = (ProgressBar) findViewById(;

// Create the TessBaseAPI object, and register to receive OCR progress updates
TessBaseAPI baseApi = new TessBaseAPI(this);

public void onProgressValues(ProgressValues progressValues) {

Building an Apertium Standalone Language Pair Translation Jar Package for Android on Ubuntu

Posted in Uncategorized by rmt on June 16, 2013

This is a procedure for creating standalone packages that can be bundled with Android apps for supporting in-app language translation while offline–that is, without a cellular or wifi data connection.

Requires: Android SDK

Install required packages:
sudo apt-get install subversion libxml2-dev xsltproc flex libpcre3-dev gawk libxml2-utils

Get Apertium repository code:
svn co apertium

Compile and install lttoolbox:
cd apertium/trunk/lttoolbox
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./
sudo make install
sudo ldconfig

Compile and install apertium:
cd ../apertium
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./
sudo make install
sudo ldconfig

Compile and install lttoolbox-java:
cd ../lttoolbox-java
sudo make install

Compile a language pair (for example, English-Spanish):
(Android-related note: If you see ‘you don’t have cg-proc installed’ then this pair requires the constraint grammar
package, so this pair is not Android compatible.)
cd ../apertium-en-es
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./
sudo make install
echo 'test' | apertium en-es

Create a symbolic link in the former location of Android’s ‘dx’:
cd /home/$USER/android-sdk-linux/platform-tools
ln -s ../build-tools/17.0.0/dx dx

Compile the standalone package for a language pair (for example, English-Spanish):
cd apertium/trunk/lttoolbox-java
export LTTOOLBOX_JAVA_PATH='/usr/local/share/apertium/lttoolbox.jar'
export ANDROID_SDK_PATH='/home/$USER/android-sdk-linux'
./apertium-pack-j /usr/local/share/apertium/modes/en-es.mode /usr/local/share/apertium/modes/es-en.mode

At this point apertium-en-es.jar has been created in apertium/trunk/lttoolbox-java.


Install language pairs:
Build standalone language pair packages:

Looking For Words With an Edit Distance of 1 or 2 From Other Words

Posted in Uncategorized by rmt on November 3, 2012

This code is a modification of Peter Norvig’s spelling corrector that adds the closest_nearby_word() method, which identifies the most-frequently-seen correctly-spelled word that has an edit distance of 1 or 2 from the given correctly-spelled word.

# -*- coding: utf-8 -*-

An altered version of Peter Norvig's spelling corrector

from collections import Counter
import re

# Get the whitespace-delimited words from a text, minus any punctuation
def words(text): return re.findall('[a-z]+', text.lower())

# Count the frequency with which each word occurs
def train(features):
 model = Counter()
 for f in features:
 model[f] += 1
 return model

# Run training using a book with words we'll consider to be spelled correctly
NWORDS = train(words(file('big.txt').read()))

# Get strings with an edit distance of 1 from the given word
def edits1(word):
 alphabet = 'abcdefghijklmnopqrstuvwxyz'
 splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
 deletes = [a + b[1:] for a, b in splits if b]
 transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
 replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
 inserts = [a + c + b for a, b in splits for c in alphabet]
 return set(deletes + transposes + replaces + inserts)

# Get strings with an edit distance of 2 from the given word
def known_edits2(word):
 return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

# Get any known words matching the given mutated words
def known(words):
 return set(w for w in words if w in NWORDS)

# Suggest a correction by mutating a word and choosing the most likely replacement
# based on how often the mutated words appear in the trained model NWORDS
def correct(word):
 candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
 print candidates
 return max(candidates, key=NWORDS.get)

# Get the likeliest correctly-spelled word
def closest_nearby_word(word):
 nearby = set()
 for e in known(edits1(word) or known_edits2(word)):
 if (e != word):
 if not nearby: return set()
 return max(nearby, key=NWORDS.get)

#print correct('speling')

# Run some test cases for finding "nearby" words
for w in frozenset(['rational', 'woman', 'rogue', 'effect', 'started', 'rein',
 'scalded', 'mislead', 'reality', 'whit', 'marshal', 'voila',
 'aide', 'tiered', 'county', 'fires', 'stated', 'soldier',
 'beset', 'affect', 'vice', 'wreck', 'spayed', 'complimentary',
 'their', 'principal', 'moral', 'especially', 'steal',
 'personal', 'why', 'heroine', 'descendant', 'baited',
 'interested', 'sole', 'think', 'physics', 'corps', 'discrete']):
 print w, "-", closest_nearby_word(w)


With a better training corpus, this method could possibly be used to identify misspelled words that are overlooked by most spell checkers. But better starting points are available.

think - thing
corps - crops
stated - states
baited - waited
aide - side
beset - best
fires - fire
scalded - scolded
moral - morel
whit - what
principal - principals
wreck - wrack
personal - set([])
heroine - heroin
reality - set([])
their - theirs
interested - set([])
voila - set([])
woman - women
rational - national
started - stated
sole - some
effect - effects
rogue - vogue
affect - effect
why - who
descendant - descendants
county - count
spayed - stayed
especially - specially
vice - voice
physics - set([])
discrete - discreet
tiered - tired
mislead - misled
soldier - soldiers
rein - vein
complimentary - complementary
steal - steel
marshal - marshall

A Continuous Floating-Point Adaptation of Conway’s Game of Life

Posted in Uncategorized by rmt on October 13, 2012

This video shows a continuous, floating-point adaptation of Conway’s Game of Life.

I rendered this video by running Ready using the SmoothLifeL parameter set. World size is 2048×2048. Stephan Rafler’s paper explains the math behind the video. Rendering took eight hours on my video card.

Frames were converted to video using:

ffmpeg -s hd1080 -r 30 -b 9600 -i frame_%06d.png video.mp4

Running the Apertium Translator on Android: Offline Machine Translation on a Mobile Device

Posted in Uncategorized by rmt on September 26, 2012

A few days ago I put together an Android app that runs the Apertium open source rule-based translation system.

For his excellent 2012 Google Summer of Code project, Mikel Artexte has converted several of the Apertium language pairs into standalone, self-contained Jar files that have no outside dependencies. The end result is a set of cool little embeddable modules that do real language translation on mobile devices without requiring a network connection at all.

One of the challenges I ran into while making this app was that the default Proguard settings cause the translation system to crash at runtime. To avoid this, I tried a whole bunch of settings and finally found a set of Proguard directives that preserve the Apertium offline translation, avoiding the crashing:

# Preserve Apertium offline translation
-dontwarn java.awt.**
-dontwarn javax.swing.**
-dontwarn org.apertium.ApertiumGUI
-keepattributes InnerClasses
-keep class org.apertium.**
-keepclassmembers class * {
    public *; protected *; private *;

Together, these settings preserve the translation-related classes packaged in the Apertium Jar files, saving them from Proguard’s obfuscation process that leads to crashing. These Proguard settings can be added on top of the default Android Proguard optimizations by adding the following line to the file:


With these settings in place, the translations work correctly, with the optimized translation results being returned after about 5000ms for a short phrase on my phone.

Overall, I’m pleased with the Android offline translation app results, particularly because these translations work without requiring a network connection. Another advantage is that this system can do translations for a few languages that are currently unsupported by Google Translate and Bing Translator.

UPDATE (Jan. 1, 2013)
A new version of the offline translation app is available.

Counting Word Frequency with Python

Posted in Uncategorized by rmt on September 26, 2012

A Python script to crudely tally the number of times a space-delimited word appears in a text file. Uses a Counter object to track the number of occurrences for each word.

# -*- coding: utf-8 -*-
Created on Wed Sep 26 20:04:11 2012

Returns the most-common space-delimited words in a file.

@author: robert
from collections import Counter
import re

def openfile(filename):
    fh = open(filename, "r+")
    str =
    return str

def removegarbage(str):
    # Replace one or more non-word (non-alphanumeric) chars with a space
    str = re.sub(r'\W+', ' ', str)
    str = str.lower()
    return str

def getwordbins(words):
    cnt = Counter()
    for word in words:
        cnt[word] += 1
    return cnt

def main(filename, topwords):
    txt = openfile(filename)
    txt = removegarbage(txt)
    words = txt.split(' ')
    bins = getwordbins(words)
    for key, value in bins.most_common(topwords):
        print key,value

main('speech.txt', 500)

Example output:

the 235
and 161
to 132
of 125
that 101
a 91
in 83
we 70
is 54
our 40
who 39
for 39
people 37
not 36
are 32
on 29
it 28
be 28
their 27
must 26
have 25
those 25
will 25
with 25
as 22
world 21
this 19
all 18
america 18
because 18
from 17
they 17
i 17
an 17


Using an SSH config file to git-push to use the right SSH key for multiple domains

Posted in Uncategorized by rmt on August 27, 2012

Use a file called config in your .ssh directory to direct ssh/git to use the correct ssh public key, depending on which domain you connect to.

An example that uses the correct SSH key when using git push to save code to either GitHub or Heroku:

  IdentityFile C:\Users\Robert\Documents\keystore\github-id_rsa
  IdentityFile C:\Users\Robert\Documents\keystore\heroku-pc2-id_rsa

Removing a package in Gentoo Linux

Posted in Uncategorized by rmt on August 2, 2012

Ensure there are no reverse dependencies:

equery depends packagename

Unmerge the package, leaving config files in place:

emerge --unmerge --pretend packagename

emerge --unmerge packagename