Software Projects

Parallel Machine Translations as an Aid to Human Translators

Posted in Uncategorized by rmt on October 18, 2015

Anyone who’s used machine translation tools like Google Translate knows that machine translation is an inexact science. Mistakes and mistranslations are common, and the accuracy of machine translations seems to range from “acceptable” to “completely wrong”. Clearly, there is an opportunity for new approaches to help people get better results from machine translation.

Translation inaccuracies arise due to imperfect language models and unclear input text. Machine translation systems tend to have problems when translating text between languages that are very linguistically different from one another. The format of the text makes a huge difference too: Input text that is short or idiomatic similarly leads to inaccuracies.

Even with these inaccuracies in translation, the resulting imperfect translations can leave the user with the gist of the original underlying meaning, or at least a hint or a starting point for a better translation. Human translators and those with some proficiency in the target language will often use more than one system– both Google and Microsoft, for example–to help them compare results and choose between the translations.

Below is a link to an Android app I’ve developed that performs machine translations in parallel using different systems to make this comparison easier. The app shows results on the screen at the same time from different machine translation systems: Google Translate, Microsoft Translator, and Yandex Translate. I’d be interested in getting feedback on whether this approach is helpful for mobile users.

Android app on Google Play

Receiving OCR Progress Updates when Using Tesseract on Android

Posted in Uncategorized by rmt on December 17, 2014

The running time required to perform optical character recognition is influenced by the size of the image and the language of the text being recognized. When running OCR on more than a small block of text, or when using a language with many characters (like Chinese), the time delay required to perform OCR can be an annoyance to users.

To provide a better user experience, I’ve added some code written by Renard Wellnitz for Text Fairy to provide a progress callback method to the tess-two API. Objects implementing this method will receive updates during OCR with the percent complete and coordinates of the bounding box around the word that the OCR engine is currently working on.

The progress percentage can be used in a thermometer-style ProgressBar. The bounding boxes can be drawn on top of the display of the input image during recognition.

Implementing this callback requires using an alternate constructor for the TessBaseAPI object and implementation of the ProgressNotifier interface:

ProgressBar progressBar = (ProgressBar) findViewById(;

// Create the TessBaseAPI object, and register to receive OCR progress updates
TessBaseAPI baseApi = new TessBaseAPI(this);

public void onProgressValues(ProgressValues progressValues) {

Building an Apertium Standalone Language Pair Translation Jar Package for Android on Ubuntu

Posted in Uncategorized by rmt on June 16, 2013

This is a procedure for creating standalone packages that can be bundled with Android apps for supporting in-app language translation while offline–that is, without a cellular or wifi data connection.

Requires: Android SDK

Install required packages:
sudo apt-get install subversion libxml2-dev xsltproc flex libpcre3-dev gawk libxml2-utils

Get Apertium repository code:
svn co apertium

Compile and install lttoolbox:
cd apertium/trunk/lttoolbox
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./
sudo make install
sudo ldconfig

Compile and install apertium:
cd ../apertium
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./
sudo make install
sudo ldconfig

Compile and install lttoolbox-java:
cd ../lttoolbox-java
sudo make install

Compile a language pair (for example, English-Spanish):
(Android-related note: If you see ‘you don’t have cg-proc installed’ then this pair requires the constraint grammar
package, so this pair is not Android compatible.)
cd ../apertium-en-es
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./
sudo make install
echo 'test' | apertium en-es

Create a symbolic link in the former location of Android’s ‘dx’:
cd /home/$USER/android-sdk-linux/platform-tools
ln -s ../build-tools/17.0.0/dx dx

Compile the standalone package for a language pair (for example, English-Spanish):
cd apertium/trunk/lttoolbox-java
export LTTOOLBOX_JAVA_PATH='/usr/local/share/apertium/lttoolbox.jar'
export ANDROID_SDK_PATH='/home/$USER/android-sdk-linux'
./apertium-pack-j /usr/local/share/apertium/modes/en-es.mode /usr/local/share/apertium/modes/es-en.mode

At this point apertium-en-es.jar has been created in apertium/trunk/lttoolbox-java.


Install language pairs:
Build standalone language pair packages:

Looking For Words With an Edit Distance of 1 or 2 From Other Words

Posted in Uncategorized by rmt on November 3, 2012

This code is a modification of Peter Norvig’s spelling corrector that adds the closest_nearby_word() method, which identifies the most-frequently-seen correctly-spelled word that has an edit distance of 1 or 2 from the given correctly-spelled word.

# -*- coding: utf-8 -*-

An altered version of Peter Norvig's spelling corrector

from collections import Counter
import re

# Get the whitespace-delimited words from a text, minus any punctuation
def words(text): return re.findall('[a-z]+', text.lower())

# Count the frequency with which each word occurs
def train(features):
 model = Counter()
 for f in features:
 model[f] += 1
 return model

# Run training using a book with words we'll consider to be spelled correctly
NWORDS = train(words(file('big.txt').read()))

# Get strings with an edit distance of 1 from the given word
def edits1(word):
 alphabet = 'abcdefghijklmnopqrstuvwxyz'
 splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
 deletes = [a + b[1:] for a, b in splits if b]
 transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
 replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
 inserts = [a + c + b for a, b in splits for c in alphabet]
 return set(deletes + transposes + replaces + inserts)

# Get strings with an edit distance of 2 from the given word
def known_edits2(word):
 return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

# Get any known words matching the given mutated words
def known(words):
 return set(w for w in words if w in NWORDS)

# Suggest a correction by mutating a word and choosing the most likely replacement
# based on how often the mutated words appear in the trained model NWORDS
def correct(word):
 candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
 print candidates
 return max(candidates, key=NWORDS.get)

# Get the likeliest correctly-spelled word
def closest_nearby_word(word):
 nearby = set()
 for e in known(edits1(word) or known_edits2(word)):
 if (e != word):
 if not nearby: return set()
 return max(nearby, key=NWORDS.get)

#print correct('speling')

# Run some test cases for finding "nearby" words
for w in frozenset(['rational', 'woman', 'rogue', 'effect', 'started', 'rein',
 'scalded', 'mislead', 'reality', 'whit', 'marshal', 'voila',
 'aide', 'tiered', 'county', 'fires', 'stated', 'soldier',
 'beset', 'affect', 'vice', 'wreck', 'spayed', 'complimentary',
 'their', 'principal', 'moral', 'especially', 'steal',
 'personal', 'why', 'heroine', 'descendant', 'baited',
 'interested', 'sole', 'think', 'physics', 'corps', 'discrete']):
 print w, "-", closest_nearby_word(w)


With a better training corpus, this method could possibly be used to identify misspelled words that are overlooked by most spell checkers. But better starting points are available.

think - thing
corps - crops
stated - states
baited - waited
aide - side
beset - best
fires - fire
scalded - scolded
moral - morel
whit - what
principal - principals
wreck - wrack
personal - set([])
heroine - heroin
reality - set([])
their - theirs
interested - set([])
voila - set([])
woman - women
rational - national
started - stated
sole - some
effect - effects
rogue - vogue
affect - effect
why - who
descendant - descendants
county - count
spayed - stayed
especially - specially
vice - voice
physics - set([])
discrete - discreet
tiered - tired
mislead - misled
soldier - soldiers
rein - vein
complimentary - complementary
steal - steel
marshal - marshall

A Continuous Floating-Point Adaptation of Conway’s Game of Life

Posted in Uncategorized by rmt on October 13, 2012

This video shows a continuous, floating-point adaptation of Conway’s Game of Life.

I rendered this video by running Ready using the SmoothLifeL parameter set. World size is 2048×2048. Stephan Rafler’s paper explains the math behind the video. Rendering took eight hours on my video card.

Frames were converted to video using:

ffmpeg -s hd1080 -r 30 -b 9600 -i frame_%06d.png video.mp4

Running the Apertium Translator on Android: Offline Machine Translation on a Mobile Device

Posted in Uncategorized by rmt on September 26, 2012

A few days ago I put together an Android app that runs the Apertium open source rule-based translation system.

For his excellent 2012 Google Summer of Code project, Mikel Artexte has converted several of the Apertium language pairs into standalone, self-contained Jar files that have no outside dependencies. The end result is a set of cool little embeddable modules that do real language translation on mobile devices without requiring a network connection at all.

One of the challenges I ran into while making this app was that the default Proguard settings cause the translation system to crash at runtime. To avoid this, I tried a whole bunch of settings and finally found a set of Proguard directives that preserve the Apertium offline translation, avoiding the crashing:

# Preserve Apertium offline translation
-dontwarn java.awt.**
-dontwarn javax.swing.**
-dontwarn org.apertium.ApertiumGUI
-keepattributes InnerClasses
-keep class org.apertium.**
-keepclassmembers class * {
    public *; protected *; private *;

Together, these settings preserve the translation-related classes packaged in the Apertium Jar files, saving them from Proguard’s obfuscation process that leads to crashing. These Proguard settings can be added on top of the default Android Proguard optimizations by adding the following line to the file:


With these settings in place, the translations work correctly, with the optimized translation results being returned after about 5000ms for a short phrase on my phone.

Overall, I’m pleased with the Android offline translation app results, particularly because these translations work without requiring a network connection. Another advantage is that this system can do translations for a few languages that are currently unsupported by Google Translate and Bing Translator.

UPDATE (Jan. 1, 2013)
A new version of the offline translation app is available.

Counting Word Frequency with Python

Posted in Uncategorized by rmt on September 26, 2012

A Python script to crudely tally the number of times a space-delimited word appears in a text file. Uses a Counter object to track the number of occurrences for each word.

# -*- coding: utf-8 -*-
Created on Wed Sep 26 20:04:11 2012

Returns the most-common space-delimited words in a file.

@author: robert
from collections import Counter
import re

def openfile(filename):
    fh = open(filename, "r+")
    str =
    return str

def removegarbage(str):
    # Replace one or more non-word (non-alphanumeric) chars with a space
    str = re.sub(r'\W+', ' ', str)
    str = str.lower()
    return str

def getwordbins(words):
    cnt = Counter()
    for word in words:
        cnt[word] += 1
    return cnt

def main(filename, topwords):
    txt = openfile(filename)
    txt = removegarbage(txt)
    words = txt.split(' ')
    bins = getwordbins(words)
    for key, value in bins.most_common(topwords):
        print key,value

main('speech.txt', 500)

Example output:

the 235
and 161
to 132
of 125
that 101
a 91
in 83
we 70
is 54
our 40
who 39
for 39
people 37
not 36
are 32
on 29
it 28
be 28
their 27
must 26
have 25
those 25
will 25
with 25
as 22
world 21
this 19
all 18
america 18
because 18
from 17
they 17
i 17
an 17


Using an SSH config file to git-push to use the right SSH key for multiple domains

Posted in Uncategorized by rmt on August 27, 2012

Use a file called config in your .ssh directory to direct ssh/git to use the correct ssh public key, depending on which domain you connect to.

An example that uses the correct SSH key when using git push to save code to either GitHub or Heroku:

  IdentityFile C:\Users\Robert\Documents\keystore\github-id_rsa
  IdentityFile C:\Users\Robert\Documents\keystore\heroku-pc2-id_rsa

Removing a package in Gentoo Linux

Posted in Uncategorized by rmt on August 2, 2012

Ensure there are no reverse dependencies:

equery depends packagename

Unmerge the package, leaving config files in place:

emerge --unmerge --pretend packagename

emerge --unmerge packagename

Installing the Android SDK from the Command Line

Posted in Uncategorized by rmt on July 24, 2012

(Ubuntu 12.04)

The following are instructions for setting up the Android SDK from the command line.

Install Eclipse:

sudo apt-get install eclipse

Install Android SDK:

sudo apt-get install ia32-libs
tar xzvf android-sdk_*
echo ‘export ANDROID_HOME=/home/$USER/android-sdk-linux’ >> /home/$USER/.bashrc
echo ‘PATH=$PATH:$ANDROID_HOME/tools’ >> /home/$USER/.bashrc
echo ‘export PATH=$PATH:$ANDROID_HOME/platform-tools’ >> /home/$USER/.bashrc
source ~/.bashrc
android update sdk –no-ui –force

Install Android NDK:

bunzip2 android-ndk-
tar xvf android-ndk-

echo ‘export ANDROID_NDK_HOME=/home/$USER/android-ndk-r8’ >> /home/$USER/.bashrc
echo ‘export PATH=$PATH:$ANDROID_NDK_HOME’ >> /home/$USER/.bashrc
source ~/.bashrc

Install Android ADT Plugin:


Help->Install New Software->Add>Developer Tools

Preferences->SDK Location->(Set the SDK Location)

Help->Check for Updates

Create an AVD:

android create avd -n 4.0.3 -t 27 –abi armeabi-v7a

Maven Installation

Install Maven 3:

tar xzvf apache-maven-3.0.4-bin.tar.gz
echo ‘export M3_HOME=/home/$USER/apache-maven-3.0.4’ >> /home/$USER/.bashrc
echo ‘export M3=$M3_HOME/bin’ >> /home/$USER/.bashrc
echo ‘export PATH=$PATH:$M3_HOME’ >> /home/$USER/.bashrc
source ~/.bashrc

Install android-maven-plugin:

sudo apt-get install git
git clone
cd maven-android-plugin
mvn clean install

Install maven-android-sdk-deployer:

git clone
cd maven-android-sdk-deployer
mvn install

Install maven-android-plugin-samples:

git clone
cd maven-android-plugin-samples
mvn clean install