Software Projects

Running the Apertium Translator on Android: Offline Machine Translation on a Mobile Device

Posted in Uncategorized by rmtheis on September 26, 2012

A few days ago I put together an Android app that runs the Apertium open source rule-based translation system.

For his excellent 2012 Google Summer of Code project, Mikel Artexte has converted several of the Apertium language pairs into standalone, self-contained Jar files that have no outside dependencies. The end result is a set of cool little embeddable modules that do real language translation on mobile devices without requiring a network connection at all.

One of the challenges I ran into while making this app was that the default Proguard settings cause the translation system to crash at runtime. To avoid this, I tried a whole bunch of settings and finally found a set of Proguard directives that preserve the Apertium offline translation, avoiding the crashing:

https://gist.github.com/3792747

# Preserve Apertium offline translation
-dontwarn com.sun.org.apache.bcel.internal.**
-dontwarn java.awt.**
-dontwarn javax.swing.**
-dontwarn javax.tools.**
-dontwarn javax.xml.stream.**
-dontwarn org.apertium.ApertiumGUI
-keepattributes InnerClasses
-keep class org.apertium.**
-keepclassmembers class * {
    public *; protected *; private *;
}

Together, these settings preserve the translation-related classes packaged in the Apertium Jar files, saving them from Proguard’s obfuscation process that leads to crashing. These Proguard settings can be added on top of the default Android Proguard optimizations by adding the following line to the project.properties file:

proguard.config=${sdk.dir}/tools/proguard/proguard-android-optimize.txt:proguard.cfg

With these settings in place, the translations work correctly, with the optimized translation results being returned after about 5000ms for a short phrase on my phone.

Overall, I’m pleased with the Android offline translation app results, particularly because these translations work without requiring a network connection. Another advantage is that this system can do translations for a few languages that are currently unsupported by Google Translate and Bing Translator.

Counting Word Frequency with Python

Posted in Uncategorized by rmtheis on September 26, 2012

A Python script to crudely tally the number of times a space-delimited word appears in a text file. Uses a Counter object to track the number of occurrences for each word.

# -*- coding: utf-8 -*-
"""
Created on Wed Sep 26 20:04:11 2012

Returns the most-common space-delimited words in a file.

@author: robert
"""
from collections import Counter
import re

def openfile(filename):
    fh = open(filename, "r+")
    str = fh.read()
    fh.close()
    return str

def removegarbage(str):
    # Replace one or more non-word (non-alphanumeric) chars with a space
    str = re.sub(r'\W+', ' ', str)
    str = str.lower()
    return str

def getwordbins(words):
    cnt = Counter()
    for word in words:
        cnt[word] += 1
    return cnt

def main(filename, topwords):
    txt = openfile(filename)
    txt = removegarbage(txt)
    words = txt.split(' ')
    bins = getwordbins(words)
    for key, value in bins.most_common(topwords):
        print key,value

main('speech.txt', 500)

Example output:

 
the 235
and 161
to 132
of 125
that 101
a 91
in 83
we 70
is 54
our 40
who 39
for 39
people 37
not 36
are 32
on 29
it 28
be 28
their 27
must 26
have 25
those 25
will 25
with 25
as 22
world 21
this 19
all 18
america 18
because 18
from 17
they 17
i 17
an 17

(…)

Using an SSH config file to git-push to use the right SSH key for multiple domains

Posted in Uncategorized by rmtheis on August 27, 2012

Use a file called config in your .ssh directory to direct ssh/git to use the correct ssh public key, depending on which domain you connect to.

An example that uses the correct SSH key when using git push to save code to either GitHub or Heroku:


Host github.com
  IdentityFile C:\Users\Robert\Documents\keystore\github-id_rsa
Host heroku.com
  IdentityFile C:\Users\Robert\Documents\keystore\heroku-pc2-id_rsa

Removing a package in Gentoo Linux

Posted in Uncategorized by rmtheis on August 2, 2012

Ensure there are no reverse dependencies:

equery depends packagename

Unmerge the package, leaving config files in place:

emerge --unmerge --pretend packagename

emerge --unmerge packagename

Installing the Android SDK from the Command Line

Posted in Uncategorized by rmtheis on July 24, 2012

Updated 2019-05-17

These commands are for installing the Android SDK command line tools without installing Android Studio.

(Ubuntu 18.04)

sudo apt-get install openjdk-8-jre-headless
sudo apt-get install openjdk-8-jdk
wget https://dl.google.com/android/repository/sdk-tools-linux-4333796.zip
mkdir android-sdk
unzip sdk-tools-linux-4333796.zip -d android-sdk
yes | android-sdk/tools/bin/sdkmanager "platform-tools" "platforms;android-28" "build-tools;28.0.3" "extras;android;m2repository" "emulator"

echo 'export ANDROID_HOME=/home/$USER/android-sdk' >> /home/$USER/.bashrc
echo 'export PATH=$PATH:$ANDROID_HOME/emulator' >> /home/$USER/.bashrc
echo 'export PATH=$PATH:$ANDROID_HOME/tools' >> /home/$USER/.bashrc
echo 'export PATH=$PATH:$ANDROID_HOME/tools/bin' >> /home/$USER/.bashrc
echo 'export PATH=$PATH:$ANDROID_HOME/platform-tools' >> /home/$USER/.bashrc
echo 'export PATH=$PATH:$ANDROID_HOME/build-tools/28.0.3' >> /home/$USER/.bashrc

echo 'export TMPDIR=/tmp' >> /home/$USER/.bashrc

Converting a YYYYMMDD Number to Date Format in Microsoft Excel

Posted in Uncategorized by rmtheis on May 17, 2012

To tell Excel to interpret the number that’s in cell A1 as a date in YYYYMMDD format, enter this formula in a separate cell:

=DATE(LEFT(A1,4),MID(A1,5,2),RIGHT(A1,2))

Then do Format Cells->Date on the cell into which you just entered the formula.

App Videos

Posted in Uncategorized by rmtheis on December 4, 2011


ARB Installation

Posted in Uncategorized by rmtheis on August 13, 2011

sudo apt-get install libmotif4 libpng3 xloadimage gnuplot gv xfig libglew1.5 libmotif-dev libtiff4-dev libx11-dev libxaw7-dev libxext-dev libxml2-utils libxp-dev libxpm-dev libxt-dev lynx x11proto-print-dev xsltproc xutils-dev freeglut3-dev libglew1.5-dev libpng12-dev
mkdir /home/share/apps/arb
cd /home/share/apps/arb
wget http://download.arb-home.de/release/arb_5.2/arb.64.ubuntu.OPENGL.tgz
sudo tar xvfz arb.64.ubuntu.OPENGL.tgz
sudo /bin/sh arb_install.sh
sudo ln -s /usr/lib/libXm.so.4.0.3 /usr/lib/libXm.so.3

Using Tesseract Tools for Android to Create a Basic OCR App

Posted in Uncategorized by rmtheis on August 6, 2011

Jan. 24, 2012 UPDATE: This tutorial is out of date. The tesseract-android-tools build files and the Android SDK Tools have both been updated, so the build should now succeed without requiring the modifications shown below. There’s an up-to-date tutorial available here.

I’ve published a project that combines the tesseract-android-tools project code with the source code for the Tesseract/Leptonica dependencies in a single project that’s intended to be easier to build here.

Some newer projects of mine for Android: Wildfire tracking appGPS latitude/longitude converter app“What language is this?” appPrice Scanner appDeclutter your walletHarmful Algal Blooms
 
 
Note: The below instructions were written for the Android SDK Tools r12. To compile using r14+, after ndk-build do rm build.xml, then android update project --path . , then ant release (without modifying build.xml). Running the test cases on new versions of the SDK Tools will require other modifications.

These instructions assume you have already installed the Android SDK and NDK along with Eclipse and Subversion on Ubuntu.

Overall, what you need to do is to set up the tesseract-android-tools project as a library project in Eclipse, and tell your project to refer to the library project. So you’ll need two projects in Eclipse, whereas for an ordinary app you would have just one.

Step-by-step:

Check out the latest tesseract-android-tools source code using Subversion (don’t use the outdated code from “Downloads”):

<

p style=”padding-left:30px;”>git clone https://code.google.com/p/tesseract-android-tools/
Build the project according to the instructions in the readme file. Make sure that ndk-build successfully creates the .so object files, and that you get “BUILD SUCCESSFUL” when ant finishes. You may need to make three modifications:
Modification 1. Apparently the kernel.org site is unavailable for the libjpeg download, and it’s been pointed out elsewhere that using an alternative repository works, so use the following command instead of the existing git clone command:

git clone git://github.com/android/platform_external_jpeg.git libjpeg

Modification 2. Before running ant, edit the existing build.xml as a workaround for Android bug #13024. Put the following lines immediately before the ending tag:

<!-- beginning of modification -->
  
  
<!-- end of modification -->

Modification 3. Do ant compile instead of ant release.

Create an AVD running Android 2.2 or higher, and with an SD card.

Import the tesseract-android-tools project into Eclipse:

File->Import->Existing Projects Into Workspace->Choose tesseract-android-tools->Finish

If you get an error complaining about a compiler level 5.0 compatibility problem, right-click the project name for tesseract-android-tools and do Properties->Java Compiler->Enable project specific settings and Uncheck “Use default compliance settings,” then set “Generated .class files compatibility” to 1.5, and set “Source compatibility” to 1.5. Answer yes if asked to rebuild.

Add tesseract-android-tools as a library project:

Right-click tesseract-android-tools project name->Properties->Android->check “Is Library”.

[Optional] Install the built-in test case package by importing the tesseract-android-tools-test project:

File->Import->Existing Projects Into Workspace->Choose tesseract-android-tools-test->Finish

[Optional] Start the AVD, wait for it to boot, and install the traineddata file required by the test cases:

wget http://tesseract-ocr.googlecode.com/files/eng.traineddata.gz

gunzip eng.traineddata.gz

adb shell mkdir /mnt/sdcard/tesseract

adb shell mkdir /mnt/sdcard/tesseract/tessdata

adb push eng.traineddata /mnt/sdcard/tesseract/tessdata

[Optional] Run the test cases–the test cases should pass, saying “OK (3 tests)”:

adb install tesseract-android-tools-test/bin/tesseract-android-tools-test.apk

adb shell am instrument -w -e package com.googlecode.tesseract.android.test \
com.googlecode.tesseract.android.test/android.test.InstrumentationTestRunner

Create your new app as a new Android project.

Configure your project to use the tesseract-android-tools project as a library project: Right click your new project name, do Properties->Android->Library->Add, and choose tesseract-android-tools.
You can now create a TessBaseAPI object in your app’s onCreate():

File myDir = getExternalFilesDir(Environment.MEDIA_MOUNTED);

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(myDir.toString(), "eng"); // myDir + "/tessdata/eng.traineddata" must be present
baseApi.setImage(myImage);

String recognizedText = baseApi.getUTF8Text(); // Log or otherwise display this string...
baseApi.end();

Run your project on the AVD.
Other basic examples can be found in the TessBaseAPITest.java file in the tesseract-android-tools-test project.

Using VNC to Connect to an Ubuntu Server with a Full Screen Window Manager

Posted in Uncategorized by rmtheis on July 22, 2011

Do the one-time setup:

Install the VNC server package:

sudo apt-get install vnc4server

Set the password you want to use, entering a strong password:

vncpasswd

Run the following command to create your VNC configuration file:

vncserver :1 -geometry 1366x768 -depth 16

Then kill the session:

vncserver -kill :1

You should now have a config file at: ~/.vnc/xstartup. Edit that file and uncomment the following directives:

unset SESSION_MANAGER

sh /etc/X11/xinit/xinitrc

 

Each time you want to connect:

Connect by ssh to the server, and start your VNC server session on the server:

vncserver :1 -geometry 1366x768 -depth 16

Run your VNC client (Chicken of the VNC for Mac, Putty for Windows) on your local machine and connect to the server, using the password you set up above, and the display number as specified above (“:1“).