wayne piekarski Cheap and low power web server with local Wikipedia

 
Share Blog article posted May 2017

Cheap and low power web server with local Wikipedia

Building a super low power web server and WiFi access point, with a full 50 Gb copy of wikipedia, for when you have no Internet access.

Portable offline wikipedia

I have always been amazed by the Wikipedia project, and how it has managed to capture so much of the world's information. Around 2005, you could get archive files containing a full copy of the entire text of Wikipedia that fit onto a 4 Gb SD card. These 4 Gb SD cards were quite expensive at the time, but I thought it was amazing that I could download the entire Wikipedia, put it into my Palm Tungsten T3, and then browse Wikipedia while flying on a plane far away from any Internet access. I had access to the entire world's information and it was stored locally in my device ... this was basically the first actual implementation of the Hitchhiker's Guide to the Galaxy.

Kiwix

Nowadays, the full Wikipedia archive from Kiwix including pictures is 59 Gb, and growing every day. The text-only version is 19 Gb, which is much larger than the 4 Gb image I used over 10 years ago. Most phones don't have enough space to store either of these files, and don't contain a slot for an SD card. There are nice apps such as Kiwix for Android (open source) that can browse a local ZIM file, but today's phones just don't have the storage space.

LinkIt Smart 7688 Duo

So I thought perhaps a better solution would be to build a tiny web server that can run off a battery, and then I could browse it using a phone, laptop, or E-paper device like a Kindle. The Kiwix project actually supports something called kiwix-serve, which takes a ZIM file and creates a web server. However, I needed a device which could run off a battery, and be also cheap and very small. A Raspberry Pi 3 would do it, but it is physically large, costs around $50, and can consume lots of power. The Raspberry Pi Zero is smaller and cheaper, but doesn't include WiFi, and I didn't want extra adaptors plugged in via USB. At the time I started this project, the Raspberry Pi Zero W did not exist. However, I found the perfect solution, a LinkIt Smart 7688 Duo which uses a Media Tek MT7688 MIPS processor at 580 MHz with 128 MB memory, a Micro SD slot, and runs OpenWRT Linux. It is also the size of my thumb, and had nothing extra that would make it larger than necessary. It was perfect for what I wanted to do. The only problem is that kiwix-serve is quite a large piece of code, and the build system is super complicated and only binaries for ARM and x86 are pre-built. Also, I was concerned that kiwix-serve would need more than 128 MB of memory to run it, so it might not even work if I ported it.


LinkIt Smart 7688 Duo (SD card slot on back)

zimHttpServer

Instead of trying to use Kiwix, I looked for other options. I was hoping for a web server written in an interpreted language that was supported by OpenWRT out of the box. The only solution I found which looked like it would work was zimHttpServer, which was a really simple test web server written in Perl as part of the Kiwix tools repository. It turned out that ZIM files were a lot less complicated than I thought, and it is basically an archive containing XZ compressed HTML files for each article, and you use a binary search to find the exact file you want. What impressed me most was that it was only 300 lines of Perl code, and ran on a standard Perl implementation with no extra dependencies! Since this was so small, I figured that it would be straightforward to understand, and worst case scenario I could rewrite it from scratch in another language if I really needed to. Every time I'd ever touched Perl was a horrible experience, but I figured that I should be able to deal with all its craziness for this project, what could possibly go wrong?

64-bit seek on 32-bit perl

zimHttpServer worked perfectly at creating a web server with the 59 Gb ZIM file I had on my x86 Linux and OSX machines, so that was a good start. The code was a bit old from 2012 and required a few cleanups and bug fixes, and I also put some extra debugging in there so I knew what was going on. However, when I copied it over to my MT7688 device, I ran into a big problem. It turns out the MT7688 is a 32-bit processor, and Perl doesn't support 64-bit integers or seek operations. The ZIM header and article lookups use 64-bit offsets, and I was able to change the parser to read these offsets as two separate 32-bit values. What remained was trying to do a 64-bit seek, but 32-bit perl only supports 32-bit operations via seek(), and there is no 64-bit llseek() support. So you need to break down the seek operations into smaller chunks of size 0x40000000 (1 Gb), because 0x80000000 (2 Gb) is a negative number in 32-bit machines. So you do an absolute seek to 0x0, and then relative seeks of 0x40000000, and then a final relative seek to the offset you need. It can take 50 seek() calls to get to the 50th gigabyte in a file, but this does not impose a huge impact on the overall speed of fetching an article. The next problem was that zimHttpServer was using Perl's buffered I/O abstraction, which tries to read back the seek() location and it fails. So I needed to convert from open() to sysopen(), which uses raw file I/O and supports passing O_LARGEFILE to the kernel. There were a few places where the code was using fancy Perl I/O operations that needed to be rewritten to work with this as well. So after making these changes, I was able to get everything working with the large files on the MT7688.

Extra MIPS binaries from OpenWRT

Since the MT7688 runs a simplified OpenWRT distribution, there were a few things missing. zimHttpServer relied on an external xz compression executable to do the work, and xz was not included. So I found the OpenWRT build and configured that to build the xz binaries. I then copied the xz binaries over to run on the device.

Indexing

zimHttpServer.pl including a primitive indexing technique, where on first run it traverses the entire ZIM file and generates a single text file with the names of all the articles. However, this single file is hundreds of megabytes and takes a long time to search through. I modified the code so that it generates separate index files for each letter of the alphabet, and you can only search by the first word. This helps to reduce searching time to only a few seconds, and doesn't require any more memory.

Web server and WiFi

After making these changes, I now have a version of zimHttpServer.pl that runs on 32-bit machines with only 128 MB of RAM, and can serve the full 59 Gb ZIM file from wikipedia, including all the images! I configured the LinkIt 7688 to start up its own local WiFi, and configured /etc/hosts so that the name "wikipedia" maps to the local IP address of the device. On your laptop, phone, or e-reader, you can pair up to this WiFi, and use any browser to visit http://wikipedia:8080 and it just works!

Packaging and power

The device is so tiny and I didn't want to make it too big by adding any plastic housing around it, it is only 63mm long and 26mm wide, and weighs basically nothing. So I wrapped Kapton tape around it to insulate the electronics. I'm not using this for anything else, so the only port available is the Micro USB port, where you plug in the power supply. The power usage is incredibly low, and peaks at only 300 mA at 5V under heavy I/O. You can run this off an incredibly tiny battery pack, and carry it around in your jacket and broadcast Wikipedia to all! I like the packaging of this device, where the Micro USB port is on the end, so the cables are inline. Everything was really cheap, with the LinkIt 7688 costing only US$15, and a 64 Gb SD card costing less than US$20. This will only continue to get cheaper over time.


LinkIt 7688 wrapped in Kapton tape for protection, compared with US 25 cent coin. Micro SD card holder visible on rear side.

Source code

You can download the modified code from my GitHub repository, which is based on the original zimHttpServer.pl code. It was originally licensed under GPL v3 as part of the Kiwix project.

If you want to see how to created the xz binaries using OpenWRT, I have a GitHub repository with the build, and also a fork of the repository from MediaTek.


Share Blog article posted May 2017


Developer Advocate for Iot and Assistant


IoT water meter monitoring


Tiny and cheap offline Wikipedia project


Dylan with custom GoPro backpack


Outdoor augmented reality research
Tinmith 1998-2006


Outdoor augmented reality gaming
ARQuake 1999-2006


Scanned physical objects outdoors
Hand of God 3D 2006


Developer Advocate for Iot and Assistant


IoT water meter monitoring


Tiny and cheap offline Wikipedia project


Dylan with custom GoPro backpack


Outdoor augmented reality research
Tinmith 1998-2006


Outdoor augmented reality gaming
ARQuake 1999-2006


Scanned physical objects outdoors
Hand of God 3D 2006


Contact Wayne Piekarski via email wayne AT tinmith.net for more information

Last Updated 2017