Wednesday, April 18, 2012

Weekend project

Quick weekend project - a website to search for 'halal' status of products in local market. The data was scraped from JAKIM website. The primary motivation for doing this is to search for halal status from my mobile phone - small feature phone with browser (not android). The JAKIM website can't even being displayed on my phone. It used Django at the backend and the well known Twitter Bootstrap for the frontend page. This is my first use of Bootstrap, it simply work out of the box for mobile browser (mine is the old opera mini, 3.0 something I guess). I try to avoid Django for weekend project but the offer django-haystack has for quick search tool is too tempting so I decided to still use Django for this one. The search backend is Whoosh with the integration mostly done by haystack. Scraping JAKIM website not easy, the data are all in heavily nested html tables with no id or class to identify. I use python lxml lib to parse the html. When user search for keyword, it will look first in the Whoosh index and if none found, try to query JAKIM site directly and then redirect user to the same page with their query parameter. To make the result immediately available, I used the real time index feature of haystack that will automatically update the index once new item inserted into db. This site has one advantage over the JAKIM site. The search on JAKIM site was naively implemented that you can't even search for multiple keywords. For example searching for "shokubutsu original" yield no result from JAKIM while my site return exactly what I want.

No comments:

Post a Comment