Thursday, June 13, 2013

Apache read raw request data

It's not possible to get the whole raw request data from within PHP. You can get the request body using php://input but not the headers. Searching around, I found that you can log the request data from apache  using mod_dumpio. It will dump the incoming request data to error.log. As mentioned in the docs, for apache version < 2.4, you have to set LogLevel to debug. One catch with this is to make sure none of your virtualhost config has LogLevel higher than debug otherwise you'll not get output from this module. Also make sure you didn't set the LogLevel to debug but going down the config, another LogLevel exists and set to something else. Happened to me.

Monday, May 27, 2013

Craft HTTP requests using nc

To do some low level check on websites, I'd usually use telnet to compose a http requests against the server. The main intention is to talk directly to the server port to make sure the problem we have not caused by some higher level application. For example, to connect to server and issue a GET request:-

telnet 80
Connected to
Escape character is '^]'.
Connection closed by foreign host.

There's always a problem with telnet. In the above example, I can only issue a GET request without having a chance to add other HTTP headers such as HOST before the server close the connection. Some websites also time out very quickly when they are not receiving any data after establishing connection. And since the above command is in interactive session, it's not repeatable or scripted. Using nc seem to be much better.

Specify the virtual host:-

echo -en "HEAD / HTTP/1.1\r\nHOST:\r\n\r\n" | nc 80

You'll get the output as:-

HTTP/1.1 200 OK
Content-Type: text/html
Last-Modified: Fri, 12 Apr 2013 23:26:51 GMT
Expires: Sun, 26 May 2013 19:40:06 GMT
Cache-Control: max-age=600
Content-Length: 9991
Accept-Ranges: bytes
Date: Sun, 26 May 2013 19:30:07 GMT
Via: 1.1 varnish
Age: 0
Connection: keep-alive
X-Served-By: cache-s34-SJC2
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1369596606.987305880,VS0,VE145
Vary: Accept-Encoding

Without virtualhost:-

echo -en "HEAD / HTTP/1.1\r\n\r\n" | nc 80

And the output:-

HTTP/1.1 400 Bad Request
Content-Type: text/html
Content-Length: 166
Accept-Ranges: bytes
Date: Sun, 26 May 2013 19:32:16 GMT
Via: 1.1 varnish
Age: 0
Connection: keep-alive
X-Served-By: cache-s35-SJC2
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1369596736.277374744,VS0,VE72
Vary: Accept-Encoding

It allow us to fully compose the request and then send it through the opened connection nc created.


Wednesday, April 18, 2012

Weekend project

Quick weekend project - a website to search for 'halal' status of products in local market. The data was scraped from JAKIM website. The primary motivation for doing this is to search for halal status from my mobile phone - small feature phone with browser (not android). The JAKIM website can't even being displayed on my phone. It used Django at the backend and the well known Twitter Bootstrap for the frontend page. This is my first use of Bootstrap, it simply work out of the box for mobile browser (mine is the old opera mini, 3.0 something I guess). I try to avoid Django for weekend project but the offer django-haystack has for quick search tool is too tempting so I decided to still use Django for this one. The search backend is Whoosh with the integration mostly done by haystack. Scraping JAKIM website not easy, the data are all in heavily nested html tables with no id or class to identify. I use python lxml lib to parse the html. When user search for keyword, it will look first in the Whoosh index and if none found, try to query JAKIM site directly and then redirect user to the same page with their query parameter. To make the result immediately available, I used the real time index feature of haystack that will automatically update the index once new item inserted into db. This site has one advantage over the JAKIM site. The search on JAKIM site was naively implemented that you can't even search for multiple keywords. For example searching for "shokubutsu original" yield no result from JAKIM while my site return exactly what I want.

Sunday, April 15, 2012

Django lxml encode error

Python script using lxml library that work fine on console suddenly throwing out error when importing it from django views module.
File "lxml.etree.pyx", line 123, in init lxml.etree (src/lxml/lxml.etree.c:160385) TypeError: encode() argument 1 must be string without null bytes, not unicode"
It unlikely problem with the encoding of the content I want to parse because it just importing the module and I'm not calling any function that do the parsing yet. Almost giving up in my search until I found this answer[1] on Stackoverflow. It turn out on console I'm using Python 2.6 while mod_wsgi, which run the django app is compiled against python 2.7.

Saturday, February 11, 2012

PostgreSQL dump list file

The error first started with something like 'INVALID COMMAND \N ..'. This actually not the real error. Use psql command line switch --set ON_ERROR_STOP=1 to stop immediately on error so we can see the actual error.

When restoring my postgresql dump to the new hosting, got error that saying plpgsql language already exists. This probably because my hosting already setup plpgsql in all databases by default while dump consist a line to create the language extension. Googling around, I found out that pg_restore provide what they call list file that list out what kind of object should be restored. So the first step is to generate the list file out of your database dump:-
pg_restore -l db.dump > db.list
db.dump is your database dump file and when specifying -l option, the output would be list of object to restore. We save that list in db.list file. Now we can open up db.list with any text editor and comment out the line that mention the creation of plpgsql. The list file should look something like:-
3178; 1262 1525521 DATABASE - cc_live myname
6; 2615 1313721 SCHEMA - audit myname
7; 2615 1313722 SCHEMA - cct myname
3; 2615 2200 SCHEMA - public postgres
3179; 0 0 COMMENT - SCHEMA public postgres
3180; 0 0 ACL - public postgres
;1007; 2612 1313725 PROCEDURAL LANGUAGE - plpgsql myname
639; 1247 1313728 TYPE cct daily_sale myname
641; 1247 1313729 DOMAIN public bigint_unsigned myname
So we comment out (by putting semicolon at the beginning of the line) plpgsql object. Then we restoring the dump, we specify the list file to pg_restore:-
pg_restore -x -O -L db.list db.dump | psql new_db 
This is useful if the dump file very large or using the archive format which mean you can't edit it directly with editor. The list file supposed to be editable.

Thursday, January 12, 2012

Test sending email through telnet

This the minimal telnet session to test sending to email server.
$ telnet localhost 25
Connected to localhost.
Escape character is '^]'.
250 ok
250 ok
354 go ahead
Subject: OTA

ota ajaj.
250 ok 1326364891 qp 1790
The mail server in this case running qmail. If the email is bounced, the returned email may end up in your spam folder as the sender domain does not match, we send this from localhost instead of from

Wednesday, November 23, 2011

Using IRC as Time Tracker

I have tried all sort of time tracker, todo lists and what not but still don't really get  what I want out of that tools. The requirement is very basic and simple, a tool that allow me to track how much time I have spent on certain things throughout the day. While there are lot of great tools with a degree of simplicity, it still having too much friction for me to effectively use them.

Emacs orgmode  I think pretty much close to what I want. That just a guess since I just gleaned over it's features and what people are saying about it. But to learn emacs just to use this is a bit too much. I took note in vim in outline mode - that's how my mental mode work. It would be nice if vim can record the time whenever I add new outline. Most of the todo lists or time tracker out there only allow you to define simple title and explanation of the task. If that was web based app, don't ever think of having some outlining support.

Then I noticed that it quite easy for me to rebuild my mental state on what I have done throughout the day by looking at my svn commit log or the skype chat log with my colleagues. So I stop looking  for the tools and just used the history browser of our svn to compose my daily report. It work well except that not all tasks would have a commit log - or in other word something that has svn repository. I might be doing some server config  or work in some web interface and all these not related to any of our svn repo.

The skype chat log gave me an idea that I maybe can use it to track my time. Each post got a time stamp so it perfect. But to whom should I chat with ? Setup new fake account ? Possible but I think there must be a better way. Enter IRC. I used irssi as my irc client and since it console based app, that mean almost no context switch in order to use it. It just few SHIFT+Arrow key away from the current console. To use public irc server for this definitely no-no. Setting up the whole irc server seem a bit over kill although is much easier nowadays thanks to apt-get.

Searching around, I found hircd - python code (in 400 lines) implementing basic IRC server. Simply run the python code and fire up irssi to connect to localhost. It just work. Automate some tasks such as automatically joining a channel, autolog on and I'm ready to go. If you work on multiple project and want to split the log, we can just create new channel but for now I stay in single channel as it easy to keep track.

1. Put the chat log ($HOME/irclogs) into spideroak synced directory.
2. Integrate with pormodoro technique