Then we do a query on the domain to get the 100 item_ids, and then load their attributes one by one.
Of course it is all done in an EC2 instance to save me some money and faster network.
The goal is to see how many parallel threads do we need to achieve optimum load time.
For statistical purposes, we ran with 1, 2, 5, 7, 10, 12, 15, 17, 20 and 25 threads.
Each thread run was done 10 times and average. Then we ran this 5 times and average the averages for each thread.
Here's what we got.
Y-axis is time in seconds to fetch the attributes of 100 items.
X-axis is the number of parallel threads.
Noticed that single-thread is definitely a bad idea.
Two is better.
Five is even better, but anymore than that doesn't make it any faster.
At 20 threads it is even getting worse. Maybe ruby's thread management gets weird at this point?
It is possible that 100 items is too small (afterall at 20 threads, this means each thread is only loading 5 items). Not to mention that the test code includes thread spawning in the timing, which means that the cost of spawning 20 threads might no longer do justice to load only 5 items).
So let's try again with more items. This time 250 at once (the maximum number of items that can be queried from SimpleDB). Now, at 20 threads we are loading 12 per thread, slightly more than double the previous test.
Again, Y-axis is time in seconds to load attributes of 250 items.
X-axis is the number of thread.
The two lines are two different runs. This is even more curious because in one of the run, there is no noticeable improvement with more than 2 threads. But in general anything between 5 and 15 is good. Again there is a rise around 20 threads.
At this point I do not know if the limitation is on SimpleDB, ruby or networking.
The bottom line is that it takes 25 ms to fetch this data from SimpleDB. Each set of attributes (item) averages 32 bytes at 5 or more threads. At 1 thread, it takes 46ms, which is fairly acceptable to achieve sub-1-second web app response time.
No comments:
Post a Comment