Sunday, June 6, 2010

PHP and Mary - Speed!

As I tease you-my friends-endlessly about. I'm building a high volume-low overhead website like Markus did with Plentyoffish.com called Mary. It is not a dating website but I hope to make Mary into a low workload million dollar website with adsense. To encourage women to go into computer science I give all my projects cute girl names-these serve as the generic free software names. Plus for mine it's a hint on what it does.

I decided at first to use PHP and MySQL because as the most common platforms they would be the best supported and have the most mature tools. That reasoning was very good. I easily setup Ubuntu by downloading the packages-it did all the hard work for me. I was also able to get phpmyadmin working out of the box as well. Got MediaWiki set as well with a bit of configuration.

I ran into database issues when I realize how difficult joins could be to design. I tried comparing other platforms such as noSQL databases. Then I began thinking how much Mary-which right now does nothing more then display a fancy template and connects to a database to do some admin tasks-compares against other. Once again Ubuntu package maintainers did the hard work for me and apt-get installed xcache and xdebug to speed up and profile my applications.

The packages installed but I still have to configure them. I used this tutorial to setup xcache and xdebug.
(1) http://www.ibm.com/developerworks/library/os-php-fastapps1/
(2) http://www.ibm.com/developerworks/opensource/library/os-php-fastapps2/?S_TACT=105AGY82&S_CMP=MAVE

I'm too lazy to post the data. I discovered that my index.php spent most of it's time on the require_once(common.inc); function call.

Okay... why? Why is it important?

I used the ab (apache benchmark) comparison tool to test Mary's index.php to phpMyAdmin index and phpMyAdmin blew it's socks off. That couldn't be possible and I spent many days playing with it to discover why. Including comparing phpMyAdmin to an empty file or no file at all-it still beat it. That stumped me... my best thought was that phpMyAdmin was redirecting ab. (There were no failures.) Mediawiki produced similar results, however when compared to there main page
/mediawiki/en/MainPage.php (I think) they slowed down to a screeching halt-which made sense.

Mary's unmodified time as of 0.3 is 30 seconds for 10,000 requests for the index.php by ab.

Now back to require_once(). I have a topology to my early program as such.
defs.inc
dbmysql.inc template.inc
common.inc
index.php admin.php search.php... etc

common.inc is included in every bottom .php file for my convenience. It was also where the php interpreter was spending most of it's time. Interesting it didn't do that for the require_once of defs.inc or dbmysql.inc nor the template.inc generating function. Using require instead shaved about 30% of the time off that was spent including common.inc. So I just ripped it out and just used the template.inc-the index.php ran a fraction of the time... according to the debugger.

In a deeper issue to expose programming issues. Markus reported that in ASP.net that he everything dynamically generated-switching between html and ASP tags actually slowed down the interpreter in the long run. My template function is very simple-taking the name and the body of the text to display and wrapping the html and css around it-all dynamically parsing strings to be echoed at the end. I thought it might be different in PHP-or at least curious-I rewrote the 30 lines of code to go in and out of PHP tags. ab reported a 20 vs 23 second improvement per 10,000 queries (3 seconds). Since I am running the cache and the debugger at the same time I imagine it was due to the cache handling the html and the broken up echos better then one single echo. I forgot to add a few echos down below to print my name, date and version number of Mary. This is extra work so php should have slowed down... it actually improved to 18 seconds. Once again I think this is because of the echo's being broken up so apache worker threads more efficiently switch between inteperting and printing out html rather then hogging the output and stepping on each other's toes. I plan to stick with Markus's advice because of a fear of premature optimization plus my template function can be broken down into three echo's and have it's performance compared to that (a real application will have database lag to act as a buffer).

ab still reported phpmyadmin beating up Mary's index (15-17 seconds). Comparing phpmyadmin's profile to Mary's index reported similar comparisons in the debugger which was odd. I didn't even come up with an explanation for this... I need to read the source code to find out if it is just redirecting like Mediawiki was.

So I was out looking for the info and came across this.
(3) http://stackoverflow.com/questions/186338/why-is-require-once-so-bad-to-use

"
The other downside is APC doesn't cache include_once and require_once calls IIRC – dcousineau Oct 10 '08 at 7:20
"

Damn.

Further research came up with this.
(4) http://sb2.info/php-opcode-caches-require_once-is-slow/

"

If you got a large OO PHP application, you usually come with many files included using “require_once()”. Different test from different sources shows that using require_once works up to 4 times slower than require. Basically, all known PHP opcode caches suffers from this issue (APC, XCache, ea/mmcache).
There’s patches committed in APC that focus on this problem. They override INCLUDE_ONCE/REQUIRE_ONCE opcode handling whenever possible, although a bit ugly. This isn’t done in XCache yet. Ugly because isn’t not that php-src support caches, but caches support php-src.
Solution: general advice here is replacing all “require_once()” calls with “require()”, it should speed up the application.
"

Great.
Out of habit I just require_once with all my function calls. (3) has a deeper explanation. Includes are very expensive and should be used rarely. As compared to being a C programmer where you include everything at the top of the file and having the program so nicely organized as you know the complier will optimize everything and make your program run so nice.

It's funny how php can remind you of assembler.

Anyway... i'm not going to worry about speed right now and just go back to developing the database functionality and view. The includes are easy enough to clean up for optimization. Plus I have been looking at Facebook's HipHop and I like it. I will use when program actually get's bogged down and compile it.

Ciao

No comments:

Post a Comment