PHP, Zend Framework and Other Crazy Stuff
Archive for September, 2009
The Mysteries Of Asynchronous Processing With PHP - Part 2: Making Zend Framework Applications CLI Accessible
Sep 29th
In Part 1 of this series, we started an exploration of the concept of Asynchronous Processing as it applied to PHP. We covered the benefits it offers, the basic implementation directions often applied, and also discussed how to identify and separate tasks from the main application so they could be made subject to asynchronous processing. It is highly recommended that you read this before continuing with Part 2 so you can follow what I’m up to here .
UPDATE: Modified the bootstrap class and script based ZF runner to reflect some changes needed to support Part 3 of this series. These primarily allow for improved control over command line options.
With the theory heavy portion of the series out of the way, we can begin to explore the various implementation possibilities. In Part 3, we will examine implementing Asynchronous Processing using a child process, i.e. a separate PHP process we create from our application during a request. We’ll analyse this implementation option before introducing some source code so we may understand its advantages and disadvantages.
While, technically, this series is not Zend Framework specific since the same principles can be applied to any PHP application, I’ll be using the Zend Framework in examples of asynchronous processing from an application. As a result, Part 2 is a tangential detour into how to make a Zend Framework based application accessible from the command line before we delve into examples using this in future parts of the series. If you are not a Zend Framework user, I’m sure you can find relevant material online for your own preferred framework though the ZF pieces may still have some usefulness in understanding the approach from an MVC perspective.
Surprising to some, the Zend Framework is indeed usable from the command line…with some massaging. I’ve already noted that using a full application framework for a background task comes at a cost since you are using a lot of code, not all of which may be strictly necessary but unless you are willing to invest in a custom framework specifically for such uses, your framework of choice is probably the simplest option.
I’m not going to describe setting up a basic application with the Zend Framework, however you can do so by following the base application created in my book, Zend Framework: Surviving The Deep End (which is free, online in HTML form, and now duly advertised to you ). The relevant chapters are Chapter 5 and Chapter 6. If you want to get started quickly, you can download the example application for the book (in progress) from http://github.com/padraic/ZFBlog
Unfortunately, the ZF is not immediately accessible from the command line. Although it offers classes like Zend_Console_Getopt and Zend_Controller_Response_Cli, the remaining pieces are mysteriously (and conspicuously) missing. They are not difficult to add however, especially if you are using Zend_Application to fuel your bootstrapping.
Adding Custom CLI Support Classes
There are two very obvious problems calling a Zend Framework application from the command line. First, there is no Request class supporting CLI command line options (though there is a Zend_Controller_Request_Simple). Secondly, the Front Controller always attempts to route the request, and all of the standard Routers assume you will use a HTTP request. This HTTP focus results in an Exception when routing occurs.
To improve this situation, we will implement two very simple custom classes. ZFExt_Controller_Request_Cli and ZFExt_Controller_Router_Cli.
ZFExt_Controller_Request_Cli very simply accepts an instance Zend_Console_Getopt and attempts to locate a module, controller and action name from the command line options it exposes. If they exist, these are used to set the relevant module, controller and action names for the request (doing this manually negates the need for routing). Here’s the class stored to (if using the example app) at /library/ZFExt/Controller/Request/Cli.php:
[geshi lang=php]
require_once 'Zend/Controller/Request/Abstract.php';
class ZFExt_Controller_Request_Cli extends Zend_Controller_Request_Abstract
{
protected $_getopt = null;
public function __construct(Zend_Console_Getopt $getopt)
{
$this->_getopt = $getopt;
$getopt->parse();
if ($getopt->{$this->getModuleKey()}) {
$this->setModuleName($getopt->{$this->getModuleKey()});
}
if ($getopt->{$this->getControllerKey()}) {
$this->setControllerName($getopt->{$this->getControllerKey()});
}
if ($getopt->{$this->getActionKey()}) {
$this->setActionName($getopt->{$this->getActionKey()});
}
}
public function getCliOptions()
{
return $this->_getopt;
}
}[/geshi]
ZFExt_Controller_Router_Cli is basically a “dumb” router. It implements Zend_Controller_Router_Interface but all of its methods are blank. Since our CLI access does not need to be routed, we’re effectively just plugging the requirement for a Router object with something which is designed to do absolutely…nothing . Here’s the class stored to (if using the example app) at /library/ZFExt/Controller/Router/Cli.php:
[geshi lang=php]
class ZFExt_Controller_Router_Cli implements Zend_Controller_Router_Interface
{
public function route(Zend_Controller_Request_Abstract $dispatcher){}
public function assemble($userParams, $name = null, $reset = false, $encode = true){}
public function getFrontController(){}
public function setFrontController(Zend_Controller_Front $controller){}
public function setParam($name, $value){}
public function setParams(array $params){}
public function getParam($name){}
public function getParams(){}
public function clearParams($name = null){}
}[/geshi]
Putting these new classes to use will require manually adding them to the Front Controller we'll use in our application bootstrap. For CLI use, I've elected to implement a new bootstrap which is very similar to the one implemented for the example app at /library/ZFExt/Bootstrap.php. The CLI bootstrap below is stored to /library/ZFExt/BootstrapCli.php:
[geshi lang=php]
class ZFExt_BootstrapCli extends Zend_Application_Bootstrap_Bootstrap
{
protected $_getopt = null;
protected $_getOptRules = array(
'environment|e-w' => ‘Application environment switch (optional)’,
‘module|m-w’ => ‘Module name (optional)’,
‘controller|c=w’ => ‘Controller name (required)’,
‘action|a=w’ => ‘Action name (required)’
);
protected function _initView()
{
// displaces View Resource class to prevent execution
}
protected function _initCliFrontController()
{
$this->bootstrap(‘FrontController’);
$front = $this->getResource(‘FrontController’);
$getopt = new Zend_Console_Getopt($this->getOptionRules(),
$this->_isolateMvcArgs());
$request = new ZFExt_Controller_Request_Cli($getopt);
$front->setResponse(new Zend_Controller_Response_Cli)
->setRequest($request)
->setRouter(new ZFExt_Controller_Router_Cli)
->setParam(‘noViewRenderer’, true);
}
// CLI specific methods for option management
public function setGetOpt(Zend_Console_Getopt $getopt)
{
$this->_getopt = $getopt;
}
public function getGetOpt()
{
if (is_null($this->_getopt)) {
$this->_getopt = new Zend_Console_Getopt($this->getOptionRules());
}
return $this->_getopt;
}
public function addOptionRules(array $rules)
{
$this->_getOptRules = $this->_getOptRules + $rules;
}
public function getOptionRules()
{
return $this->_getOptRules;
}
// get MVC related args only (allows later uses of Getopt class
// to be configured for cli arguments)
protected function _isolateMvcArgs()
{
$options = array($_SERVER['argv'][0]);
foreach ($_SERVER['argv'] as $key => $value) {
if (in_array($value, array(
‘-action’, ‘-a’, ‘-controller’, ‘-c’, ‘-module’, ‘-m’, ‘-environment’, ‘-e’
))) {
$options[] = $value;
$options[] = $_SERVER['argv'][$key+1];
}
}
return $options;
}
}[/geshi]
This new bootstrap class performs two important functions. First, it sets up the application’s Front Controller to use our full set of CLI helper classes including the custom ones we added. Secondly, it allows for the setting of a command line option parser, an instance of Zend_Console_Getopt. The default used within the bootstrap class has a limited set of options, so we could set a replacement parser with an expanded set of command line options available. Unfortunately, we may not simply add new options and reparse due to the limitations of Zend_Console_Getopt but substitution will work just fine for most needs.
Adding A Calling Script
All that remains to enable CLI access is to add a calling script to run the application. We’ll start by adding a php file at /scripts/zfrun.php. This will be very similar to how a Zend Framework index.php file would look like if using Zend_Application:
[geshi lang=php]
if (!defined('APPLICATION_PATH')) {
define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
}
if (!defined('APPLICATION_ROOT')) {
define('APPLICATION_ROOT', realpath(dirname(__FILE__) . '/..'));
}
set_include_path(
APPLICATION_ROOT . '/library' . PATH_SEPARATOR
. APPLICATION_ROOT . '/vendor' . PATH_SEPARATOR
. get_include_path()
);
require_once 'Zend/Loader/Autoloader.php';
$autoloader = Zend_Loader_Autoloader::getInstance();
$autoloader->setDefaultAutoloader(create_function(‘$class’,
“include str_replace(‘_’, ‘/’, \$class) . ‘.php’;”
));
// check for app environment setting
$i = array_search(‘-e’, $_SERVER['argv']);
if (!$i) {
$i = array_search(‘-environment’, $_SERVER['argv']);
}
if ($i) {
define(‘APPLICATION_ENV’, $_SERVER['argv'][$i+1]);
}
if (!defined(‘APPLICATION_ENV’)) {
if (getenv(‘APPLICATION_ENV’)) {
$env = getenv(‘APPLICATION_ENV’);
} else {
$env = ‘production’;
}
define(‘APPLICATION_ENV’, $env);
}
$application = new Zend_Application(
APPLICATION_ENV,
APPLICATION_ROOT . ‘/config/cli.ini’
);
$application->bootstrap()->run();[/geshi]
That wasn’t so bad . The script itself merely sets up the typical constants needed for Zend_Application. We also have a block defining the rules needed to parse any command line options. As the related comment suggests, we should in future iterations add a means of appending additional rules as needed by varying tasks. The resulting Zend_Console_Getopt instance is later passed to our bootstrap instance (ZFExt_BootstrapCli) before we bootstrap and run the application.
The final piece of this jigsaw is adding the configuration file, cli.ini, passed to Zend_Application. This is a cut down version of the original application.ini used by the example app stored to /config/cli.ini:
[geshi lang=php][production]
; PHP INI Settings
phpSettings.display_startup_errors = 0
phpSettings.display_errors = 0
; Bootstrap Location
bootstrap.path = APPLICATION_ROOT “/library/ZFExt/BootstrapCli.php”
bootstrap.class = “ZFExt_BootstrapCli”
; Standard Resource Options
resources.frontController.controllerDirectory = APPLICATION_PATH “/controllers”
resources.frontController.moduleDirectory = APPLICATION_PATH “/modules”
; Module Options (Required For Mysterious Reasons)
resources.modules[] =
; Autoloader Options
autoloaderNamespaces[] = “ZFExt_”
[staging : production]
[testing : production]
phpSettings.display_startup_errors = 1
phpSettings.display_errors = 1
resources.frontController.throwExceptions = 1
[development : production]
phpSettings.display_startup_errors = 1
phpSettings.display_errors = 1
resources.frontController.throwExceptions = 1[/geshi]
The main differences from the original application.ini is to remove any settings for a View. We won’t be rendering any templates for our CLI access. Otherwise, you can retain any other settings for database access, etc. This could also be added as a separate section to application.ini, however I decided a separate CLI settings file made it a bit simpler to follow and allows setting the usual application environment based sections.
Adding CLI tasks to ZF Applications
We’ll start by adding a TaskController to the application. The name is largely irrelevant so don’t decide you must put all tasks into the same controller! You may also use controllers within a module should they require their own specific tasks or command line needs.
The new controller is added at /application/controllers/TaskController.php:
[geshi lang=php]
class TaskController extends Zend_Controller_Action
{
public function init()
{
if (!$this->getRequest() instanceof ZFExt_Controller_Request_Cli) {
exit(‘TaskController may only be accessed from the command line’);
}
}
public function echoAction()
{
echo ‘Hello, World!’, “\n”;
exit(0);
}
}[/geshi]
While this is a very simple example, echoing a message, the task itself could be as complicated as you wish. We’ve also added a quick check to ensure this controller cannot be accessed from a normal HTTP request - having publicly available tasks is not a good idea afterall .
Using the CLI access from the command line
Use of our newly added CLI access to this Zend Application is very simple. There are four command line options defined. Here’s an example which calls the new task and sets the application environment (used in our configuration) to “development”. Note that if absent, the environment defaults to “production”.
php zfrun.php -c task -a echo -e development
Which is equivelant to:
php zfrun.php -controller=task -action=echo -environment=development
Using either, once you’ve navigated to the application’s /script directory, should echo the message we added to the task.
Conclusion
In the second part of our look at Asynchronous Processing we’ve investigated how to enable CLI access to a Zend Framework application. In the future, this will allow us to delegate tasks asychronously using command line calls and using framework based tasks.
In Part 3, we’ll return to the Asynchronous Processing topic and put this work to use in explaining a very common implementation strategy for asynchronous tasks.
The Mysteries Of Asynchronous Processing With PHP - Part 1: Asynchronous Benefits, Task Identification and Implementation Methods
Sep 27th
Imagine a world where clients will give up on receiving responses from your application in mere seconds, where failed emails will give rise to complaints and lost business, where there exist tasks that must be performed regularly regardless of how many requests your application receives. This is not a fantasy world, it’s reality. In the real world your application must be responsive, reliable and capable of recovery from errors. These are obvious needs but all too often applications fail to realise them. Sometimes, developers even fail to realise they should even be concerned about them.
To offer an opening real-world example, I’ll borrow from a recent discussion I had concerning the Pubsubhubbub Protocol. If you are unfamiliar with Pubsubhubbub (PuSH), it’s a protocol which implements a publish-subscribe model where the publishers of RSS and Atom feeds can “push” updates to a group of Subscribers. The pushing is handled by an intermediary called a Hub which is pinged by the Publisher when they update a feed, and which then distributes the update to many Subscribers using a Callback URL they each have declared.
In that discussion, the original poster was having a problem. Whenever a Hub sent his Subscriber implementation an update, it seemed to do it repetitively for some mysterious reason. Eventually, the problem was identified. The Hub implements a five second timeout. If, after five seconds, the update request was not completed because the Subscriber failed to send a valid response, it was assumed to have failed. The Hub would then attempt it again, and again, until finally its configured number of retries was used up.
Why was the five second timeout being exceeded by the Subscriber? What was taking it so long in returning a response and finishing the request? You see, the Subscriber was not simply acknowledging the receipt of an update as demanded by the protocol, it was actually processing the entire update for its own use including a number of potentially expensive database operations before it completed the request. This was taking more than five seconds.
Here’s the problem in a nutshell. The Subscriber was performing work that had absolutely nothing to do with returning a response to the Hub and it was having an impact on the time it took to complete the request. The Hub couldn’t care less about the Subscriber’s processing, it was expecting a quick confirmation that the update was received. Instead, the Subscriber was effectively making it wait while it did something completely unrelated to that response. Using Asynchronous Processing, the Subscriber should have offloaded the feed processing elsewhere leaving it free to quickly respond to the Hub.
What is Asynchronous Processing?
Asynchronous processing is a method of performing tasks outside the loop of the current request. Basically, you offload the task to another process, leaving the process serving the request free to respond quickly and without delay. Of course, not all tasks are caused by a request. Some can performed without a request trigger, like some forms of maintenance or log parsing.
Implementing asynchronous processing can take a few directions:
1. A parent process can spawn a child process to complete a task in the background allowing the parent process continue uninterrupted.
2. You could add tasks to a Job Queue (or even Message Queue) relying on a background daemon or scheduled process to perform batch processing of outstanding tasks in the queue.
3. You could simply have a scheduled standalone task without the queue, and which is performed regardless of what requests are received.
There are, I’m sure, many more variations. Most readers will recognise at least one of these (hint: cron ). Once you understand the nature of asynchronous processing you can find many uses for it in the most unlikely of places.
What Problems Does Asynchronous Processing Solve?
Our example demonstrates that resource intensive tasks can be detrimental to responsiveness, so much so that it can can become detrimental in turn to the client, whether it be a machine applying a configured timeout and being forced into retrying the same request over and over, or whether it be an actual person who has to stare at a blank page as the seconds tick by.
Resource intensive tasks are not the only ones worth applying asynchronous processing to, though they are likely the most obvious given their impact on clients. Most tasks worth offloading can be grouped into categories:
1. Tasks which are resource intensive, i.e. needing a lot of CPU cycles or memory to complete which will add to server load and delay client responses.
2. Tasks which are time consuming but not necessarily resource intensive. These may include database operations, HTTP requests, the use of external web services, and other operations which can suffer delays from network latency or external problems out of our control.
3. Tasks which must be completed regardless of errors. For example, sending emails like signup confirmations or order confirmations. If a first attempt fails (for whatever reason), they may need to be attempted many times before either succeeding or being reported or logged for attention. Obviously, attempting these just once within a request cycle is prone to error - if it fails during the request, will it ever be attempted again? What if your mail server is offline for an extended period?
4. Tasks not triggered by requests. If it needs to be performed, but is not triggered by a HTTP request, then it probably needs to be scheduled or manually added to a job queue somewhere.
If you can categorise any task in your application within those loose categories, then you have identified a potential candidate for asynchronous processing. If such tasks are presently performed during an application request, you just need to pass one additional test - the completion of the task should not be required in order to return a response. Sending emails, for example, can be done in the background and will not effect the response - it doesn’t have an impact on any dynamic data passed to a view or template, for example.
Implementing Asynchronous Processing: Task Identification, Separation and Reusability
So, we’ve worked through the thought process and theory of asynchronous processing. Before we run off and implement some examples, we first need a task! Once it’s identified, we then need to separate it from the application so it can be processed as an independent unit of work. To add to this, we should also make sure it’s reusable, essentially returning to our Object Oriented basics. The task should be implemented as a class, or set of classes, so we can execute it with different parameters as easily as possible. This may not have been its original structure. For example, it may simply have been a big procedural script hiding out in an application controller somewhere (very very common), or even the application’s service layer.
Let’s stick with a prior example, our Pubsubhubbub Subscriber. We’ll assume, for now, that the most appropriate method of asynchronous processing relies on spawning a background PHP process to operate on the feed update, leaving the parent process free to return a response quickly. The task to be made subject to background processing is therefore anything to do with processing the feed update. We can show both alternatives in a simple diagram.
Now that the task is identified, it needs to be separated. This involves taking all steps that the task performs and adding them to an isolated script, effectively a PHP file executed from the command line using the “php -f” command. This does not mean that task must be procedural! It should remain as object oriented as possible. Here’s a sample PHP file showing a simple task and demonstrating how it’s called from a script.
[geshi lang=php]
myTask::perform();
class myTask {
public static function perform()
{
echo "Performing a task...";
}
}[/geshi]
Simple really. Once the perform() method is called, you can use Object Oriented Programming as usual.
One final piece to remember is that tasks should be reusable. You may start by calling this in a separate child process, but that may be migrated to a Job Queue or a schedule. The task needs to be agnostic as to its calling method. This means that it should be capable of accepting configuration/parameters from any source. In many cases, you'd simply wrap the task in a supporting framework. Besides configuration options, there is enabling autoloading, bootstrapping required dependencies, etc. In fact, each task would have something akin to a bootstrapping process just like the main application would rely on from whatever framework it depends on.
In a sense therefore, we're comparing tasks to actions on a controller. They are very similar.
Somewhat related to reusability is another concept of breaking down tasks themselves into their most relevant components. For example, let's say your task is described as follows:
When a User’s registration details are stored, attempt to send them an activation email up to five times before delegating any subsequent attempts to a job queue.
To explain the task, activation emails are time sensitive. A user will likely register, and immediately check their email. They may even refresh their inbox a few times. Because it’s time sensitive, we may start by using a child process spawned from the parent to attempt the emailing immediately. After five attempts, the child process aborts the task and perhaps marks it for future processing by a scheduled scripted job queue (activation emails are important enough that we should keep trying to send them until continued failures prove a bigger problem exists).
At first, we might be tempted to add a Task which loops over an email attempt five times. Wrong! The looping is a separate task component. The actual email attempt is the core component. It’s that core component we want to make reusable. The looping may instead be implemented by a Task Manager which will attempt the email task five times. Okay, that might be a too simple example, but it shows a point. The looping and the emailing can be thought of as separate components. In another situation, perhaps the task does two mutually exclusive things. There again, we can break the apparent task into two separately reusable tasks. Just keep thinking in terms of OOP and you won’t go wrong .
Conclusion
In this first part of my series on Asynchronous Processing with PHP, we’ve covered a lot of theory concerning why such processing is needed, how it could be implemented, and how to think of tasks in terms of being separate and reusable. So I guess I’ll let you turn that around in your head for a day or two before I hit you with Part 2 .
The main message is important. Asynchronous Processing is one of those fundamental areas of knowledge any programmer, even in PHP, needs to know about. It’s been my experience that developers often see it as some arcane craft practiced by a handful of hardcore PHP developers. This completely untrue. Asynchronous Programming is actually very easy to understand, and very easy to implement as we’ll see in the next part of this mini-series where we’ll look at an example using the Zend Framework.