Saturday 6 February 2010

PHP and long running processes

It seems this question keeps coming up on the PHP newsgroups and, now that I've plugged into Stack Overflow - I keep seeing it their too:

How I do I start a PHP program which takes a long time to complete and how do I track its progress?

While these tend to attract lots of replies, they are usually wrong.

The first thing to consider is that you need to seperate the the thing which takes a long time from its initiation, the ongoing monitoring and whatever final reporting is required.

Since we're talking about PHP its fair to assume that in most cases, the initiation will be a PHP script running in a webserver. However this is not a good place to keep a long-running program.

1) webservers are all about turning around requests quickly - indeed most have failsafe mechanisms to prevent one request hanging about too long.

2) the webserver ties the request to both the execution of the script and to the client socket connection. Typically NOT keeping a browser window open somewhere waiting for the job to complete is an objective for the exercise. Although the dependence on the client connection can be reduced via ignore_user_abort() that was never its intended purpose.

3) long-running typically means it will have quite different resource requirements than a typical web page script - e.g. lots of file handles being opened and closed, more memory being consumed.

Most commentators come back with the suggestion of spawning a seperate thread of execution, either using fork or via the shell. The former obviously does not solve the webserver related issues if the interpreter is running as a module - you're just going to fork the webserver process. You've not solved any of the web related issues and created a whole lot of new ones.

You need to create a new process certainly.

The obvious type of process to create would be a standalone PHP interpreter to process the long running job. So is there a standalone interpreter available to the webserver? The prospective implementor would need to check (and whether the webserver runs as chroot). So lets assume there is, our coder writes:


print shell_exec('/usr/bin/php -q longThing.php &');


A brave attempt. However they will soon find that this doesn't behave as well as they expected and keeps stopping. Why? because all the process they created runs concurrently with the php which created it, it is still a child of that process. Now this is where it starts to get complicated. In our example above, the webserver process finishes with the users script immediately after it creates the new process - however it will probably hang around waiting to be assigned a new request to deal with. However at some point the controller for the webserver processes will decide to terminate it - either as a matter of policy because it has dealt with a certain number of requests (for apache: MaxRequestsPerChild) or because it has too many idle processes (apache's MinSpareServers). However the webserver process should not stop until all its child processes have terminated. How this is dealt with varies by operating system and of course, webserver. Regardless, the coder has created a situation which should not have arisen.

But on a Unix system there are lots of jobs which run independently for long periods of time. They achieve this by:

1) they are first started, say as pid 1234, and try to fork, say to pid 1235 after calling fork, pid 1234 exits
2) pid 1235 will become the daemon - it closes all its open fds including those for stdin, stdout and stderr
3) pid 1235 now calls setsid(), this dissociates this process from the tree of processes which led to its creation (and typically makes it a child of the 'init' process).

You can do all this in a PHP script, assuming you've got the posix and pcntl extensions. However in my experience its usually a lot simpler to ask an existing daemon to run the script for you:


print `echo /usr/bin/php -q longThing.php | at now`;


But how do you get progress information? Simple, just get your long running script to report its progress to a file or a database, and use another, web-based script to read the progress / show the final result.

Troubleshooting (updated Sep 2014)

Following on from the feedback I've received, there's a couple of things to check if it doesn't go according to plan.

The atd has its own permissions system implemented via /etc/at.allow and /etc/at.deny - see you man pages for more info.

On Redhat machines, the apache uid is configured with shell /bin/nologin - this will silently discard any jobs submitted to it, hence a more complete solution is:

putenv("SHELL=/bin/bash");
print `echo /usr/bin/php -q longThing.php | at now 2>&1`;

A note about systemd (updated Jun 2016)

The latest "feature" to be announced for systemd is that it will kill user processes when they logout. I don't currently have a machine running systemd to test what impact this might have, but since the apache user never logs in never mind logging out, and I recommend using atd to invoke the process (which is specifically designed to run a program regardless of whether the user is logged in) I don't expect a negative impact on my solution.

31 comments:

  1. I'd like to add that, in my particular case, using the & for background and redirecting the output to /dev/null achieved the desired effect of "daemonizing" the script.

    ReplyDelete
  2. Thanks for this.

    It took me a while to get it to work, mainly because I didn't understand how at works. After reading that at executes the command from stdin, I got how your example worked, and found that I just needed to add quotes around my code to be executed to get it to work, e.g.
    exec('echo \'/usr/bin/php -q longThing.php\' | at now');

    ReplyDelete
  3. I'm wondering if anyone sees a problem with nohup e.g.

    shell_exec('nohup /usr/bin/php -q longThing.php 2>&1 >> somefile.txt &');

    It's running fine, just wondering if there's an issue I may not be seeing that piping to at would take care of.

    ReplyDelete
  4. Both these approaches start the new process in the same process group, as a child of the apache process. The apache process cannot exit cleanly while the child process is still running - so must hand around not serving requests. Thats OK if the apache process runs indefinitely but you should really configure MaxRequestsPerChild to a finite value to ensure some turnover of apache processes - also using maxspareservers/minspareservers/startservers to ensure that the optimal amount of memory is always available for I/O caching.

    ReplyDelete
  5. This is Anonymous at 8-18-10 14:55

    Assuming Colin McKinnon was responding both to my note and to Rafael Almeida with "Both these approaches...".

    I should have done some more research on my method before posting a question like I did ("does anyone..."). I noted that the PPID for the spawned process using the nohup method is 1, i.e. the init process.

    I haven't used the 'at' command before, but I suspect it and 'nohup' are very similar, the main difference being that at provides options on when to run the disassociated command, whereas nohup will run the command immediately. I don't know about 'at' but nohup also provides options for logging output...however I've always found it easiest to just redirect things instead of using the nohup options. The only thing I think that causes major differences is that 'at' by default will not "hang", whereas if I run nohup w/o the trailing '&', it won't do any of the things mentioned above (e.g. handing the child process off to init).

    At this point I think it's academic, 'at' should work fine as you noted, but does that discussion make sense? My system admin classes have been awhile ago so I thought I'd check my knowledge on this.

    ReplyDelete
  6. "I noted that the PPID for the spawned process using the nohup method is 1" - I wasn't expecting that.

    If that is the case then using nohup might be a valid way to solve the problem (what's the PPID of 'nohup'?). Certainly the POSIX spec requires that nohup only deal with stdio redirection and isolation of SIG_HUP (It's notable that POSIX also warns that behaviour is undefined when running some shell built-ins). Sample POSIX code from Darwin is available here: http://src.gnu-darwin.org/src/usr.bin/nohup/nohup.c.html - no mention of setsid - this code will not behave as you describe.

    Like you, its been some time since I studied these things in any depth - so I went and hit Google. It appears that even versions labelled as POSIX-compliant vary in their behaviour. However one theme emerges consistently - problems dissociating the target process from the parent. There are other command line tools available (setsid, detach, disown) trying to solve the same problem - but availibility of these varies between distributions. Why have people felt it necessary to write these proghrams when nohup is available as a standard command? The at command is almost universally available on Unix/Linux/POSIX systems, with little variation in behaviour and fewer side-effects (e.g. nohup must always try to find a file to associate output with but in the case of at, atd should already have one available).

    If it works for you, great, but do be aware that the solution is probably not portable.

    ReplyDelete
    Replies
    1. nohup is very portable and does not require the 'at' package to be installed. 'at' is not installed by default and may not be available on some hosting providers without some more expensive options. A nohup spawned process by default has the PPID of the calling shell, but automatically gets assigned to '1' when the calling process exits. By adding the & to the end of the line, you are asking for it to be a background only process. Being a background process, the calling process is no longer blocked by the child process and can run exit/return immediately. I believe that is what you are experiencing. Try running nohup on a php script without the & and use another terminal to see the process PPID. Next, kill the parent process, and you will see it revert to 1.

      Delete
  7. Very interesting article. Thank you for sharing.

    ReplyDelete
  8. Note that on Mac OS X, 'at' is disabled by default. You can enable it (see the man page for 'atrun') but it still only runs periodically (every 30 seconds by default).

    ReplyDelete
    Replies
    1. The 30 seconds is customizable in /System/Library/LaunchDaemons/com.apple.atrun.plist

      To activate it :
      # launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist

      Delete
  9. You have probably just saved me half a day or a day, while being very close to deadlines. Thank you.

    ReplyDelete
  10. U have made it simple with the best explanation my fren.Thanks for ur best points.
    Cheers !

    ReplyDelete
  11. Ditto...been trying to spawn a process from a web script written in tcl (using tcl's exec), but everything I tried (nohup, redirection etc) halts the Apache thread until the child is finished. Only the 'at' command worked as expected. Thanks

    ReplyDelete
  12. Hi, thanks for posting your solution. is there a windows equivalent? For something running on WAMP.

    ReplyDelete
  13. Thanks for this. Here's a gotcha to watch out for ...

    Make sure /etc/at.deny does not contain your web server user.

    That caught me for a while. On Ubuntu Linux, at.deny contains www-data.

    ReplyDelete
  14. Thanks for great advise.

    I spent quit some time battling a gotcha as well.

    The "at now" returned e.g. "job 30 at 2013-07-10 11:45" but the job was not executed.

    The problem showed to be caused by that the default SHELL of "at" while being executed by php exec() was
    /sbin/nologin (found by simply running: echo exec('echo $SHELL')).

    Assigning a proper shell to "at" solved the issue:
    exec('echo "mv 1.txt 2.txt" | SHELL=/bin/bash at now 2>&1', $out);

    PS: Thanks to http://crashingdaily.wordpress.com/tag/shell/

    ReplyDelete
  15. There's a good point here that I'd omitted from the discussion so far - it's very common for the PHP uid to be setup with no shell. In the example above it was trivial to bypass, but with (for example) selinux, it won't be so easy - but these restrictions should be in place for a good reason. Using 'sudo' isn't going to help here (but will resolve the case where the PHP uid is not allowed to run 'at' jobs).

    If you can't get a shell from php without compromising your security, then you need a different solution (e.g. running your own daemon).

    ReplyDelete
  16. And how to fight with "you do not have permission to use this program" on www user?

    ReplyDelete
  17. http://linux.die.net/man/5/at.allow

    ReplyDelete
    Replies
    1. But note that the targeted policy on RHEL 6 blocks access to this file for system_u. Yes, SELinux is blocking the use of another security enforcement system. http://symcbean.blogspot.co.uk/2016/11/selinux-sucks.html

      Delete
  18. How to do this if the web server is in a shared hosting ?

    ReplyDelete
  19. I Wish I had seen this four years ago when you created this. My solution in the past has been to appeal to a forking language like perl. I would use PHP's system call to call a perl script that forks. In the forked process, I would either just do the work, or use LWP to call the desired php script (for that rare occasion where you need to whole web server infrastructure available).

    Thanks......

    ReplyDelete
    Replies
    1. Your welcome. Note that you'll run into the same problems if you try to call fork() from mod_perl (and the CLI PHP SAPI will safely fork() and setsid())

      Delete
  20. This works when I execute the command from the command line, but when I try it via a script I get "This account is currently not available."

    There are no users in at.deny

    Is the reason I am getting the message because Apache is not allowed to execute commands ? See passwd file :apache:x:48:48:Apache:/var/www:/sbin/nologin

    I could change that, but is allowing this not a security risk ?

    Can I submit the command using another user ?

    ReplyDelete
    Replies
    1. Sorry - there's no (legitimate) way to get around this - your ISP has configured the webserver uid with a shell which does not allow execution of commands (in this case /bin/nologin)

      Delete
  21. I found this to be a good solution, but as the above comment, my hoster doesn't permit me or apache to use at.

    What in your opinion is the next best way to launch a long php process? I'm looking at it and am leaning towards setting a flag which a cron job checks to start the job, then unsetting it when the job is completed. In other words, set flag, have bash check flag ever x minutes/hours, then start job if A: flag is set and B: job is not running (easy to determine via ps aux). I'm not fond at all of using & for processes, I've had too many issues with that over the years.

    ReplyDelete
    Replies
    1. Actually you would want to reset the flag as soon as your agent starts processing it (to prevent the next iteration of the agent starting to process the same job). You don't need to explitily record the fact that an agent has started processing it since this is represented by the agent's thread of execution (OK, so the second agent could check an earlier instance was still running - but this is a lot more complex).

      (ultimately this is the same solution as I've described - handing the task over to another an existing daemon for action)

      Delete
  22. How does it differ from executing a long running script with cron job?

    ReplyDelete
  23. what will happen if longThing.php execution takes long period time and crashed ?

    ReplyDelete
    Replies
    1. Ah well, that's another discussion altogether - indeed its actually 2 different discussions - one about limiting the length of time a process runs for and one about handling the output and/or early termination of a process.

      Delete
  24. I am running MAMP on a Mac. When I try to run the exec command, I get a message saying that at cannot open the .lockfile. The command runs fine from terminal.

    Any ideas why it doesn't work from the exec command?

    ReplyDelete