ProperBlog

Security for the masses

Skip to: Content | Sidebar | Footer

PDF to JPG Automation

1 August, 2011 (12:18) | LAMP | By: Blogkeep

I need to download a PDF document and convert each page to a jpg and upload the results to a web server on a regular basis. I thought I would automate the task and share my code. This is a quick and dirty method (very quick and very dirty). There is no sanity checking and the code is very insecure, a perfect example of how *NOT* to write code for deployment on a website. I use this code from a command line and it is not accessible from the Internet. Imagemagick is a pre-requisite as this is what does the actual pdf to jpg conversion.

<?php
$htmlPage=<<<HTMLHEAD
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head></head>
<body>
  <div>
HTMLHEAD;
$htmlEnd="  </div></body></html>";
if ($argc > 0) {
  for ($i=1;$i < $argc;$i++) {
    parse_str($argv[$i],$tmp);
    $_REQUEST = array_merge($_REQUEST, $tmp);
  }
}
$filename=$_REQUEST['fn'];
$sourceserver=$_REQUEST['ss'];
$sourcefilename=urlencode($_REQUEST['sf']);
$localfilename=$_REQUEST['sf'];
$sourceuser=$_REQUEST['suser'];
$sourcepass=$_REQUEST['spass'];
$cmd="wget --user=".$sourceuser." --password=".$sourcepass." ".$sourceserver."/".$sourcefilename;
exec($cmd);
echo "Download complete... Converting\n\n";
$Ccmd="convert ./'".$localfilename."' -resize 570x800\! -quality 80 -density 72 -depth 16 -strip ./jpgs/page%d.jpg";
exec($Ccmd);
echo "conversion complete\n\nCreating html\n\n";
$dir = opendir ("jpgs");
$files=array();
while (false !== ($file = readdir($dir))) {
  if (strpos($file, '.jpg',1)) {
    $files[]=$file;
  }
}
natsort($files);
foreach($files as $Ifilename) {
  $htmlPage.="\n<img src='jpgs/".$Ifilename."' />";
}
$htmlPage.=$htmlEnd;
$htmlFile=fopen("webpage.html",'w') or die("cannot create file");
fwrite($htmlFile,$htmlPage);
fclose($htmlFile);
echo "starting upload...\n\n";
$destserver=$_REQUEST['ds'];
$ftpUser =$_REQUEST['duser'];
$ftpPass =$_REQUEST['dpass'];
$conn_id = ftp_connect($destserver) or die("Error connecting to $destserver");
if(ftp_login($conn_id, $ftpUser, $ftpPass)) {
  echo "login to ".$destserver." sucessful";
  ftp_chdir($conn_id, "/public/www");
  ftp_put($conn_id, basename("webpage.html"), "webpage.html", FTP_ASCII);
  ftp_chdir($conn_id, "/public/www/jpgs");
  foreach (glob("./jpgs/*.*") as $filename) {
    ftp_put($conn_id, basename($filename), $filename, FTP_BINARY);
  }
ftp_close($conn_id);
} else {
  echo "login failure\n\n";
  exit();
}
echo "Upload complete";
exit();
?>

Presuming the php code above has been saved as doit.php.

php doit.php sf=pdf_source_filename ss=ftp://domain.com/path_to_source_pdf suser=source_server_username spass=source_server_password $ds=destinationserver.com duser=destination_user_name dpass=destination_password

I don’t actually run it with command line options, I have hard coded the variables including user names and passwords into the script itself. This is bad practice even though the I have saved the script so that only root can read and run it.

This is the line that does the actual conversion for those who don’t need to download the pdf and upload the resulting images. Substituting png for jpg at the end of this command will save the pdf pages as a lossless png. You can play with the quality setting in order to compromise between quality and file size for jpg images.

convert pdf_file.pdf -resize width_in_pixelsxheight_in_pixels -quality 80 page%d.jpg

Write a comment