Bash-Script Printing a PDF to a PDF in Linux

Bash-script printing a pdf to a pdf in Linux

You could try putting your PDF files through Ghostscript. I have found that this is enough to fix many problematic PDFs.

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf input.pdf

(The same command can also be used to merge several PDF files into one, just specify multiple input files.)

Output list of pdf files as one pdf using pdftk bash script

Something like this:

#!/bin/bash

files=()

add() {
  files+=("'""$1""'")
}

add "file1.pdf"

#add "file2.pdf"

add "file3.pdf"

add "file with     spaces.pdf"

echo "${files[*]}"

Naturally, substitute the proper pdftk command for echo.

Edit 2

This new "version" will work better with filenames containing spaces.

Edit 3

To hand the files over to the command, it seems something like the following will do the trick:

bash -c "stat $(echo "${files[*]}")"

Print contents of a PDF to the command line

On the man pages for pdftotext, I found this:

pdftotext [options] [PDF-file [text-file]]
Description
Pdftotext converts Portable Document Format (PDF) files to plain text.
Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is '-', the text is sent to stdout.

Thus to output to stdout in order to pipe to grep use this:

pdftotext mydoc.pdf - | grep mysearchterm

How to print out pdf file with script generated highlighted output?

First you have to convert the colored shell output to html then to pdf.

Use the ansi2html.sh from here

you can try smth like this

cat myapp_log | ansi2html.sh -p > myapp_log.html
html2any myapp_log.html file.pdf

Find string inside pdf with shell

As nicely pointed by Simon, you can simply convert the pdf to plain text using pdftotext, and then, just search for what you're looking for.

After conversion, you may use grep, bash regex, or any variation you want:

while read line; do

    if [[ ${line} =~ [0-9]{4}(-[0-9]{2}){2} ]]; then
        echo ">>> Found date;";
    fi

done < <(pdftotext infile.pdf -)

How to write shell script for finding number of pages in PDF?

Without any extra package:

strings < file.pdf | sed -n 's|.*/Count -\{0,1\}\([0-9]\{1,\}\).*|\1|p' \
    | sort -rn | head -n 1

Using pdfinfo:

pdfinfo file.pdf | awk '/^Pages:/ {print $2}'

Using pdftk:

pdftk file.pdf dump_data | grep NumberOfPages | awk '{print $2}'

You can also recursively sum the total number of pages in all PDFs via pdfinfo as follows:

find . -xdev -type f -name "*.pdf" -exec pdfinfo "{}" ";" | \
    awk '/^Pages:/ {n += $2} END {print n}'

Merge / convert multiple PDF files into one PDF

I'm sorry, I managed to find the answer myself using google and a bit of luck : )

For those interested;

I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:

pdftk file1.pdf file2.pdf cat output output.pdf

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...

This in turn can be piped directly into pdf2ps.

Linux piping ( convert - pdf2ps - lp)

convert file1.pdf file2.pdf - | pdf2ps - - | lp -s
should do the job.

You send the output of the convert command to psf2ps, which in turn feeds its output to lp.

Bash-Script Printing a PDF to a PDF in Linux