How to Remove Whitespace on Merge

How To Remove Whitespace on Merge

The following sample tool has been implemented along the ideas of the tool PdfDenseMergeTool from this answer which the OP has commented to be SO close to what [he] NEEDs. Just like PdfDenseMergeTool this tool here is implemented in Java/iText which I'm more at home with than C#/iTextSharp. As the OP has already translated PdfDenseMergeTool to C#/iTextSharp, translating this tool here also should not be too great a problem.

PdfVeryDenseMergeTool

This tool similarly to PdfDenseMergeTool takes the page contents of pages from a number of PdfReader instances and tries to merge them densely, i.e. putting contents of multiple source pages onto a single target page if there is enough free space to do so. In contrast to that earlier tool, this tool even splits source page contents to allow for an even denser merge.

Just like that other tool the PdfVeryDenseMergeTool does not take vector graphics into account because the iText(Sharp) parsing API does only forward text and bitmap images

The PdfVeryDenseMergeTool splits source pages which do not completely fit onto a target page at a horizontal line which is not intersected by the bounding boxes of text glyphs or bitmap graphics.

The tool class:

public class PdfVeryDenseMergeTool
{
    public PdfVeryDenseMergeTool(Rectangle size, float top, float bottom, float gap)
    {
        this.pageSize = size;
        this.topMargin = top;
        this.bottomMargin = bottom;
        this.gap = gap;
    }

    public void merge(OutputStream outputStream, Iterable<PdfReader> inputs) throws DocumentException, IOException
    {
        try
        {
            openDocument(outputStream);
            for (PdfReader reader: inputs)
            {
                merge(reader);
            }
        }
        finally
        {
            closeDocument();
        }
    }

    void openDocument(OutputStream outputStream) throws DocumentException
    {
        final Document document = new Document(pageSize, 36, 36, topMargin, bottomMargin);
        final PdfWriter writer = PdfWriter.getInstance(document, outputStream);
        document.open();
        this.document = document;
        this.writer = writer;
        newPage();
    }

    void closeDocument()
    {
        try
        {
            document.close();
        }
        finally
        {
            this.document = null;
            this.writer = null;
            this.yPosition = 0;
        }
    }

    void newPage()
    {
        document.newPage();
        yPosition = pageSize.getTop(topMargin);
    }

    void merge(PdfReader reader) throws IOException
    {
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        for (int page = 1; page <= reader.getNumberOfPages(); page++)
        {
            merge(reader, parser, page);
        }
    }

    void merge(PdfReader reader, PdfReaderContentParser parser, int page) throws IOException
    {
        PdfImportedPage importedPage = writer.getImportedPage(reader, page);
        PdfContentByte directContent = writer.getDirectContent();

        PageVerticalAnalyzer finder = parser.processContent(page, new PageVerticalAnalyzer());
        if (finder.verticalFlips.size() < 2)
            return;
        Rectangle pageSizeToImport = reader.getPageSize(page);

        int startFlip = finder.verticalFlips.size() - 1;
        boolean first = true;
        while (startFlip > 0)
        {
            if (!first)
                newPage();

            float freeSpace = yPosition - pageSize.getBottom(bottomMargin);
            int endFlip = startFlip + 1;
            while ((endFlip > 1) && (finder.verticalFlips.get(startFlip) - finder.verticalFlips.get(endFlip - 2) < freeSpace))
                endFlip -=2;
            if (endFlip < startFlip)
            {
                float height = finder.verticalFlips.get(startFlip) - finder.verticalFlips.get(endFlip);

                directContent.saveState();
                directContent.rectangle(0, yPosition - height, pageSizeToImport.getWidth(), height);
                directContent.clip();
                directContent.newPath();

                writer.getDirectContent().addTemplate(importedPage, 0, yPosition - (finder.verticalFlips.get(startFlip) - pageSizeToImport.getBottom()));

                directContent.restoreState();
                yPosition -= height + gap;
                startFlip = endFlip - 1;
            }
            else if (!first) 
                throw new IllegalArgumentException(String.format("Page %s content sections too large.", page));
            first = false;
        }
    }

    Document document = null;
    PdfWriter writer = null;
    float yPosition = 0; 

    final Rectangle pageSize;
    final float topMargin;
    final float bottomMargin;
    final float gap;
}

(PdfVeryDenseMergeTool.java)

This tool makes use of a custom RenderListener for use with the iText parser API:

public class PageVerticalAnalyzer implements RenderListener
{
    @Override
    public void beginTextBlock() { }
    @Override
    public void endTextBlock() { }

    /*
     * @see RenderListener#renderText(TextRenderInfo)
     */
    @Override
    public void renderText(TextRenderInfo renderInfo)
    {
        LineSegment ascentLine = renderInfo.getAscentLine();
        LineSegment descentLine = renderInfo.getDescentLine();
        float[] yCoords = new float[]{
                ascentLine.getStartPoint().get(Vector.I2),
                ascentLine.getEndPoint().get(Vector.I2),
                descentLine.getStartPoint().get(Vector.I2),
                descentLine.getEndPoint().get(Vector.I2)
        };
        Arrays.sort(yCoords);
        addVerticalUseSection(yCoords[0], yCoords[3]);
    }

    /*
     * @see RenderListener#renderImage(ImageRenderInfo)
     */
    @Override
    public void renderImage(ImageRenderInfo renderInfo)
    {
        Matrix ctm = renderInfo.getImageCTM();
        float[] yCoords = new float[4];
        for (int x=0; x < 2; x++)
            for (int y=0; y < 2; y++)
            {
                Vector corner = new Vector(x, y, 1).cross(ctm);
                yCoords[2*x+y] = corner.get(Vector.I2);
            }
        Arrays.sort(yCoords);
        addVerticalUseSection(yCoords[0], yCoords[3]);
    }

    /**
     * This method marks the given interval as used.
     */
    void addVerticalUseSection(float from, float to)
    {
        if (to < from)
        {
            float temp = to;
            to = from;
            from = temp;
        }

        int i=0, j=0;
        for (; i<verticalFlips.size(); i++)
        {
            float flip = verticalFlips.get(i);
            if (flip < from)
                continue;

            for (j=i; j<verticalFlips.size(); j++)
            {
                flip = verticalFlips.get(j);
                if (flip < to)
                    continue;
                break;
            }
            break;
        }
        boolean fromOutsideInterval = i%2==0;
        boolean toOutsideInterval = j%2==0;

        while (j-- > i)
            verticalFlips.remove(j);
        if (toOutsideInterval)
            verticalFlips.add(i, to);
        if (fromOutsideInterval)
            verticalFlips.add(i, from);
    }

    final List<Float> verticalFlips = new ArrayList<Float>();
}

(PageVerticalAnalyzer.java)

It is used like this:

PdfVeryDenseMergeTool tool = new PdfVeryDenseMergeTool(PageSize.A4, 18, 18, 5);
tool.merge(output, inputs);

(VeryDenseMerging.java)

Applied to the OP's sample documents

Header.pdf

Header.pdf pages

Body.pdf

Body.pdf pages

Footer.pdf

Footer.pdf pages

it generates

A4 very dense merge result

If one defines the target document page size to be A5 landscape:

PdfVeryDenseMergeTool tool = new PdfVeryDenseMergeTool(new RectangleReadOnly(595,421), 18, 18, 5);
tool.merge(output, inputs);

(VeryDenseMerging.java)

it generates this:

A5 very dense merge result

Beware! This is only a proof of concept and it does not consider all possibilities. E.g. the case of source or target pages with a non-trivial Rotate value is not properly handled. Thus, it is not ready for production use yet.

Improvement in current (5.5.6 SNAPSHOT) iText version

The current iText development version towards 5.5.6 enhances the parser functionality to also signal vector graphics. Thus, I extended the PageVerticalAnalyzer to make use of this:

public class PageVerticalAnalyzer implements ExtRenderListener
{
    @Override
    public void beginTextBlock() { }
    @Override
    public void endTextBlock() { }
    @Override
    public void clipPath(int rule) { }
    ...
    static class SubPathSection
    {
        public SubPathSection(float x, float y, Matrix m)
        {
            float effectiveY = getTransformedY(x, y, m);
            pathFromY = effectiveY;
            pathToY = effectiveY;
        }

        void extendTo(float x, float y, Matrix m)
        {
            float effectiveY = getTransformedY(x, y, m);
            if (effectiveY < pathFromY)
                pathFromY = effectiveY;
            else if (effectiveY > pathToY)
                pathToY = effectiveY;
        }

        float getTransformedY(float x, float y, Matrix m)
        {
            return new Vector(x, y, 1).cross(m).get(Vector.I2);
        }

        float getFromY()
        {
            return pathFromY;
        }

        float getToY()
        {
            return pathToY;
        }

        private float pathFromY;
        private float pathToY;
    }

    /*
     * Beware: The implementation is not correct as it includes the control points of curves
     * which may be far outside the actual curve.
     * 
     * @see ExtRenderListener#modifyPath(PathConstructionRenderInfo)
     */
    @Override
    public void modifyPath(PathConstructionRenderInfo renderInfo)
    {
        Matrix ctm = renderInfo.getCtm();
        List<Float> segmentData = renderInfo.getSegmentData();

        switch (renderInfo.getOperation())
        {
        case PathConstructionRenderInfo.MOVETO:
            subPath = null;
        case PathConstructionRenderInfo.LINETO:
        case PathConstructionRenderInfo.CURVE_123:
        case PathConstructionRenderInfo.CURVE_13:
        case PathConstructionRenderInfo.CURVE_23:
            for (int i = 0; i < segmentData.size()-1; i+=2)
            {
                if (subPath == null)
                {
                    subPath = new SubPathSection(segmentData.get(i), segmentData.get(i+1), ctm);
                    path.add(subPath);
                }
                else
                    subPath.extendTo(segmentData.get(i), segmentData.get(i+1), ctm);
            }
            break;
        case PathConstructionRenderInfo.RECT:
            float x = segmentData.get(0);
            float y = segmentData.get(1);
            float w = segmentData.get(2);
            float h = segmentData.get(3);
            SubPathSection section = new SubPathSection(x, y, ctm);
            section.extendTo(x+w, y, ctm);
            section.extendTo(x, y+h, ctm);
            section.extendTo(x+w, y+h, ctm);
            path.add(section);
        case PathConstructionRenderInfo.CLOSE:
            subPath = null;
            break;
        default:
        }
    }

    /*
     * @see ExtRenderListener#renderPath(PathPaintingRenderInfo)
     */
    @Override
    public Path renderPath(PathPaintingRenderInfo renderInfo)
    {
        if (renderInfo.getOperation() != PathPaintingRenderInfo.NO_OP)
        {
            for (SubPathSection section : path)
                addVerticalUseSection(section.getFromY(), section.getToY());
        }

        path.clear();
        subPath = null;
        return null;
    }

    List<SubPathSection> path = new ArrayList<SubPathSection>();
    SubPathSection subPath = null;
    ...
}

(PageVerticalAnalyzer.java)

A simple test (VeryDenseMerging.java method testMergeOnlyGraphics) merges these files

circlesOnlyA.pdf

circlesOnlyB.pdf

circlesOnlyC.pdf

circlesOnlyD.pdf

into this:

circlesOnlyMerge-veryDense.pdf

But once again beware: this is a mere proof of concept. Especially modifyPath() needs to be improved, the implementation is not correct as it includes the control points of curves which may be far outside the actual curve.

Merging without whitespace conflicts

 git merge -Xignore-all-space

Or (more precise)

 git merge -Xignore-space-change

should be enough to ignore all space related conflicts during the merge.

See git diff:

--ignore-space-change

Ignore changes in amount of whitespace.

This ignores whitespace at line end, and considers all other sequences of one or more whitespace characters to be equivalent.

--ignore-all-space

Ignore whitespace when comparing lines.

This ignores differences even if one line has whitespace where the other line has none.

ks1322 adds in the comments a good advice:

It is worth to merge with --no-commit and review the merge before actual commit.

The OP Callum Macrae reports that, in that case, the merge proceed uninterrupted, and the trailing spaces contained in the pull request patches are applied to the local files.

However, the OP uses a pre-commit hook which takes care of said trailing spaces.

(I suppose a bit similar to this one, also referenced here).

The OP's pre-commit hook is referenced here:

In addition to removing trailing whitespace, it removes one to three spaces before tabs (I have tab width set to 4), and adds EOLs.

I've had reports that the code that adds the EOL deletes the file in windows, but haven't been able to replicate it.

Merge Multiple spaces to single space; remove trailing/leading spaces

This seems to meet your needs.

string <- "  Hi buddy   what's up   Bro "
library(stringr)
str_replace(gsub("\\s+", " ", str_trim(string)), "B", "b")
# [1] "Hi buddy what's up bro"

How to remove trailing whitespace changes from a sequence of git commits

In your situation I would ask PR submitter to fix whitespaces or try to resolve that problem when merging that pull request branch into main branch using git merge options, like git merge --ignore-space-change. Look for more info in man git merge

How to use git merge to ignore trailing whitespace?

I think you are misunderstanding what the idea behind "ignore-space-at-eol" is (of course, otherwise you wouldn't be asking :)).

ignore-space-at-eol is a merge strategy option for the recursive merge strategy. These options only apply to conflicts (this is the important bit!)

The MWE you provided is creating a branch with additional commits not on master, but master does not have any other commits (so in theory it could be fast-forwarded, but since --no-ff is passed, that's not happening).

So, in this case, a normal merge is being made. All changes not in master but only on the branch are merged into master. Since there are no conflicts, the -Xignore-space-at-eol does not take effect.

It would take effect, if you had eol-whitespace-only-changes in both branches to the same line and then try to merge it, i.e.:

$ git init
$ >file echo 'a'
$ git add file
$ git commit -m 'initial'
$ git branch feature # create a branch, but do not switch to it
$ >file echo 'a ' # add one blank
$ git add file ; git commit -m 'add one blank on master'
$ git checkout feature
$ >file echo 'a  ' # add two blank
$ git add file ; git commit -m 'add two blanks branch'
$ git checkout master ; git merge feature # you will get conflicts
$ git merge --abort # let's try that again:
$ git merge -Xignore-space-at-eol feature # no conflicts, hooray!

You see, ignore-space-at-eol is a strategy option how to handle conflicts. It does not mean that such changes will be ignored completely.

Use case: you want to merge a branch which does not match line endings or whitespace of your branch (think CRLF vs LF). With this strategy option you can automatically resolve such conflicts and cleanly merge the changes.

Deleting white space and merging numbers in a list - Python

Seeing the given data and the wanted result it's pretty safe to assume that the list elements with the spaces divide the numbers. If this is the case the following algorithm will work for other input data too.

Join the list to a full string. Then split the string at the whitespaces. Convert the values of the resulting list to integers.

list(map(int, ''.join(['1', ' ', '1', '0', ' ', '2', '\n']).split()))

Break it down in parts to see what happens

>>> data = ['1', ' ', '1', '0', ' ', '2', '\n']
>>> data = ''.join(data)
>>> data
'1 10 2\n'
>>> data = data.split()
>>> data
['1', '10', '2']
>>> data = list(map(int, data))
>>> data
[1, 10, 2]

An example for other input data:

>>> list(map(int, ''.join(['1', '2', ' ', '3', ' ', '4', '5', '6']).split()))
[12, 3, 456]

Ignore whitespace when doing a merge in mercurial

Ignoring whitespace or not is a choice your merge tool makes. You can configure all manner of merge tools for use with mercurial as shown here: MergeToolConfiguration

Mercurial's internal pre-merge won't/can't ignore whitespace, but if your external merge tool does and it finds only whitespace changes it will exit immediately, and if it finds other that whitespace changes it can hide the whitespace changes when it does load.

For example, with the popular kdiff3 merge tool you'd enable the "White space 2/3-file merge" setting and tell it whether to pick left or right.

Tl;Dr: turn this on in your merge tool, not in mercurial.

How to Remove Whitespace on Merge