Component Based Web Project Directory Layout with Git and Symlinks

Component based web project directory layout with git and symlinks

As soon as you need to reuse some set of files elsewhere, that's when you should start thinking in term of components or (in git) submodules

Instead of managing webroot, and comp, and lib within the same repo (which is the SVN or the "Centralized way" for CVCS), you define:

  • n repos, one per component you need to reuse (so 'img' would be a Git repo reused as a submodule within webroot, for instance)
  • a main project for referencing the exact revision of those submodules you need.

That is one of advantages of submodules of symlink: you reference one exact revision, and if that component has some evolutions of its own, you don't see them immediately (not until you update your submodule anyway).

With a symlink, you see whatever state is the set of files at the other end of that link.

Git strategy for a project with many independent modules

One repo per module is usually best, especially if they can evolve and being tagged independently one from another.

This is akin to a component-based approach, and similar to what I proposed in "Component based web project directory layout with git and symlinks".

But if you cannot make any change in one without having to modify another component (or several others), then and only then, one repo would make sense.

That would be called the "system approach" in a "Component-driven development".

The other argument for one repo would be if all those components have only a few files in them (and not tens or hundreds of files).

Structuring related components in git

The criteria for combining different set of files in one "component" (here one Git repo) is their respective development lifecycle (the way they evolve in term of labelling and branching):

  • can you make evolution on php or java module without having to make any modifications to other modules (like the obj c ones)?
  • can you isolate in a branch some evolutions/fixes which are only made for one of those modules and not the others?
  • can you reuse a specific version of one of those modules in several projects?

If yes, a component-based approach is best (i.e. one git repo per module), as opposed to one repo with everything in it (system approach).

See for instance "Component based web project directory layout with git and symlinks".

A component represent a "coherent set of files" and is best managed in its own Git repo.

working with git in a web-project for multiple customers


TL;DR

This is actually an architectural design problem, not a source code management problem. Nevertheless, it's a common and interesting problem, so I'm offering some general advice on how to address your architectural issues.

Not Really a Git Problem

The problem isn't really Git here. The issue is that you haven't adequately differentiated what remains the same vs. what will change between customers. Once you've determined the correct design pattern, the appropriate source control model will become more obvious.

Consider this quote from Russ Olsen:

[Separate] the things that are likely to change from the things that are likely to stay the same. If you can identify which aspects of your system design are likely to change, you can isolate those bits from the more stable parts.

Olsen, Russ (2007-12-10). Design Patterns in Ruby (Kindle Locations 586-588). Pearson Education (USA). Kindle Edition.

Some Refactoring Suggestions

I don't know your application well enough to offer concrete advice, but in general web projects can benefit from a couple of different design patterns. The template, composite, or prototype patterns might all be applicable, but sometimes discussing patterns confuses the issue more than it helps.

In no particular order, here's what I would personally do:

  1. At the view layer, rely heavily on templates. Make heavy use of layouts, includes, or partials, so that you can more easily compose presentation-layer objects.
  2. Make heavy use of customer-specific configuration files (I rather like YAML for this purpose) to allow easier customization without modifying core code.
  3. At the model and controller layers, choose some appropriate structural patterns to allow your objects to behave polymorphically based on your customer-specific configuration files. Duck-typing is your friend here!
  4. Use some introspection based on hostname or domain, enabling polymorphic behavior for each client.

Next Steps with Git

Once you've refactored your application to minimize the changes between customers, you may find you don't even need to keep your code separate at all unless you're trying to hide polymorphic code from each client. If such is the case, you can certainly investigate submodules or separate branches at that point, but without the burden of heavy duplication between branches.

Symlinks are Your Friends, Too

Lastly, if you find that you can isolate changes into a few subdirectories, Git supports symlinks. You could simply have all your varied code in a per-client subdirectory on your development branch, and symlink the files into the right places on your per-client release branches. You can even automate this with some shell scripts or during automated deployments.

This keeps all your development code in one place for easy comparisons and refactoring (e.g. the development branch), but ensures that code that really does need to be different for each release is where it needs to be when you roll it out into production.

Git: Maintaining third-party modules consisting of multiple directories

If your environment supports symlinks, I would recommend the submodule approach, combined with soft links in order to get the proper directory structure.

That way, you get the benefits from:

  • a component approach (like this other non-related example)
  • a custom directory structure compatible with your application deployment and execution environment.

From the comments:

basically you have two directories structures:

  • one resulting directly from a git checkout, with its submodules directories.
  • One, manually created to fit the expected structure, with 'ln -s' (or mklink for Windows Vista/Seven) commands in it to link the correct directories from the first structure to the expected place in the second structure.

The OP raphinesse objects:

It would be really nice to be able to just clone the repository and have the required structure to run the app.

Having to recreate the whole hierarchy with symlinks in a separate directory every time someone clones the repository seems like a lot of work to me.

To which I reply:

It can be a script included in the repo itself, that the user would execute in order to generate the correct directory structure. a One-click solution in that case.

The OP raphinesse agrees:

Yeah, I just thought about adding a script which initializes and updates the submodules as well as setting up the relevant links "post-checkout" in the working directory of the main repository (In case the UAC-Prompt doesn't make any trouble). Of course the symlinks set up this way would have to be included in the .gitignore file.

Symlinks lifecycle in progress of application

Indeed it was very stupid question, because if file or folder (including symlinks) is blocked by reading and still in progress of reading linux will cache it for application (or somebody in this folder).
Also in case if I will start compiling in folder by symlink it will finish compilation in folder by old symlink.
If I will run make using path to folder for compilation I would say that behaviour is undefined but looks like make finishes compilation as intended.

Sharing git repo symlinks between Windows and WSL

This issue occurs because Windows and Linux (or at least the emulated version) disagree about the size of a symlink. On Windows, a symlink's size is in blocks, so a 6-character symlink will be 4096 bytes in size. On Linux, a symlink's size is the number of bytes it contains (in this example, 6).

One of the things that Git writes into the index to keep track of whether a file has changed is the size. When you perform any sort of update of the index, such as with git reset --hard, Git writes all of this metadata into the index, including the size. When you run git status, git checks this metadata to determine if it matches, and if not, it marks the file as modified.

It is possible to control whether certain information is checked in the index, since some tools can produce bogus info (for example, JGit doesn't write device and inode numbers), but size is always checked, because it is seen as a good indicator of whether a file has changed.

Since this is a fundamental disagreement between how Windows and how WSL see the symlink, this really can't be fixed. You could try asking the Git for Windows project if they'd be willing to work around this issue in Git for Windows, but I suspect the answer is probably going to be no since changing it would likely have a performance impact for all Windows users.

git-svn and huge svn repository with many sub-repositories

The usual practice is to associate a component (a coherent set of files with its own development lifecycle) to a git repo.

That means you can make several git-svn clone, each one with a SVN address referencing a distinct component.

From there, you can reference those various repos within a single parent repo as submodules if you want.

But the idea remains: once you are within one of those submodules, you are actually within a git repo: that repo will support git-svn dcommit operations.

Copying inherited directories containing symlinks

It seems you should use tar and not cp :
please try (I can't test it right now; watch out what your versino of tar does by default with symlinks. it should save them as symlinks, not follow them... but ymmv) :

#step 1: create an archive of one of the directories_using_common_symlinks:
cd /some/directory/maps/
tar cvf /somewhere/wholeshebang.tar images/ css/
#ie, tar everything (directories, symlinks, etc).
#Please add whatever is missing
#note: it will also contain extra (specific) stuff, which we get rid of below

#step 2: create the real archive of JUST the common stuff
cd /some/temporary_dir/
tar xvf /somewhere/wholeshebang.tar #regurgitate all the content.
rm /somewhere/wholeshebang.tar #we won't need that one anymore, it contains too much
rm images/muffins.png css/mapStyles.css #delete all extra stuff
tar cvf /somewhere/just_commons.tar #recreate tar, this time with only common stuff.

#step 3: use the new archive to "kickstart" the other directories
cd /the/destination
tar xvf /somewhere/just_commons.tar #regurgitte the common stuff (dirs, symlinks)
#then add whatever is specific to /the/destination

Copying inherited directories containing symlinks

It seems you should use tar and not cp :
please try (I can't test it right now; watch out what your versino of tar does by default with symlinks. it should save them as symlinks, not follow them... but ymmv) :

#step 1: create an archive of one of the directories_using_common_symlinks:
cd /some/directory/maps/
tar cvf /somewhere/wholeshebang.tar images/ css/
#ie, tar everything (directories, symlinks, etc).
#Please add whatever is missing
#note: it will also contain extra (specific) stuff, which we get rid of below

#step 2: create the real archive of JUST the common stuff
cd /some/temporary_dir/
tar xvf /somewhere/wholeshebang.tar #regurgitate all the content.
rm /somewhere/wholeshebang.tar #we won't need that one anymore, it contains too much
rm images/muffins.png css/mapStyles.css #delete all extra stuff
tar cvf /somewhere/just_commons.tar #recreate tar, this time with only common stuff.

#step 3: use the new archive to "kickstart" the other directories
cd /the/destination
tar xvf /somewhere/just_commons.tar #regurgitte the common stuff (dirs, symlinks)
#then add whatever is specific to /the/destination


Related Topics



Leave a reply



Submit