How to deploy: database, source and binary changes in 1 patch?


Today, was an exciting day for me, again. :)

I was working on several client’s projects and something seems to have crystalized in my brain today. This crystallization created a new question that I posted on stackoverflow: How to deploy: database, source and binary changes in 1 patch?

Amazingly, 6 hours went by and I only got 20 views and 0 answers. You know what this says to me? To me this says that people do not have an easy answer to this question. I know for myself, i usually look for questions that I have quick answers to, I think this is not one of those questions.

Well, this actually was a test to see if something like this already exists, because I think I have an answer to this question. I outlined it in my conversation with Trevor today. I put it to your for consideration, I would love to get feedback so please tell me what you think.

A transcript of this conversation is in this post:

Taras Mankovski

I have multiple developers checking out projects from repository, the repo contains db dump of the project, they import the db, make changes to code, dump db, commit changes and db back to repo, problem is that sql dump site is pretty much useless in terms of figuring out what has changed in db

diff, is a well established mechanism for comparing text files. It works pefectly. I can at any point, check what’s changed between revision of a project. This is especially nice with git because I can choose arbitrary commits and see their changes.

This brings us back to db. DB sql dump is pretty much useless in determining what has changed in the DB because of the nature of the file. It’s long it’s wordy, it plain sucks to read.

So here is the solution:

I think it would be possible to represent the database as a directory with files in it. Every table in the database would be a directory. Every record a text file that contains in it the data from the record, represented as JSON.

What this means is that it would be extremely easy to commit the database into the repository and see what has changed.

We can use existing established mechanisms to determine the changes in the database.

The information included in the file system can be quiet extensive. For example, every directory can contain a schema file, that represents the schema of the table as json file. Again, it would be very easy to compare schema changes from revision to revision by comparing the schema files.

Trevor Twining
I think it’s a really interesting solution to the problem, just trying to work out how implementation details would work with Drupal’s architecture and how it might support or conflict with other partial solutions to the problem.

There’s been a push in drupal lately to allow more config to be exported as code for just that reason, but a lot of it is happening at the object level. There’s a growing number of modules that are using OOP principles in their code even though Drupal is procedural, so they’re basically providing var exports for some of those objects some of that work is for reusability of code as well, so that configurations can be saved and implemented in different projects

Taras Mankovski
Ok, so there is more. The title of this wave is Unified Patching. The problem with what we have right now is that “It’s very difficult to deploy changes to the server.”

The problem actually is that deployment is easy, if you’re doing it wrong. Deployment is hard if you’re doing it right.

What do I mean by that?
It’s easy to make changes to client’s site by working directly on client site. That in many many many ways is very wrong. If you break something you’re essentially screwed, it looks bad and it’s just extremely unprofessional. But it’s so damn easy.

The hard part is to do it right – do all development offline, test the changes, show the changes to the client on demo server, stage the changes and then finally apply them to client’s site without impacting the live site. This is hard.

I’m finding that deployment takes as much time as development. This is what I want to change. This is where Unified Patching will come in.

I’m breaking up project changes into 4 categories.

  • Database Structure
  • Database Data
  • Source Code
  • Binary Files

Actually, when working with CMS, Database Data can be broken up into 2 categories: Content Changes and Structural Changes. In Drupal, this would be defining the content types and the actual content data. We will ignore this point for now.

I think it would be possible to systematically apply all 4 of these changes using a unified tool, that would apply them in a reliable and predictable manner.

What’s I’m talking about is unix ‘patch’ utitility on steroids.

I have an idea of how this could be done in a very simple and reliable way.

Ok, so I think it would be possible to use diff to generate Unified Patches.

Diff is a very simple format that can be generated from any VCS.

Git’s diff includes information about the changes that happened between version with source code, binary files and permissions. It has no capacity to handle database changes.

I think if we were to use the solution outlined above, then we could create a Unified Patch generator that would parse the diff to determine changes in the 4 data types that were outlined above.

For example:

Let’s assume that we have all of the database dumpted as files onto the files system. We are going to run DIFF between 2 revisions.

  1. Parse the diff to determine if any of the schema files have changed in the /db directory have changed. If changed then generate ALTER statements that correspond to the changes made to the schema.
  2. Parse the diff to determine if any files in the /db directory have changed, if they have then create a corresponding REPLACE or DELETE/INSERT statement for each changed file
  3. If binary files were added/removed or modified, then include the modifed files a temprorary location
  4. Include standard patch for source code or text files

Now that we have all of the data types processed, put them together into 1 directory and tar.gz the directory.

Now you have an Unified Patch.

  1. Upload the patch to the server.
  2. run upatch ourpatch.tar.gz

upatch will perform the following actions.

  1. dry-run database schema changes (possibly inside of a trasaction)
  2. dry-run database content import
  3. dry-run binary file move
  4. dry-run patch

I don’t know how to do step 1 and 2. Step 3 could be fairily simple, we just need to know if have permissions to overwrite these files.

If dry-run succeds, then it’s fairily safe to perform live update.

Trevor Twining
I don’t know if you could do a dry run, but you could make changes in a test version (makes a dump of current live db and changes that instead of live)

transactional support might allow you to do the rollback, but you’d need to be able to do several operations before the commit, shouldn’t be a problem, but dummy database might be more useful because you can connect to it and actually see if anything is messed up, which you couldn’t do with a transactional approach.

Taras Mankovski
We really just want to know if the query succeeded. The other approach would be use to test, like you said, but actual make it a staging environment where you can push the changes to and test if the site is working before you apply the same change to the live site

Trevor Twining
Here’s a question though, what about the cases where the database is storing some serialized version of a code-object for persistance? Not looking for an immediate answer, but it’s an important question

Taras Mankovski
Is the data stored as php serialized object?

Trevor Twining
a portion, yes, and it’s how code is shared/exported, basically via a var_export call. code for views can then be moved out of the database and stored as modules

Taras Mankovski
ok, let’s take a few example:
Multiple developers are working on same site
Developer A checked out changes from repo, made changes in the view and commited them.
At the same time, Developer B also made changes to the db and try to commit them. Now, for Developer B, his commit would fail because his dbfiles are out of date. He needs to perform merge. I do not know how you would merge 2 view exports or if it’s at all possible, but atleast the developers know that there is a conflict in the views right away.

That’s it for today, not bad for 1 days, work.

Ok, what do you guys think about this?

Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts
Announcing a new project called: Social Photography

Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.




Reader Comments

Be the first to leave a comment!