Josef “Jeff” Sipek

Making Version Control Systems Really Go Boom

This is a part 2 of my adventures of making version systems go boom.

As I described before, I need to version some reasonably large files. After trying Mercurial and Git, I decided to go with git as it presented me with fewer problems.

To make matters worse than before, I now need to version 3 files which are about 2.7GB in size each. I tried to git-add the directory, but I got this wonderful message:

$ git-add dir/
The following paths are ignored by one of your .gitignore files:
dir/ (directory)
Use -f if you really want to add them.
$ git-add -f dir/
fatal: dir/: can only add regular files or symbolic links

Wha?

  1. I don’t have any .gitignore files in this repository
  2. Adding a directory like that worked (and still works!) on other directories

Really painful. Time to experiment, but first I run git-status to see what other files I have not committed yet, and I see everything listed except the directory!…So, I moved one of the files to the top directory of the repo, ran git-status — the file did not show up — but tried to add it anyway:

$ git-add file
fatal: pathspec 'file' did not match any files

Ok, this time around, I at least get an error message which I’ve seen before. It is still wrong, but oh well. Thankfully, the program that uses these files has be made in such a way that it can handle filesystems which don’t support files larger than 2GB. I regenerate the file, now I have 2 files, the first one 2GB and the other 667MB. git-status displays both — great! git-add on the smaller file works flawlessly, but…you guessed it! Adding the larger file dies? Which error message?

fatal: Out of memory, malloc failed

Yep, great. My laptop’s 1GB of RAM just isn’t good enough, eh? I’m not quite sure what I’ll do, I’ll probably scp everything over to a box with 2+GB RAM, and commit things there. This really sucks :-/

Update: I asked around on IRC (#git) where I got a few pointers and the code confirms things…it would seem that git-hash-object tries to mmap the entire file. This explains the out of memory error. The other problem is the fact that the file size is stored in an unsigned long variable, which is 32-bits on my laptop. Oh well, so much for files over 4GB. I think, but I’m not sure - I’m too lazy to check — the stat structure may return a signed int which would limit things to 2GB — which is what I see.

3 Comments »

  1. I think the issue about adding directories has been fixed per commit e96980ef8164f266308ea5fec536863a629866dc ("builtin-add: simplify (and increase accuracy of) exclude handling") in Git's Git repository. So git-1.5.3 and higher should contain this fix.

    Comment by [unknown] — January 1, 1970 @ 00:00

  2. I'm using git to track the files I push to my webserver and ran into this same problem. When committing a large flac file (413MB). My dev server with 1GB of RAM was able to add and commit, but while compressing objects locally during a push to the main server I also saw the malloc failed error. And this was with git version 1.6.5.GIT... They still need to fix how it does object compression before a transfer it seems. Pretty irritating.

    Comment by [unknown] — January 1, 1970 @ 00:00

  3. That's just sad...I haven't tested it recently, but I hoped that things got better. I guess the problem lies with git's (lack of) architecture. The source code is just a gigantic pile that abuses global vars, and does bare minimum to get by in a number of cases (e.g., this case).

    Comment by [unknown] — January 1, 1970 @ 00:00

Atom feed for comments on this post.

Leave a comment

Powered by blahgd