Pull large repository (more than 1GB size) over http fail

The repository's size is over 1GB, when i pull up to 50%, error occurred:

> remote: Counting objects: 23891, done. remote: Compressing objects:
> 100% (19980/19980), done. fatal: The remote end hung up
> unexpectedly3.61 MiB | 333 KiB/s    fatal: early EOF fatal: recursion
> detected in die handler

Can anybody help me, please?

It will most likely fail due to the size of your repository.

If you have access to the remote repository, try this:

  1. Get a copy of the remote repository files. You can tar.gz the remote repository directory and download it to your local machine.
  2. Unzip the remote repository somewhere in your local machine.
  3. Clone the repository from your machine (no file downloading here so it should work)

    git clone /path/to/where/you/unzipped/the/remote/repository your_local_copy

  4. Edit the .git/config file you can find inside 'your_local_copy' directory

  5. Edit the value of the 'url' key just below the [remote "origin"] line.

    [remote "origin"]

    url = git+ssh://[email protected]/srv/git/yourrepository.git

  6. Your local repository will now point to the remote repository and should work as expected.

  7. Remove the copy of the remote repository you made in Step 2.

it should be pretty obvious from your question that you're actually just asking about the difference between git merge and git rebase.

so let's suppose you're in the common case - you've done some work on your master branch, and you pull from origin's, which also has done some work. after the fetch, things look like this:

- o - o - o - h - a - b - c (master)
               
                p - q - r (origin/master)

if you merge at this point (the default behavior of git pull), assuming there aren't any conflicts, you end up with this:

- o - o - o - h - a - b - c - x (master)
                            /
                p - q - r --- (origin/master)

if on the other hand you did the appropriate rebase, you'd end up with this:

- o - o - o - h - p - q - r - a' - b' - c' (master)
                          |
                          (origin/master)

the content of your work tree should end up the same in both cases; you've just created a different history leading up to it. the rebase rewrites your history, making it look as if you had committed on top of origin's new master branch (r), instead of where you originally committed (h). you should never use the rebase approach if someone else has already pulled from your master branch.

finally, note that you can actually set up git pull for a given branch to use rebase instead of merge by setting the config parameter branch.<name>.rebase to true. you can also do this for a single pull using git pull --rebase.

the limitation of the size of http post requests is usually not in the html side at all. the limitation is more in the server side. the webserver needs to be configured to accept that large post requests. the default is usually indeed often 2gb and the server will usually return a http 500 error on that. the default limit can often be increased to 4gb, but anything beyond that will hit the border on 32bit systems. on 64bit systems with a 64bit os, the theoretical border is much higher, 16eb.

if configuring the webserver to accept that large post requests is not an option, or when you want to go beyond the webserver's limit, then you have no other option than splitting the file in the client side and reassembling the parts in the server side.

since html is just a markup language, it offers no facilities for splitting the file. you really have to use a normal programming language like c# (silverlight) or java (applet) in flavor of a small application which you serve by your webpage. very maybe it's also possible with flash or flex, but don't pin me on that since i do neither.

said that, ftp is a much better choice than http for transferring (large) files over network. i'd reconsider the choice of using http for that.

cherry-pick is implemented as a merge, with the merge base being the parent of the cmomit you're bringing in. in cases where there are no merge conflicts, this should have exactly the same effect as generating and applying the patch as you have (but see torek's answer for a bit of a caveat, where am could, in theory, do the wrong thing).

but by doing a merge, cherry-pick can try to more gracefully handle cases where changes would conflict. (in fact, the -3 option you gave to am tells it that, if need be, it should do the same thing if it has enough context in the patch to be able to do so. i'll come back to that point at the end...)

when you apply a patch, by default if it changes a hunk of code that is not the same in the commit where you apply it, as it was in the parent commit from which it was generated, then the apply will fail. but the cherry-pick/merge approach will look at what those differences are, and generate a merge conflict from them - so you have the chance to resolve the conflict and carry on.

as part of conflict detection, cherry-pick does rename detection. so for example, say you have

o -- x -- x -- a <--(master)
      
       b -- c -- d <--(feature)

and you cherry-pick commit c onto master. suppose at o you created file.txt, and in a you have modifications to file.txt. but commit b moves file.txt to my-old-file.txt, and commit c modifies my-old-file.txt.

the change to my-old-file.txt in c could conflict with the change to file.txt in a; but to see that possibility, git has to do rename detection so it can figure out that file.txt and my-old-file.txt are "the same thing".

you may know that you don't have that situation, but git doesn't know until it tries to detect renames. i'm not sure why that would be time-consuming in this instance; in my experience it usually isn't, but in a repo with lots of paths added and deleted (between b and either c or a in our example) it could be.

when you generate and apply a patch instead, it tries to apply the patch on the assumption that there is no conflict. only if this runs into a problem (and then, only because you gave the -3 option) will it fall back to doing a merge, with conflict detection. it gets to skip all that - and any potential rename detection - as long as its first attempt applies cleanly.


update - as noted in comments on the question, you also can turn rename detection off if it's not helping and is running slowly. if you use this when there are, in fact, renames that "matter" to the merge, it may cause conflicts where rename detection would resolve them. although i don't think it should, i can't rule out that it might also just calculate an incorrect merge result and quietly apply it - which is why i rarely use this option.

for the default merge strategy, the -x no-renames option will turn off rename detection. you can pass this option to cherry-pick.

per torek's comment, it seems rename detection should be a non-issue with am. that said, i can confirm that it is able to properly handle a case where merge only works with rename detection. i'm going to return to trying to understand the ins and outs of this sometime when it's not friday afternoon.

please see https://github.com/akka/akka-http/issues/745#issuecomment-271571342

in short, if you need to unmarshal your entity twice you should use the tostrict first to make sure the entire entity is buffered in memory, otherwise it will be drained by the first unmarshalling process and not available for the second.

only accidentally it happens to work without tostrict if the entity is small enough that it fits in akka's internal buffer, then there's actually no draining involved.


Tags: Git