archive about

unix command line: split large files

From time to time, you will end up with a file so large that is hard to manage. A usual example is a large backup file you are trying to store to a medium (ex DVD) with smaller capacity. Or you may want to upload it to a server using an unreliable connection that may go down before the transfer is over, and you will have to start all over again.

A nice solution is the split command.

split -b 10M LargeFile LargeFileParts.

will split LargeFile to a number of files named LargeFileParts.xx where xx is “aa”, “ab”, etc, and each file is 10M or less.

If you prefer to use numeric suffixes, and you are using the GNU version of split, you can use the -d option (see “man split” for this and other options!). OS X comes with a different version of split that doesn’t support “-d”, you may want to check out lxsplit that is also available through homebrew.

In order to join the chunks, you just

cat LargeFileParts.* > LargeFile

It’s always a good idea to create (and keep!!!) a hash of the original file and compare the hash of the original with the hash of the “joined” file, to make sure the parts were joined correctly.

To create the hash, use something like

shasum LargeFile > LargeFile.sha

and to verify it,

shasum -c LargeFile.sha

Improvements/alternatives/thoughts? Please share here.