Tuesday 27 August 2013

Avamar and large dataset with multiples files

Here is a gem I found in the Avamar forums, from a "Ask the expert" session answered by Ian anderson concerning large file systems and Avamar. Please share your experiences if you have any.


lmorris99 wrote:

We have one server with 25 million files, scattered through directories six levels deep.
We'd like to throw it at our test Avamar grid; any tuning I should look at on the client (or server) side before we set it up for its first backup?

The most important thing to do on a client with so many files is to make sure that the file cache is sized appropriately. The file cache is responsible for the vast majority (>90%) of the performance of the Avamar client. If there's a file cache miss, the client has to go and thrash your disk for a while chunking up a file that may already be on the server.

So how to tune the file cache size?

The file cache starts at 22MB in size and doubles in size each time it grows. Each file on a client will use 44 bytes of space in the file cache (two SHA-1 hashes consuming 20 bytes each and 4 bytes of metadata). For 25 million files, the client will generate just over 1GB of cache data.

Doubling from 22MB, we get a minimum required cache size of:
22MB => 44MB => 88MB => 176MB => 352MB => 704MB => 1408MB

The naive approach would be to set the filecachemax in the dataset to 1500. However, unless you have an awful lot of memory, you probably don't want to do that since the file cache must stay loaded in memory for the entire run of the backup.

Fortunately there is a feature called "cache prefixing" that can be used to set up a unique pair of cache files for a specific dataset. Since there are so many files, you will likely want to work with support to set up cache prefixing for this client and break the dataset up into more manageable pieces.

One quick word of warning -- as the saying goes, if you have a hammer, everything starts to look like a nail. Cache prefixing is the right tool for this job because of the large dataset but it shouldn't be the first thing you reach for whenever there is client performance tuning to be done.

On to the initial backup.

If you plan to have this client run overtime during its initial, you will have to make sure that there is enough free capacity on the server to allow garbage collection to be skipped for a few days while the initial backup completes.

If there is not enough free space on the server, the client will have to be allowed to time out each day and create partials. Make sure the backup schedule associated with the client is configured to end no later than the start of the blackout window. If a running backup is killed by garbage collection, no partial will be created.

You will probably want to start with a small dataset (one that will complete within a few days) and gradually increase the size of the dataset (or add more datasets if using cache prefixing) to get more new data written to the server each day. The reason for this is that partial backups that will only be retained on the server for 7 days. Unless a backup completes successfully within 7 days of the first partial, any progress made by the backup will be lost when the first partial expires.

After the initial backup completes, typical filesystem backup performance for an Avamar client is about 1 million files per hour. You will likely have to do some tuning to get this client to complete on a regular basis, even doing incrementals. The speed of an incremental Avamar backup is generally limited by the disk performance of the client itself but it's important to run some performance testing to isolate the bottleneck before taking corrective action. If we're being limited by the network performance, obviously we don't want to try to tweak disk performance first.

The support team L2s from the client teams have a good deal of experience with performance tuning and can work with you to run some testing. The tests that are normally run are:
  • An iperf test to measure raw network throughput between client and server
  • A "randchunk" test, which generates a set of random chunks and sends them to the grid in order to test network backup performance
  • A "degenerate" test which, as I mentioned previously, processes the filesystem and discards the results in order to measure disk I/O performance
  • OS performance monitoring to ensure we are not being bottlenecked by system resource availability (CPU cycles, memory, etc.)
Edit -- 2013-08-06: The behaviour for partial backups changed in Avamar 6.1. More information in the following forums thread:
Re: Garbage collection does not reclaim expected amount of space

No comments:

Post a Comment