Fab's AutoBackup Out of memory error

fabs

Fabs Autobackup Developer
Vendor
Reaction score
1,299
Location
Charente, France
Hi there, The Out of memory error in Fab's AutoBackup 6 is still alive. For people who have some programming knowledge, I am sure the problem comes from the TStringList class I use in Delphi. Choosing this was not a good idea since it is something very memory hungry (I was not aware of this, still learning) and crashes when it reaches 2 GB used. That is enough in most cases but with large lists, it is a pain.

I am now changing my source code, adding a feature that writes to a text file on the drive for less important lists like the log that stores every action made (used for debugging). For the files copy lists, I will have to find something more effective since this will have a direct impact on disk I/O and so, on global performance while searching files. What would you use for this, knowing that I use a Starter Delphi version that does not handle databases ?
 
Yep, every string is a TObject, expensive. Which version of Delphi are you using? Do you do much sorting/manipulation/lookups of the list? If not, you might look into linked lists. Very cheap and fast provided you don't need random access to the items. You don't want to use a database.
 
There's a few other options depending on the version of delphi you're using, including TList, TDictionary, linked list (as mentioned already) (singly or doubly linked). Some people mention array of strings but this is a bad idea because reallocating dynamic arrays is slow and requires twice the memory.

Here's a few discussions:

http://stackoverflow.com/questions/...objects-taking-up-tons-of-memory-in-delphi-xe
http://stackoverflow.com/questions/2680635/tstringlist-dynamic-array-or-linked-list-in-delphi

Tomorrow I'll search the archives to see what jogs the memory...
 
Yep, every string is a TObject, expensive. Which version of Delphi are you using? Do you do much sorting/manipulation/lookups of the list? If not, you might look into linked lists. Very cheap and fast provided you don't need random access to the items. You don't want to use a database.
Thanks for the input. I am using the latest Delphi 10.1 Berlin Starter, I grabbed a free license (if anyone is interested, the offer runs until September 9th : https://www.embarcadero.com/fr/products/delphi/starter/promotional-download ). This was a good reason to jump. There are 2 cons anyway: it does not handle databases (no matter, I do not use them) and can only compile 32bit programs, so, I am stuck with it since I cannot afford a professional version.

About TstringList, I do not sort them, I just use them to store the source file's path, its destination and some other stuff. Here is a line example :
Code:
0|Desktop|C:\Users\UserName\Desktop\MyFile.ext|D:\Backup_Path\2016-08-26\PC_NAME\UserName\archive\Desktop\MyFile.ext|("C:\Users\UserName\Desktop")
During the process, the line is parsed using the | character as separator : the number is the user account being processed, then the item being processed, the first path is the source path, the second is the destination and the last one is added to the log. When there are thousands of files, this make the list too heavy and the program crashes with that Out of memory error. This is the theory I have and I am sure this is what is happening.

I do not feel comfortable with array of strings, so, linked list may be worth a shot. I have to understand how to use them and go on with it if results are good.
 
Last edited:
I have to understand how to use them and go on with it if results are good.
Easy, once you get your head around pointers, which are quite straight forward in Delphi. I recommend you create a little test project that creates a big linked list of strings (how many files does it take to kill AutoBackup?)

I found the following but Google is your friend.

http://pascal-programming.info/articles/linkedlists.php
 
You can consider switching to http://www.freepascal.org/.
It can compile 64-bit appliactions and hopefullly can generate more memory efficient code.

You might have to make some changes to the source code and temorary disable most of stuff to compile your appliaction but it may be worth it in the long run.
 
32 bit only kinda sux! How much is the pro version?

Edit: I just looked it up. Ouch! Gonna need to sell a few more copies of AutoBackup! :eek:
Yes that hurts. No problem with 32bit as soon as these memory issues are sorted. In one way or another, I will succeed.
 
You can consider switching to http://www.freepascal.org/.
It can compile 64-bit appliactions and hopefullly can generate more memory efficient code.

You might have to make some changes to the source code and temorary disable most of stuff to compile your appliaction but it may be worth it in the long run.
I have already played with Lazarus. The problem is that I rely a lot on JVCL components and some others and they do not exist at all for these environments.
 
Some suggestions on optimization:
  • Store strings in UTF-8 instead of Unicode.
  • Split full filenames into folder path and file name("long_path\filename.txt" -> "long_path", "filename.txt"). Since many files are located in the same folders, you can reuse strings for many files by using shared pointers.
  • If you don't have too keep the whole list in memory, you can implement a wrapper for TStringList that dynamically loads/unloads data to a file similar to swap file.
 
I have another idea. The most memory consuming TStringList I have here is a debug log that contains everything the program does, every button click and it can become huge. I will start by disabling this since it is not that helpful for debugging purposes. That should make the program much stronger.
 
Disabling the action log should to do the trick. In a debug report, there were a 6.7MB copy list and a 28 MB action log. If an action log can take so much space regarding the copy list, it is easy to imagine what this can do with much more files.
Just for testing, I have flooded the copy TStringList with fake data and when the Out of memory error popped up, there were 12 millions lines in it and it was using 3.8 GB of ram. Not that bad for a 32bit process ;)
On the left, this is a label where I put the lines added to the list and the left part is a piece of Windows task manager.
FabsTest.PNG

Huge amount of files to copy should not be a big deal now, even if I keep the list in RAM. I prefer keep doing like that for performance reasons. Nothing works as fast as RAM.
An update will be released soon. Thanks guys!
 
I have another idea. The most memory consuming TStringList I have here is a debug log that contains everything the program does, every button click and it can become huge. I will start by disabling this since it is not that helpful for debugging purposes. That should make the program much stronger.

If you ever decide you need the "debug" log couldn't you have it write to a text file every few thousand lines?
 
Depending on how your debugging is implemented (e.g. do you have your own "DebugLog("This")" type of thing), could you implement a ring buffer of strings instead and decide that in case of an error you're only interested in the most recent 10,000 lines of errors? Perhaps with an optional "Critical" flag on the call, used for logging initial environment information in a separate buffer.
 
Each string in Delphi has at least 12 bytes of overhead and encoded in Unicode(2 bytes per character)
I suggest replacing strings with null-terminated UTF-8 encoded strings based on PChar.

I also suggest using array list because they have smaller overhead than that of linked lists(they additional pointers to adjacent items) and can be faster in your case.
  • In the first step, you create a list and allocate space for N strings.
  • Whenever all slots are occupied, you just need to reallocate the array list with doubled size. The operation isn't very expensive because your array is made of pointers to strings, not strings themselves. It means that reallocation 1 millions of strings involves copying only 4 MB of data(4 bytes per pointer) and it doesn't matter how long these strings are.
  • Optionally, when you populated the list, you can reallocate it again to shrink it to the actual size.

This string contains a lot of redundant information.
0|Desktop|C:\Users\UserName\Desktop\MyFile.ext|D:\Backup_Path\2016-08-26\PC_NAME\UserName\archive\Desktop\MyFile.ext|("C:\Users\UserName\Desktop")
Let's split it smaller items.

0|Desktop|C:\Users\UserName|Desktop|MyFile.ext|D:\Backup_Path\2016-08-26\PC_NAME\UserName\archive|Desktop|MyFile.ext|C:\Users\UserName|Desktop
Let's remove duplicate items.

0|Desktop|C:\Users\UserName|MyFile.ext|D:\Backup_Path\2016-08-26\PC_NAME\UserName\archive

You probably don't need D:\Backup_Path\2016-08-26\PC_NAME\UserName\archive inside each string because all strings have it and you might as well make it a global shared variable.
 
I'm no pascal programmer (just Java, C#, and assembly), but could you set the current path as a variable, and then use pointers / shallow copies of the path where need be? You could also encode / hash the data (let me explain).

So, Huffman encoding (https://en.wikipedia.org/wiki/Huffman_coding) can be used to create a unique encoding of the traversal of a tree (as long as it is a binary tree). So, this algorithm won't work for file system trees, but it has me thinking that you could do one of two things:
  • Encode common strings in the path as binary strings via a prefab dictionary, thus reducing the total amount of data in RAM
  • Store paths as not strings by compressing the values with TCompressionStream/TDecompressionStream
In school I was taught to avoid I/O like the plague (because it is slow), so I think compressing the values and storing them in a linked list as mentioned by others would be beneficial (although an array of compressed strings *might* be ok, depending on how tightly you can compress the data).
 
  • In the first step, you create a list and allocate space for N strings.
  • Whenever all slots are occupied, you just need to reallocate the array list with doubled size. The operation isn't very expensive because your array is made of pointers to strings, not strings themselves. It means that reallocation 1 millions of strings involves copying only 4 MB of data(4 bytes per pointer) and it doesn't matter how long these strings are.
  • Optionally, when you populated the list, you can reallocate it again to shrink it to the actual size.
I know Google is my friend but I am not familiar with arrays at all. Do you have an example?

Currently, I use this code to add an item to the TStringList :
Code:
CopyList.Add(IntToStr(SourceUserNumber)+'|'+CurrentItem+'|'+SourceFile+'|'+TargetFile+'|'+ItemDescription)
I guess I have to use multi dimensional array right?
 
I know Google is my friend but I am not familiar with arrays at all. Do you have an example?
I guess I have found the trick by myself :
I am testing the array of string with a default 100000 length and same increment if there are more lines to store in it.
Here is the sample of code I am trying :
Code:
procedure TForm1.Button1Click(Sender: TObject);
var
largeint, ArraySize : int64;
CopyList Array : array of string;
begin
  ArraySize:=100000;
  Setlength(CopyListArray, ArraySize);
  For Largeint:=0 to 500000000 do
  begin
  If LargeInt=Length(CopyListArray)-1 then SetLength(CopyListArray, Length(CopyListArray)+ArraySize);
  CopyListArray[LargeInt]:=IntToStr(SourceUserNumber)+'|'+CurrentItem+'|'+SourceFile+'|'+TargetFile+'|'+ItemDescription;
  JobTypeLabel.Caption:=IntToStr(LargeInt); //to check how many strings have been added to the array
  Application.ProcessMessages; //to avoid GUI freezing while filling up the RAM ;)
  end;
end;
 
Back
Top