Have been working the NAS hard for a few days… stability has improved a lot but when moving large amount of data into the machine is still somewhat unstable.

As I am migrating all my data from all sorts of portable drives into the server, I am often copying 100+GB at a time.  I can’t find any specific patterns yet, but it seems that while gani driver is hugely better than the default Realtek driver, it can still have stability issue.  Good thing is that gani driver never totally locks up.  I have yet need to reboot it to regain network connection.  However, I think while copying huge amount of data (unqualified at this time), gani Realtek driver can still become temporarily unresponsive.

In order to avoid having to recopy everything, I use Microsoft’s SyncToy 2.0 to verify if the directories are in sync.  I could have use Rsync, but most of my client machines are on Windows, so this looks to be the best free tool.  However, I found a strange behaviour with SyncToy 2.  If I am syncing large directory and it needs to copy a lot of data to the destination drive, if the target path becomes unresponsive, SyncToy would spit out the error that the target path is not found.  However, trying to rerun it, it would still spit out the same error, even though the network share is working just fine.

I think there is a bug in the way SyncToy maintains the ‘work file’ data (it saves some checksum I think in a file on the directory).

Maybe it is time to find another sync program…  Can’t wait for OpenSolaris’s ZFS dedupe functionality to arrive…

I have been googling around for the instability of my OpenSolaris NAS box when copying large amount of data to see if this is common issue and finally found the following link:

http://sigtar.com/2009/02/12/opensolaris-rtl81118168b-issues/#comments

After reading the post, a lot of the symptom that I saw start to make sense.  The Realtek 8111c/8168 driver is definitely the culprit for the system instability.  At first, I have always thought that it was the SMB/CIFS service’s fault. In fact, I think in snv_111b (2009.6), SMB/CIFS is actually not quite robust.  As reported before, after reading and writing to it for a while, the directory shares would disappear and when executing “sharemgr show -vp”, it would hang.

After upgrading snv_117, the hanging did not occur again after extensive reading/writing.  However, the NAS would go off the network after random length of time.  I suspected that network load might be a cause.  However, on one instance, I was simply copying large amount of data within the server and the NAS box went offline.  That would point to some sort of CPU workload or disk workload that was causing the instability.

So I follow the advice above and some info below (some detail):

http://schlaepfer.nine.ch/twiki/bin/view/Schlaepfer/SelfMadeNas2

and upgraded the network driver to the gani driver…  So far, it has survived half a day with quite a bit of activity without issue.

Will keep testing it.  One thing to note, the gani + CIFS write speed (to a RAIDZ volume with 6 1TB disk), fluctuate quite a bit… it ranges from 11MB/s to an average of  3.5MB/s… I think the performance is on par with the regular rge driver. … hopefully stability will be much better.

Stay tuned…

I am checking out this new storage server with much interest:

http://www.amazon.com/Acer-Aspire-AH340-UA230N-Home-Server/dp/B001WGX15W/ref=sr_1_1?ie=UTF8&s=electronics&qid=1247022235&sr=8-1

Heard a lot about Windows Home Server 2008 already from Paul Thurott’s “Windows Weekly“.  Windows Server 2008 is also capable of provide redundancy to data.  It is almost like Drobo except it is even more expandable.  You can hook up any storage media to it and that become part of the storage pool.  For folders that you want extra protection, Windows Server 2008 will make sure that the data is replicated to different physical device.

The above unit comes with 1TB installed already and have 3 more drive cages to install drive internally.  You can add more drive to it via USB and eSATA.  With USB hub, you can add many many more drives.  Not sure if the eSATA port will support port multiplier or not… but I guess I can always use the CoolDrive eSATA to USB2 converter.

I guess you can never have too much storage….

07.07.2009

Installed the development build of OpenSolaris so I am now at snv_117.  Overall, it seems that the smb service is slightly more stable but bearly.

Upon installation, copying files to the NAS started okay but then for a set of data files, for some reason, it just refuse to go through (about 40 files).  The client would complain of losing the network connection.  However, unlike snv_111b, the smb service never become completely hung.  The encouraging thing was that “sharemgr show -vp” would continue to see the share service without issue.    I tried to break the 40 files into batches of 10, but immediately stuck on the 1st 10.  The copying process just hang (I use teracopy… it is a must have utility).

I then started a scrub on the volume to make sure that there wasn’t any issue with the volume…. and guess what, copying while the scrub is happening in the background actually seem to make the copying processing more stable.  I was able to copy a lot of data that has yet to be migrated to the NAS…. go figure.

Will report back on the stability of the release with regard to CIFS/SMB.

07.07.2009

Have been using the OpenSolaris NAS box for a while now.  I have about 2TB of stuff already on it.  Mostly photos and videos that I have collected over the years.

OpenSolaris 2009.6 based on snv_111b had this weird issue with CIFS (SMB) service.  When copying a large amount of data, after a while (random length of time), the SMB share would disappear (all my clients are Windows) and you cannot access the share.

Certain occasions, the host become completely unreachable, but at times when you can still rsh into the machine, you will see that command: <code>sharemgr show -vp</code> would hang.  Nothing shows up in dmesg or the svc logs.

Restarting the service doesn’t do anything and each time, only a reboot would fix the issue.  This is highly frustrating.  Will have to upgrade to DEV build to see if more recent builds have solve this issue.