eworldproblems
  • Home
  • About
  • Awesome Ideas That Somebody Else Already Thought Of
  • Perl defects
  • Books & Resources
Follow

Crazy scheme of the week: melding heterogeneous filesystems



At work, we have an interesting problem. The primary storage system at MSI is from Panasas, a vendor that specializes in high aggregate I/O performance from a single filesystem to thousands of Linux compute nodes. Besides having a stunningly high CPU-to-disk ratio (read: being expensive), a big part of Panasas’ performance strategy is to remove the bottleneck that could easily occur at filer heads in traditional NAS when thousands of compute nodes are all asking for I/O ops. They do this by outsourcing much of the NAS head’s role to the compute nodes themselves, leaving only a thin metadata server that puts compute nodes into direct contact with the storage blades that contain the files being accessed.

The problem is a large amount of the data housed on our Panasas system doesn’t really need the privilege of residing on such a high-speed system, either because it is owned by non-HPC users who are only accessing it from one node at a time, or (the bigger problem) because nobody’s looked at it in 2 years, and they’re not likely to again anytime soon. And unfortunately, for all the things Panasas does, HSM is not currently one of them.

We could easily whip up a separate filesystem on a denser SAN or something, but offloading all the data at uber-rest to it would not be transparent to the users who currently have data resident on the Panasas. Doesn’t mean it can’t happen, but not ideal.

An alternative that I’ve been kicking around is to create a pseudo-filesystem that only actually stores metadata, and delegates I/O to one of several other filesystems. The idea is basically that the Linux kernel provides a well-defined API to filesystems for operations at the file level which every mountable filesystem in Linux conforms to. It is a common denominator whether your filesystem is a FAT-16 floppy disk, an ext4 local hdd, or even a DirectFlow Panasas filesystem. So, it ought to theoretically be possible to write another piece of code that knows how to consume the same API, but isn’t the kernel itself…and in fact also implements this API.

This pseudo-filesystem would be mounted by the kernel, and it in turn would be configured to mount any other filesystems of the administrator’s choice. Then, when asked for a file, the pseudo-fs would query a metadata server of its own to determine which underlying filesystem the file resides on, and then simply pass through all further interactions on that file handle to the API methods of the appropriate underlying filesystem.

With a scheme like this, one additional step is added when opening a file, but (importantly), the pseudo-fs would not be expected to introduce a measurable performance impact on all subsequent I/O to opened files, since native performance charactaristics of the underlying filesystems would be preserved. In my case, data transfer from storage to compute node would still happen using Panasas’ proprietary DirectFlow protocol if the data was on the high-speed Panasas system.

Clearly, completing this would be a very ambitious undertaking, but so far I haven’t discovered any fundamental reasons why this system wouldn’t work, and if such a thing existed, it might prove to be a unique open-source solution to HSM across any combination of heterogeneous storage systems.

Fortunately, it feels like a project with a pretty manageable, progressive roadmap. A feasible and likely very instructive first step would be to simply implement a kernel module providing a mountable filesystem that in fact passes all calls through to some single underlying filesystem.

Now we just have to see if this ever becomes more than my latest crazy scheme of the week.

Posted in Uncategorized
SHARE THIS Twitter Facebook Delicious StumbleUpon E-mail
← Newegg good service shoutout
Drupal webforms submissions by form_key →

No Comments Yet

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Reset connection rate limit in pfSense
  • Connecting to University of Minnesota VPN with Ubuntu / NetworkManager native client
  • Running nodes against multiple puppetmasters as an upgrade strategy
  • The easiest way to (re)start MySQL replication
  • Keeping up on one’s OpenSSL cipher configurations without being a fulltime sysadmin

Categories

  • Computing tips
    • Big Storage @ Home
    • Linux
  • dev
    • devops
    • Drupal
    • lang
      • HTML
      • JavaScript
      • PHP
    • SignalR
  • Product Reviews
  • Uncategorized

Tags

Apache iframe malware performance Security SignalR YWZmaWQ9MDUyODg=

Archives

  • June 2018
  • January 2018
  • August 2017
  • January 2017
  • December 2016
  • November 2016
  • July 2016
  • February 2016
  • January 2016
  • September 2015
  • March 2015
  • February 2015
  • November 2014
  • August 2014
  • July 2014
  • April 2014
  • February 2014
  • January 2014
  • October 2013
  • August 2013
  • June 2013
  • January 2013
  • December 2012
  • November 2012
  • September 2012
  • August 2012
  • July 2012

Blogroll

  • A Ph.D doing DevOps (and lots else)
  • gavinj.net – interesting dev blog
  • Louwrentius.com – zfs@home with 4x the budget, other goodies
  • Me on github
  • My old edulogon.com blog
  • My old GSOC blog
  • My wife started baking a lot
  • Now it's official, my wife is a foodie

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

EvoLve theme by Theme4Press  •  Powered by WordPress eworldproblems