eworldproblems
  • Home
  • About
  • Awesome Ideas That Somebody Else Already Thought Of
  • Perl defects
  • Books & Resources
Follow

Posts in category Computing tips

Running nodes against multiple puppetmasters as an upgrade strategy



At work, we’re way out of date in our devops, having not upgraded Puppet since version 3.8. As of this writing, version 5 is available.

This has finally created sufficiently many problems that I’ve been helping prep for an upgrade to puppet 5 — but with some 3,200 .pp puppet manifest files in our existing puppet deployment, and a puppet language that doesn’t retain backwards compatibility, the prospect of upgrading is incredibly onerous.

Instead of trying to convert everything in one massive action, a strategy that people will hate me for but that I’m finding really helps to get the job done is to write new puppetization against a puppet 5 server, slowly removing equivalent declarations / resources / etc from the old puppet server, and running your puppetized nodes against both masters during this period. As long as you ensure the puppet masters don’t both try to set up the same service / file / resource in different ways, there’s no real reason you can’t do this.

This turns out to be fairly easy, because Puppet’s developers threw us a bone and made sure the latest 5.x Puppet server can drive very old (3.8) puppet agents, so you don’t need more than one puppet binary installed on the puppetized nodes. All the shiniest puppet 5 features are available for use in your new puppet code if it is compiled by a puppet 5 server, and the resulting state can be set by agents all the way back to 3.8 (maybe even slightly older.) Also, it’s really helpful that the puppet agent can be told at invocation to use a nonstandard config file.

There’s some potential gotchas with getting the agent to trust both masters’ self-signed certs, pluginsync, and the case of puppet masters that enforce particular puppet agent configuration. Here’s a setup that avoids all that.

  1. Leave your legacy puppet configuration alone.
    We’ll do puppet runs against the new server via a foreground puppet run in a cronjob.
  2. Make a copy of puppet.conf.
    I’ll call the copy puppet5.conf, but you’ll just be referencing this new file in a command-line argument, so may name it as you like.
  3. Edit puppet5.conf:
    • Change the server line to your new puppet 5 server, of course.
    • Change vardir, logdir, and rundir to new locations. This is key, as it makes puppet agent runs against your new server completely isolated from puppet agent runs against your old serer with respect to ssl trust and pluginsync.
    • Unrelated to a multi-master setup, but I also found that most modern puppet modules on the forge assume you’ve set stringify_facts = false.

    Here’s my complete puppet5.conf, for reference:

    [main]
        server = puppet5-experimental.msi.umn.edu
        vardir = /var/lib/puppet5
        logdir = /var/log/puppet5
        rundir = /var/run/puppet5
        ssldir = $vardir/ssl
        pluginsync = true
        factpath = $vardir/lib/facter
        always_retry_plugins = false
        stringify_facts = false
    
    [agent]
        # run once an hour, stagger runs                                                             
        runinterval = 3600
        splay = true
        configtimeout = 360
        report = true
    
  4. Do a test run manually:
    # puppet agent --config /etc/puppet/puppet5.conf -t

    This should perform like a first-time puppet run. A new client certificate will be generated, the agent will retrieve and in future trust the server’s certificate and CRL, and depending on your server’s configuration you’ll likely need to puppet cert sign mynode.mydomain on the master.

  5. Do a standard test run against your legacy server manually.
    # puppet agent -t

    Watch it proceed happily, as confirmation that your existing puppet infrastructure is unaffected.

  6. If desired, create a cronjob to cause periodic convergence runs against your new puppet server.

Now you’re free to start using puppet 5 features, and porting legacy puppet code, to your heart’s content.

Posted in dev, devops, Linux

The easiest way to (re)start MySQL replication



I’ve run mirrored MySQL instances with asynchronous replication from time to time for almost 15 years. The fundamentals haven’t changed much over that time, but one thing that has drastically improved is mysqldump‘s support for properly initializing slaves.

In order for your replication slave to have a complete and uncorrupted database, you need to arrange for it to use exactly the same data as was on the master at some particular instant in time as a starting point. In practice, this means taking out some kind of lock or transaction on the master while a complete copy of the database is made. Then, you need to tell the slave from what point in the master’s binary log to start applying incremental updates.

It used to be that doing all this required a lot of refreshing one’s memory by reading the MySQL manual in order to issue a variety of queries manually. But of course, there’s no reason the steps can’t be scripted, and I was pleased to discover this automation is now nicely packaged as part of mysqldump.

By including –master-data in the following command (to be run on the master), mysqldump will take out locks as necessary to ensure a consistent dump is generated, and automatically append a CHANGE MASTER TO query to the dump file with the associated master binary log coordinates:

$ mysqldump --add-drop-database --master-data -u root -p --databases list your replicated database_names here > /tmp/master-resync.sql

That way, you can simply apply this dump file to a slave server whose replication has broken (for example, due to an extended loss of connectivity) and restart the slave process to be back in business. On the slave:

$ cat /tmp/master-resync.sql | mysql -u root -p
$ mysql -u root -p
mysql> START SLAVE;
Query OK, 0 rows affected (0.01 sec)

Tadaa! Clean restart of MySQL replication without any FLUSH TABLES WITH READ LOCKs or manual copying of binlog coordinates in sight!

Posted in Linux

Keeping up on one’s OpenSSL cipher configurations without being a fulltime sysadmin



As you probably already know if you’re the type to be reading my blog, https is able to stay secure over time because it is not reliant on a single encryption scheme. A negotiation process takes place between the two parties at the start of any TLS-encrypted TCP session in which the parties figure out which cipher suites each are willing and able to use. So, as cipher suites fall out of favor, alternative ones can be seamlessly put to use instead.

Of course, this requires that as a server operator, you keep your systems in the know about the latest and greatest trends in that arena. And unfortunately, in order to do that the reality is that it requires you to keep you in the know, as well. It pretty much comes down to plugging “the right” value into a parameter or two used by the OpenSSL library, but those parameters are long and obtuse, and there’s a balance to be struck between optimal security and support for visitors with older web browsers.

It’s a nuisance I’d been aware of for years, but had been letting sit on the back burner because frankly I didn’t have any solutions that were sufficiently easy for me to actually bother keeping up with it over time. This post by Hynek Schlawack, for example, professes to be among the more concise explanations for a quality OpenSSL configuration, but it still weights in at 11 printed pages. More than I am a systems operator, I’m a developer with many active interests to pursue. The reality is I’m not going to be rereading something like that periodically as the post suggests.

Recently, with the help of a link Jeff Geerling dropped on his excellent blog, I found out that CloudFlare, one of the major CDN providers, makes their current SSL configuration available publicly on github -> cloudflare/sslconfig. As a commercial entity that serves a huge volume of content to a diverse client base, they have the resources and motivation to figure all this stuff out, and they’re providing a valuable public service by keeping their findings updated and public.

Checking their github repo periodically is probably an improvement over diff’ing an 11-page blog post, but I still would need to remember to do it. I wanted proactive automated notifications when I needed to update my SSL configuration. Maybe I missed something obvious, but I didn’t find any options on github that would notify me of new commits in a repository I’m not a member of, at least that didn’t also spam me with every comment on every issue.

So, project! The github API is easy to poll for new commits on a repository, so I coded up this little script to do that, and email me when it sees a change. I have it cronned to watch only cloudflare/sslconfig for now, but you can configure it to watch any repository(ies) you desire. You can also configure the email recipients/subject/message easily.

Grab my script and give it a try if this is a problem you can relate to!

Posted in devops, Linux

Top gotchas: Creating virtual machines under KVM with virt-install



After a stint building Stacy’s new blog whilst contributing to Drupal 8, I’ve been directing my free development time to more ops-like concerns for the past month or so. I moved to a cheaper parking facility at work and plan to direct the savings to bankroll the expansion of my own little hosting platform. And, I’m going virtualized.

This is a big shift for me, as I’ve gotten many years of trusty service without much trouble from the “single dedicated server with OS on bare metal” model, only experiencing a handful of minutes of downtime a month for major security updates, while watching several different employers flail around keeping all the pieces of their more complex virtualized environments operational.

On paper, the platform I’ll be moving to should be more resilient than a single server — the cost savings of virtualization plus infusion of extra cash will enable me to run two redundant webserver instances behind a third load balancing VM, with another (fourth) load balancing VM that can take over the public IP addresses from the first load balancer at a moment’s notice (thanks to the “floating IP” or “reserved IP” features that the better VPS providers have been rolling out recently). The provider I’m going with, Vultr, has great customer service and assures me that my instances will not share any single points of failure. Thus, I should have full redundancy even for load balancing, and I’ll be able to take instances down one at a time to perform updates and maintenance with zero downtime.

Alas, even on VMs, full redundancy is expensive. I could have just bought sufficiently small instances to host everything at Vultr and stay within my budget, but I’m betting I’ll get better performance overall by directing all traffic to a beefed-up instance at Vultr (mostly superior to the old hardware I’m currently on) under nominal conditions, and have the load balancer fail over to a backend server on hardware in my home office for the short periods where my main Vultr instance is unavailable. My home office isn’t the Tier 4 DuPont Fabros facility in Piscataway, NJ that the rest of the instances will reside in, but it is server grade components with a battery backup that hasn’t seen a power failure yet, and the probability of a simultaneous outage of both webservers seems very low.

The two backend webservers need to be essentially identical for sanity’s sake. However, as hinted in other posts such as the one where I installed an Intel Atom SoC board, I’m a bit obsessive about power efficiency, and there was no way in hell I was running a separate machine at home to serve up a few minutes of hosting a month. So, I needed to figure out KVM.

As near as I can ascertain having completed this experience, there are one or two defaults you always need to override if you want a VM that can write to its block device and communicate on a network.

  1. You probably need to create an ethernet bridge on the host system, onto which you attach your guests’ virtualized NICs. Examples abound for configuring the bridge itself in various linux distros, but none seem to mention with clarity that the host kernel needs to be told not to pump all the ethernet packets traversing the (layer 2!) bridge through the (layer 3!) IPTables. I’m puzzled whose idea this was, but someone made this thing called bridge-nf and apparently talked their way into it being default ‘on’ in the Linux kernel. Maybe that would be nice if you wanted run suricata on a machine separate from your main router, or something, but yeah otherwise I’d recommend turning this off:
    cd /proc/sys/net/bridge
    for f in bridge-nf-call-*; do echo 0 > $f; done
    

    …and if the bridge should pass tagged vlan traffic, snag bridge-nf-filter-vlan-tagged too. See http://unix.stackexchange.com/questions/136918/why-does-my-firewall-iptables-interfere-in-my-bridge-brctl/148082#comment-218527 for more detail. If you have strict iptables rules on your host, your guests won’t get any network connectivity — and on the flipside, while troubleshooting that problem I inadvertently made packets on the theoretically isolated bridge start appearing on a totally different network segment when I tested a blanket-accept rule on the host iptables’ FORWARD chain. I’m trying to isolate my hosting-related vlan and subnet from my plain-old home Internet connection sharing vlan and subnet, and to get this result was just wrong. Bridge-nf is scary stuff; turn it off.

  2. If you opt to use lvm volumes as the backing for your VM’s disks, virt-install provides a handy syntax to cause it to make and manage the logical volume along with the VM:
    virt-install --disk pool=vm-guest-vg,size=80,bus=virtio ...

    where vm-guest-vg is a volume group you’ve set aside for VM disks. This says, make me a new 80G logical volume in the vm-guest-vg volume group, associate it to this instance, and attach it as a disk on the guest through the optimized virtio bus. If you do this, your VM will experience many I/O errors. You must also add “sparse=false”:

    virt-install --disk pool=vm-guest-vg,size=80,bus=virtio,sparse=false ...

    and then, finally, your VM hosting environment will be reasonably usable to a guest operating system.

For completeness/my own reference later on, an entire virt-install command:

virt-install -n 'the-instances-name' --virt-type kvm --hvm --autostart -r 4096 --vcpus=4 --os-type=linux --os-variant=ubuntutrusty --cpu host --disk pool=vm-guest-vg,size=80,bus=virtio,sparse=false --network bridge=hostingbr,model=virtio --graphics vnc -c "/data/public/Program Installers/ubuntu-14.04.3-server-amd64.iso"

There’s still much work ahead of me to build this entire infrastructure out — some custom scripting will likely be needed to pull off automated IP address transfers between the load balancers, and I’m shooting to use Apache Traffic Server as the HTTPS terminator/caching proxy, because it seems to have been doing the HTTP/2 thing for longer than alternatives like nginx, and HTTP/2 is cool. I’ll do a follow-up post after it’s all been up and running for awhile and we’ll see if going virtualized was even a good idea…

And, stay tuned, once I’m done with that, I have plans to further reduce my home infrastructure’s power consumption by losing the bulky full ATX PSU in my main server, which burns about 10 watts minimum, and go to a little laptop-sized DC transformer. If I put the hot data on an SSD and move the big PSU and spinning disks (which are the only things that need that much power, and only when spinning up) to a second machine that just acts as an iscsi target with wake-on-mac, I think I can keep the big disks and their PSU in standby the vast majority of the time. Fun times, fun times.

Posted in Linux

More super ubuntu tips: udev-based NIC naming & lvm silicon_medley_raid_member



I had a pretty booked weekend, but decided to see if I could slam an Intel Atom SoC-based board into my home server anyway. Plan of attack: take a snapshot of OS lvm volume in case things went south, mount snapshot to double-check it, shut down, swap out system boards, resume business as usual.

Actual result: Still on the old board, after taking the root volume snapshot, I was unable to mount the thing, because mount detected the volume as of type “silicon_medley_raid_member”  rather than the ext4 filesystem it was. I’d never seen that before, but mount -t ext4 fixed it; I examined the mounted snapshot which seemed fine and ignored that strange hiccup.

Upon power-up on the new board, the kernel failed to mount the real root filesystem, so the boot sequence halted before pivot_root with an error (mount: cannot mount /dev/mapper/ssd–vg-root on /root: no such device) and dumped me in busybox on the initramfs.

Now. given that a mount without explicit fs type had failed on the snapshot, I thought it possible that somehow the type bits of the root filesystem itself had gotten corrupted, and that was why the boot-time mount was failing as well. How to confirm and/or fix? Thanks to this blog post, I got answers to both. Saved me a bundle of trouble! Since it seems to always involve LVM, and neither I nor anyone else who has been bit by this has a clue what happened, I’m calling software bug.

With my machine booting again (35 watts lower!) I just had to tackle the NIC names, so the new NIC plugged to my lan was still eth0 etc, making all my configurations and firewall rules and such “just work.” I happily pulled up the MAC addresses with lshw -class network, and swapped out the old values with the new in /etc/udev/rules.d/70-persistent-net-rules. Aaand it didn’t work. So, what the heck.

Searching through dmesg, I discovered the igw driver sanely brought the NICs up as eth0 through eth3, and then udev came along and arbitrarily renamed them em1 through em4. No idea where it got that from. After quite a bit of prodding and searching, I found this suggestion which I am not sufficiently esteemed to upvote on that site. The default contents of the Ubuntu udev network interface rules file requires that the interface being renamed already be named eth*. That corresponds to

KERNEL="eth*",

in the file. Seems udev selects em* as the name and sets it before it bothers to read the rules file, and thus it never renames them again to the desired values. Removing that constraint, I was again fully up and running.

I’m grateful these folks documented the respective magic incantations, leaving substantial portions of my weekend for more fun things. The whole thing was maybe 4 or 5 hours, not as bad as I’d feared. Next time I build up enthusiasm to break stuff, I’ll tackle the link-aggregating switch I picked up…

Posted in Linux

Extreme Home Network Makeover: IPv6 & hostapd



When constant.com made IPv6 available on its dedicated servers and shortly thereafter Comcast finally got around to lighting up IPv6 on my home connection, I inevitably had to begin playing with it. I had these main goals:

  • Distribute globally unique public addresses to every device on my home network
  • Keep using a home Linux server as my Internet gateway, configure it to do regular routing and forwarding without NATting for IPv6 traffic, and run a stateful firewall in place of hiding behind a NAT.
  • Serve websites on my dedicated server over IPv6
  • Bonus – see if I can rig up public DNS to return AAAA’s for hostnames on my home LAN.

I mostly prodded at it from time to time for months, with slow progress. It took quite awhile to sort out some real basic stuff: the public IPv6 address my home gateway acquired over DHCP was showing up as a /64, so I imagined that /64 subnet was available to me – that’s how it had worked with the /64 my dedicated server received from constant.com’s router advertisements. Many hours were spent sending traffic out with other source addresses in the /64 from my home gateway and into the void before I stumbled on DHCP-PD and figured out comcast was using it. After that discovery, it was pretty quick to get the ISC dhclient (the default on Ubuntu) to send out a request for prefix delegation, packet-sniff the offer that came back to figure out what actual /64 I could use for the other devices on my home network, and configure up dnsmasq to RA that. That netted my internal network devices a brief glimpse of IPv6 connectivity, until I restarted the gateway, or something. But, it was long enough for me to figure out that the nothing-special and no longer supported netgear wireless router I’d been using as my wifi access point was dropping more than 90% of IPv6 traffic, for some reason. So, I had to add “pick out a new wifi AP” to the list of things that needed doing. Boy, was that a rabbit-hole.

I’ve had a good 15 years of experience with all kinds of residential wireless routers not being very good. From basic models in the early days of 802.11b to the latest, (priced as) premium Apple AirPort Extreme, I’ve only ever gotten truly solid reliability from certain hardware revisions of the now obsolete Linksys WRT-54G. It seemed shortsighted to buy a new AP in late 2014 that didn’t do 802.11AC, but I wasn’t terribly enthused about picking up another device priced in the multiple-hundred$ that likely still would need the good old occasional power cycling, either. What if I could just have the Linux gateway machine itself be the AP, I wondered?

I did a respectable amount of research on Linux as a wifi access point. Hostapd was the obvious choice for running WPA/sending out beacon frame type stuff, but selecting a wireless card wasn’t as clear. There are many wifi radio chipsets with Linux drivers that also have some level of support for access point mode, although the ath10k Linux driver seemed to be the only as-yet available line of chipsets supporting both 802.11AC and access point mode. I found the Compex mini-PCIe card that seemed to be the only as-yet available consumer device using said chipset, found that you can basically only get it on eBay from some guy in Germany, realized that I’d still have to work out a scheme for external antennas, found a post from only a few months earlier on the ath10k developer’s mailing list mentioning someone had even made it run long enough to run a benchmarking suite before it crashed, and decided I’d be better off bailing on the 802.11AC idea.

Ath9k devices seemed a reasonable consolation, except for a note on ath9k’s wireless.kernel.org page that says it only supports seven simultaneous associated devices. We don’t have that many, but it’s not much room to grow. So, I picked up the Intel Centrino Advanced-N 6205 PCIe card from Microcenter. I fidgeted with hostapd and other configurations for a few days trying to get it to at least approximately resemble 802.11n performance, but didn’t succeed before its driver started crashing and needing to be re-insmodded. Back to Microcenter with that one (thanks for the no questions asked refund!)

Ultimately, I ended up with a wonderfully inexpensive Rosewill RNX-900PCE using the ath9k-compatible Atheros AR9380 chipset. This card’s been running now for a month or so with no issues, performing like an 802.11n access point, and, back to the main point, delivering IPv6 packets to my wireless devices.

But there was more trouble. My gateway now had an additional LAN interface – the wireless card. When a packet comes in for a particular host, how will the kernel know which of those two interfaces to forward it on?

Looked like the options were to either “bridge” the ethernet and wifi LAN interfaces, which I think involves layer 4 routing sending it to the bridged interface and then the bridged interface acting like a layer 2 switch implemented in software. Or, I could put the wifi card on its own, new subnet, and just have the routing rule be “destination address in new subnet range -> forward on wlan0.” Opting to not have yet another daemon process (the bridge) to keep running and configured, I went with the latter.

But there was more trouble. In order to have more than one IPv6 subnet and follow IETF recommendations, you need an IPv6 address space bigger than the /64 I’d been able to eek out of Comcast through the ISC dhcp client (since each subnet must be no smaller than a /64 in order to work correctly with stateless address autoconfiguration and stuff.)

A tremendously helpful Comcast employee surely considered way too smart by their management to ever be directly answering customer support requests through normal channels took the time to offer the handful of subscribers like me who’re nerdy enough to have these problems with some information about how to get a large enough delegated address space to run multiple subnets. There are a number of posts on comcast’s forums covering basically this subject matter, all seeming to have responses by “ComcastTusca”, but here’s the original and most comprehensive thread: http://forums.comcast.com/t5/Home-Networking-Router-WiFi/IPv6-prefix-size-and-home-routing/td-p/1495933. By switching from the ISC dhclient to dhcp6c (“WIDE dhcp”), and basing my configuration for that off of brodieb’s example, I was able to get my /60, and have my two lan interfaces be configured with their own /64’s each, automatically, when the /60 is delegated or re-delegated by Comcast.

That was about the end of the trouble. Another key element worth noting is dnsmasq’s RA “constructor” feature, which tells it to author appropriate router advertisements on each LAN interface based on that interface’s public IPv6 address information. Here’s the actual lines from my dnsmasq.conf:

dhcp-range=::,constructor:eth0,ra-stateless,ra-names
dhcp-range=::,constructor:wlan0,ra-stateless,ra-names

The “ra-names” bit is pretty cool too – read the manpage for the detailed description of the magic, but it tells it to go ahead and apply some trickery and heuristics to try to determine the ipv6 address that hosts running dual-stack IP have selected for themselves, ping that, and add it to the dns server side of dnsmasq as an AAAA if everything works. This is the basis for how you can look up the IPv6 address of internal hosts in my home network from anywhere in the world, for example mike-pc.lan.mbaynton.com, when they happen to be up. (Details I’m happy to publish on my blog, because setting up iptables on the gateway box so pinging it is about all you can do was mercifully not conceptually any different than with good old IPv4.)

Posted in Linux

Remote drush with only FTP and database access



Necessity

I manage a Drupal site as part of my job, which has taught me to appreciate the treat that is drush. With Drupal,  you really want ssh access to the webserver because it gives you the ability to use drush, the command-line administration utility for Drupal sites. Uninteresting tasks like module and core upgrades become much less tedious.

I also babysit the website for the Minnesota Valley Unitarian Universalist fellowship, a real simple Drupal site that lives on a basic shared hosting server. No ssh access here. Because doing drupal core updates solely over FTP sucks so much, finally today I sat down in earnest to see if I could somehow leverage the power of drush with such shared hosting sites.

Invention

There didn’t seem to be much accurate information about how to do it already, with top Google hits saying things like you need ssh and drush installed both locally and on the server (false and false), you can’t, and there was really nobody suggesting anything to the contrary especially when your local workstation is running Linux (here’s a workable method for Windows.) I stubbornly refused to believe them and came up with a method that has proven to work for upgrading modules and core successfully, and every other drush subcommand I have tried to date. This is a guide for Linux, and I’ll include the package names for Debian/Ubuntu, but other than that it should be generic to other distros too.

The basic idea is to get the site’s source code mounted as a filesystem on your local machine backed by your ftp server, install php on your local machine if needed, and get connected to the live site’s database server from your local machine. You then run drush directly on your local machine, but its changes impact the actual site. It’s a bit of work to get everything going the 1st time, but simplifies site management tons in the long run.

Warning

Before the step-by-step, two words of warning: 1. This method is for people that want to be lazy, but don’t mind if the computers take forever once you tell them what to do. This method’s performance is slow, due to curlftpfs. For my purposes I don’t care, but some might. 2. A somewhat faster and more reliable way involves copying your site’s source locally and synchronizing the real site to it after running commands that modify code, not a big deal with a few rsync commands. Even when you do it this way, you don’t need to get another instance of your site actually up and running locally, and you don’t need a clone of the database. I discuss this more in step 5.

Step-by-step Guide

1. Pick a Linux computer to run drush commands on, likely your desktop/laptop. This computer will need php with mysql and gd extensions installed, as well as curlftpfs (or something else that makes ftp servers mountable, if you have another preference.) On Debian/Ubuntu, run these commands to ensure you have all needed packages:

sudo apt-get install php5
sudo apt-get install php5-mysql
sudo apt-get install php5-gd
sudo apt-get install curlftpfs

Failure to install php5 ultimately produces a clear error message from drush that it failed to find php, but failure to install php5-mysql produces an error with inaccurate troubleshooting information when you run Drush, so make sure you have it. Lack of php5-gd will generate frequent warnings from drush, but can mostly be done without if you really don’t want to install it.

2. Install drush locally on the Linux computer you’ve chosen. When I put drush on a system I usually just untar the latest stable release from Drush’s github releases to somewhere like /usr/local/lib and symlink the drush shell script from the release to /usr/bin/drush, but drush is just php and shell scripts so you can untar and run it in your home directory or wherever as well if you like. I’ll skip the line-by-line commands for how to do these things; few readers will have never extracted a .tgz or .zip.

3. Mount your site somewhere on your local machine’s filesystem with curlftpfs. In this example I’ll mount to the path ‘/mnt/mvuuf_inmotion’; choose something appropriate to your site and create an empty directory there.

sudo mkdir -p /mnt/mvuuf_inmotion

Modifying the below command for your ftp server, username, password, and local mountpoint, run (as the regular user you plan to run drush commands as, not as root):

curlftpfs -o user="your-ftp-username:your-ftp-password,uid=$UID,gid=${GROUPS[0]}"  "ftp://your-ftp-server/" /mnt/mvuuf_inmotion

(ps, if you get an error about “fusermount: failed to open /etc/fuse.conf: Permission denied”, fix with ‘sudo chmod a+r /etc/fuse.conf’; if you get an error about “fusermount: user has no write access to mountpoint /mnt/mvuuf_inmotion”, fix with ‘sudo chown $UID:${GROUPS[0]} /mnt/mvuuf_inmotion’.)

You should now be able to cd into your local mountpoint, /mnt/mvuuf_inmotion in this example, and ls your site’s contents.

4. Remote database connectivity. You need to be able to connect to your site’s database  from your local computer using the credentials stored in sites/default/settings.php.

You should be confident that your hosting company supports such remote database connections before spending too much time on this; consult with them if you need to. Even when it is supported, many webhosts won’t actually allow connections from hosts other than the webserver unless you take special steps to allow it – again, consult with them if you need to. If your hosting provider gives you MySQL managed with cPanel, this is quite easy, just go to “Remote MySQL” under Databases and add your public IP address that your local machine makes connections out to the Internet with (as displayed at whatismyip.com.)

You should probably do a manual connection to the database to ensure this part works before proceeding. Assuming the database server software is MySQL and you’ve installed the mysql client locally, you can do that with

mysql -h [whatever.your.host.is] -u [username_from_settings.php] -p

where “whatever.your.host.is” would generally be a hostname given in your sites/default/settings.php, unless settings.php says “localhost” in which case you should try the domain name of your website. You’ll be prompted for a password, which should also be in the database settings in settings.php.

5. Try it out. From a terminal on your local machine, cd to the root directory of your site (where Drupal’s index.php file is) and try a simple drush command like drush status. (Note that when this runs successfully, drush status likes to list the site’s url as “http://default” even when a url is explicity given in your settings.php. This seems to be a drush bug but doesn’t seem to affect anything else; you can fix it anyway if you like with drush’s –uri option.)

Great! You now have drush set up for use with your remote site.

Optional: Working with a local copy of the code. If you’ve gotten this far, you have two choices: If everything works and you don’t mind how darned slow it is, you can choose to be done. However, if you can’t stand the slowness or if drush is even aborting during some operations with messages like “the MySQL server has gone away,” you might want to make a local copy of the site’s code on your computer, switch to that to run drush, then rsync the local copy back over to your curlftpfs mount after you do updates. (I needed to do this because the shared host had a quite short timeout on idle mysql connections, so doing everything over curlftpfs slowed drush runs down enough to break it.)

Here’s some sample rsync commands to create a local copy of your site’s code, and to sync the remote site with your local copy. Note that you should place your site in maintenance mode before beginning upgrades with this method, and exit maintenance mode only after the remote sync is complete.

Example command to create/sync existing local copy:

rsync -aP --exclude="sites/default/files" /mnt/mvuuf_inmotion ~/mvuuf_local

Example command to sync the remote site up with your local copy after it has changed:

rsync -aP --temp-dir=/var/tmp/rsync --exclude="sites/default/files" --delete ~/mvuuf_local /mnt/mvuuf_inmotion

(these commands assume you have your curlftpfs-mounted copy of the site at /mnt/mvuuf_inmotion and you want to store the local copy under your home directory in a folder called mvuuf_local; adapt to suit.) The specific options to rsync here are composed with some care, don’t change them unless you’ve given it careful thought.

The obligatory notice accompanying all drupal upgrade etc writeups: whenever you do things that make considerable changes to your site, whether with this remote drush method or any other, make backups first. Drush itself can do this for you if you like.

Posted in Drupal, Linux

ZFS it is



Quite a few months back, in a post about the lack of ZFS block pointer rewrite, I mentioned that I’d begun investigations on whether I should migrate my home file server to a more modern filesystem. At that time, I already knew a few things about ZFS, but said that I wasn’t prepared to actually use it at home because it lacked the ability to add disks to a raid set. Well, the move to a new filesystem is now underway, and despite the lack of raidset reconfiguration, ZFS it is anyway.

So how did that happen?

When it came down to it, I basically had three options: ZFS, btrfs, or punt and do nothing for now, in hopes that either ZFS would support adding drives in future or btrfs would become more stable in future. I set up a test server with five old/shelved 320gb disks, and installed Ubuntu 14.04 with ZFS on Linux and btrfs tools. Then I set about evaluating both of them live & in person.

Btrfs Evaluation

I was really hoping my evaluations with btrfs would be positive – sure, it’s not “officially stable” yet, but that’ll come with time, and it pretty much does everything feature-wise. So I created a btrfs filesystem in its raid5-like layout on my 5 320gb drives, and queried both df and btrfs-tools for space available. 1.5TiB, it said. Ok, so that’s a little odd, since there can’t be more than 4x320gb available, but moving on…

I copied 20 gb or so over to it fine, then pulled one of the five disks from the online system and checksummed the 20 gb against the original. All ok. Then I went hunting for some kind of indication from btrfs tools that my filesystem was in a degraded state and action was needed. There were plenty of indications of trouble in syslog, but coming from an unhappy disk controller driver that’d lost contact with a disk. The closest I could come in btrfs was through ‘btrfs device stats,’ which showed a handful of read and (strangely, since I was reading the data to checksum it) more write errors on the device removed. Given that btrfs is designed to detect and correct such errors on disks, if I saw a small number of read/write errors to a device in a real-world scenario where I believed all disks were present, it wouldn’t be clear to me what if any action I should take. So strike one, screwy space accounting, strike two, lack of clear reporting of degraded state.

Next, I went ahead and filled the filesystem up to see what those df numbers would do as the filesystem approached the sensible, 4x320gb definition of full. What I ended up getting was, as I wrote more data to it, the Available value in df decreased faster than the Used value increased, so that by the time I’d written ~4x320gb, the Used value showed 1.2T, but the Avail value showed just a few GiBs. I decided I could live with that; I know how much actual space I have, so just remember not to rely on df / btrfs tools for accurate space available values on half-full filesystems basically.

Then, I pulled a different disk from the one I had originally removed, and wrote another GB or two. I forget if the first kernel panics happened during the writes or when I started trying to read it back, but either way I ended up with an unstable system that didn’t even shutdown without hanging and hung on reads to some of the btrfs files. In a last-ditch chance for btrfs to redeem itself, I hard reset the system and let it do its offline repair operation before mounting the filesystem again. This resulted in my last file disappearing completely, but didn’t prevent the system from panicking / becoming unstable again. Strike 3.

This left either ZFS or punt.

ZFS Evaluation

ZFS’s administration tools are a treat in comparison to btrfs’s. I easily created a raidz zpool of my five disks and then a couple of ZFS filesystems with different options enabled. Space reporting through any tool you like (df, zfs list, zfs get all, zpool list, zpool get all) made sense, and I saw some real potential in being able to do things like say “dedupe in this filesystem which I would like mounted where my photos directory used to be” (not so much data that the dedupe table will swamp my RAM, and I have a bad habit of being messy and having the same photo copied off cameras more than once), or say “run transparent gzip compression on this filesystem which I would like mounted where my log files go”, or “use lz4 compression and the ARC for my coding/music/wife’s stuff, but don’t compress and never cache data coming from the videos filesystem,” since video data will almost never be accessed more than once in time proximity and my tests proved it hardly compresses at all. Since filesystems all share the space of the single zpool they reside on as needed, there’s no administrative overhead to resizing filesystems or volumes to make all this happen (unless you want…there’s a filesystem quota you can use.)

I put ZFS through the same paces as btrfs in terms of disk-pulling. The difference was it handled it without issue, and not only did it give me detailed information about the pool’s degraded status (through zpool status) after yanking a drive, it even gave me in the output suggested actions to take.

Basically I concluded ZFS kicks butt.

I still have one major conceptual beef with ZFS, but it doesn’t apply to installations of my size, so whatever.

“Wait & See” Evaluation

The third choice was to keep adding more data to my md-raid/ext4 configuration for, realistically, at least a few years, until ZFS added the one missing feature or btrfs’ multiple device support matured. The rub with that is, there’s surely never going to be a way to transform in-place an md-raid to a btrfs or ZFS raid, so the migration further down the road would involve buying my entire storage utilization over again in blank disks and copying. But, that’s the same thing I’d end up doing if I moved to ZFS now and added a second RAID set to the zpool years down the road. So, I concluded, why not start getting the feature and integrity assurance benefits of ZFS today.

Posted in Big Storage @ Home

ZFS block pointer rewrite status, 2013



Through one part idle internet window shopping and one part good luck, I came across a  supply (now sold out) of the disks I use in my main RAID at home, nearly 50% off the next lowest price I’d ever seen for them–open-box (but not refurb) disks from a reputable online retailer with only a few months already consumed on Seagate’s 5 year warranty.  This seemed to me by far the best deal I’d seen on these disks in 5 years of looking, so I immediately bought some.

So now I am at one of those rarely encountered times where I have sufficient unused hard disk capacity to create a whole second copy of my current stash of bits if I wanted, and that gives me the flexibility to switch to another disk/filesystem/volume management arrangement if I want.

I always assume that at some point the amount of data I’ve accumulated will grow too large to allow for another opportunity to start from scratch like this, and so even though I’ve tried to choose wisely in selecting a storage arrangement in the past, and as a result could feasibly grow my current MD-based RAID out to 22 TB or so, I still want to make sure there isn’t something better out there that I could switch to since I do have the chance to do it now. Plus, the last time I investigated storage options was three years ago (which put me on my current 2TB-per-member MD RAID with ext4), so I took another look.

There are lots of things I like in ZFS, such as the added data integrity safeguards, snapshots to protect me from the fat-fingered rm -rf *, and with support for remote mirroring, it would definitely be worth another look at nearly-realtime backups to my hosted server (something I previously rejected due to the lack of snapshots.) A less unique but still important feature for long-term scalability is logical volumes, as I think a single RAID with > 12 drives would be a stretch even for the relatively light load my home array experiences.

But, as many a home user before me can tell you, ZFS has one huge feature hole for nerd@home scale operations: it is not possible to drop in another disk or two onto an existing RAID set. Enterprise users will drop another rack of raid sets into an existing zpool, which is possible with ZFS and makes sense. But do that after buying a handful of disks, and you’ll have a bunch of little RAID sets with wasted parity disks everywhere. RAID reconfiguration is something that’s been possible with MD RAID since I fired up my first array in 2003, and became possible as on online operation some years afterwards. It’s a feature I’ve used several times that I’m not comfortable giving up.

So, I dug into the state of development of this for ZFS, and was pleased to find some pretty current and comprehend-able information from Matt Ahrens, co-founder of ZFS way back when it was at Sun (awesomely, he seems to still be involved in ZFS through employment elsewhere.) The short summary sounds like achieving this via “block pointer rewrite” will almost surely never happen, because any code that correctly updates all necessary data structures would, in Matt’s opinion, introduce code that is too confusing/ugly/unmaintainable to encourage further ZFS development of other kinds. The YouTube video from October 2013 that I found this on is a great watch, if you’re interested in more detail. He also says, however, that there may be less ambitious approaches than block location rewrite to achieve the similar task of removing a device from a pool, so perhaps the more targeted task of adding a disk to a RAID-Z could also be tackled another way. Something I might try to learn more about?

First though, I need to research ZFS basics a bit more to find out why I shouldn’t just build a zpool of one or two vdevs (configured from ZFS’s point of view as plain disks) that actually happen to be MD RAID devices. Would it work to just let MD RAID do what it does best, and let ZFS do what it does best on top of it?

(Update 1/2014: the above md/zfs scheme would be ill-advised because ZFS implements a nontraditional “RAID” that includes couplings to the filesystem level, providing advantages such as self-healing data in the event of a parity/data disagreement, since the “RAID” subsystem and filesystem can compare notes to see if the data aligns with ZFS’s filesystem-level hashing (meaning the parity must be bad) or does not align with ZFS’s filesystem-level hash, meaning the data should be reconstructed from parity.)

Posted in Big Storage @ Home

Sendmail’s unwanted insistence on local delivery



Here’s another quick post to record one of those 2-line solutions to a problem that took considerable searching to find. This one affects outgoing mail passing through sendmail when the recipient’s email address matches the machine’s hostname, but the machine is not the mail server for the domain. For example, my dedicated server is configured as mbaynton.com, and sends logs and such to my @mbaynton.com email. Sendmail on this machine works fine for emails to other domains, but the trouble is, as every other email server in the entire world besides mine knows, mail destined for addresses @mbaynton.com should be sent to the smtp sever at aspmx.l.google.com. Instead of looking up the mx record for mbaynton.com to see if it actually is the mbaynton.com mail server, it just assumes it is, looks for the recipient’s name in its local user database, and either delivers the message locally or doesn’t find a local mailbox and gives up.

The fix: add the following two lines to the end of /etc/mail/sendmail.mc:

define(`MAIL_HUB', `mbaynton.com.')dnl
define(`LOCAL_RELAY', `mbaynton.com.')dnl

Then rebuild the sendmail configuration, which on Ubuntu can be accomplished by running

sendmailconfig

Since sendmail is going to be with us for the foreseeable future, I’m sure I’ll need to refer back to this tip someday. Thanks to http://serverfault.com/questions/65365/disable-local-delivery-in-sendmail/128450#128450 for the solution.

Posted in Linux
← Older Entries

Recent Posts

  • Reset connection rate limit in pfSense
  • Connecting to University of Minnesota VPN with Ubuntu / NetworkManager native client
  • Running nodes against multiple puppetmasters as an upgrade strategy
  • The easiest way to (re)start MySQL replication
  • Keeping up on one’s OpenSSL cipher configurations without being a fulltime sysadmin

Categories

  • Computing tips
    • Big Storage @ Home
    • Linux
  • dev
    • devops
    • Drupal
    • lang
      • HTML
      • JavaScript
      • PHP
    • SignalR
  • Product Reviews
  • Uncategorized

Tags

Apache iframe malware performance Security SignalR YWZmaWQ9MDUyODg=

Archives

  • June 2018
  • January 2018
  • August 2017
  • January 2017
  • December 2016
  • November 2016
  • July 2016
  • February 2016
  • January 2016
  • September 2015
  • March 2015
  • February 2015
  • November 2014
  • August 2014
  • July 2014
  • April 2014
  • February 2014
  • January 2014
  • October 2013
  • August 2013
  • June 2013
  • January 2013
  • December 2012
  • November 2012
  • September 2012
  • August 2012
  • July 2012

Blogroll

  • A Ph.D doing DevOps (and lots else)
  • gavinj.net – interesting dev blog
  • Louwrentius.com – zfs@home with 4x the budget, other goodies
  • Me on github
  • My old edulogon.com blog
  • My old GSOC blog
  • My wife started baking a lot
  • Now it's official, my wife is a foodie

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

EvoLve theme by Theme4Press  •  Powered by WordPress eworldproblems