eworldproblems
  • Home
  • About
  • Awesome Ideas That Somebody Else Already Thought Of
  • Perl defects
  • Books & Resources
Follow

Posts in category dev

Running nodes against multiple puppetmasters as an upgrade strategy



At work, we’re way out of date in our devops, having not upgraded Puppet since version 3.8. As of this writing, version 5 is available.

This has finally created sufficiently many problems that I’ve been helping prep for an upgrade to puppet 5 — but with some 3,200 .pp puppet manifest files in our existing puppet deployment, and a puppet language that doesn’t retain backwards compatibility, the prospect of upgrading is incredibly onerous.

Instead of trying to convert everything in one massive action, a strategy that people will hate me for but that I’m finding really helps to get the job done is to write new puppetization against a puppet 5 server, slowly removing equivalent declarations / resources / etc from the old puppet server, and running your puppetized nodes against both masters during this period. As long as you ensure the puppet masters don’t both try to set up the same service / file / resource in different ways, there’s no real reason you can’t do this.

This turns out to be fairly easy, because Puppet’s developers threw us a bone and made sure the latest 5.x Puppet server can drive very old (3.8) puppet agents, so you don’t need more than one puppet binary installed on the puppetized nodes. All the shiniest puppet 5 features are available for use in your new puppet code if it is compiled by a puppet 5 server, and the resulting state can be set by agents all the way back to 3.8 (maybe even slightly older.) Also, it’s really helpful that the puppet agent can be told at invocation to use a nonstandard config file.

There’s some potential gotchas with getting the agent to trust both masters’ self-signed certs, pluginsync, and the case of puppet masters that enforce particular puppet agent configuration. Here’s a setup that avoids all that.

  1. Leave your legacy puppet configuration alone.
    We’ll do puppet runs against the new server via a foreground puppet run in a cronjob.
  2. Make a copy of puppet.conf.
    I’ll call the copy puppet5.conf, but you’ll just be referencing this new file in a command-line argument, so may name it as you like.
  3. Edit puppet5.conf:
    • Change the server line to your new puppet 5 server, of course.
    • Change vardir, logdir, and rundir to new locations. This is key, as it makes puppet agent runs against your new server completely isolated from puppet agent runs against your old serer with respect to ssl trust and pluginsync.
    • Unrelated to a multi-master setup, but I also found that most modern puppet modules on the forge assume you’ve set stringify_facts = false.

    Here’s my complete puppet5.conf, for reference:

    [main]
        server = puppet5-experimental.msi.umn.edu
        vardir = /var/lib/puppet5
        logdir = /var/log/puppet5
        rundir = /var/run/puppet5
        ssldir = $vardir/ssl
        pluginsync = true
        factpath = $vardir/lib/facter
        always_retry_plugins = false
        stringify_facts = false
    
    [agent]
        # run once an hour, stagger runs                                                             
        runinterval = 3600
        splay = true
        configtimeout = 360
        report = true
    
  4. Do a test run manually:
    # puppet agent --config /etc/puppet/puppet5.conf -t

    This should perform like a first-time puppet run. A new client certificate will be generated, the agent will retrieve and in future trust the server’s certificate and CRL, and depending on your server’s configuration you’ll likely need to puppet cert sign mynode.mydomain on the master.

  5. Do a standard test run against your legacy server manually.
    # puppet agent -t

    Watch it proceed happily, as confirmation that your existing puppet infrastructure is unaffected.

  6. If desired, create a cronjob to cause periodic convergence runs against your new puppet server.

Now you’re free to start using puppet 5 features, and porting legacy puppet code, to your heart’s content.

Posted in devops, Linux

Keeping up on one’s OpenSSL cipher configurations without being a fulltime sysadmin



As you probably already know if you’re the type to be reading my blog, https is able to stay secure over time because it is not reliant on a single encryption scheme. A negotiation process takes place between the two parties at the start of any TLS-encrypted TCP session in which the parties figure out which cipher suites each are willing and able to use. So, as cipher suites fall out of favor, alternative ones can be seamlessly put to use instead.

Of course, this requires that as a server operator, you keep your systems in the know about the latest and greatest trends in that arena. And unfortunately, in order to do that the reality is that it requires you to keep you in the know, as well. It pretty much comes down to plugging “the right” value into a parameter or two used by the OpenSSL library, but those parameters are long and obtuse, and there’s a balance to be struck between optimal security and support for visitors with older web browsers.

It’s a nuisance I’d been aware of for years, but had been letting sit on the back burner because frankly I didn’t have any solutions that were sufficiently easy for me to actually bother keeping up with it over time. This post by Hynek Schlawack, for example, professes to be among the more concise explanations for a quality OpenSSL configuration, but it still weights in at 11 printed pages. More than I am a systems operator, I’m a developer with many active interests to pursue. The reality is I’m not going to be rereading something like that periodically as the post suggests.

Recently, with the help of a link Jeff Geerling dropped on his excellent blog, I found out that CloudFlare, one of the major CDN providers, makes their current SSL configuration available publicly on github -> cloudflare/sslconfig. As a commercial entity that serves a huge volume of content to a diverse client base, they have the resources and motivation to figure all this stuff out, and they’re providing a valuable public service by keeping their findings updated and public.

Checking their github repo periodically is probably an improvement over diff’ing an 11-page blog post, but I still would need to remember to do it. I wanted proactive automated notifications when I needed to update my SSL configuration. Maybe I missed something obvious, but I didn’t find any options on github that would notify me of new commits in a repository I’m not a member of, at least that didn’t also spam me with every comment on every issue.

So, project! The github API is easy to poll for new commits on a repository, so I coded up this little script to do that, and email me when it sees a change. I have it cronned to watch only cloudflare/sslconfig for now, but you can configure it to watch any repository(ies) you desire. You can also configure the email recipients/subject/message easily.

Grab my script and give it a try if this is a problem you can relate to!

Posted in devops, Linux

Introducing Prophusion: Test complex applications in any version of PHP



Putting together testing infrastructure for Curator has been an interesting project onto itself. On my wishlist was:

  • Support for a wide range of PHP interpreter versions, at least 5.4 – current
  • In addition to the unit test suite, be able to run full integration testing including reads/writes to actual FTP servers.
  • Keep the test environment easy enough to replicate that it is feasible for developers to run all tests locally, before submitting a pull request.

By building on Docker and phpenv, I was able to meet these requirements, and create something with more general applicability. I call it Prophusion, because it provides ready access to over 140 PHP releases.

For a quick introduction to Prophusion including a YouTube of it in action, check out this slide deck.

I’ve since fully integrated Prophusion into the testing pipeline for Curator where it happily performs my unit and in-depth system tests in the cloud, but I also make a habit of running it on my development laptop and workstation at home as I develop. You can even run xdebug from within the Prophusion container to debug surprise test failures from an xdebug client on your docker host system…I’m currently doing that by setting the right environment variables when docker starts up in my curator-specific test environment. I’ll port those back to the prophusion-base entrypoint in the next release of the prophusion-base docker image.

Prophusion includes a base image for testing in the CLI (or FPM), and one with Apache integrated for your in-browser testing.

Posted in devops, PHP

Automatic security updates and a support lifecycle: the missing links to Backdrop affordability



I saw Nate Haug and Jen Lampton give their Backdrop CMS intro talk last weekend at the Twin Cities Drupal Camp. They have already done much to identify and remedy some big functionality and UX issues that cause organizations to leave Drupal for WordPress or other platforms, but you can read more about that elsewhere or experience it yourself on Pantheon.

Their stated target audience is organizations who need more from their website than is WordPress’s primary use case, but still don’t need all the capability that Drupal 8’s software engineering may theoretically enable — everyone doesn’t need a CMS that supports BigPipe or compatibility with multiple heterogeneous data backends.

That positions them squarely in the same space in the same target market as Squarespace — but whereas Squarespace is a for-profit business, Backdrop is open-source software, and affordability to the site owner is so important to the project that it gets a mention in Backdrop’s single-sentence mission statement.

Simplified site building for less-than-enterprise organizations is a crowded space already. The Backdrop philosophy identifies a small area of this market where a better solution is conceivable, but I think a big reason so much emphasis is given to goals and philosophy on the Backdrop website and in their conference session is that Jen and Nate recognize their late entry in this crowded market leaves little room for Backdrop to miss in achieving its goals. Backdrop’s decision makers definitely need to keep a clear view of the principles the project was founded for in order for Backdrop to positively differentiate itself.

Affordability is one potential differentiator, and I am personally happy to see it’s one already embodied by Backdrop’s promises of backwards compatibility and low server requirements. But, frankly, WordPress has got those things already. An objective evaluation of Backdrop’s affordability in its current form would put it on par with, but not appreciably better than a more established competitor. But there is hope, because:

Current CMSs don’t give site owners what’s best for them

To make converts, Backdrop could truly differentiate itself with a pledge to offer two new things:

  1. Security backports for previous releases, with a clearly published end-of-support date on each release, so site owners can address security issues with confidence that nothing else about their site will change or stop working.
  2. A reference implementation for fully automated updates, so that installed sites stay secure with zero effort.

Here’s why.

Let’s assume there’s a certain total effort required to create and maintain that highly customized website described by Backdrop’s mission statement. Who exerts that effort is more or less transferable between the site builder and the CMS contributors. On one extreme, the site owner could eschew a CMS entirely and take on all the effort themselves. Nobody does this, because of the colossal duplication of exertions that would result from all the sites re-solving many of the same problems. Whenever a CMS takes over even a small task from site builders, a mind-boggling savings in total human hours results.

Consider this fact in the context of CMS security. From a site owner’s perspective, here’s the simplest-case current model in Drupal, Backdrop, WordPress, and other popular open-source CMS’s for keeping the sites they run secured over time. This model assumes a rather optimistic site owner who isn’t worried enough about the risk of code changes breaking their site to maintain a parallel development environment; organizations choosing to do this incur even more steps and costs.

  1. A responsible individual keeps an eye out for security updates to be released. There are myriad ways to be notified of them, but somebody has to be there to receive the notifications.
  2. When the individual and/or organization is ready, an action is performed to apply the latest version of the code, containing the security update as well as non-security changes the developers have made since this site was last updated. Sometimes this action is as simple as clicking a button, but it cannot be responsibly automated beyond this point due to the need to promptly perform step 3.
  3. The site owner browses around on their site, checking key pages and functions, to find any unintended consequences of the update before their customers do. (For non-enterprise, smaller sites, let’s assume an automated testing suite providing coverage of the highly customized features does not exist. Alternatively, if it did exist, there would be a cost to creating it that needs to be duplicated for each instance of the CMS in use.)
  4. In practice, the developers usually did their job correctly, and the update does not have unintended consequences. But sometimes, something unexpected happens, and a cost is incurred to return the site to its past level of operation, assuming funds are available.

Once a site has been developed, unsolicited non-security changes to the code rarely add business value to the organization operating the site. In the current model, however, changes are forced on organizations anyway as a necessary component of maintaining a secure site, merely because they are packaged along with the security update. In my opinion, the boldface observation above ought to be recognized as one of the principles guiding Backdrop’s philosophy. In the classic model, the CMS avoids a small task of backporting the security fix to past releases and the work is transferred to site owners in the form of the above steps. That’s expense for the site owner, and in total it is multiplied by each site the CMS runs — a much larger figure than offering a backport would have amounted to.

This is a clear shortcoming of current offerings, and Backdrop’s focus on affordability makes it a ripe candidate for breaking this mold. Not to mention the value proposition it would create for organizations evaluating their CMS options. Heck, make a badge and stick it on backdropcms.org/releases:

Supported

3 years

Stable

Backdrop could guarantee security updates will not disrupt the sites they run; the competition could only say “A security update is available. You should update immediately. It is not advisable to update your production site without testing the update first, because we introduced lots of other moving parts and can’t make any guarantees. Good luck.”

That’s something I think developers and non-technical decision makers alike can appreciate, and that would make Backdrop look more attractive. Don’t want to pay monthly dues to Squarespace? Don’t want to pay periodic, possibly unpredictable support fees to a developer? Now you can, on Backdrop.

The above case that a software support lifecycle would make site maintenance more affordable to site owners does not even begin to take into consideration the reality that many sites simply are not updated in a timely fashion because the updates aren’t automated. If you are not an enterprise with in-house IT staff, and you are not paying monthly premiums to an outfit like Squarespace or a high-end web host with custom application-layer firewalls, history shows a bot is pretty guaranteed to own that site well before you get around to fixing it. Exploited sites are in turn used to spread malware to their visitors, so adding automated updates to the CMS that can be safely applied, rapidly, without intervention would have a far-reaching overall impact on Internet security.

But how achievable is this?

Isn’t extended support boring to developers volunteering their time?

Yes, probably. Top contributors might not be too psyched to do backports. But just as in a for-profit development firm, developers with a range of abilities are all involved in creating open-source software. Have top contributors or members of the security team write the fix and corresponding tests against the latest release, and let others merge them back. The number of patches written for Drupal core which have never been merged has I think demonstrated that developer hours are eminently available, even when the chance of those hours having any impact is low. Propose to someone that their efforts will be reflected on hundreds or thousands of sites across the Internet in a matter of days, and you’ll get some volunteers. Novice contributors show up weekly in #drupal-contribute happy to learn how to reroll patches as it is. Security issues might be slightly more tricky in that care needs to be taken to limit their exposure to individuals whose trust has been earned, but this is totally doable. Given the frequency of core security releases in Drupal 7, a smaller pool of individuals known personally by more established community members could be maintained on a simple invite-only basis.

Update, April 2017
I discussed how achievable backports within major versions of Drupal could be in a core conversation at DrupalCon Baltimore. The focus related especially to the possibility of extending the model where official vendors have access to confidential security information to support Drupal 6 LTS. Participants included many members of the security team and a few release managers; the youtube is here.

Some interesting possibilities exist around automating attempts to auto-merge fixes through the past releases and invoke the test suite, but rigging up this infrastructure wouldn’t even be an immediate necessity.

Also, other projects in the wider, and even PHP FOSS world show us it can be done. Ubuntu made software lifecycles with overlapping supported versions famous for an entire Linux distribution (though most all of the other distros also pull it off with less fanfare, chalking it up as an implicit necessity), and it’s even been embraced by Symfony, the PHP framework deeply integrated into Drupal 8. While Drupal adopted  Symfony’s software design patterns and frequently cites this as one of Drupal 8’s strengths, they didn’t adopt Symfony’s software lifecycle practices. In this regard, Drupal is increasingly finding itself “on the island.” Hey, if Backdrop started doing it, maybe Drupal would give in eventually too.

What about contributed modules?

I would argue that a primary strategy to handle the issue of contrib code should be to reduce the amount of contrib code.  This fits well with a Backdrop initiative to put the things most people want in the core distribution of the product.  A significant number of sites — importantly, the simplest ones that are probably least inclined to think about ongoing security — would receive complete update coverage were backports provided only for core. The fact that contrib is there in CMS ecosystems is sometimes cited as a reason security support lifecycles would not be possible, but it’s no excuse not to tackle it in the CMS’s core.

Contrib will always be a major part of any CMS’s ecosystem though, and shouldn’t be left out of the opportunity to participate in support lifecycles.  I would propose that infrastructure be provided for contrib developers to clearly publish their own support intentions on their project pages. Then, when a security issue is disclosed in a contrib module, the developer would identify, by means of simple checkboxes, the versions that are affected. There would be no obligation to actually produce a patched version of old releases identified as vulnerable, however, regardless of previously published intentions. This would have two effects: A) the developer would be reminded that they committed to do something, and therefore might be more likely to do it, and B) sufficient data would be available to inform site owners of a security issue requiring their attention if the contrib module chose not to provide a backported fix. Eventually, the data might also be used as a statistic on project pages to aid site builders in selecting modules with a good support track record.

Aren’t automated updates inherently insecure?

No, although some web developers may conflate performing an automatic update with the risks of allowing code to modify other code when it can be invoked by untrusted users. A reference implementation of an automatic updater would be a separate component from the CMS, capable of running with a different set of permissions from the CMS itself.

Brief case study: “Drupalgeddon”

Here’s the patch, including its test coverage, that fixed what probably proved to be the most impactful security vulnerability in Drupal 7‘s history to date:

drupalgeddon

The fix itself is a one-line change in database.inc. Security patches are, as in this case, often very small and only have any impact on a site’s behavior in the face of malicious inputs. That’s why there’s value in packaging them separately.

Drupal 7.32, the version that fixed this vulnerability, was released in October 2014. A

git apply -3 drupalgeddon.patch

is able to automatically apply both the database.inc and database_test.test changes all the way back to Drupal 7.0, which was released almost four years earlier in January 2011. Had the infrastructure been in place, this fix could have been automatically generated for all earlier Drupal versions, automatically verified with the test suite, and automatically distributed to every Drupal 7 website with no real added effort on the part of the CMS, and no effort on the part of site owners. Instead, in the aftermath, tremendous time energy and money was expended by the site owners that were affected or compromised by it, with those that didn’t patch in a matter of hours facing the highest expenses to forensically determine if they were compromised and rectify all resulting damages.

You better believe botnet operators maintain databases of sites by CMS, and are poised to use them to effectively launch automated exploits against the bulk of the sites running any given CMS within hours of the next major disclosure. So, unless the CMSs catch up, it is not a matter of if this will happen again, but when.

The only way to beat the automation employed by hackers is for the good guys to employ some automation of their own to get all those sites patched. And look how easy it is. Why are we not doing this.

Final thoughts

A CMS that offered an extended support lifecycle on their releases would make site ownership more affordable and simpler, and would improve overall Internet security. Besides being the right thing to do, if it made these promises to its users, Backdrop would be able to boast of a measurable affordability and simplicity advantage. And advantages are critical for the new CMS in town vying for market share in a crowded and established space.

Posted in Drupal, PHP

HAProxy “backup” and “option redispatch”



I’m testing out my new infrastructure-in-progress for redundant web hosting, described in my last post. The quick outline of the components again, is a total of four VMs:

  • Two small & inexpensive instances in a tier 4 datacenter running Apache Traffic Server + HAProxy.
  • One bigger, rather spendy backend VM in a tier 4 datacenter running Apache, webapp code, databases, etc. Call it hosting1.
  • One VM roughly equivalent in specs to the backend VM above, but running on a host machine in my home office so its marginal cost to me is $0. Call it hosting2.

The thinking is that hosting1 is going to be available 99%+ of the time, so “free” is a nice price for the backup VM relative to the bandwidth / latency hit I’ll very occasionally take by serving requests over my home Internet connection. Apache Traffic Server will return cached copies of lots of the big static media anyway. But this plan requires getting HAProxy to get with the program – don’t EVER proxy to hosting2 unless hosting1 is down.

HAProxy does have a configuration for that – you simply mark the server(s) you want to use as backups with, sensibly enough, “backup,” and they’ll only see traffic if all other servers are failed. However, when testing this, at least under HAProxy 1.4.24, there’s a little problem: option redispatch doesn’t work. Option redispatch is supposed to smooth things over for requests that happen to come in during the interval after the backend has failed but before the health checks have declared it down, by reissuing the proxied request to another server in the pool. Instead, when you lose the last (and for me, only) backend in the proper load balance pool, requests received during this interval wait until the health checks establish the non-backup backend as down, and then return a 503, “No servers are available to handle your request.”

I did a quick test and reconfigured hosting1 and hosting2 to both be regular load balance backends. With this more typical configuration, option redispatch worked as advertised, but I now ran the risk of traffic being directed at hosting2 when it didn’t need to be.

Through some experimentation, I’ve come up with a configuration that my testing indicates gives the effect of “backup” but puts both servers in the regular load balance pool, so option redispatch works. The secret, I think, is a “stick match” based on destination port — so matching all incoming requests — combined with a very imbalanced weight favoring hosting1.

Here’s a complete config that is working out for me:

global
  chroot  /var/lib/haproxy
  daemon
  group  haproxy
  log  10.2.3.4 local0
  maxconn  4000
  pidfile  /var/run/haproxy.pid
  stats  socket /var/lib/haproxy/stats
  user  haproxy

defaults
  log  global
  maxconn  8000
  option  redispatch
  retries  3
  stats  enable
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 1m
  timeout  server 1m
  timeout  check 10s

listen puppet00
  bind 127.0.0.1:8100
  mode http
  balance static-rr
  option redispatch
  retries 2
  stick match dst_port
  stick-table type integer size 100 expire 96h
  server hosting1 10.0.1.100:80 check weight 100
  server hosting2 10.0.2.2:80 check weight 1

I tested this by putting different index.html’s on my backend servers and hitting HAProxy with cURL every 0.5 seconds for about an hour, using the `watch` command to highlight any difference in the data returned:

watch -x --d=permanent -n 0.5 curl -H 'Host: www.mysite.com' http://104.156.201.58  --stderr /dev/null

It didn’t return the index.html from hosting2 a single time, until I shutdown apache on hosting1 after about an hour.

There’s a small amount of magic here that I unfortunately don’t understand, but will resign myself to being happy that it works as desired. Once failed over to hosting2, I would kind of expect the stick table to update and cause everything to stay with hosting2 even after hosting1 comes back. In actuality, it returns to hosting1. So cool, I guess.

Posted in devops

Reset a crashed migration in Drupal 8



I’ve been writing migration code for my wife’s WordPress blog to Drupal 8, using the migrate framework. This framework attempts to track if a particular migration is currently running, but does not always unset that flag if it crashed with a PHP fatal error (which will happen when you are writing new code.) Subsequent attempts to run the migration will produce “Migration your_migration_name is busy with another operation: Importing”.

To unstick it in this case, you can run the following in drush:

drush php-eval 'var_dump(Drupal::keyValue("migrate_status")->set('your_migration_name', 0))'
Posted in Drupal

Handling GET-method submissions with Drupal 7’s Forms API



Remarkably, Drupal 7’s FAPI treats form submissions using the GET method as something of an afterthought. “Just register a URL with wildcards for that!” is the usual refrain from the Drupal devout when the less indoctrinated express their shock over this state of affairs. That approach, however, has several limitations over the W3C standard Way To Do It, such as if your form /has/twenty/optional/fields/that/all/need/a/position/in/the/url, or if your goal is to have Drupal serve response pages for requests formed by standards-complaint tools.

Fortunately though, it is possible. Getting it to work at all requires a multi-line recipe; things become even trickier if you want Drupal to handle submissions coming from another web framework or a static html file. I was faced with this prospect when I needed to reimplement a single form in a more complex legacy application. The frontend form needed to stay as-is because that’s where users look for it, but Drupal had all the backend pieces to make the magic happen in < 100 lines, so I figured I would just point the existing form’s method to a custom module in Drupal. Here’s a generalized sample of what all is needed in such cases, derived from combined googling and interactive debugging through core:

  1. Build the form
    $form_state = array(
      'method' => 'get',
      'always_process' => true, // if support for submissions originating outside drupal is needed
    );
    $form = drupal_build_form('name_of_my_form', $form_state);
    

    Returning $form at this point would suffice as the entirety of a page callback. (Note that because of the needed $form_state, you have to drupal_build_form instead of drupal_get_form, and the standard trick of just registering drupal_get_form as your page callback in hook_menu isn’t an option.)

  2. Write the form callback
    Despite the fact that you already told Drupal you want a GET form in the $form_state, you’ve gotta do so in $form in your form callback as well:

    function name_of_my_form($form, &$form_state) {
       $form['#method'] = 'get';
       $form['#token'] = 'false';
       ... // describe your form as usual
       return $form;
    }
    
  3. Break the redirect loop and otherwise handle the submission in the form submission callback
    So, you’ve made a form that trips the submission callback when you access it and loads up the values specified via GET into Drupal’s usual form_state structures. But wait! Drupal’s standard thing is to redirect somewhere following a form submission, and your GET url looks like a form submission to it now, so endless redirect loop ahoy. This can be killed off in a variety of ways, but I’ll mention two that I think should cover most cases.

    If you examine the submitted values and find the submission to be incomplete, such as if the user visited the registered path without providing any GET parameters, you can simply set $form_state['no_redirect'] = true in your submit handler.

    If you wish to actually process a submission and display results on the page, here’s how I did it: instead of setting no_redirect, set $form_state['rebuild'] = true. This implicitly stops a redirect from occurring, and also results in two calls to your form callback during the request – one to provide Drupal with your form information for validation/submission processing, and a second after the submission handler has been run. The results of the second invocation are what is rendered to the page. So, it’s easy enough to stick some results data in your $form_state in the submit callback, then look for it in the form callback and add some additional elements to the form’s render array to show them.

Of course, 'always_process' => true explicitly lowers Drupal’s CSRF shields, so be sure to confirm for yourself that your GET form really isn’t modifying the state of your application.

Also, beware the magic q: Drupal uses the URL param ‘q’ when pretty URLs are disabled, and treats it as the string to be routed whenever it is present as a parameter, whether pretty URLs are on or off. So, it’s unavailable for use as the name of an input on your form.

Posted in Drupal

Remote drush with only FTP and database access



Necessity

I manage a Drupal site as part of my job, which has taught me to appreciate the treat that is drush. With Drupal,  you really want ssh access to the webserver because it gives you the ability to use drush, the command-line administration utility for Drupal sites. Uninteresting tasks like module and core upgrades become much less tedious.

I also babysit the website for the Minnesota Valley Unitarian Universalist fellowship, a real simple Drupal site that lives on a basic shared hosting server. No ssh access here. Because doing drupal core updates solely over FTP sucks so much, finally today I sat down in earnest to see if I could somehow leverage the power of drush with such shared hosting sites.

Invention

There didn’t seem to be much accurate information about how to do it already, with top Google hits saying things like you need ssh and drush installed both locally and on the server (false and false), you can’t, and there was really nobody suggesting anything to the contrary especially when your local workstation is running Linux (here’s a workable method for Windows.) I stubbornly refused to believe them and came up with a method that has proven to work for upgrading modules and core successfully, and every other drush subcommand I have tried to date. This is a guide for Linux, and I’ll include the package names for Debian/Ubuntu, but other than that it should be generic to other distros too.

The basic idea is to get the site’s source code mounted as a filesystem on your local machine backed by your ftp server, install php on your local machine if needed, and get connected to the live site’s database server from your local machine. You then run drush directly on your local machine, but its changes impact the actual site. It’s a bit of work to get everything going the 1st time, but simplifies site management tons in the long run.

Warning

Before the step-by-step, two words of warning: 1. This method is for people that want to be lazy, but don’t mind if the computers take forever once you tell them what to do. This method’s performance is slow, due to curlftpfs. For my purposes I don’t care, but some might. 2. A somewhat faster and more reliable way involves copying your site’s source locally and synchronizing the real site to it after running commands that modify code, not a big deal with a few rsync commands. Even when you do it this way, you don’t need to get another instance of your site actually up and running locally, and you don’t need a clone of the database. I discuss this more in step 5.

Step-by-step Guide

1. Pick a Linux computer to run drush commands on, likely your desktop/laptop. This computer will need php with mysql and gd extensions installed, as well as curlftpfs (or something else that makes ftp servers mountable, if you have another preference.) On Debian/Ubuntu, run these commands to ensure you have all needed packages:

sudo apt-get install php5
sudo apt-get install php5-mysql
sudo apt-get install php5-gd
sudo apt-get install curlftpfs

Failure to install php5 ultimately produces a clear error message from drush that it failed to find php, but failure to install php5-mysql produces an error with inaccurate troubleshooting information when you run Drush, so make sure you have it. Lack of php5-gd will generate frequent warnings from drush, but can mostly be done without if you really don’t want to install it.

2. Install drush locally on the Linux computer you’ve chosen. When I put drush on a system I usually just untar the latest stable release from Drush’s github releases to somewhere like /usr/local/lib and symlink the drush shell script from the release to /usr/bin/drush, but drush is just php and shell scripts so you can untar and run it in your home directory or wherever as well if you like. I’ll skip the line-by-line commands for how to do these things; few readers will have never extracted a .tgz or .zip.

3. Mount your site somewhere on your local machine’s filesystem with curlftpfs. In this example I’ll mount to the path ‘/mnt/mvuuf_inmotion’; choose something appropriate to your site and create an empty directory there.

sudo mkdir -p /mnt/mvuuf_inmotion

Modifying the below command for your ftp server, username, password, and local mountpoint, run (as the regular user you plan to run drush commands as, not as root):

curlftpfs -o user="your-ftp-username:your-ftp-password,uid=$UID,gid=${GROUPS[0]}"  "ftp://your-ftp-server/" /mnt/mvuuf_inmotion

(ps, if you get an error about “fusermount: failed to open /etc/fuse.conf: Permission denied”, fix with ‘sudo chmod a+r /etc/fuse.conf’; if you get an error about “fusermount: user has no write access to mountpoint /mnt/mvuuf_inmotion”, fix with ‘sudo chown $UID:${GROUPS[0]} /mnt/mvuuf_inmotion’.)

You should now be able to cd into your local mountpoint, /mnt/mvuuf_inmotion in this example, and ls your site’s contents.

4. Remote database connectivity. You need to be able to connect to your site’s database  from your local computer using the credentials stored in sites/default/settings.php.

You should be confident that your hosting company supports such remote database connections before spending too much time on this; consult with them if you need to. Even when it is supported, many webhosts won’t actually allow connections from hosts other than the webserver unless you take special steps to allow it – again, consult with them if you need to. If your hosting provider gives you MySQL managed with cPanel, this is quite easy, just go to “Remote MySQL” under Databases and add your public IP address that your local machine makes connections out to the Internet with (as displayed at whatismyip.com.)

You should probably do a manual connection to the database to ensure this part works before proceeding. Assuming the database server software is MySQL and you’ve installed the mysql client locally, you can do that with

mysql -h [whatever.your.host.is] -u [username_from_settings.php] -p

where “whatever.your.host.is” would generally be a hostname given in your sites/default/settings.php, unless settings.php says “localhost” in which case you should try the domain name of your website. You’ll be prompted for a password, which should also be in the database settings in settings.php.

5. Try it out. From a terminal on your local machine, cd to the root directory of your site (where Drupal’s index.php file is) and try a simple drush command like drush status. (Note that when this runs successfully, drush status likes to list the site’s url as “http://default” even when a url is explicity given in your settings.php. This seems to be a drush bug but doesn’t seem to affect anything else; you can fix it anyway if you like with drush’s –uri option.)

Great! You now have drush set up for use with your remote site.

Optional: Working with a local copy of the code. If you’ve gotten this far, you have two choices: If everything works and you don’t mind how darned slow it is, you can choose to be done. However, if you can’t stand the slowness or if drush is even aborting during some operations with messages like “the MySQL server has gone away,” you might want to make a local copy of the site’s code on your computer, switch to that to run drush, then rsync the local copy back over to your curlftpfs mount after you do updates. (I needed to do this because the shared host had a quite short timeout on idle mysql connections, so doing everything over curlftpfs slowed drush runs down enough to break it.)

Here’s some sample rsync commands to create a local copy of your site’s code, and to sync the remote site with your local copy. Note that you should place your site in maintenance mode before beginning upgrades with this method, and exit maintenance mode only after the remote sync is complete.

Example command to create/sync existing local copy:

rsync -aP --exclude="sites/default/files" /mnt/mvuuf_inmotion ~/mvuuf_local

Example command to sync the remote site up with your local copy after it has changed:

rsync -aP --temp-dir=/var/tmp/rsync --exclude="sites/default/files" --delete ~/mvuuf_local /mnt/mvuuf_inmotion

(these commands assume you have your curlftpfs-mounted copy of the site at /mnt/mvuuf_inmotion and you want to store the local copy under your home directory in a folder called mvuuf_local; adapt to suit.) The specific options to rsync here are composed with some care, don’t change them unless you’ve given it careful thought.

The obligatory notice accompanying all drupal upgrade etc writeups: whenever you do things that make considerable changes to your site, whether with this remote drush method or any other, make backups first. Drush itself can do this for you if you like.

Posted in Drupal, Linux

Drupal views join on multiple columns



Drupal views UI is nice from time to time, once you’ve spent lots of time poking at it and know it’s quirks. The rest of the time, what you need can’t be done via the UI, and you are plunged into the realms of programmatic customization whose APIs may or may not be properly documented. Today, I had to fix the query behind a view I’d inherited which had been constructed mostly through the UI and had a bad join, because the UI only lets you specify a single column from the left and right tables as defining the relationship. Determining and writing the correct SQL manually took 5 minutes, but getting views to duplicate it took reading the docs – and then the views code – for the rest of the afternoon. Here’s what I finally figured out is the most simple / direct way to specify joins against multiple columns with views:

1. Add a join_handler to your relationship in hook_views_data

Chances are if you need to be joining on multiple columns, you are using a custom data model and have described it to views with hook_views_data already. (If not, you’ve probably got something different going on and this post may not help you. see my addendum at the bottom.) A basic “relationship” aka SQL join on only one column is defined something like this in hook_views_data:

  $data['tablea']['some_field_from_tableb'] = array(
    'title' =&gt; t('Some field from table B'),
    'relationship' =&gt; array(
      'base' =&gt; 'tableb',
      'base field' =&gt; 'a_id',
      'field' =&gt; 'id',
      'handler' =&gt; 'views_handler_relationship',
    ),
  );

This’ll get you SQL along the lines of

... FROM tablea LEFT JOIN tableb ON tablea.id = tableb.a_id

But if you want SQL along the lines of

... FROM tablea LEFT JOIN tableb ON tablea.id = tableb.a_id AND tablea.instance = tableb.instance

then try the following php:

  $data['tablea']['some_field_from_tableb'] = array(
    'title' =&gt; t('Some field from table B'),
    'relationship' =&gt; array(
      'base' =&gt; 'tableb',
      'base field' =&gt; 'a_id',
      'field' =&gt; 'id',
      'handler' =&gt; 'views_handler_relationship',
      'join_handler' =&gt; 'my_fixed_up_join',
    ),
  );

2. Code my_fixed_up_join

Then, code up my_fixed_up_join, which should be a class extending views_join. You can get quite carried away with that if you want, but in the unlikely event that you’re of the mindset that we are simply trying to add an extra condition to a join in an SQL query here, and defining a new class at all in order to achieve this is already a bit much, here’s a quick and dirty version that’s “good enough” if you only have a few views that use it:

class my_fixed_up_join extends views_join
{
  public function build_join($select_query, $table, $view_query) {
    $select_query-&gt;addJoin('INNER', $this-&gt;table, $table['alias'], &quot;table_alias_a.id = ${table['alias']}.a_id AND table_alias_a.instance = ${table['alias']}.instance&quot;);
  }
}

This class needs to live in a file that drupal’s auto-including tendrils have already sucked in; it won’t happen automatically by virtue of any file naming conventions. I suggest sticking it in the same file the related hook_views_data lives in.

You’ll also need to clear drupal’s caches so it reruns your hook_views_data.

Addendum – altering field joins

If your view is of content or other fieldable entities, and you need to tweak a join for one of the fields, this can also be done by writing an extension to the views_join class — you just have to tell Views about it in a different way.

In hook_views_query_alter:

function MYMODULE_views_query_alter(&$view, &$query) {
// logic to filter down to just the view we want to be modifying, such as:
if($view->name == 'whatever-my-view-is') {
  if(!empty($query->table_queue['table_alias_views_assigned_to_the_table_whose_join_needs_altering']['join'])) {    $query->table_queue['table_alias_views_assigned_to_the_table_whose_join_needs_altering']['join'] = new my_fixed_up_join();
  }
}  

In my case, it was sufficient to pass no arguments to the my_fixed_up_join constructor, and have it just kick out a hardcoded string of SQL conditions:

 public function build_join($select_query, $table, $view_query) {
    $join_sql = <<<JOINSQL
   (field_collection_item_field_data_field_sessions.item_id = field_collection_item_field_data_field_sessions__field_data_field_event_date.entity_id  AND field_collection_item_field_data_field_sessions__field_data_field_event_date.entity_type = 'field_collection_item')
OR (
    node.nid = field_collection_item_field_data_field_sessions__field_data_field_event_date.entity_id AND field_collection_item_field_data_field_sessions__field_data_field_event_date.entity_type = 'node'
  )
  AND (
    field_collection_item_field_data_field_sessions__field_data_field_event_date.deleted = '0'
  )
JOINSQL;

    $select_query->addJoin('LEFT', $table['table'], $table['alias'], $join_sql);
  }

Why would you need to do this? In the example above, I was displaying several types of content in the view, and some of them attach the field directly while others are associated to the field through a “relationship” to another entity. The views UI will let you add the field as a member of the content, and will let you add a different field as a member of the related entity, but under the hood it joins to the field data table twice, creates two result columns, and you won’t be able to do things like sort on that field. What you really want is one join to the field data table with an OR condition, so the field comes up as a single column.

Posted in Drupal

Drupal webforms submissions by form_key



With the ease of entry for  basic use and API for extensibility, the Drupal webforms module is an indispensable tool. One snag with it, though, is that wherever it exposes the results of a submission in code, the submitted values are just expressed in a big 0-based numerically indexed array. Using this $submission->data array directly would make for hard-to-read and fragile code, and there doesn’t seem to be a function provided in webforms to give you the submitted data in an associative array.

Creating my own function to generate an associative array of results fortunately wasn’t that bad though, and seems like a valuable enough thing to make note of here.

/**
 * Returns an associative array of the submission data, instead of the
 * numerically indexed $submission->data
 *
 *  Inspired by http://drupal.stackexchange.com/questions/23607/how-do-i-access-webform-components-labels-on-the-congratulations-page
 *
 * @param int $nid The node id of the webform
 * @param int $sid The submission id.
 */
function webform_submission_data_keyed($nid, $sid) {
  $data = array();
  $node = node_load($nid);

  module_load_include('inc', 'webform', 'includes/webform.submissions');
  $submission = webform_get_submission($nid, $sid);

  foreach($node->webform['components'] AS $key => $component) {
    if(isset($submission->data[$key])) {
      $data[$component['form_key']] = $submission->data[$key];
    }
  }

  return $data;
}

This’ll give you a multidimensional associative array with the “machine name” of each field as keys, and arrays as values. The subarrays are 0-based numeric indicies to one or more response values the user selected.

If you wanted the human names as keys, use $data[$component[‘name’]] = $submission->data[$key].

Posted in Drupal
← Older Entries

Recent Posts

  • Reset connection rate limit in pfSense
  • Connecting to University of Minnesota VPN with Ubuntu / NetworkManager native client
  • Running nodes against multiple puppetmasters as an upgrade strategy
  • The easiest way to (re)start MySQL replication
  • Keeping up on one’s OpenSSL cipher configurations without being a fulltime sysadmin

Categories

  • Computing tips
    • Big Storage @ Home
    • Linux
  • dev
    • devops
    • Drupal
    • lang
      • HTML
      • JavaScript
      • PHP
    • SignalR
  • Product Reviews
  • Uncategorized

Tags

Apache iframe malware performance Security SignalR YWZmaWQ9MDUyODg=

Archives

  • June 2018
  • January 2018
  • August 2017
  • January 2017
  • December 2016
  • November 2016
  • July 2016
  • February 2016
  • January 2016
  • September 2015
  • March 2015
  • February 2015
  • November 2014
  • August 2014
  • July 2014
  • April 2014
  • February 2014
  • January 2014
  • October 2013
  • August 2013
  • June 2013
  • January 2013
  • December 2012
  • November 2012
  • September 2012
  • August 2012
  • July 2012

Blogroll

  • A Ph.D doing DevOps (and lots else)
  • gavinj.net – interesting dev blog
  • Louwrentius.com – zfs@home with 4x the budget, other goodies
  • Me on github
  • My old edulogon.com blog
  • My old GSOC blog
  • My wife started baking a lot
  • Now it's official, my wife is a foodie

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

EvoLve theme by Theme4Press  •  Powered by WordPress eworldproblems