eworldproblems
  • Home
  • About
  • Awesome Ideas That Somebody Else Already Thought Of
  • Perl defects
  • Books & Resources
Follow

Posts in category dev

JavaScript’s Object.keys performance with inheritance chains



Backstory: (You can cut to the chase) I’m working on a JavaScript library in which I want to identify any changes a method has made to properties of the object it is applied to, preferably in a more intelligent way than cloning all the properties and comparing them all before and after invocation. This can be done by applying the method in question to an empty object x inheriting from the true/intended object, and looking for any properties in x after the method is applied.

But, how does one efficiently look for the properties in x? In JS implementations supporting ECMAScript 5, Object.keys looks like a promising candidate. The function is decidedly O(n), but I wanted to be sure that with ES5, “n” is the properties of just my object, and not properties of all objects in the inheritance chain, like the ES3 polyfill.

(The chase:) My test on jsperf.com shows that yes, in browsers with ES5, Object.keys doesn’t waste time iterating properties of the inheritance chain and discarding them.

See also a simple html page with some primitive timings, which basically says the same thing:

<html>
<head>
<script>
function BaseClass() {}
for (var i = 0; i < 10e4; i++) {
  BaseClass.prototype["p" + i] = i;
}

function SubClass() {
	this.thing = 'x';
}
SubClass.prototype = BaseClass.prototype;

var objInTest = new SubClass();


function polyfillKeys(obj) {
  var output = [];
  for (var key in obj) {
    if (obj.hasOwnProperty(key)) {
      output.push(key);
    }
  }
  return output;
}
// could do Object.keys = polyfillKeys

var start = new Date();
var keys = Object.keys(objInTest);
console.log("Native Object.keys: " + (new Date() - start));

start = new Date();
var keys = polyfillKeys(objInTest);
console.log("Polyfill way: " + (new Date() - start));
</script>
</head>
<body></body>
</html>
Posted in JavaScript

PHP’s mysqli::reap_async_query blocks



Just a quick note about mysqli::reap_async_query since the official documentation is surprisingly lacking, and I’ve never had any luck getting user-contributed comments posted to the php pmanual.

The manual has this to say about mysqli::query(“…”, MYSQLI_ASYNC): “With MYSQLI_ASYNC (available with mysqlnd), it is possible to perform query asynchronously. mysqli_poll() is then used to get results from such queries.”

Does this mean that the only safe way to call reap_async_query is to first poll for connections with complete queries, and only call reap_async_query on connections that you know have results ready? No. Here’s a quick sample script to show what happens if you skip mysqli_poll() –

<?php
$dbhost = "localhost";
$dbuser = "someuser";
$dbpass = "somepass";
$dbschema = "db_name";

$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $dbschema);
$mysqli->query("SELECT SLEEP(5) AS sleep, 'query returned' AS result", MYSQLI_ASYNC);

echo 'Output while query is running...<br>';

$result = $mysqli->reap_async_query();
$resultArray = $result->fetch_assoc();

echo 'Got back "' . $resultArray["result"] . '" from query.';

outputs (after 5 seconds):

Output while query is running...<br>Got back "query returned" from query.

So, it appears that reap_async_query will block waiting for the query running on the mysqli instance to complete, if that query is not ready yet. So, in many cases, there is no need to use mysqli_poll() at all.

Posted in PHP

Securing SignalR to your site’s users



As you know if you read my last post on hub authorization, when a new hub connection is created, the transport connection stays open whether or not the hub connection(s) are authorized. In my case, I am using SignalR in a carefully controlled, secure authenticated environment, and as a common-sense security measure, I wanted to just close down any connections to SignalR from random clients that had no business connecting to my servers. Unfortunately, as of 1.0 RC1 there’s no built-in way to do this, but with some quick pointers from David Fowler I was able to put together something with Owin that causes my code to be run on every request first, before handing it off to SignalR. The result seemed like it might be useful enough to other people to be worthy of a blog post.

Disclaimers: This code was written for SignalR 1.0 RC1 and the version of owin that was around at that time. A visitor has posted some updates in the comments on this post’s page for SignalR 1.1.3. This applies to SignalR on top of IIS; I don’t know how it would be different if you’re hosting in some other way. This applies to requests that are routed to SignalR hubs (hub transport connections, method invocations, hubs auto-javascript.) You may need to do a similar thing for Persistent Connections too if you use them. Also, most of the calls and callbacks to and from Owin are just what “seemed to work,” but that doesn’t mean this is the best or most correct possible way to do it. We’ll just have to wait for Owin to be documented for that.

Step 1: Ditch RouteTable.Routes.MapHubs();

You should recognize this call from your application startup code (if you added SignalR via nuget, it’s in App_Start/RegisterHubs.cs). Internally, MapHubs tells Owin to route requests with certain urls right into SignalR code. This isn’t how we want to route. What we want to do is route them to our code first, where we’ll decide if we want to let the request through to SignalR or close it. So, we have to do do some routing calls ourselves:

// in addition to the usings you already have,
using Microsoft.AspNet.SignalR.SystemWeb.Infrastructure;
using Microsoft.Owin.Host.SystemWeb;
using Owin; // actually defined in a SignalR assembly and probably going to change soon...

public static void Start()
{
	// RouteTable.Routes.MapHubs(); not doing this anymore
	MapHubsWithSecurityInspector(RouteTable.Routes, "~/signalr", GlobalHost.DependencyResolver);
}

private static RouteBase MapHubsWithSecurityInspector(RouteCollection routes, string url, IDependencyResolver resolver)
{
	var existing = routes["signalr.hubs"];
	if (existing != null)
	{
		routes.Remove(existing);
	}

	var routeUrl = url.TrimStart('~').TrimStart('/');

	var locator = new Lazy(() => new BuildManagerAssemblyLocator());
	resolver.Register(typeof(IAssemblyLocator), () => locator.Value);

	return routes.MapOwinRoute("signalr.hubs", routeUrl, map => OwinSetup(map, resolver));
}

private static void OwinSetup(IAppBuilder builder, IDependencyResolver resolver)
{
	builder.Use(typeof(MySecurityInspectionHandler));

	builder.MapHubs(resolver);
}

Step 2: Write MySecurityInspectionHandler

After you’ve replaced MapHubs with the above, Owin will need you to define a MySecurityInspectionHandler class, because it will try to call it on every incoming request to routeUrl. The signature of MySecurityInspectionHandler is a bit magical, as the constructor and callback Owin actually looks for isn’t expressed anywhere in docs or as an interface or anything. Here’s what seems to work:

using System.Threading.Tasks;

// u might want to put the rest  in your app's namespace...
using AppFunc = Func<IDictionary<string, object>, Task>;

public class MySecurityInspectionHandler
{
	private AppFunc _app;

	public MySecurityInspectionHandler(AppFunc app)
	{
		_app = app;
	}

	public Task Invoke(IDictionary<string, object> environment)
	{
		// Finally, here's where we can examine the incoming request and allow or reject it
		if(RequestHasRightCookiesOrSomething(environment))
		{
			// continue processing the request
			return _app.Invoke(environment);
		} else {
			return StopRequestProcessing(environment);
		}
	}

	private Task StopRequestProcessing(IDictionary<string, object> environment)
	{
		environment.Add(ResponseStatusCode, 403);

		object responseStreamObj;
		environment.TryGetValue(ResponseBody, out responseStreamObj);
		Stream responseStream = (Stream)responseStreamObj;

		var streamWriter = new StreamWriter(responseStream);
		streamWriter.Write("403 Forbidden.");
		streamWriter.Close();

		var tcs = new TaskCompletionSource<bool>();
		tcs.SetResult(false);
		return tcs.Task;
	}
}

Then all you have to do is implement your actual logic to detect if this is a request from an authorized source or not, in a RequestHasRightCookiesOrSomething method. Just about everything you could ever want to know about the incoming request can be found in that environment IDictionary, unfortunately as objects with string keys that Intellisense couldn’t help you with. When you get to this stage just shove a breakpoint in and take a look at the Keys / Values inside environment to find what you need.

Step 3: Test

Visit some urls to your SignalR hubs manually in a browser, both with and without the expected authentication features sent along. When you do not send the authentication information, you should get a 403 forbidden back from URLs like

  • http://localhost/myproj/signalr
  • http://localhost/myproj/signalr/hubs
  • http://localhost/myproj/signalr/anything_else

When you do send authentication, /signalr visited manually currently produces an unknown transport exception, and /signalr/hubs produces the auto-javascript.

Hope this helps. Happy routing!

Posted in SignalR - Tagged Security, SignalR

SignalR Hub Authorization



I set about making use of the new hub authorization feature in SignalR 1.0 today. It was a bit difficult to obtain answers about what it actually does and how it works, so I studied the revision that introduced this feature, wrote some test code of my own, and thought I would post my findings. This applies to the current version of SignalR as of this post, which is 1.0 RC1.

Some important Hub Authorization bullet-points:

  • Authorization to connect to a hub shouldn’t be confused with authorizing the underlying persistent connection. This means if a client (I happen to use the .net one) connects to two hubs with 
    // Create a proxy to the chat service
    var hub1 = hubConnection.CreateHubProxy("HubTypeOne");
    var hub2 = hubConnection.CreateHubProxy("HubTypeTwo");
    
    // Start the connection
    hubConnection.Start().Wait();

    and both HubTypeOne and HubTypeTwo do not authorize the hub connection, the underlying transport connection still stays open. This does make sense if you think about it, for WebSocket transport at least, since being authorized to connect to a hub is not a prerequisite to being authorized to invoke methods on the hub. But, an important consequence of this is that hub authorizers offer no protection against random people/bots that want to open up zillions of TCP sessions to your server, or attempt other shenanigans. I’ll have another post discussing how to authenticate all requests to SignalR before they even hit SignalR code soon – I’ve done it today, but have to write up the post…

  • Code samples using the Authorize attribute are in the SignalR source at samples/SignalR.Hosting.AspNet.Samples/Hubs/Auth/. See also the hub authorization feature introduction in David Fowler’s blog.
  • If you don’t implement any SignalR interfaces and just use the [Authorize] attribute, then the actual authorization logic that runs (code from Microsoft.AspNet.SignalR.Hubs.AuthorizeAttribute) is
    private bool UserAuthorized(IPrincipal user)
    {
    	if (!user.Identity.IsAuthenticated)
    	{
    		return false;
    	}
    
    	if (_usersSplit.Length > 0 && !_usersSplit.Contains(user.Identity.Name, StringComparer.OrdinalIgnoreCase))
    	{
    		return false;
    	}
    
    	if (_rolesSplit.Length > 0 && !_rolesSplit.Any(user.IsInRole))
    	{
    		return false;
    	}
    
    	return true;
    }

    where

    • user is the System.Security.Principal.IPrincipal associated with the connection or hub invocation. If you know what what this is great, otherwise do what I did and write your own authorizers that tie into your app’s user management (see below.)
    • _usersSplit and _rolesSplit correspond to the attribute’s Users and Roles properties, allowing you to taylor the authorization to specific usernames or roles:
      [Authorize(Roles="Admin, Poweruser")]
      public class MySuperHub : Hub
  • There’s a shortcut extension method to apply authorization to all hubs (any authenticated user, can’t specify usernames/roles):
    GlobalHost.HubPipeline.RequireAuthentication();

    This should be called in your application startup code before creating your hub routing (MapHubs()).

Implementing your own authorizers

If you don’t like the UserAuthorized method that is the heart of the [Authorize] attribute, you can write your own authorizers. To do this, create a class that implements at least one of Microsoft.AspNet.SignalR.Hubs.IAuthorizeHubConnection or Microsoft.AspNet.SignalR.Hubs.IAuthorizeHubMethodInvocation. The parameters to these interface’s methods are very sensible and provide all sorts of information you might want in making an authorization decision – hub, method, user, cookies, and many others. If you want to apply your authorizer to hubs or hub methods by decorating them in code, you’ll of course need to subclass Attribute too. Here’s a class declaration that does all three of these things to get you started:

[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method, Inherited = true, AllowMultiple = true)]
public class CustomAuthorizeAttribute : Attribute, IAuthorizeHubConnection, IAuthorizeHubMethodInvocation

Applying your own authorizers to all hubs and method invocations

var globalAuthorizer = new CustomAuthorizeAttribute();
GlobalHost.HubPipeline.AddModule(new AuthorizeModule(globalAuthorizer, globalAuthorizer));

Again, this should be called in your application startup code before creating your hub routing (MapHubs()).

Posted in SignalR

JSON and data: URLs can show CSS Sprites a thing or two



Concept

I’m not usually as much into web performance as some, but I had a project in mind recently that included a requirement to load a large number of small photos onto a single page. On this project, I wanted to nail the technical approach and attend to every detail, from using optimizing php compilers & memcached to cooperating with the network for optimized transfer times. So, I put some thought into how to best deal with the high image counts. The solution I came up with was a little different from anything I’d seen before, and delivers complete page load times that routinely exceed 3x speedup over no optimization. It’s got a different set of benefits and drawbacks from other optimizations out there, but for typical use cases involving lots of image tags on one page it pretty much puts the standard css sprites tactic to shame.

The basic concept is similar to CSS sprites: since the major http server and browser manufacturers failed to provide the world with bug-free and interoperable http pipelineing, page authors can attain better performance by reducing the total number of requests their pages require. This minimizes time between reuquests when no data is actually getting transferred. In order to retain the multimedia richness of the page yet reduce requests for separate image resources, you need to find clever ways of packaging multiple resources into a single http response.

My twist on this was to package the multiple images by base64-encoding their byte streams and framing them with JSON, instead of packing many distinct images into one bigger image file and taking scissors to it on the client. I call it “JSON packaging.” It might sound impractical and unlikely at first, but the downsides that may initially come to mind (base64 space and decoding overhead) turn out to have surprising silver linings, and on top of that it has some nice additional advantages over css sprites.

First, to retain this post’s credibility and your interest, I’ll try to lower the first big red flag that may be in your mind at this point – when you base64 encode binary streams with 8-bit character encodings, you make them 33% bigger. A little more overhead is added by the JSON notation. How can any attempt at optimization succeed if it requires 33% more data to be transmitted? Well, on many networks it wouldn’t, but thanks to gzip compression, that’s not what ends up happening. Applying gzip compression to the completed JSON object produces an unexpected result: the compressed base64 data actually becomes smaller than the sum of the sizes of the original images, even if the original images are also gzipped. Why is this? Because gzip has more than one image’s worth of data to work with, it can identify and compress more occurrences of more patterns. For this reason you would expect better compression if you compressed all the original images as one big file (and css sprites also benefit from this phenomenon), but starting with a 33% overhead due to the base64 encoding, this outcome defies conventional wisdom.

Okay, so what are the advantages of JSON packaging over css sprites?

  • Unlike with css sprites and usual front-end performance rules, you don’t have to keep track of the dimensions of each image. With css sprites, this is a must for everything to get cropped and displayed as intended. (though I’ve seen IE fail and show edges of other sprites on tightly-packed backing images when page zoom isn’t 100% anyway.) In addition, because large batches of images are presented to the renderer in one shot, page reflow counts are kept in check when you don’t know or choose not to state the image’s size in markup. This helps to counteract the cost of the base64-decode.
  • It’s easier to dynamically generate these puppies than CSS sprites. You almost never see CSS sprites constructed for anything beyond a set of static icons and branding of the partiular site. One reason for this is because you would need to set up webservers that could collect the images of interest, render them all onto a new canvas, and recompress them into a new image file in a performant fashion. With JSON packaging, the load on the servers for generating customized packed resources is reduced.
  • It’s much better suited to photography, since it does not require an additional lossy recompress – the original image file’s bytes are what is rendered.
  • You can use appropriate color palletes, bit depths, and compression formats on an image-by-image basis, yet still achieve the enhanced gzip performance of css sprites.

Proof-of-concept

I put together a few proof-of-concepts which you can check out for yourself. I also measured some performance data from these. Each example contains two static html pages and 100 img tags. One page is unoptimized – URLs to each individual image are included in the img tags. The other page references four external <script> resources and includes no src for the img tags.

Example 1

100 small jpg’s. 4kb to 8kb per file. Unoptimized | JSON-Packaged

Example 1 is designed to approximate the typical real-world scenarios. In particular, when you’re showing a bunch of images on one page, typically each individual image isn’t all that big (in this example the long dimension is 200px.) In most all permutations of network type, browser, and client CPU speed that I have tried, the JSON-Packaged case performs better during initial page loads, and usually at least halves total page time. Here’s some specific figures from some of my test runs to give you an idea. These are approximate averages of multiple runs, not highly scientific but intended to illustrate what is typical:

Initial load times
Computer
Browser 3 Ghz Core 2 Quad (Q6600) 1.6 Ghz Atom (N270)
IE 9 508 ms packaged1,127 ms unpackaged 540 ms packaged1,672 ms unpackaged
FF 13 466 ms packaged1,117 ms unpackaged 723 ms packaged1,590 ms unpackaged

This table shows time to render the complete page if the files are cached and not modified, but the browser revalidates with the server each resource to see if it has changed:

Cached load times (revalidated)
Computer
Browser 3 Ghz Core 2 Quad (Q6600) 1.6 Ghz Atom (N270)
IE 9 172 ms packaged1,197 ms unpackaged 203 ms packaged1,200 ms unpackaged
FF 13 180 ms packaged1,062 ms unpackaged 180 ms packaged1,079 ms unpackaged

This table shows time to render the complete page when techniques like expire times have been used to inform the browser’s caching system that it’s not necessary to revalidate the resource, and no roundtrips to the server occur:

Cached load times (not revalidated)
Computer
Browser 3 Ghz Core 2 Quad (Q6600) 1.6 Ghz Atom (N270)
IE 9 120 ms packaged16 ms unpackaged 145 ms packaged64 ms unpackaged
FF 13 62 ms packaged42 ms unpackaged 375 ms packaged320 ms unpackaged

I obtained these figures over the public Internet with a latency to the server of 54 ms and bandwidth of about 15 mbps. Results favor packaging even more on more latent links. On very low latency networks like a LAN, this is not an optimization, but the whole thing is sufficiently fast either way in that environment. Similarly, if cached on the client in such a way that the client does not revalidate each resource with the server, this is not an optimization, but the inconsistent results above show that at this speed, other factors play a bigger role in overall load time, but the whole thing is sufficiently fast no matter what.

Finally, here’s a table of total bytes that need to be transferred with the packaged/unpackaged samples under various compression scenarios. As you can see, packaging also results in a little bit less data transfer:

Bytes: base64 in JSON vs. separate files
Sample 100 Separate Files Base64 in four JSON objects
best gzip fastest gzip no compression best gzip fastest gzip no compression
small jpg (mostly 200×133 px) 509,883 (-2.3%) 510,630 (-2.1%) 521,763 (reference) 460,281 (-11.8%) 473,169 (-9.3%) 696,372 (+33.5%)

Example 2

100 medium jpg’s. 40kb to 90kb per file. Unoptimized | JSON-Packaged

Example 2 is designed to experiment with how far you can take this – it uses much larger image files just to see what happens. The dimensions of these images are large enough that it doesn’t work well to show them all on one page, so this is probably not a typical real-world use case. Here the results I obtained favor the “unoptimized” version due to the increased base64 decoding overhead. Thus, I’ll leave it at that – you’re welcome to test it out yourself for some specific figures.

Implementation Notes

Once you have your image data as base64-encoded JavaScript strings, some very light JavaScript coding is all that is needed to produce images on your page. Thanks to the often-ignored data: protocol handler supported by all browsers nowadays (detail here), all you need to do is set your image’s src attribute to the base64-encoded string, with a little static header indicating how the data is encoded. In my example, I pass a JS array of base64 strings to the function unpack_images, which simply assigns it to an img already on the document. In an application you would invent a more complex scheme to map the base64 data to a particular img in the DOM, such as creating the DOM images on-the-fly or including image names in the JSON.

function unpack_images(Packages) {
  for(var i = 0; i < Packages.length; i++)
  {
    document.getElementById('img' + i).src = "data:image/jpg;base64," + Packages[i];
  }
}

Using four separate js files to package the images wasn’t an arbitrary decision – this allows the browsers to fetch the data over four concurrent TCP streams, which results in faster transfers overall due to the nature of the beast. (This is what makes this approach superior to simply stuffing all your data into one massive integrated html file.) Also, I tweaked my initial version of Example 1 a little bit to enable the base64-decoding to commence immediately when the first js file has completed transferring, while the remaining files still finish up. To do this, place your unpack_images function in a separate <script> tag, and somewhere below that in your html page add script tags to your js files with the defer attribute:

 <!-- image data package scripts -->
<script type="text/javascript" src="package-1.js" defer></script>
<script type="text/javascript" src="package-2.js" defer></script>
<script type="text/javascript" src="package-3.js" defer></script>
<script type="text/javascript" src="package-4.js" defer></script>

Then, just wrap your JSON data in a call to unpack_images directly in your package.js files (yes, it’s not a pure JSON object anymore):

 unpack_images(["base64data_for_image_1", "base64data_for_image_2", ...]);

This tweak saves 80 – 100 ms over loading all the data first and then decoding it in my Example 1.

All the content in these examples except the individual image files was generated using this script, if you want to pick it up and run with it.

Conclusion

By my analysis, this technique seems to put css sprites to shame in just about any use case. As a word of caution, though, both css sprites and JSON packaging don’t play very nice with browser caches, since they only allow the storage of one entity per http request. Consider the common case where a summary page shows dozens of product images, each linking to a product details page. The first time the user visits your site’s summary page, you are probably better off delivering the images in packages. On the other hand, you want to avoid referencing the packaged set on the product details page in case the user entered your site directly to the details page, but it would be nice if you could fetch the particular product’s image from the already cached package if you’ve got it in cache already. It’d be nice if there was a JavaScript API that allowed you to save script-generated resources to the browser cache with any url under the window.domain, but until that happens this is the ugly side of css sprites and JSON packaging.

 

Posted in HTML - Tagged performance
Newer Entries →

Recent Posts

  • Reset connection rate limit in pfSense
  • Connecting to University of Minnesota VPN with Ubuntu / NetworkManager native client
  • Running nodes against multiple puppetmasters as an upgrade strategy
  • The easiest way to (re)start MySQL replication
  • Keeping up on one’s OpenSSL cipher configurations without being a fulltime sysadmin

Categories

  • Computing tips
    • Big Storage @ Home
    • Linux
  • dev
    • devops
    • Drupal
    • lang
      • HTML
      • JavaScript
      • PHP
    • SignalR
  • Product Reviews
  • Uncategorized

Tags

Apache iframe malware performance Security SignalR YWZmaWQ9MDUyODg=

Archives

  • June 2018
  • January 2018
  • August 2017
  • January 2017
  • December 2016
  • November 2016
  • July 2016
  • February 2016
  • January 2016
  • September 2015
  • March 2015
  • February 2015
  • November 2014
  • August 2014
  • July 2014
  • April 2014
  • February 2014
  • January 2014
  • October 2013
  • August 2013
  • June 2013
  • January 2013
  • December 2012
  • November 2012
  • September 2012
  • August 2012
  • July 2012

Blogroll

  • A Ph.D doing DevOps (and lots else)
  • gavinj.net – interesting dev blog
  • Louwrentius.com – zfs@home with 4x the budget, other goodies
  • Me on github
  • My old edulogon.com blog
  • My old GSOC blog
  • My wife started baking a lot
  • Now it's official, my wife is a foodie

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

EvoLve theme by Theme4Press  •  Powered by WordPress eworldproblems