Sunday, 27 July 2014

How to improve the speed of your web page, the basics

Is saving a few seconds really important?

Image by FracFx
While practically any change, no matter how sure you are that it is a positive improvement, can be met with anger from the user base performance improvements are not one of them.

In an ideal world all webpages would appear the instant we demanded them, however achieving good performance has costs, and those costs need to be offset against the benefits. Features and complexity sells and performance is just one of those features which can be easily ignored and tough to sell, especially if it is say the difference between 10 seconds and 5 seconds.

The benefits of speed are often underrated and can be difficult to appreciate. For example a fast application is easier to use, not just because of the reduced frustration of waiting, but because it is easier to explore. If you click in the wrong place as the next piece of information appears instantly you can easily explore the options on the page. If every option takes ages not only are you more careful in your decisions taking long to complete any interaction, you also will not explore and perhaps find new better pathways to complete your task.

Another aspect of speed is not only does it allow you to explore, because it does not take long to get back to a known pathway, but it gives you further confidence that the application is well made. Confidence in the application again has unseen benefits such as the user is more likely to assume that the problem they are encountering is of their own making and not the application, reducing support calls as they try to resolve their issues themselves.

Also people can start using more optimised work practices to save themselves time, often what the developer sees as the "correct" pathway can be ignored because the "incorrect" pathway is faster.

Saving time, even seconds can add up to a lot more than the some of its parts.

What do I do next?

Well done you have managed to secure resource to improve performance, but where do you start? There is only one place to start measuring. Always start by monitoring and measuring how long the page takes to display. There is an issue that by logging time taken does actually slow down the page which might be considered an issue in a live environment. However, it is important to remember that the performance bottlenecks spotted in a test environment may not full reflect your live platform.

For the remainder of this article I am going to focus purely on the HTML, CSS and JavaScript and the page rendering performance. To truly improve performance you need to consider all factors and in fact some of the best gains can be achieved through a more holistic approach. For example a great performance improvement pattern is to anticipate the next request and cache this pathway. This works extremely well with a string find UI, before the user clicks "Find Next" the system has already calculated where the next string match is and jumps to it immediately. Simple approaches like this can get around the requirements for massive amounts of power or super efficient algorithms. Of course most of my best gains have been working on the "back end" processes, but there is plenty that can be done at the "front end" too.


It is not exactly rocket surgery to realise that reducing the amount of data sent to the browser will improve performance. This is often the simplest change to make and most of the time gives you the biggest boost in performance.

TCP/IP connections initially send a packet of 14kb followed by the rest of the data. If you are accessing the page via cellular data or perhaps through a satellite connection then latency of the connection becomes an issue. Latency can be in the hundreds of milliseconds so if you are aiming for great responsiveness then the initial 14kb becomes important. If at all possible it would be optimal to provide all of the data necessary to display the visible portion of the page in the first 14kb. To achieve this it is vital to minimise everything that is being sent

Lets take a look at a corporate website example. City Link is a large UK logistics company. On a decent broadband connection their website pops up pretty quickly, but, it could be faster.


The home page comes in at a total of 567 kb. 256 kb of the total are the images. Most of these are compressed appropriately, it is possible to compress these without any loss to image quality and achieve a reduction of around 153 kb. Now of course you could argue that some of the images could be replaced with HTML and CSS for a much larger reduction in size, if there is opportunity for this then it would obviously be a good place to investigate next.


The HTML for the page is 29 kb, by just removing carriage returns and unnecessary space this page drops to 22kb. Now the results are not quite so impressive if you consider that this will be sent over the line compressed, but we are still able to see a drop of 7kb to 6kb.


Again none of the CSS files are compressed, while the built in server compression again helps hide the benefits of compression it is still possible to save 3kb off the 30.6 kb total without the need to try and combine CSS rules, or cull any unnecessary rules.


The page loads 17 JavaScript resources totalling 245 kb of data. Again a lot of this is not compressed and it is easy to save 8kb of data just by "minifying" the JS.


The page makes a total of 52 request to get all of the required files. Each request has quite an overhead and has quite an impact on how quickly the data gets to the user as well as the amount of traffic required to pass this data in. It is possible to combine a number of the images and use image slicing to produce the images. As the images are almost entirely green, yellow and white as the pallet is not diverse combining the images will compress well and can substantially reduce the overall image requirements on the page.

Combining the CSS will have a similar effect. There are 4 css files, although one is served from Google the other 3 can be combined quite easily to reduce the number of requests.

Load the JavaScript as late as possible

JavaScript blocks the rendering of the page, moving the JavaScript from the header to the end of the body can often get the visuals up and running before the event control in the JavaScript is required.

Do not load what you do not need

I might be missing something in the code, but I cannot see where the Jquery.mousewheel.min.js is used. If this is indeed not in use then removing this code gets rid of an unnecessary http request, 1 kb of data and also stops the browser delaying render while it parses this file. Try to load JavaScript or any other resource as conditionally as possible, yes your site will Cache resources so the overall effects might not be too bad, but getting rid of an unnecessary parsing will improve the responsiveness even just a little.

Fix any broken HTML

When the HTML is in error the browser triggers its error handling, this is only a fraction slower, but may trigger various quirks modes, this probably leads to extra CSS and HTML to "fix" the problem. Running a validator such as HTML Tidy is invaluable in preventing this path. City Link only triggers 7 warnings, this is far fewer than the majority of pages I have looked at, but these 7 could quickly be addressed for a perfect score.

Sadly I have used technologies which prevent you from getting a perfect score for example they force you to use lang or type='javascript' when you are working in HTML5 because the source will not compile otherwise. fixing broken HTML is not always possible, but when it is try to use HTML and rid your page of those warnings.

Do not use XHTML

Some websites use XHTML such as City Link. Besides the fact that support for this version of HTML within browser is poor and they tend to just parse as if it were HTML, it is not compatible the with XHTML 2 draft and perhaps more importantly it requires more verbose HTML than HTML 5. Save precious bandwidth and choose HTML 5.

Friday, 25 July 2014

The big constraint in Development and how it can bring down a team

I always consider the primary constraint in software development to be time. While technically this constraint could be considered a function of money, it is difficult for a developer to assert direct influence over budget. Most organisations require developers to go quite high up the chain of influence in order to try affect the proportion of budget development have access to. Supply and demand factors for the product that development are supplying is difficult for them to directly influence, in most corporate structures sales and marketing are the real influencers of this factor. Yes they rely on the quality of product that development provide, but it is unlikely that and improvement in the product quality will have an equivalent impact on sales. The ability of sales and marketing to reach new customers and alter demand is a much more influential factor.

It is one of the great features of software development that you can essentially do anything the customer asks for, but only if you have enough time. Developers realise how precious this commodity is. There will always be bugs in the product that can be fixed, there will always be ways you can enhance the existing features and there will always be brand new features which can benefit the customer. As your team is not infinite in size and your system resources are finite you are always unable to write super efficient bug free code covering every desired feature.

All the time a developer is working on a given section of work they will be performing cost benefit decisions, how much time do they spend optimising their changes for performance, how much time do they spend reviewing/testing it for bugs or coding defensively against them. They weigh these factors up against what they perceive as the risks of the code and against the relative importance of the change.

Software developers have normally entered the field due to a natural aptitude for logic. Logical reasoning will commonly be a very important to how they perceive the world. This is critical in understanding how best to manage a team of developers. A manager must be able to provide logical reasoning for the decisions he makes and more importantly communicate these reasons effectively to their team.

The development team will often know what processes appear to be impeding their time, they will be aware of many of the issues the product they are working on, and have a good idea of how long each will take to fix. They will probably even be able to provide you with an ordered list based on their cost benefit analysis of what should be done. Now if that list tallies with the managers scheduling plans then the team will be functioning smoothly, the developers will naturally respect the managers decisions. However, if that list does not tally then there is potential for a break down within the team dynamic.

The manager will need to explain their priority order clearly and provide rational explanations for the differences in opinion. I have not yet been in a team where the developer has not altered their opinion in the face of rational argument. It is vital that the manager should be open to change in the same way. If the manager cannot provide a consistent rational argument to influence a change in opinion of the developer they should strongly consider if their ideas are the best way forward. If the decision making is obviously particularly contentious then it is important for the manager to review their decisions at the end of the process and if they were wrong admit to this, if they were right they should provide this evidence to the team. Everyone has some level of ego so obviously this should not be presented in a gloating manner, however, as I have stated developers tend to work from a logical stand point, if the manager adequately proves their decision making process was effective then the developer will be able to maintain their respect and the team should continue to function well.

The more the developers see the decisions of their manager as irrational the more respect they will lose, the lose of respect will lead to reduced communication and a drop in the effectiveness of the team.

Developers normally want to release the best products possible and realise that the product quality and breadth is influenced by the time available to code any seemingly irrational scheduling or processes eat into this time and are seen by the developer as barriers thrown up to stop them producing a great product. I have not spoken to a developer yet who does not agree with the agile manifesto and I believe it is clear that this document was borne out of experiencing frustratingly irrational management processes that the developers saw as restricting the precious resource of time.

Sunday, 20 July 2014

Intersystems Cache - Write Performance

In previous blog posts I have mentioned that Cache has good write performance, in fact in certain circumstances it is faster than many other databases. However, it is rarely the case that optimising performance is not desirable. Every one loves a performance boost, it is the most universally loved product change. The closest I have ever experienced to a complaint about a performance boost is "its so fast I didn't believe it had worked".

Any way I was working on improving query performance on a server when I noticed some intermittent slow down. I traced the performance issue to a task which performs a huge burst of write operations. The write operations were so heavy that the rest of the server was experiencing slow down.

Obviously the code is under review to see if the amount of data being written is necessary, but I was curious about optimising the write operation itself. With read operations I have noticed that dynamic SQL is slower than compiled SQL which is in turn slower than direct global access, in certain circumstances this performance difference can be 10-100 times difference. I wanted to determine if this is the case with write operations as well.

The Test

I thought a simple test should be sufficient, just write to the database 10,000 times 3 small strings. 

Test 1 Dynamic SQL

For better code reuse and readability it is sometimes superior to use dynamic sql, however, there is often quite a performance penalty. I assumed this would be the slowest method and it did not disappoint. My laptop managed to complete the operation in 2.173 seconds.

Test 2 Compiled SQL

If possible use compiled SQL over the dynamic variety, not only is the syntax highlighting of benefit, the performance gain is substantial. The test ran through in 0.238 seconds, nearly a 10 times performance improvement and the code is just as readable. The only downside of compiled SQL is that occasionally in a production environment I have found that queries can stop performing correctly and you need to clear the cached queries, this is relatively uncommon though.

Test 3 Direct Global Write

Unsurprisingly writing globals was dramatically faster at 0.042 seconds. Writing directly to globals has many issues in terms of code re-use and clarity, additionally it requires a lot of extra code around it to make sure the data matches the requirements for the fields, and its not exactly ACID compliant without substantial work. That being said if the performance is insufficient for the task and hardware upgrade is out of the question it can become necessary to use global access. 


Avoid writing dynamic SQL if you can, compiled SQL will really boost the performance and without any real cost. If you have to move lots of data about really quickly and are happy to deal with the limitations then globals can really help get that application performing well. 

Wednesday, 16 July 2014

When will the S5 get a working CyanogedMod 11?

As of today (17th July) CyanogenMod 11 is only available in a semi-functional version. With both GPS and the camera not working it could not be considered a daily driver, and of course there are plenty of other minor niggles.

Samsung has introduced lots of great features in its latest version of TouchWiz, and I have always liked the swipe left and right on contacts to send a message or initiate a phone call since it was on the original Galaxy S, but this single likeable feature is does not compare to stock Android.

The cleaner look, the instant responsiveness and now with Android L the promise of better battery life we all know that TouchWiz cannot compete.

So far I have turned off most of Samsung's proprietary functionality, and found the finger print scanner and heart rate monitor to be next to useless.

The Galaxy S5 is still the best phone I have ever owned, but a Google Play edition or CyanogenMod would be a massive step up in usability, speed and enjoyment.

Dearth of Intersystems Code Examples

When I first started my current job I was told that the database was an Intersystems database, I had not heard of this before on researching it I found out a number of things.

1. It was ranked number 63 on DB-Engines most popular databases
2. Cache is not a good name for a piece of software if you want to find information in Google
3. There was a distinct lack of examples and people discussing real code

This 3rd issue is perhaps the most important. In programming most of what you want to do has been done by thousands of people before you. Want to parse a CSV file, well if your language does not have a built in function someone will have posted a solution on the internet. Not only will you have a working solution it will probably have been commented on by a number of programmers and improved to a level which would have probably taken you several live iterations to achieve.

The code examples in the documentation are basic, out of context and ignore even simple advise on field filtering or error handling.

Cache is one of those languages which does not appear to offer native CSV handling. Additionally when experimenting with the SQL Import/Export manager code I found that it did not appear to allow more complex CSV files where for example there are carriage return line feeds in the delimited fields. Of course Excel also breaks with these files, but then it is not exactly the best thing for dealing with CSV files...

So rather than complaining I thought I would post an example. Please if you notice any bugs or good ideas for improvement please advise so that I can improve the code.

This is a simple Parsed CSV File object. It takes a file and parses the data and stores the extracted data in a global. It also stores a total row count and a longest row. I found it useful to store the longest row when displaying the results in an HTML table so that I could set up the table with the appropriate width of empty cells.

Class User.CSVParser Extends %Persistent

Property Name As %String [ Required ];

Index Name On Name [ Unique ];

Property MaxColLength As %Integer;

Property TotalRows As %Integer;

ClassMethod StripDotsAndCommas(String As %String) As %String
 Set String=$Replace(String,",","")
 Set String=$Replace(String,".","")
 Quit String

ClassMethod CSVFileToGlobal(FileName, Output Name, FieldsEnclosedBy As %String = """", FieldsEscapedBy As %String = """", FieldsDelimitedBy As %String = ",") As %Status
 if '##class(%File).Exists(FileName)
  Quit $SYSTEM.Status.Error(5001,"File "_FileName_" Does not exist")
 Set Stream=##class(%FileBinaryStream).%New()
 Set Stream.Filename=FileName
 set lineNo=1
 Set ParsedCSV=..%New()
 Set FileName=$Piece(FileName,"/",$Length(FileName,"/"))
 Set FileName=$Piece(FileName,"\",$Length(FileName,"\"))
 Set FileName=..StripDotsAndCommas(FileName)
 Set Name=FileName_..StripDotsAndCommas($ZH)
 Set ParsedCSV.Name=Name
 Set Status=ParsedCSV.%Save()
 If $$$ISERR(Status)
  Set i=0
  While $$$ISERR(Status)
   Set i=i+1
   //Try 10 times to generate a unique name
   If i=10
   Set Name=FileName_..StripDotsAndCommas($ZH)
   Set ParsedCSV.Name=Name
   Set Status=ParsedCSV.%Save()
  If $$$ISERR(Status)
   Quit Status 
 Set global="^"_Name
 Set subscript="CSV"
 Set colLengthArray=##class(%Library.ArrayOfDataTypes).%New()
 Set maxColNo=0
 Set state=1
 Set Data=""
 Set colNo=1
 Set fullLine=""
While 'Stream.AtEnd { //Do not assume CRFL are end of lines Set line=Stream.Read() if lineLength=0 { Set Data="" } else { //Loop through each character //State 1 - initial state (ready for start of a new field) //State 2 - Inside enclosed string //State 3 - Possibly at end of enclosed string //State 4/5 - Next character should be an escaped character //State 6 - Carriage return for i=1:1:lineLength { set char=$Extract(line,i) If ((char=FieldsEnclosedBy) && (state=1)) { //opening encapsulating character Set state=2 } ElseIf ((char=FieldsEnclosedBy) && (state=2)) { //possible closing characer Set state=3 } ElseIf ((char=FieldsEnclosedBy) && (state=3)) { //escape char / enclosing char are the same Set state=2 Set Data=Data_char } ElseIf ((char=FieldsEscapedBy)) { If state=1 { Set state=4 } ElseIf state=2 { Set state=5 } ElseIf state=4 { Set Data=Data_char Set state=1 } ElseIf state=5 { Set Data=Data_char Set state=2 } } ElseIf ((char=FieldsDelimitedBy) && (state'=2)) { //delimiter Set @global@(lineNo,colNo)=Data Set colNo=colNo+1 Set state=1 Set Data="" } ElseIf ((char=$Char(13)) && (state'=2)) { //Probably start of newline - set new line on LF Set state=6 } ElseIf ((char=$Char(10)) && (state'=2)) { //New line Set @global@(lineNo,colNo)=Data If colNo>maxColNo { Set maxColNo=colNo } Set state=1 Set Data="" Set colNo=1 Set lineNo=lineNo+1 Set fullLine="" } ElseIf ((char'=FieldsDelimitedBy) && (state=3)) { Set Data=Data_FieldsEnclosedBy_char Set state=2 } Else { Set Data=Data_char If state=4 { Set state=1 } ElseIf state=5 { Set state=2 } } } }
 Set ParsedCSV.MaxColLength=maxColNo
 Set ParsedCSV.TotalRows=lineNo-1
 Set Status = ParsedCSV.%Save()
 Quit Status
} }

Monday, 14 July 2014

Response to making a better password fields

Photo credit: Flickr user FORMALFALLACY via Creative Commons

Paul Lewis wrote an article attempting to reduce friction on password fields. He even referenced the wonderful xkcd cartoon on the subject, but I feel concentrating on ways of helping users to see how they .

The friction is primarily caused by the requirements, be they minimum requirements symbols numbers etc. or maximum requirements like no more than 16 characters. Being forced to modify a password to add extra features is likely to cause you to forget it, in the same way that preventing a user from certain length passwords or use of specific symbols will also annoy and lead to forgotten passwords. These restrictions add friction to the password entry screen and anything that can be done to remove these restrictions will reduce friction more than attempting to improve the clarity of the restrictions.

The xkcd cartoon points out that the "Tr0b4dor&3" password is easier for a computer to hack than "correcthorsebatterystaple", yet the second is substantially easier for a human to memorise.

To reduce friction the password field should be made as simple as possible, but assist users in creating difficult to crack passwords.

I would remove the requirement all of the criteria such as upper case, lower case, digits and symbols. Perhaps more importantly allow any character and very long passwords.  Instead of restricting the input and forcing people to use l33t passwords, teach the user how to write and memorise longer passwords.

Using a "complex" password like "Tr0b4dor&3" does not prevent brute force attacks succeeding, the only way realistically is to prevent multiple incorrect entry.

If the only requirement for a password was it has to be 6 characters, even if you enter all lower case that is 308,915,776 possible combinations. Of course if you were to try a dictionary attack then you might restrict this number down to say 50,000 realistic combinations.

Now a cluster of computers could chew through either of those numbers in no time, but if they are restricted in their number of attempts, well then even simple 6 character passwords can become pretty robust.

It is vital if you care about security to monitor the number of successful and unsuccessful attempts to access an account. Start by adding a locking timer after what you consider a reasonable number of attempts. Around 10 attempts should be sufficient and then lock the account for 30 minutes after another 10 lock it and notify the user that their account appears to be under attack.

The only password restriction I would include to the list is to help prevent the use of common passwords. A list of the top 1000 passwords and advise that they have tried to use a common password and recommend trying a different one, or adding an additional pre-fix or suffix to reduce the probability a hacker could guess it.

You can try to add more advanced features like white lists and black lists if the simple lock out above is insufficient and causes too many accounts to be locked out through a DOS attack rather than an attempt to break into accounts.

When you are creating password fields please work towards greater usability and remember that forcing a human to do something they do not want to do will inevitably lead to reduced security as they write down passwords in notebooks or in saved documents on the PC, on post-its, or saving them in a program such as keepass creating a single attack vector for multiple passwords.

Wednesday, 9 July 2014