Sunday, 20 July 2014

Intersystems Cache - Write Performance

In previous blog posts I have mentioned that Cache has good write performance, in fact in certain circumstances it is faster than many other databases. However, it is rarely the case that optimising performance is not desirable. Every one loves a performance boost, it is the most universally loved product change. The closest I have ever experienced to a complaint about a performance boost is "its so fast I didn't believe it had worked".

Any way I was working on improving query performance on a server when I noticed some intermittent slow down. I traced the performance issue to a task which performs a huge burst of write operations. The write operations were so heavy that the rest of the server was experiencing slow down.

Obviously the code is under review to see if the amount of data being written is necessary, but I was curious about optimising the write operation itself. With read operations I have noticed that dynamic SQL is slower than compiled SQL which is in turn slower than direct global access, in certain circumstances this performance difference can be 10-100 times difference. I wanted to determine if this is the case with write operations as well.

The Test

I thought a simple test should be sufficient, just write to the database 10,000 times 3 small strings. 

Test 1 Dynamic SQL

For better code reuse and readability it is sometimes superior to use dynamic sql, however, there is often quite a performance penalty. I assumed this would be the slowest method and it did not disappoint. My laptop managed to complete the operation in 2.173 seconds.

Test 2 Compiled SQL

If possible use compiled SQL over the dynamic variety, not only is the syntax highlighting of benefit, the performance gain is substantial. The test ran through in 0.238 seconds, nearly a 10 times performance improvement and the code is just as readable. The only downside of compiled SQL is that occasionally in a production environment I have found that queries can stop performing correctly and you need to clear the cached queries, this is relatively uncommon though.

Test 3 Direct Global Write

Unsurprisingly writing globals was dramatically faster at 0.042 seconds. Writing directly to globals has many issues in terms of code re-use and clarity, additionally it requires a lot of extra code around it to make sure the data matches the requirements for the fields, and its not exactly ACID compliant without substantial work. That being said if the performance is insufficient for the task and hardware upgrade is out of the question it can become necessary to use global access. 

Conclusion

Avoid writing dynamic SQL if you can, compiled SQL will really boost the performance and without any real cost. If you have to move lots of data about really quickly and are happy to deal with the limitations then globals can really help get that application performing well. 


Wednesday, 16 July 2014

When will the S5 get a working CyanogedMod 11?

As of today (17th July) CyanogenMod 11 is only available in a semi-functional version. With both GPS and the camera not working it could not be considered a daily driver, and of course there are plenty of other minor niggles.

Samsung has introduced lots of great features in its latest version of TouchWiz, and I have always liked the swipe left and right on contacts to send a message or initiate a phone call since it was on the original Galaxy S, but this single likeable feature is does not compare to stock Android.

The cleaner look, the instant responsiveness and now with Android L the promise of better battery life we all know that TouchWiz cannot compete.

So far I have turned off most of Samsung's proprietary functionality, and found the finger print scanner and heart rate monitor to be next to useless.

The Galaxy S5 is still the best phone I have ever owned, but a Google Play edition or CyanogenMod would be a massive step up in usability, speed and enjoyment.

Dearth of Intersystems Code Examples

When I first started my current job I was told that the database was an Intersystems database, I had not heard of this before on researching it I found out a number of things.

1. It was ranked number 63 on DB-Engines most popular databases
2. Cache is not a good name for a piece of software if you want to find information in Google
3. There was a distinct lack of examples and people discussing real code

This 3rd issue is perhaps the most important. In programming most of what you want to do has been done by thousands of people before you. Want to parse a CSV file, well if your language does not have a built in function someone will have posted a solution on the internet. Not only will you have a working solution it will probably have been commented on by a number of programmers and improved to a level which would have probably taken you several live iterations to achieve.

The code examples in the documentation are basic, out of context and ignore even simple advise on field filtering or error handling.

Cache is one of those languages which does not appear to offer native CSV handling. Additionally when experimenting with the SQL Import/Export manager code I found that it did not appear to allow more complex CSV files where for example there are carriage return line feeds in the delimited fields. Of course Excel also breaks with these files, but then it is not exactly the best thing for dealing with CSV files...

So rather than complaining I thought I would post an example. Please if you notice any bugs or good ideas for improvement please advise so that I can improve the code.

This is a simple Parsed CSV File object. It takes a file and parses the data and stores the extracted data in a global. It also stores a total row count and a longest row. I found it useful to store the longest row when displaying the results in an HTML table so that I could set up the table with the appropriate width of empty cells.

Class User.CSVParser Extends %Persistent
{

Property Name As %String [ Required ];

Index Name On Name [ Unique ];

Property MaxColLength As %Integer;

Property TotalRows As %Integer;

ClassMethod StripDotsAndCommas(String As %String) As %String
{
 Set String=$Replace(String,",","")
 Set String=$Replace(String,".","")
 Quit String
}

ClassMethod CSVFileToGlobal(FileName, Output Name, FieldsEnclosedBy As %String = """", FieldsEscapedBy As %String = """", FieldsDelimitedBy As %String = ",") As %Status
{
 if '##class(%File).Exists(FileName)
 {
  Quit $SYSTEM.Status.Error(5001,"File "_FileName_" Does not exist")
 }
  
 Set Stream=##class(%FileBinaryStream).%New()
 Set Stream.Filename=FileName
 set lineNo=1
   
 Set ParsedCSV=..%New()
 Set FileName=$Piece(FileName,"/",$Length(FileName,"/"))
 Set FileName=$Piece(FileName,"\",$Length(FileName,"\"))
 Set FileName=..StripDotsAndCommas(FileName)
 Set Name=FileName_..StripDotsAndCommas($ZH)
 
 Set ParsedCSV.Name=Name
 Set Status=ParsedCSV.%Save()
 If $$$ISERR(Status)
 {
  Set i=0
  While $$$ISERR(Status)
  {
   Set i=i+1
   //Try 10 times to generate a unique name
   If i=10
   {
    Quit  
   }
   Set Name=FileName_..StripDotsAndCommas($ZH)
   Set ParsedCSV.Name=Name
   Set Status=ParsedCSV.%Save()
  }
  If $$$ISERR(Status)
  {
   Quit Status 
  }
 }
 Set global="^"_Name
 Set subscript="CSV"
  
 Set colLengthArray=##class(%Library.ArrayOfDataTypes).%New()
 Set maxColNo=0

 While 'Stream.AtEnd
 {
  //Do not assume CRFL are end of lines
  Set line=Stream.Read()
  Set state=1
  Set lineLength=$Length(line)
  Set Data=""
  Set colNo=1
  Set fullLine=""
  if lineLength=0
  {
   Set Data=""
  }
  else
  {
   //Loop through each character
   //State 1 - initial state (ready for start of a new field)
   //State 2 - Inside enclosed string
   //State 3 - Possibly at end of enclosed string
   //State 4/5 - Next character should be an escaped character
   //State 6 - Carriage return

   for i=1:1:lineLength 
   {
    set char=$Extract(line,i)
    If ((char=FieldsEnclosedBy) && (state=1))
    {
     //opening encapsulating character
     Set state=2
    }
    ElseIf ((char=FieldsEnclosedBy) && (state=2))
    {
     //possible closing characer
     Set state=3
    }
    ElseIf ((char=FieldsEnclosedBy) && (state=3))
    {
     //escape char / enclosing char are the same
     Set state=2
     Set Data=Data_char
    }
    ElseIf ((char=FieldsEscapedBy))
    {
     If state=1 
     {
      Set state=4
     }
     ElseIf  state=2
     {
      Set state=5
     } 
     ElseIf state=4
     {
      Set Data=Data_char
      Set state=1
     } 
     ElseIf state=5
     {
      Set Data=Data_char
      Set state=2
     } 
    }
    ElseIf ((char=FieldsDelimitedBy) && (state'=2))
    {
     //delimiter     
     Set @global@(lineNo,colNo)=Data
     
     Set colNo=colNo+1
     Set state=1
     Set Data=""
    }
    ElseIf ((char=$Char(13)) && (state'=2))
    {
     //Probably start of newline - set new line on LF
     Set state=6
    }
    ElseIf ((char=$Char(10)) && (state'=2))
    {
     //New line
     Set @global@(lineNo,colNo)=Data

     If colNo>maxColNo 
     {
      Set maxColNo=colNo
     }
     Set state=1
     Set Data=""
     Set colNo=1
     Set lineNo=lineNo+1
     Set fullLine=""
    }
    ElseIf ((char'=FieldsDelimitedBy) && (state=3))
    {
     Set Data=Data_FieldsEnclosedBy_char
     Set state=2
    }
    Else
    {
     Set Data=Data_char
     If state=4
     {
      Set state=1
     } 
     ElseIf state=5
     {
      Set state=2
     } 
    }

   }
  }
 Set ParsedCSV.MaxColLength=maxColNo
 Set ParsedCSV.TotalRows=lineNo-1
 Set Status = ParsedCSV.%Save()
 Quit Status
} }

Monday, 14 July 2014

Response to making a better password fields

Photo credit: Flickr user FORMALFALLACY via Creative Commons


Paul Lewis wrote an article attempting to reduce friction on password fields. He even referenced the wonderful xkcd cartoon on the subject, but I feel concentrating on ways of helping users to see how they .

The friction is primarily caused by the requirements, be they minimum requirements symbols numbers etc. or maximum requirements like no more than 16 characters. Being forced to modify a password to add extra features is likely to cause you to forget it, in the same way that preventing a user from certain length passwords or use of specific symbols will also annoy and lead to forgotten passwords. These restrictions add friction to the password entry screen and anything that can be done to remove these restrictions will reduce friction more than attempting to improve the clarity of the restrictions.

The xkcd cartoon points out that the "Tr0b4dor&3" password is easier for a computer to hack than "correcthorsebatterystaple", yet the second is substantially easier for a human to memorise.

To reduce friction the password field should be made as simple as possible, but assist users in creating difficult to crack passwords.

I would remove the requirement all of the criteria such as upper case, lower case, digits and symbols. Perhaps more importantly allow any character and very long passwords.  Instead of restricting the input and forcing people to use l33t passwords, teach the user how to write and memorise longer passwords.

Using a "complex" password like "Tr0b4dor&3" does not prevent brute force attacks succeeding, the only way realistically is to prevent multiple incorrect entry.

If the only requirement for a password was it has to be 6 characters, even if you enter all lower case that is 308,915,776 possible combinations. Of course if you were to try a dictionary attack then you might restrict this number down to say 50,000 realistic combinations.

Now a cluster of computers could chew through either of those numbers in no time, but if they are restricted in their number of attempts, well then even simple 6 character passwords can become pretty robust.

It is vital if you care about security to monitor the number of successful and unsuccessful attempts to access an account. Start by adding a locking timer after what you consider a reasonable number of attempts. Around 10 attempts should be sufficient and then lock the account for 30 minutes after another 10 lock it and notify the user that their account appears to be under attack.

The only password restriction I would include to the list is to help prevent the use of common passwords. A list of the top 1000 passwords and advise that they have tried to use a common password and recommend trying a different one, or adding an additional pre-fix or suffix to reduce the probability a hacker could guess it.

You can try to add more advanced features like white lists and black lists if the simple lock out above is insufficient and causes too many accounts to be locked out through a DOS attack rather than an attempt to break into accounts.

When you are creating password fields please work towards greater usability and remember that forcing a human to do something they do not want to do will inevitably lead to reduced security as they write down passwords in notebooks or in saved documents on the PC, on post-its, or saving them in a program such as keepass creating a single attack vector for multiple passwords.

Wednesday, 9 July 2014

Thursday, 19 June 2014

Intersystems Cache Indexing Shocker - How to Fix Sluggish Queries

Photo: Markus Spiske / www.temporausch.com / CC-BY

When going through a performance review of a grid on the website, I was struggling to see why the query was taking so long. It was a single table query with a single where clause which was on an indexed field. This indexed field could only contain 5 different values and the query was return fewer than 100 entries but was taking over 1 second.

On other lower powered servers the query was taking less than a third of a second and often returning more data albeit on a smaller total data set.

Checking the query plan revealed something interesting, the query was not using the index but was instead performing a whole table scan. Further testing revealed that only "=" queries use indexes. "!=" ignore indexes as do any queries with "OR" or "IN".

After further investigation I found that if you only return the ID then NOT, OR , IN queries do use indexes, however, if you return any other field including the field you are performing your where clause on you activate a full table scan

Uses index

SELECT ID FROM TABLE WHERE STATUS != 'Complete'
SELECT count(*) FROM TABLE WHERE STATUS != 'Complete'
SELECT * FROM TABLE WHERE STATUS = 'Complete'

Full table scan

SELECT STATUS FROM TABLE WHERE STATUS != 'Complete'
SELECT * FROM TABLE WHERE STATUS != 'Complete'
SELECT * FROM TABLE WHERE STATUS = 'Active' OR STATUS = 'Error'  OR STATUS = 'Cancelled'

As far as I am concerned this is unbelievable behaviour, which is still in the product as of the latest 2014 version of Intersystems Cache database!

At least it had revealed a 3 different ways of substantially improving query performance

1. Union Query 

If you know all of the data that can be contained in the field then you can perform a UNION query

SELECT * FROM TABLE WHERE STATUS = 'Active'
UNION
SELECT * FROM TABLE WHERE STATUS = 'Error'
UNION
SELECT * FROM TABLE WHERE STATUS = 'Cancelled'

This of course means more maintenance when adding a new status.

2. ID Only Select 

If you do not know the data then you can just return the ID and loop through the returned IDs to grab the data.
e.g.
SELECT ID FROM TABLE WHERE STATUS != 'Complete'
Loop...
SELECT * FROM TABLE WHERE ID=...

3. Create New Property

e.g.
SELECT * FROM TABLE WHERE Active = '1'

This requires a new property which is probably easiest to create in the OnUpdate method, again maintenance of this would be an issue.

I have not seen any documentation talking about these issues with queries. It does strike me this should be an easy thing for Intersystems to fix at the database level, but if they haven't considered it by now then I won't be holding my breath for it.

Sunday, 8 June 2014

MVC Frameworks - CakePHP a quick Evaluation

A real project always throws up more issues than following the tutorials, I find that you not only tend to learn faster, but to a deeper level.

To try and evaluate CakePHP I made up a simple application. This was a simple car database to allow direct comparison of different makes and models comparing the following characteristics:

Price, Seats, 0-60 MPH time, MPG, Age, Road tax.

Within an hour of downloading CakePHP I had a working create, read, update, delete (CRUD) application. After 2 hours the filtering of the available fields to restrict the list to various required maximums or minimums and 20 cars details were manually entered. The application was essentially built working and tested to a basic level in 2 hours with 200 lines of code.

The real advantage though was those 2 hours were not hard pressed programming, it was a relaxed pace, and I am unsure without a framework I could get a CRUD application to the same level without a reasonable amount of planning time.

The framework does force you to work in a specific way, but brings in a number of advantages.

Less Code Required

Yes the framework overall will almost certainly have more than a basic CRUD application, but I only wrote 200 lines, which included all display code, I doubt that anyone could achieve similar functionality in the same amount of code (obviously unless they "cheat" and have multi-statement lines, single line loops etc...)

This brings the sub-benefits of:

  • Higher quality code - The less you need to write the easier it is to maintain concentration on higher quality easier to understand code.
  • Fewer bugs - Every extra line invites the possibility of error, either logical or even simple spelling mistakes.
  • Reduced maintenance - Fewer lines to read means more time spent fixing an issue and less time finding it.

More code reuse

The framework will contain methods and helpers for common coding issues, rather than re-writing these simple methods, or possibly worse writing lots of semi-duplicate methods throughout your code because you lack the appropriate planning / refactoring time.

Pattern Guidance

When you approach a new project you can tackle in a multitude of ways, a framework can help take a consistent and proven methodology to solving the issue and significantly reduce planning requirements.

Performance

While raw performance of applications may be considered an issue CakePHP has a number of built in performance indicators and actively helps you spot performance problems before they have impacted on your users. My application was perhaps too basic to demonstrate any potential scaling issues, but there is certainly plenty of control over the code to prevent performance being a likely blocker to the implementation of CakePHP in a project.