Wednesday, 16 July 2014

Dearth of Intersystems Code Examples

When I first started my current job I was told that the database was an Intersystems database, I had not heard of this before on researching it I found out a number of things.

1. It was ranked number 63 on DB-Engines most popular databases
2. Cache is not a good name for a piece of software if you want to find information in Google
3. There was a distinct lack of examples and people discussing real code

This 3rd issue is perhaps the most important. In programming most of what you want to do has been done by thousands of people before you. Want to parse a CSV file, well if your language does not have a built in function someone will have posted a solution on the internet. Not only will you have a working solution it will probably have been commented on by a number of programmers and improved to a level which would have probably taken you several live iterations to achieve.

The code examples in the documentation are basic, out of context and ignore even simple advise on field filtering or error handling.

Cache is one of those languages which does not appear to offer native CSV handling. Additionally when experimenting with the SQL Import/Export manager code I found that it did not appear to allow more complex CSV files where for example there are carriage return line feeds in the delimited fields. Of course Excel also breaks with these files, but then it is not exactly the best thing for dealing with CSV files...

So rather than complaining I thought I would post an example. Please if you notice any bugs or good ideas for improvement please advise so that I can improve the code.

This is a simple Parsed CSV File object. It takes a file and parses the data and stores the extracted data in a global. It also stores a total row count and a longest row. I found it useful to store the longest row when displaying the results in an HTML table so that I could set up the table with the appropriate width of empty cells.

Class User.CSVParser Extends %Persistent
{

Property Name As %String [ Required ];

Index Name On Name [ Unique ];

Property MaxColLength As %Integer;

Property TotalRows As %Integer;

ClassMethod StripDotsAndCommas(String As %String) As %String
{
 Set String=$Replace(String,",","")
 Set String=$Replace(String,".","")
 Quit String
}

ClassMethod CSVFileToGlobal(FileName, Output Name, FieldsEnclosedBy As %String = """", FieldsEscapedBy As %String = """", FieldsDelimitedBy As %String = ",") As %Status
{
 if '##class(%File).Exists(FileName)
 {
  Quit $SYSTEM.Status.Error(5001,"File "_FileName_" Does not exist")
 }
  
 Set Stream=##class(%FileBinaryStream).%New()
 Set Stream.Filename=FileName
 set lineNo=1
   
 Set ParsedCSV=..%New()
 Set FileName=$Piece(FileName,"/",$Length(FileName,"/"))
 Set FileName=$Piece(FileName,"\",$Length(FileName,"\"))
 Set FileName=..StripDotsAndCommas(FileName)
 Set Name=FileName_..StripDotsAndCommas($ZH)
 
 Set ParsedCSV.Name=Name
 Set Status=ParsedCSV.%Save()
 If $$$ISERR(Status)
 {
  Set i=0
  While $$$ISERR(Status)
  {
   Set i=i+1
   //Try 10 times to generate a unique name
   If i=10
   {
    Quit  
   }
   Set Name=FileName_..StripDotsAndCommas($ZH)
   Set ParsedCSV.Name=Name
   Set Status=ParsedCSV.%Save()
  }
  If $$$ISERR(Status)
  {
   Quit Status 
  }
 }
 Set global="^"_Name
 Set subscript="CSV"
  
 Set colLengthArray=##class(%Library.ArrayOfDataTypes).%New()
 Set maxColNo=0
 Set state=1
 Set Data=""
 Set colNo=1
 Set fullLine=""
While 'Stream.AtEnd { //Do not assume CRFL are end of lines Set line=Stream.Read() if lineLength=0 { Set Data="" } else { //Loop through each character //State 1 - initial state (ready for start of a new field) //State 2 - Inside enclosed string //State 3 - Possibly at end of enclosed string //State 4/5 - Next character should be an escaped character //State 6 - Carriage return for i=1:1:lineLength { set char=$Extract(line,i) If ((char=FieldsEnclosedBy) && (state=1)) { //opening encapsulating character Set state=2 } ElseIf ((char=FieldsEnclosedBy) && (state=2)) { //possible closing characer Set state=3 } ElseIf ((char=FieldsEnclosedBy) && (state=3)) { //escape char / enclosing char are the same Set state=2 Set Data=Data_char } ElseIf ((char=FieldsEscapedBy)) { If state=1 { Set state=4 } ElseIf state=2 { Set state=5 } ElseIf state=4 { Set Data=Data_char Set state=1 } ElseIf state=5 { Set Data=Data_char Set state=2 } } ElseIf ((char=FieldsDelimitedBy) && (state'=2)) { //delimiter Set @global@(lineNo,colNo)=Data Set colNo=colNo+1 Set state=1 Set Data="" } ElseIf ((char=$Char(13)) && (state'=2)) { //Probably start of newline - set new line on LF Set state=6 } ElseIf ((char=$Char(10)) && (state'=2)) { //New line Set @global@(lineNo,colNo)=Data If colNo>maxColNo { Set maxColNo=colNo } Set state=1 Set Data="" Set colNo=1 Set lineNo=lineNo+1 Set fullLine="" } ElseIf ((char'=FieldsDelimitedBy) && (state=3)) { Set Data=Data_FieldsEnclosedBy_char Set state=2 } Else { Set Data=Data_char If state=4 { Set state=1 } ElseIf state=5 { Set state=2 } } } }
 Set ParsedCSV.MaxColLength=maxColNo
 Set ParsedCSV.TotalRows=lineNo-1
 Set Status = ParsedCSV.%Save()
 Quit Status
} }

No comments: