Hasoft Software Engineering Introduces Regular Expression Sorting for Excel and VB/VBA
LAselect is a professional sorting framework for Microsoft Visual Basic and Excel. It is, to date, a one-of-a-kind solution. It does not rely on Excel's sorter and can therefore offer a myriad of additional sorting functionality. LAselect is fully programmable and will sort any standard or non-standard data types. It's also an indexer, list generator, data mining tool and anything in between.
Hasoft has added support for regular expression sorting to this second release of LAselect. Users can now directly sort mixed data without doing any pre- or postprocessing on the sheets. The sheets remain free from clutter and don't need any maintenance. Custom-made VBA templates are reusable, saving time in new projects. Templates developed for regular expressions usually only need a different RegEx to sort out a different problem, again saving time.
LAselect has a much broader spectrum than Excel's sorter and is indispensable when:
- nonstandard data formats have to be sorted
- the amount of rows, columns or sorting keys is exceeding Excels limits
- background sorting without changing the source is needed (index- or non-destructive sorts)
- lists or shortlists have to be generated
- Visual Basic needs to sort its arrays
- implied values have to be sorted
- data-mining unstructured data (where does it say data has to be uniform to be sorted?)
- data needs to be weighted instead of just compared
- data has a complex or fuzzy format
- anything else that needs a sorting solution that other programs have no good answer for
Most of the time workarounds can be found for any shortcomings of Excel's built-in sorter; but that invariably leads to complex spreadsheets that are hard to maintain and become rather awkward to use. For background sorting there's no workaround possible; Excel always rewrites the table it sorts.
LAselect, however, does all its work off-the-sheet in a VBA macro and never needs changes to sheet or data. Although LAselect understands and sorts Excel's valid data types, their interfaces are incompatible and changing sides changes how they're called. Luckily LAselect and Excel's sorter can peacefully coexist and old code does not need to be changed.
Some history
LAselect exists because Excel can only do so much while there are so many, many more things that could be sorted; even in a spreadsheet.
Years ago Hasoft developed additions for a delayed real-time Excel share portfolio. At each update the database therein had to be sorted in numerous non-standard ways. The highest rated information had to be exported in compressed form to several small destination lists, each with a format and meaning of their own. Since Excel was already in the picture Visual Basic for Applications as programming language seemed the logical choice.
At that time new data was coming in at 15 minute intervals. Over time those intervals became shorter and shorter, delayed real-time started to approach real-time, and the number of shares kept growing too. New ideas called for more shortlists, each with their own specific problems. This went on until Visual Basic for Applications could not handle the workload anymore and their (Hasoft's) system broke.
"Killed by evolution," said Arjen van Andel, CEO of Hasoft Software Engineering. "We decided to port the gist of the VBA macro code into a Dynamic Link Library, or add-in, in Excel speak. We also chose flexibility over performance. If our engine's core would be written entirely in C-Language --the fastest member of the C/C++ family-- we could sacrifice some speed to programming comfort. That effort resulted in LAselect, a flexible hybrid sorter that is much, much faster than the interpreted Visual Basic version of the sorter could ever be."
LAselect seamlessly integrates with VBA and is bound only by available memory, not by Excel. LAselect in its current shape can address 4 billion records and 2 billion sorting keys. However, these absolute figures compete for available memory and would never be satisfied on a 32-bit platform. LAselect has successfully sorted VBA arrays of 65 million 8-byte random number values; at which point the test machine ran out of memory because of VBA debugger overhead. This already compares to 65 times the row limit of Excel 2007.
LAselect's interface is quite simple; in-depth knowledge of Visual Basic for Applications certainly helps but is not required. If one can find the energy to make workarounds on the sheet than the same amount of effort would also create neat reusable code to drive LAselect.
LAselect's multistage approach
LAselect accepts up to three command lines where Excel only takes one:
- The compare command line selects (hence the name LAselect) the order the user wants the data to be in. While LAselect's hybrid engine takes care of all Excel's standard data formats, this is the place where non-standard data types end up for user-intervention.
- The move command line takes care of output. This stage is executed by an algorithm that eliminates roughly 90% of unnecessary data movements, greatly speeding up this CPU-intensive task. The user can intervene for special data types or circumstances (e.g. have cell formatting follow the data). This stage can be omitted and instead return a sorted index for the user to work with.
- The export command line is used when sorted data is to be exported to external lists or shortlists. The command line controls which data is harvested and where it goes on the external list. This list can also have some columns filled with cell formulas or data harvested from elsewhere. There's no intermediate storage on the sheet; this stage delivers a finished product, right off the bat. The move and export commands are mutual exclusive.
Regular Expressions and alphanumeric data; the crucial difference
An alphanumeric data type, in the sense that a processor can perform straight forward arithmetic on it, does not exist. What's called an alphanumeric value by the general public is simply a string, a line of text containing digits in the range 0-9 plus one or more characters in the range a-z or A-Z.
The consequences of this fact for sorting depend on the actual alphanumeric values and range from harmless to severe. As soon as digits and/or characters from the alphabet are stored as text values, all bets on correctly sorting them are off unless a rigorous formatting discipline assures all strings are of the same length. Mix numerical and non-numerical characters, call the string license plate, chemical nomenclature or product code and the result becomes untrustworthy; ABC100 is placed before ABC2. Even when the string contains of solely digits, the outcome is still highly unreliable as 120 is placed before 60.
One could argue that the numerical part of an alphanumeric value should be read from right to left. However, processors read strings from left to right and draw an, in this case, incorrect conclusion. Although LAselect and Excel are bound by the exact same restrictions, LAselect can call on regular expressions to become a full-fledged alphanumeric sorter that produces the only correct sequence.
Applications for LAselect
LAselect is fully programmable and consequently hard to put into a specific category. It offers new ways to look at information and puts that in the most useful order for a given task; not necessarily following alphabetical or numerical rules. LAselect has applications that range from from a simple address sort to real-time complex, fuzzy or weighted sorting of cost per click marketing campaigns, share portfolios and data-mining.
Testbench and Showcase
Hasoft provides an Excel testbench application on their website at no cost. It's templates are used as-is or serve as starting point for new developments. The testbench doubles as an interactive showcase and you're welcome to take it out for a spin for some hands-on experience.
The project was recently turned into a commercial product and is available for purchase from the website. At a promotional discount with this code: PRCOMLASMAR312013
About
Hasoft Software Engineering was founded in 1985 and develops software for International European Transport, systems-programming and office automation.
For more information or support, the free testbench, screenshots or to purchase a license please visit http://www.hasoft.com.
Arjen van Andel
+353857131475
www.hasoft.com
UK hours
Parser and alphanumeric sorting
The left image shows how textual and numerical numbers are correctly sorted. The image on the right shows the difference between the standard sorting that Excel does and LAselect's true alphanumerical sorting.
Hasoft spreadsheet with scalable images
Some images of possible output of LAselect in a spreadsheet. Spreadsheets can be scaled, as can the images.
Screenshot with regular expression sorting sample
This image is taken from the testbench/showcase and depicts 5 different cases where regular expressions are of value.