VersiĆ³n en castellano

Thursday 16 February 2012

Resurrecting Data Types.

The basic data types found in most languages are almost always the same.You'll find ints,strings,floats,doubles and not many more.Well, saying that something is an "int" isnt exactly a great jump in terms of describing the nature of the data you want to store in it.
That's why i like how it's usual to find in C code the use of typedef to create aliases of basic types.So you can define "Height" as an alias of "int", that helps a lot in terms of code readability.
If you need is to add more "personality" to your data, the next step are classes (i know, C/C++ also have structs).

A typical piece of data in the web, is an user login name.Looking at the options a programmer has in his hands (basic data types), he will , in almost all cases, use a string as its type.Not many will use a class to model it, specially when in the User table, the Login field is a varchar.
Also, ORM's that use the table's schema to gather information about a class, will also conclude that Login is a string.

But "Login" isn't just a string.It has many properties.For starters, it has a maximum length, and also, a minimum length.While the first property can be gathered by an ORM from the "User" table schema,what about the second?. A valid "Login" also has to match a certain regular expression.And, from the User table perspective, a "Login" field must be unique, and cannot be NULL.
Not only that.A "Login" field, when shown in a form, will be rendered using an input of type "text".And, possibly, when shown anywhere else, will have a typical action, as visiting the user profile, or something else.

But, the most we usually say about the "Login" field is : varchar(15).

Of course, in a web site, the "Login" field will be validated somewhere, shown somewhere, and stored in the user table in some other place.
But, all those actions, will not be executed as inherent abilities of the Login data type, (because it doesnt exist), but as a certain form validation code, or as a certain template rendering process.
There is no guarantee that, given two forms that use the Login field, both of them will restrict the user input to 15 characters.It's in the programmer hands the responsability of adding that restriction.If later it's decided the Login fields should allow 20 characters,it's again in the programmer hands to change those form inputs to allow the new size.

Everytime i see a framework claiming to be DRY, i check for this feature: can i specify that my "login" field has a minimum of 4 characters and a maximum of 15,and uses this regular expression, so it'll take care of drawing the right input, tell the user about those limits, and do all the syntax check before letting my code start using it?And will the Login field be represented in the DB as a varchar(15) too?

If that framework cant do just one thing of all those..sorry..it's not DRY.You'll be repeating something you'd specify just once.You'll have to give the system the same information in the "Register" form, and in the "Log in" form.

Am i saying, then, the Login field should be a class? Yes.It's a class.More than that: it's a complete MVC system.It has storage, manipulation, and display properties.

Then, what about the other usual fields we can find in an "User" table...like "Name", "Surname", "Photo", "City".. Yep.All those are classes too.
And you want them to be classes.You'll be using those just all the time in your life as a web developer, so you want them to be classes and reuse them in your next projects.

But still, i dont want to call "classes" to those entities, as those are, from your application point of view, just "data".That's why i prefer to call them "Data types".Because in the WWW, a "Login", is a basic data type, in the same way an int is a basic data type for PHP.

Yes, i guess it's a bit boring to write classes for all those types, specially when your schema gets bigger.But,fortunately, most of those types can be described easily:
<?php
     $datatypes["Login"]=array(
              "EXTENDS"=>"String",
              "MINLENGTH"=>5,
              "MAXLENGTH"=>15,
              "REGEX"=>"/^[a-zA-Z0-9]{5,15}$/",
              "DEFAULT_INPUT"=>"text",
              "DEFAULT_LABEL"=>"Login",              
              "DEFAULT_ACTION"=>"http://..../?viewProfile=%value%")
              /*...*/);


Yes, i accept i'm repeating myself here.The Regex contains information already specified in the MINLENGTH and MAXLENGTH fields.But it's not worth the effort to avoid that repetition (a possible way would be some sort of syntax like "/^[a-zA-Z0-9]{###MINLENGTH###,###MAXLENGTH###}$/" , but, again, not worth the effort.

You may miss some information i mentioned before, like if this field is required or not.But that kind of restrictions are not imposed by the type itself.Also, there are many more fields you could add here.For example, a default value.Or values considered to be "NULL".For example, when using dates, you may choose to consider "1970-01-01" as a NULL value.

Of course, certain "abilities" of a data type, like validation, may require more than just an array to work.Then, it's as simple as creating a class for it, with a custom validation method, and pass the description array to its base class.

So, i guess it'd be a good idea to resurrect data types.Even if they're modelled as classes, "Login","Address",and so, really are, in nature, basic data types, more abstract than "string","int" or "long".

And types have more uses than just being the basic data types of the system Model.That'll be explored in the next post!

No comments:

Post a Comment