Monday, December 24, 2007

Happy holidays

Hopefully most of the readers will be enjoying these holidays with their families and friends, so I'd just like to send you all my votes for an Happy Christmas and a great 2008!

I have the feeling that 2008 will be a great year for all people involved with IBM Informix. 2007 was a remarkable step in the right direction, and we should see more in the next year.

I'll just leave here a few notes, but none of this is new stuff:

and I'll try to complete the second part of the PAM authentication article as soon as possible...

Wednesday, November 07, 2007

IDS 11 Developer Edition: Available

IDS 11 Developer Edition is already available for download from:

http://www.ibm.com/software/data/informix/downloads.html

Choose "Informix Dynamic Server Enterprise Edition Trials" and then choose the right download for your needs.
This version has no time limit. Only a few restrictions like maximum storage, memory and sessions. At the time of writing the file sizes are wrong (expressed in KB and not in MB) and there are possibly a few ports still unavailable.

Regards.

Saturday, November 03, 2007

Evento com Informix 11: 14 de Novembro no IBM Forum Lisboa

As you may have noticed in the post title, this article will not be written in English due to it's local nature. It's about an IDS 11 event that IBM will promote in Lisbon.

A IBM Portugal está a organizar um evento sobre Information Management a realizar no próximo dia 14 de Novembro no IBM Forum Lisboa. Este evento contará entre outras sessões com uma inteiramente dedicada à versão 11 do IBM Informix Dynamic Server lançado (como 'Cheetah') em Julho passado.

Toda a informação sobre o evento poderá ser consultada aqui, incluindo a agenda, localização e página de inscrição (necessária).

Os participantes terão direito a almoço, parqueamento gratuito (caso se desloque de carro), um livro sobre as novas funcionalidades e mais alguma informação sobre o produto. A IBM Portugal está a desenvolver esforços para facultar aos participantes uma cópia da Developer Edition do IDS 11 recentemente anunciada.
Será uma excelente oportunidade para estreitar contactos com a equipa técnica e comercial da IBM e saber quais foram as melhorias introduzidas nesta versão. Para além disto poderá ainda trocar impressões com outros clientes e parceiros IBM.

O IBM Forum Lisboa fica nas instalações da IBM no Parque das Nações, a pouca distância do CC Vasco da Gama e da estação do Oriente.

Existe também a possibilidade de a IBM poder vir a realizar um workshop dedicado a esta versão para parceiros e clientes. Este workshop seria feito à semelhança de outros que têm ocorrido na Europa, e será completamente hands on. Se tiver interesse na participação num evento deste tipo não deixe de o manifestar à equipa da IBM que lhe poderá dar mais detalhes sobre o formato, duração e possíveis datas.

Espero poder encontrar clientes e parceiros nesta sessão e estarei ao vosso dispor para o que entenderem necessário.

New Informix related software

IBM took the opportunity of having lots of customers and partners in the Information On Demand (IOD) conference in Las Vegas to make some important software announcements related to Informix:

  • IDS 11 Developer Edition
    This will be an IDS release, oriented for software developers and with a few limitations.
    This edition will be available free of charge, but for development purposes only. It won't be usable for commercial purposes. The limits announced are reasonable for it's intended purpose: 1GB of memory, 8GB of data, 1 CPUVP (I'd change this one) and 20 concurrent sessions. There is also a new site dedicated to development using IDS:
    http://www.ibm.com/software/data/informix/ids/ad/
    The downloads should be available around November 5th.
  • IDS for MacOS
    There was also an announcement of a port of IDS to Apple's OS. MacOS is a favorite platform for many people (specially in the US). I believe the fact that you can run MacOS, Windows and Linux natively in the same hardware box turns it into an appealing tool to a lot of people (specifically students, geeks and other kinds of computer related persons). They also look cool, and have a reputation of being robust machines.
    All this have probably influenced IBM, and there are plans for a MacOS Developer Edition port
  • Data Studio
    This is another IBM tool based around the Eclipse framework designed for DBAs and application developers. It's an ambitious project that wants to facilitate the development of Java applications, XML and Web Services related to database applications and SQL development. It supports all the relational databases from IBM (Cloudscape/Derby, DB2 LUW, DB2 i5/OS, DB2 Z/OS and Informix). You'll have more features available if you use DRDA connections introduced in Informix in version 11. The best way to understand this is to look at it's Web page ( http://www.ibm.com/software/data/studio/ ) and give it a try. You can find it's download link on the mentioned page.

You can see more on these topics in Guy Bowerman's Blog, more specifically in here
I wish you a lot of free time to enjoy this new software releases, especially if you're a developer and use, or would like to use IBM Informix Dynamic Server.

Informix user authentication: PAM for the rescue (part 1)

Traditionally IDS authentication relyied only on the base OS. This approach has a few advantages, like simplicity and security, but currently can be considered too limited.

In todays environment most customers prefer to have a centralized user management facility, normally an LDAP server or an Active Directory server.
By the time IBM launched 9.4 this was already evident, so IBM had to do something to bring other forms of authentication to IDS. It could have followed other vendors approach like explicitly including code to authenticate on an LDAP server or even integrate this authentication with other IBM products like Tivoli's Identity Manager. This would probably lock the customers to IBM products like other vendors have done for their products.
IBM choice was better, in my opinion, since it was decided that the best option would be to take on an independent authentication mechanism that already existed and that allows the user to choose where and how he wants to have his authentication facilities. This mechanism is PAM and it appeared in IDS 9.40.UC2. First it was limited only to a few platforms and to 32bit IDS versions. Currently it is supported in both 32 and 64 bits in all major platforms (AIX, HP-UX, Solaris and Linux - Windows allows AD authentication)

In this article I will try to show how simple it is to use it, and specially the power and flexibility you'll gain by implementing this. There are still a few annoyances but they should be fixed in a near future. I will also include a step by step tutorial on how to setup an IDS instance to use an AD or OpenLdap server as it's authentication infra-structure.

So, this is a long and eventually a bit complex article. I'd advise you to read it slowly and if you're interested by the subject try to mimic the tutorial so you can gain hands on knowledge on these areas.

Let's start by a brief introduction of PAM. The acronym stands for Plugin Authentication Module. It's no more than a framework that allows independence between the applications that need the authentication services (in our article, this will be IDS) and the infra-structures that provide them (OpenLDAP or Active Directory). The applications have a set of well known functions that request authentication, and the PAM framework allows communication between requesters and providers, using modules that implement them.
In practice the implementations are made in modules, where each one implements a set of the functions specified in the PAM framework. This modules must be specified in a configuration file by the administrator and you can combine the modules you need, so that you achieve the desired goal. A set of modules that you define as the needed set to achieve your purposes (for authentication of IDS users for example) must be identified by a unique service name.
This service name is then used to instruct the application which modules it will use. In some OS this service name will be a field in an unique, system wide, text configuration file, while in others it will be the name of a file located in a specific directory (usually /etc/pam.d). We'll see several examples further ahead...

This modules can be stacked, and each module will be used in sequence. Each module will validate in a specific authentication system. For example, there are modules that talk to LDAP servers, another validate if the user is or is not specified in a file, other checks the hour and decides if the user is able to authenticate at that time, other emulates the hosts.equiv/.rhosts mechanism and there are even several modules to check the users against different RDBMS (including Informix)
When we specify the sequence of modules in a configuration file we have to specify the following:

  1. The area of funcionality.
    Can be "authentication", "account", "session" or "password". This indicates the type of function that we'll request from the module:
    1. "authentication" deals with the operations that guarantee the user is who he says he is
    2. "account" defines if the user account is valid under the circumstances (the account can be blocked, refused at specific times etc.)
    3. "session" implements functionalities like logging etc
    4. "password" deals with credential management like password changing
  2. The module control flag. Can be "required", "sufficient", "optional" or "requisite". This defines the influence that this module will have on the stack of modules. In other words you can define one module as:
    1. "sufficient" for the authentication functionality
      a success on this module will imply a success on the authentication funcionality
    2. "optional"
      it's result won't influence the final functionality result
    3. "required"
      a failure in it will cause a failure in the functionality, but all other modules will be called
    4. "requisite"
      a failure in it will cause a failure in the functionality and the error will be reported immediately without calling the other modules in the stack
  3. The module name
    The modules are usually supplied in dynamic libraries. Some OS require the specification of the filename with extension (e.g.: pam_ldap.so) while others will require the name without extension (e.g.: pam_ldap)
  4. The module parameters
    These are optional and completely dependent on the module. The administrator will have to check in the module documentation in order to learn what parameters are available and what they mean
The whole purpose of using PAM is to gain flexibility and reduce management tasks. Let's see some things we can accomplish by using PAM authentication:

  • Avoid the usage of multiple user passwords and/or password synchronization between systems
  • Accomplish complex authentication requirements like "allow users from that server, but not from the other", "allow users only within normal business period", "allow these users for instance A, and that users for instance B, assuming A and B are running in the same database server
  • Authenticate users against data stored in a database table (possibly encrypted)
  • Avoid the usage of trusted relations between servers (more on this below)
  • Quickly stop a user from connecting without having DBA or system administrator privilege: Usually you can do this by revoking connect, or blocking the user account. For the first you must be DBA on the specific database and the second requires system administration privilege. If you're just DBSA (and not necessarily DBA in every database) you can setup PAM to refuse users that exist in a specific file (which you can own)
  • Create your own PAM module that reacts to whatever you want. You may be surprised by the simplicity of PAM API. Although there is a lot to consider (stability, performance, security) before you jump into writing your own modules, this is a real option. You can write modules that react to system load, that handle user priorities (only privilege users can connect if system is overloaded), that check for too many connections from the same user etc. I'd say that with PAM the limit is not the sky, but your needs. It's so flexible you can do whatever you want with it.


Let's move on to the tutorial. I think this is the best way to explain how this works. What I will show you is an implementation of IDS PAM authentication against an Active Directory structure. The steps are very similar (on IDS side) if you want to authenticate against an LDAP server. Both the AD server and the IDS server will run in a virtual machine environment. The concerns here were mainly functional. In a real situation you should be much more worried with security.

First, we'll need a MS Windows server with active directory running and also services for Unix running. For simplicity sake and given this is for test purposes I choose to run it in a virtual machine environment. You'll need a virtual machine software like VMware or Virtual PC. Since the Windows Server image I used has a license that prevents convert it from VHD format I've used MS Virtual PC. You can get a working environment for testing purposes (during 30 days) here: http://www.microsoft.com/downloads/details.aspx?familyid=77f24c9d-b4b8-4f73-99e3-c66f80e415b6&displaylang=en..

After downloading the two files run the exe to uncompress it to a folder. This will be your virtual server installation. Then, assuming you have Virtual PC installed you can configure a new VM and start it. I'll assume you know how to do this or will be able to get help elsewhere.
The first time you launch it, Windows will configure several components. First you should add a role (a wizard will appear). Follow the steps as shown below. This will add Active Directory support on your new Windows 2003 server with very loose configuration... The purpose here is just to create something that runs... Not something efficient or very secure. Most of the times the AD infra-structure will already be there on your company or your customer's companies:
Role WizardStart the Wizard...
Choose Server RoleChoose to add Domain Controller roles
Create a new domainChoose to create a new domain...
New domain in a new forest...in a new forest
DNS name for new domainChoose a DNS name for the domain...
Domain NetBIOS name... and a Windows (NetBIOS) name
Database and log filesJust accept the defaults
Shared System VolumeAgain, use the default
Possible DNS problemsIf the Wizard informs you of DNS problems choose to solve them ater
Permissions for allThis shouldn't make difference, but choose the most compatible option
A password for restore modeYou won't need to remember it
Final confirmationAll set...
Installation runningRunning...
Installation completeHopefully you'll see this
And the typical rebootThis is windows...

Important: Be sure to setup your network correctly between the host and the virtual Windows 2003 server. Because I was using Virtual PC I installed MS loopback adapter. If you're following this tutorial be sure to know or find someone who knows how to configure all this. This is beyond this article's scope, so I'll just assume you were able to setup it, or you have access to a working active directory infra-structure.

Next we'll need an add-on for AD, Services For Unix. It's available here: http://www.microsoft.com/downloads/details.aspx?familyid=896c9688-601b-44f1-81a4-02878ff11778&displaylang=en

It's an exe file that you should run to unzip it to a temporary installation folder.
Be sure that you download it withing your Windows VM or you make it accessible to it. Then just run setup.exe (in the VM Windows 2003) and follow the instructions below:


SFU installation: Step 1Choose standard installation
Security optionsSecurity Settings. Mark them both, although it should not make any difference
User name mappingChoose local user name mapping in local files
Choose local filesIf you want to migrate user information from an existing system, put the passwd and group file in the windows machine and give the full PATH here. If not, just leave them blank


The installation should complete without any issues and the traditional reboot will be required.
By now you are ready to create a user for IDS authentication and to make IDS configuration adjustments to provide PAM capability to one of the IDS instance DBSERVERALIAS.
But let's complete our tasks on the Windows side. We should create a group and a user. Don't forget to add the Unix Attributes (provided by SFU extension) or it won't work. First open "Active Directory Users and Computers" in the "Administrative Tools" option on the Start menu.
When the interface appears expand the domain name you created. On the tree click on "Users", and use right mouse button to select New -> Group. I choosed to call it "linux", but it's your choice of course. What you must do is activate it's UNIX Attributes:

Group propertiesCreate a group and fill it's UNIX attributes


Then create a user (right mouse button, New -> User):

User  creationCreate a user. Remember to uncheck the option to force password change on next login
User UNIX attributesFill it's UNIX attributes. pam_ldap will need this info



This should end our Windows side tasks. Remember to check the network settings. I had to turn off the Windows firewall.
Let's now see what must be changed in the IDS side. I've used IDS 11 on a Linux VM, but it should not make any difference if you try with one of the latest (UC5+) version 10 releases.
Follow this steps:

  1. Alter $INFORMIXDIR/etc/$ONCONFIG to add a PAM enabled DBSERVERALIAS:

    DBSERVERALIASES cheetah_pam,cheetah_drda # List of alternate dbservernames


  2. Alter $INFORMIXSQLHOSTS to configure the service:

    cheetah_pam onsoctcp pacman.onlinedomus.net 1531 s=4,pam_serv=(ids_pam_service),pamauth=(password)


    Please note the changes:
    1. s=4
      means you'll use PAM on this service.
    2. pam_serv
      specifies the name of the PAM service you'll use. We have to configure this in the PAM files (done below)
    3. pamauth
      chooses the kind of authentication you'll use. It can be password or challenge response. For now, let's use the usual user/password method.
This is enough for IDS side, but now comes the fun part: The configuration of the PAM system to answer requests from IDS.
My experience tells me that it's here that most of the issues will happen. Anybody trying to setup IDS to use PAM should try to really understand how PAM works. The fact is, most of the problems will be around this, but the error message returned by IDS will almost always be the same. So, if you don't have some knowledge about PAM, or don't have a collaborative system administrator (where are they when we need one?) you could end up loosing lots of time here. And sometimes it will be difficult for IBM Informix support to help, because some details will vary with the OS you're using and specially the PAM modules.

For our tutorial we'll be using pam_ldap module running on Linux. This module will allow authentication against a LDAP server (AD is a LDAP server). This version of Linux (Fedora) uses a file for each PAM service and this files are located in /etc/pam.d

Since we specified ids_pam_service in $INFORMIXSQLHOSTS, we must create a file called /etc/pam.d/ids_pam_service with the following lines:

auth required pam_ldap.so config=/etc/ids_ldap.conf
account required pam_permit.so


This means the authorization will be done by pam_ldap.so module, using a configuration file called /etc/ids_ldap.conf and that the account services will be handled by a very simple module called pam_permit.so. The use of this later module means we will not put any restrictions on the account validation.
So, as you probably expect, we have to create a file called /etc/ids_ldap.conf.
My has the following (adjust to your environment):

#Host name or IP address of the LDAP/AD server. MUST CHANGE!
host 192.168.131.2
#Base structure in the tree where the users are located. MUST CHANGE!
base CN=Users,dc=onlinedomus,dc=net
#Name of the user that connects to the LDAP/AD. MUST CHANGE!
binddn CN=Administrator,CN=Users,dc=onlinedomus,dc=net
#Password for LDAP connection (should obviously be encripted and connection should be done through secure communication)
#MUST CHANGE!
bindpw Evaluation1

#Some more parameters. DON'T NEED TO CHANGE
nss_initgroups_ignoreusers root,ldap
pam_login_attribute sAMAccountName
pam_filter objectclass=User
nss_base_passwd cn=Users,dc=onlinedomus,dc=net
nss_base_shadow cn=Users,dc=onlinedomus,dc=net
nss_base_group cn=Users,dc=onlinedomus,dc=net
nss_map_attribute uidNumber msSFU30UidNumber
nss_map_attribute gidNumber msSFU30GidNumber
nss_map_attribute cn sAMAccountName
nss_map_attribute loginShell msSFU30LoginShell
nss_map_objectclass posixAccount User
nss_map_objectclass shadowAccount User
nss_map_attribute uid msSFU30Name
nss_map_attribute userPassword msSFU30Password
nss_map_attribute homeDirectory msSFU30HomeDirectory
nss_map_attribute homeDirectory msSFUHomeDirectory
nss_map_objectclass posixGroup Group
pam_login_attribute msSFU30Name
pam_filter objectclass=User
pam_password crypt


We will need one final step, and this leads us to the biggest inconvenient of IDS PAM usage:
Currently, IDS requires the user to be known by the underlying OS.
This apparently contradicts the whole idea of using PAM, but there are reasons for this, and ways to reduce the inconvenient. First let's understand the reasons why this is still a requirement:
  • Contrary to other databases, when you interact with the OS as a consequence of running a procedure (SYSTEM command) or a function (Some external function written in C or Java for example), IDS will use the real user identity. If you're connected as pamuser and execute a stored procedure with a SYSTEM command in it, the command specified as SYSTEM argument must be run as user pamuser. So, in order to accomplish this, the user must exist in the OS context.
  • When you activate the SET EXPLAIN in a remote (from another host) connection, IDS will try to create the sqexplain.out file in your user's HOME directory. This is impossible if the user does not exist.
So, these are at least two reasons for this inconvenient behavior. Does this mean that the use of PAM is worthless? By no means in my opinion and this is why:

  • Although you may need to create the user in the OS, the user can be locked, or have an unknown password, of have /bin/false as it's SHELL ect. So, although it logically exists it won't be able to connect or do anything in the database server
  • Having an external authentication method (like AD) means you don't have to manage or synchronize different passwords.
  • Currently you can configure most of the operating systems to use external LDAP users. So, by creating the user in the LDAP server you can also make the OS know about it. It may avoid an extra step to create the user in multiple machines
  • With PAM you can combine several modules to gain incredible flexibility in user authentication. You may refuse connections at certain times, or you may refuse connections based on certain files, or allow one user in an instance, but not in another running on the same machine etc. You can't have this power with simple user authentication.

Besides all these reasons, IBM is working on a solution for this issue. In the future, you'll be able to use PAM and don't have the user recognized by the OS. IBM will make some changes to avoid or control the issues above. I cannot give you any time frame, but this will happen. IBM development and product management team understand that this is a caveat that shouldn't exist.

So, for now, we have to create a local user called pamuser. Let's lock it's account and give him the /bin/false SHELL:

useradd pamuser
usermod -L -s /bin/false pamuser


Now everything is ready. Let's start dbaccess to test it. Make sure you choose connect from the menu. We will need to provide a user and password after choosing the correct INFORMIXSERVER (cheetah_pam).
If everything works correctly you should see the list of databases. Any problem will raise an error:

1809: Server rejected the connection.

And this is the second problem that exist in IDS PAM authentication. The error message(s) is almost useless. To be honest, there isn't much IDS can do to help... If an error happens in the PAM stack it's hard for the application to get the error. It may depend on how the module was coded, what kind of authentication is being done etc.
Your best bet is to check if the module(s) you're using have any kind of debug you can use. There are even some modules that don't do nothing except log the state of the authentication at the time they're called. I'm repeating myself, but to use this technology you really must try to understand PAM itself. The possibilities are enormous, but this framework may be complex to use.

This ends the first part of this article. But there is still a lot to talk about:
  • What is challenge/response and why do you need to know about it (or how can I use "implicit" connections like in 4GL and ESQL/C with PAM)
  • How do we configure the instances for distributed queries (or how do I get rid of /etc/hosts.equiv and .rhosts)
  • What more IBM needs to improve (it's great, but not yet perfect...)
So, there will be at least one more article about this, but possible two.
Until then, if you're interested in this, try to replicate the above tutorial or set it up in your test environments. If you have any trouble of find any error in the steps above please, don't hesitate to contact me.

References:

Monday, October 01, 2007

IIUG's feature survey: Your time to speak!

International Informix Users Group (IIUG) has just released it's monthly newsletter, and one of the news is the availability of a survey for the next IDS version features.

This is one of the opportunities that the anonymous customer has to influence the future of the product. So, I personally consider it a must do. I already did mine and I urge you all to do the same.

Let's not think that IBM will implement every feature that is requested. In fact I'd say some of the features surveyed by this online form are already being thought about and others would probably be too difficult to implement in the next version... Nevertheless it's an unique opportunity to say to IBM what you and your business really value, and what are your real needs.

The latest IDS version, v11, was really a customer driven version. Most of the new features were direct requests from several customers. Obviously the bigger you are the more influence you'll have... But it's also true that many small customers can push some features.

Ok... too much writing... Hopefully you'll just waiting to finish reading this article to go an fill your survey. Go on... and thanks!

You can find the survey here: http://www.iiug.org/url/vNext_survey.html

Regards.

Tuesday, September 04, 2007

IDS and Linux: stronger together

It's been a while since my last post. Summer time in northern hemisphere, usually means vacation time...

There are news related to IDS on Linux. Let's start by the IIUG announcement of IDS support in Debian and Ubuntu. This is a great announcement for the growing Ubuntu and Debian communities. Many people were already complaining (in comp.databases.informix for example) of the lack of official support. The external IBM page of IDS on Linux ( http://www-306.ibm.com/software/data/informix/linux/ids.html ) has not been updated yet, but I've been told it should be soon.

Another great announcement comes from Informix Zone and tells us about the recent development of an IDS resource agent for Linux based open source cluster (Heartbeat). The author, Lars Daniel Forseth, has develop it within his work for his degree in Applied Computer Science at a German university. It's very refreshing to see IDS getting attention in the academic environment. Anybody involved with Linux and IDS may want to consider testing it and give any feedback to the author.

Friday, July 06, 2007

Cheetah in the spotlight!

As announced, today July 6, IBM Informix Dynamic Server 11.10xC1 can be downloaded from the Passport Advantage site.

So, any customer with a support contract can download and start using it.
This is probably the most significant release in several years, and represents a lot of work from a lot of people. Hopefully, this release will keep in line with IDS 10 stability, performance and reliability. I also hope that the rhythm of improvements will continue and contribute to strength Informix market position.

Media and documentation are planned for July 30, accordingly to the US product announcement.

Don't hesitate to contact IBM and your user groups if you have any feedback you think is useful. Many of the new features are customer driven, and represent answers to issues or requests made by customers. Nobody knows better what should be put in there than the customers. Your feedback can really influence some future features and contribute to the continuous improvement of IDS.

By the way, IIUG has put up an open survey where you can vote on favorite topics for it's 2008 Conference. Whether or not you're planning to be there, your feedback is interesting.

Now, stop browsing and start downloading :)

Sunday, June 17, 2007

Cheetah spot by spot: C style comments in dbaccess

Another quick note on another minor (from my point of view) feature of IDS 11.10.
Beginning with v11.10 dbaccess will accept C language style multi-line comments.

As such, you can write things like:

SELECT * /* this is the
select clause */
FROM customer /* This is the FROM clause */


This will allow for better compatibility with other RDBMS SQL scripts, and it's also SQL-99 compliant.

The other forms of comments recognized by dbaccess will still be available:
  • "-- comments" for single line comments
  • "{ comments }" for multi-line comments
The introduction of this commenting style also affects the ways to specify optimizer directives.
The syntax /*+ */ are also accepted in the new version. So, you could use this:

SELECT /*+ FULL ( customer ) */ *
FROM
customer
WHERE
customer_num = 101;


to force the query to make a full table scan.

Wednesday, June 13, 2007

Cheetah spot by spot: Multiple triggers for same event/table

Welcome to another IDS 11.10 (Cheetah) new feature close-up. This time we'll take a look at the possibility of creating multiple triggers on the same table and same event.
Up to IDS 11 you could have only one INSERT/UPDATE/DELETE/SELECT trigger for each table. That means that if you wanted to do more in one of the events for which you already had a trigger you would have to change the existing trigger definition.

Now, with Cheetah, you can simply create another trigger. When the event happens both triggers will be fired and their actions will be done. It's a simple concept, but you should be aware of one important aspect: The sequence of execution of the triggered actions.

There is no guarantee to which trigger will execute first, but all the BEFORE actions will be done before all FOR EACH ROW actions, and finally the AFTER actions will be executed. So, the events have a pre-defined order to be executed, but you can't tell which trigger's event will be fired first. I've set up a test case that tries to show this issues. Let's take a look:

1 drop procedure trigger_change;
2 drop table test;
3 drop table action_log;
4 drop procedure trigger_proc;
5 CREATE TABLE test
6 (
7 col1 INTEGER,
8 col2 CHAR(20)
9 ) LOCK MODE ROW;
10
11 CREATE TABLE action_log
12 (
13 trig_name CHAR(20),
14 trig_event CHAR(20),
15 trig_action CHAR(20),
16 trig_seq SERIAL
17 ) LOCK MODE ROW;
18
19 CREATE PROCEDURE trigger_change () REFERENCING OLD AS old NEW AS new FOR test
20 LET new.col1 = 0;
21 END PROCEDURE;
22 CREATE PROCEDURE trigger_proc (t_name CHAR(20), t_action CHAR(20))
23 DEFINE t_event CHAR(20);
24 LET t_event = NULL;
25 IF t_action = 'FOR EACH ROW'
26 THEN
27 SELECT
28 CASE
29 WHEN DELETING THEN "DELETING"
30 WHEN UPDATING THEN "UPDATING"
31 WHEN INSERTING THEN "INSERTING"
32 WHEN SELECTING THEN "SELECTING"
33 END
34 INTO t_event
35 FROM sysmaster:sysdual;
36 END IF
37 INSERT INTO action_log VALUES (t_name, t_event, t_action, 0);
38 END PROCEDURE;
39
40 CREATE TRIGGER ti_1_test INSERT ON test
41 BEFORE
42 (
43 EXECUTE PROCEDURE trigger_proc('ti_1_test','BEFORE')
44 )
45 FOR EACH ROW
46 (
47 EXECUTE PROCEDURE trigger_proc('ti_1_test','FOR EACH ROW'),
48 EXECUTE PROCEDURE trigger_change() WITH TRIGGER REFERENCES
49 )
50 AFTER
51 (
52 EXECUTE PROCEDURE trigger_proc('ti_1_test','AFTER')
53 );
54
55 CREATE TRIGGER ti_2_test INSERT ON test
56 BEFORE
57 (
58 EXECUTE PROCEDURE trigger_proc('ti_2_test','BEFORE')
59 )
60 FOR EACH ROW
61 (
62 EXECUTE PROCEDURE trigger_proc('ti_2_test','FOR EACH ROW')
63 )
64 AFTER
65 (
66 EXECUTE PROCEDURE trigger_proc('ti_2_test','AFTER')
67 );
68
69 INSERT INTO TEST VALUES(1,"one...");
70 SELECT * FROM test;
71 SELECT * FROM action_log;


The execution result is:

cheetah@PacMan.domus.online.pt:informix-> dbaccess stores_demo multiple_triggers.sql
Your evaluation license will expire on 2007-11-24 00:00:00

Database selected.


Routine dropped.


Table dropped.


Table dropped.


Routine dropped.


Table created.


Table created.


Routine created.


Routine created.


Trigger created.


Trigger created.


1 row(s) inserted.



col1 col2

0 one...

1 row(s) retrieved.



trig_name trig_event trig_action trig_seq

ti_1_test BEFORE 1
ti_2_test BEFORE 2
ti_1_test INSERTING FOR EACH ROW 3
ti_2_test INSERTING FOR EACH ROW 4
ti_1_test AFTER 5
ti_2_test AFTER 6

6 row(s) retrieved.


Database closed.

So, let's make a line by line examination of the script. Hopefully, at the end, and after looking at the results, you'll get a clear picture of what's new in this area.

Lines 1 to 18 just create two tables: "test" table will be used as the triggers object and "action_log" will save the name of the trigger (in a minute...), the event that was being executed (BEFORE, FOR EACH ROW, AFTER), the action (INSERT,DELETE...) and the sequence by which they were called.

Lines 19 to 21 create what is called a triggered UDR. This UDR must be created in SPL, but can call other UDRs in another languages (C, JAVA...). This UDR must contain the REFERENCING and FOR clauses to be considered a triggered UDR. The referencing clause is similar to the same clause in triggers. The FOR clause defines which table's triggers can call this UDR in the FOR EACH ROW clause. I've included this to show that you can change the values that the user process used as VALUES in an INSERT statement for example. So, actually you can replace the values given by the client process.

The next lines (22-38) contain another procedure that will be called by the triggers and receives the trigger name and action. This procedure contains two new things:
  1. It uses the new operators "DELETING", "INSERTING", "UPDATING" and "SELECTING". These new operators can be used in the context of a FOR EACH ROW action and return true if the trigger event is respectively DELETE, INSERT, UPDATE and SELECT
  2. It also uses a new IDS 11.10 feature, the sysmaster:sysdual table. This is an Oracle ©right; compatibility feature. You could create a "dual" table in any Informix version, so this isn't really a big feature...
This procedure logs the data in the action_log table, inserting a zero into the trig_seq field. This forces IDS to generate a value by increasing the last one inserted. So we will get the execution order of the triggered actions by looking at this fields.

Lines 40-68 include the two trigger definitions. They are both INSERT triggers and they call the second procedure in each of it's actions. The first one also calls the first procedure (trig_change) to show that the insert value can be changed.

Line 69 just does an INSERT on the test table with the values "1" and "One". This will fire both triggers.

Lines 70 shows the content of table test, after the previous INSERT, and line 71 shows the contents of the action_log table.

So, what do we see in the results?:
  1. The value inserted in the test table was actually "0" and not "1"
  2. The sequence column of the action_log table shows that the BEFORE actions of both triggers were executed first, then the FOR EACH ROW and last the both AFTER clauses.
In this case the triggers were executed in the creation order, but as stated in the manual we can't assume this. What is documented and can be assumed is that the order of execution of the actions is (it couldn't be any other way...):
  1. First, all the BEFORE actions of all the triggers for the event
  2. Then all the FOR EACH ROW actions of all the triggers for the event
  3. In the end, all the AFTER actions of all the triggers for the event
We should also keep in mind that we can write triggered UDRs, and in these we can actually change the new values. Finally, we can also use the new operators to let the procedure code know which event are we running (INSERT/UPDATE/DELETE/SELECT).

Why would we want to use this feature? Well, for start, it can be handy in application conversions from other RDBMS where this is available and the existing application already uses it. Besides this, DBAs can use this to implement some triggered behavior in "blackbox applications". By this I mean applications where you don't have any influence on the developing phase. You can create your own triggers without interfering with existing ones. This can be used for auditing purposes or for some data extraction, based on changes made on the data fields.

So, in conclusion, the usefulness of this feature will depend largely on your environment requirements, but in some scenarios this can be a very valuable tool.

I think that if you look at the example above you'll see the most important aspects of this feature. So, take your time, and if any doubts remain don't hesitate to check the URL references to the manual and other sources. You can also leave a comment of course.

References:

Thursday, June 07, 2007

Cheetah spot by spot: Read all about all the spots!

This is not exactly an article about another Cheetah feature. Instead of bothering myself writing about another one, I point you to someone who already done it:

Carlton Doe, a very well known IDS specialist, author of several books and user groups presentations has just written a new book about IDS 11.10.
The book should be around on time for IDS 11 GA. You can see a reference to it on the author's website.

I don't have any information about the book structure or contents, but given the author's history and background, I sure hope this will be a great and informative book where all the spots are covered.

Cheetah spot by spot: Derived tables in FROM clause

Here we are for a very quick note on another usability feature of IDS 11.10 (Cheetah).
This time we'll see derived tables in the FROM clause of SELECT statements. This referrers to the possibility of "creating" tables on the fly as sub-querys in the FROM clause.

If you work in an environment where the developers also work with other RDBMS software you've probably had a few situations where they ask you: How can I do it with Informix?
There were two answers for this:

  1. If using IDS 7.31, we would have to use temporary tables
  2. If using IDS 9.40+ we could use the collection derived tables: TABLE(MULTISET(SELECT ... FROM ... WHERE ...))
Usually none of this would be welcome. Sometimes we have to make the query in environments where the use of temp tables isn't acceptable or desirable (ETL tools for example where we might not have control over connection lifetime, so we cannot guarantee that the temporary tables will be there when needed). The second option looks strange to an Informix outsider, and in some situation it may force the query to be different depending on the RDBMS to use - This is not desirable to application developers that want to be RDBMS neutral.

So, once more, IBM decided to ease our lives, and implement what is also an ANSI standard: Derived tables. Let's see an example:

select
o1.*
from
orders o1,
(
select max(order_num) , c.customer_num
from orders o0, customer c where c.customer_num = o0.customer_num
group by 2
) as max_orders(order_id,customer_id)
where
o1.order_num = max_orders.order_id

This is possible in IDS 11.10. In IDS 10.00 we could use:

select
o1.*
from
orders o1,
table(multiset(
select max(order_num)::integer as order_id, c.customer_num::integer as customer_id
from orders o0, customer c
where c.customer_num = o0.customer_num group by 2
)) max_orders
where
o1.order_num = max_orders.order_id

So, it was equally possible, but the syntax would be non-standard and looked more complex.
Note: The cast to integer (::integer) is required because the fields are of type SERIAL which is not allowed in MULTISETS.

For those that use this form, when you upgrade to IDS 11 you may want to consider rewriting your queries. Accordingly to the SQL syntax guide manual, the performance of a query written in the new format is "as least as good as a that of a query that uses the Informix-extension TABLE (MULTISET ( SELECT ... )) syntax to specify an equivalent derived table in the FROM clause". This insinuates that there may be performance advantages in this syntax. This is probably related to the fact that IDS does not materialize derived tables that are considered "simple". Clauses like aggregates, order by or unions define this tables as complex.

So, this is a very simple feature with usability, ANSI compliance and application portability in mind.

Monday, May 28, 2007

Cheetah spot by spot: DIRECT_IO

Back to another article about the new features of IBM IDS 11.10 (code named Cheetah).
This will be a short article about a new feature that can end an old discussion. I'd bet that at least once, every Informix DBA had an argument about the advantages and disadvantages of using raw devices vs cooked files.
Raw devices are hard disk partitions or at least represent a physical device somewhere (either a local storage disk or a NAS pseudo-device). They are not used by any OS filesystem. The access is as direct and fast as possible. There is no filesystem layer or cache involved. The Informix instance requests the I/O operations directly to the underlying OS. Their advantages can be resumed as:

  • They're fast
  • You don't waste memory in FS cache (normally useless since IDS uses it's own cache)
  • They're the most reliable way to store your data. Since there is no FS layer, it's not possible to have FS corruption
  • They use Kernel Asynchronous I/O (KAIO) by default. This allows the I/O operations to be executed in the Kernel context. Simple AIO is asynchronous, but requires an user context that needs to be set up. IDS takes another advantage of KAIO because it is done on CPUVP threads, so you avoid process context switching (from CPU VP to AIO VP)
Cooked files are files you create on a filesystem and you use as Informix chunks. Their advantages are more or less the disadvantages of the raw devices:
  • They're easy to use. Just "touch chunk;chown informix:informix chunk;chmod 660 chunk" and you're ready
  • You can't accidentally create a filesystem over one of your raw devices
  • Nowadays several filesystems have options to don't use memory cache
These arguments have been more or less persistent along the time, so the decision alway s depended in great part of your own feelings. If you could live without top of the top performance you might consider cooked files. If you needed the maximum performance then you'd probably choose raw devices.

Once again, the R&D engineers have been alert... Th lastest features available on the newest filesystems and OS lead them to make IDS use a new feature. It's called DIRECT_IO and this is also the name of a new $ONCONFIG parameter. If you set it to 1, IDS will attempt to use the feature if it's available on the OS and the filesystem you're using, for the pagesize you choose.

Whenever this is possible, it means you'll get the following on your chunks I/O:
  • Buffer bypassing: Read/writes go directly to disk
  • KAIO is used: You'll get a serious performance boost
  • Concurrent I/O: Bypassing FS inode locking, gives you another performance boost
If you try to use it, be alert to any online.log messages you may get. I've tried this in a recently installed RHEL 4 system and it worked as expected. No messages in online log and I could see KAIO threads in the onstat -g ath output-
In a virtual machine running Fedora I got:

19:45:39 libaio.so.1: cannot open shared object file: No such file or directory
19:45:39 Please install libaio package. See machine notes.
19:45:39 Warning: Kernel Asynchronous I/O is disabled.


After a bit of yum'ing I installed libaio and tried again. This time with no errors and an onstat -g ath shows:


cheetah@PacMan.domus.online.pt:informix-> onstat -g ath
Your evaluation license will expire on 2007-11-24 00:00:00

IBM Informix Dynamic Server Version 11.10.UB7TL -- On-Line -- Up 00:00:59 -- 88872 Kbytes

Threads:
tid tcb rstcb prty status vp-class name
*2 47e1dd98 0 1 IO Idle 3lio lio vp 0
*3 47e32970 0 1 IO Idle 4pio pio vp 0
*4 47e47970 0 1 IO Idle 5aio aio vp 0
*5 47e5c970 0 1 IO Idle 6msc msc vp 0
6 47e89bd8 478b2018 3 sleeping secs: 1 1cpu main_loop()
*7 47e5cad0 0 1 running 1cpu soctcppoll
*8 47f76550 0 2 sleeping forever 1cpu soctcplst
9 47f769e8 478b25e0 1 sleeping secs: 1 1cpu flush_sub(0)
*10 47f76b48 0 3 IO Idle 1cpu kaio
11 4807d080 478b2ba8 2 sleeping secs: 1 1cpu aslogflush
12 4807d828 478b3170 1 sleeping secs: 4 1cpu btscanner_0
*28 4864b018 478b42c8 3 sleeping secs: 1 1cpu onmode_mon
*33 485c6080 478b3d00 1 sleeping secs: 14 1cpu dbScheduler
*34 485c58a0 478b4890 1 sleeping forever 1cpu dbWorker1
*35 482be7d0 478b4e58 1 sleeping forever 1cpu dbWorker2


So, this is good... What are the catches? There are a few, but most of them aren't related to IDS itself. It's more to do with OS and filesystem:
  • This only works on Unix/Linux (Windows has intrinsic differences)
  • The FS must support this funcionality
  • It may depend on the page size you choose (some may not be supported in the FS)
  • It can't be used for temporary dbspaces
If you decide to use it, you should check for any error messages and also you should look for KAIO threads. If the system starts to use KAIO for your normal chunks you'll probably want to reduce your AIO virtual processors. You may also want to use AUTO_AIOVPS (an ONCONFIG parameter that allows IDS to auto-configure the number of AIO virtual processors if it finds the number you define are not keeping up with the I/O needs)

So, this feature can really make a different in the old argument of cooked vs raw chunks. You may decide that it's possible to take the best of both worlds: Performance and manageability.

References:

Tuesday, May 01, 2007

Cheetah spot by spot: Better update statistics

I think I will get many of you to agree with me if I say that the only thing we really know about statistics is that they're really important.

Everything else seems a bit unclear and remains more or less a mystery. This article won't clear your doubts about update statistics, since I'll focus only on Cheetah's new features.

If you want to learn more I advise you to check these resources:

I have to agree that something should be done about the documentation of the UPDATE STATISTICS command given its complexity and its importance for achieving maximum performance from the RDBMS.

The information gathered by the UPDATE STATISTICS is the fuel that drives the query optimizer. You can't expect good choices for query plans without having good and up to date statistics about your data.

I'll make a mental note for a future article about these issues. For now, and since this is an article in the Cheetah Spot by Spot series, let's just see what's new.

Most of you will certainly have some contact with your company or clients development teams, and if that's the case, you've probably found some situations where they are confident they've created the correct indexes, but nevertheless the query performance is poor.

Many times they simply don't know, or forgot, that they have to UPDATE STATISTICS before the optimizer picks up the index.

If you lived this situation you surely wondered why the engine doesn’t do this on index creation... Well that's a good question and it led the IBM Informix R&D team to change this behavior.

Statistical information is stored in systables, sysfragments, sysindices, sysdistrib and syscolumns. Cheetah brings improvements in the following situations:

  • More information about statistics (more columns in the above tables)
  • More automatic statistical data collected when manipulating Indexes and temporary tables
  • More options for the UPDATE STATISTICS MEDIUM to define the sampling size
  • More information when using the SET EXPLAIN instruction

Let's see these points in detail.


More information about statistics

Cheetah will have more columns for storing information about UPDATE STATISTICS. Namely;

  • systables.ustlowts : Time stamp (DATETIME YEAR TO FRACTION(5)) when the last UPDATE STATISTICS LOW was run for this table
  • sysdistrib.constr_time : Time stamp (DATETIME YEAR TO FRACTION(5)) when this distribution was collected
  • sysdistrib.smplsize : real sample size used for this distributions calculation
  • sysdistrib.rowssmpld : real number of rows sampled for this distribution
  • sysdistrib.ustnrows : number of rows in the table when the distribution was calculated

More automatic statistical data collected when manipulating Indexes and temporary tables

In Cheetah, the engine will make the equivalent of running UPDATE STATISTICS LOW when an index is created. It will also compute histograms for the first index column. The distributions will not be calculated (this does not affect the LOW MODE) if any of these verifies:

  • The header column is an UDT (either built-in or not)
  • It's a functional Index
  • It's a VII (Virtual Index Interface) index

The distributions calculated are equivalent to what UPDATE STATISTICS generates in MEDIUM mode, with a resolution of 1% for tables of fewer than a million rows, and 0.5% for larger tables.

This means that without having to do anything else, after the index creation, the optimizer will have information about it, so the index can and will be used (if it's good for the query plans)

  • The following improvements were made on temporary table statistics:
  • The number of rows and pages will be maintained automatically
  • The creation of indexes will generate distributions and statistics for the table and index

More options for the UPDATE STATISTICS MEDIUM to define the sampling size

Another new feature is a new clause for the UPDATE STATISTICS MEDIUM: SAMPLING SIZE . This clause can be used with the resolution and confidence clauses and allows the establishment of a minimum number of rows to be sampled. For more information about the sampling size in MEDIUM mode please consult the above reference for John Miller's article on UPDATE STATISTICS. Many people don't know that the sample size is pre-defined given the parameters of resolution and confidence. From now on, we can change this with this new SAMPLING SIZE option. The number given will have two meanings:

  • When it's less or equal to 1: It's a percentage of the table rows
  • When it's greater than 1: It's the real number of rows to be used for sampling
More information when using the SET EXPLAIN instruction

Although this is not specific for the UPDATE STATISTICS statement, the SET EXPLAIN statement was also improved. In Cheetah you can define the pathname for the SQEXPLAIN file. This is achieved by the statement

SET EXPLAIN FILE TO "/your/favorite/path/with/fs/permissions"

Another improvement refers to the already existing in v10, but undocumented statement

SET EXPLAIN STATISTICS

This will give you much more information about the engine behavior when solving your queries. You can turn off this feature with an ONCONFIG called EXPLAIN_STAT. If set to 0 it will disable the enhanced EXPLAIN ON. As more and more ONCONFIG parameters you can change it dynamically with

onmode -wm EXPLAIN_STAT=1

onmode -wf EXPLAIN_STAT=1

This has also implications in the onmode -Y command. You can now give 3 values:

  • onmode -Y 0 SESSION_ID
    disables EXPLAIN for session SESSION_ID
  • onmode -Y 1 SESSION_ID
    enables EXPLAIN with STATISTICS for session SESSION_ID
  • onmode -Y 2 SESSION_ID
    enables EXPLAIN without STATISTICS for session SESSION_ID

One final note: Although I've written above that the automatic statistics will be gathered for CREATE INDEX and temporary tables, in reality there are more instructions that will benefit from this:

CREATE INDEX ...

ALTER FRAGMENT FOR TABLE ... INIT

ALTER FRAGMENT FOR INDEX ... INIT

ALTER TABLE ADD UNIQUE CONSTRAINT

ALTER TABLE ADD PRIMARY KEY...

ALTER TABLE ADD FOREIGN KEY...

As for now, it's all. But this is an area that will probably have new features in the future.

I'd like to end with two references for utilities for handling update statistics.
Art S. Kagel, a distinguished participant in the Informix community has developed an extremely useful utility called dostats. It will probably handle everything you want to do related to Informix statistics and works with versions 5 (Online), 6 (does it exist?!), 7, 9, 10 and 11. He recently updated it to handle the SAMPLING SIZE feature of IDS 11.10.

You can get it on IIUG's software archive in a package called utils2_ak. It's written in ESQL/C, so you'll have to compile it for your platform using CSDK or c4gl (7.20+).

I have also made two SHELL scripts (dbs_updstats and tbl_updstats) that will handle statistics for database and single tables. These scripts are based on older scripts that were made publicly available and modified by the Informix technical team in Portugal (namely António Lima, I and possibly others). The current versions are very different but I based my work on their efforts so a great "thanks!" is due.

These scripts don't have all the functionalities of Art Kagel's dostats, but being simple SHELL scripts make them easy to pass along and maintain. The current versions have a functionality you won't find in the current dostats. I called it a "quick low mode" and it allows the LOW mode to collect only systables data. This will prevent the cleaning of indexes that is done by LOW mode. This cleaning can take very long. Be aware that although this mode will make it run almost instantly, its speed is achieved by not doing the hardest part of the work.

If you're interested you can find this scripts at my homepage.

Tuesday, March 27, 2007

Cheetah spot by spot: ontape to directories

Welcome to the third article about Cheetah new features. This time I'll check ontape's ability to use directories for backup and restore.

First let's see why this was implemented and what do we get from using it.
As most of us know, ontape is still widely used in Informix shops. Although OnBar is an advanced tool that allows integration with enterprise backup and restore systems (storage managers), customers like ontape simplicity and independence.

With ontape you don't need anything else to manage Informix backups, except the devices (disk files or tape drives in pre-cheetah versions). You dont' need more expensive licenses, complex storage systems or any non-informix knowlegde.
But of course you loose some features. ontape works in serial mode, so the performance is not optimal. Nevertheless for small to medium sized systems it is a good (if not optimal) choice.

The limitations or annoyances of ontape usage can be resumed as:



  • If we use tape drives we need two drives for logical logs and for normal backups
  • If we use tape drives and we have several instances in the same host we may need too many drives or we must carefully schedule the backups (system and logical logs)
  • If we use files we have to manage them and assure the rename between different backups
  • The use of tape drives implies similar devices on different machines if we need to exchange backup images (you can use remote drives but you'll get into network bottlenecks)
  • ontape generaly needs user input. It is possible but complex and error prone, to try to script ontape questions

Well, folks at R&D decided to give us a new functionality that helps solving this problems. It's now possible to make ontape backups (logical logs and other backups) to directories. This backups are unattended, so we don't need complex scripts and this keeps ontape simplicity.

To use this feature we should define LTAPEDEV and TAPEDEV as existing and valid directory paths. We can use the same path to both parameters, and we can use network mounted directories (if we can afford the bandwidth).
Then you can use the normal ontape commands. IDS will generate the filenames using the following structure:


<hostname>_<servernum>_<backup-type>



  • HOSTNAME is the server name where the Informix instance is running
  • SERVERNUM is the SERVERNUM parameter (unique within each host)
  • BACKUP TYPE can be: L0 (level 0), L1 (level 1), L2 (level 2) or Log<##########> (logical log number ##########)

This nomenclature assures that there are no conflits even when you use the same paths for different instances and even for different hosts (network paths).
The system seems to be smart enough to rename old files if a new backup needs to write a specific file and it already exists (from a previous backup command). It will append the original date/time stamp. This will save old backup images, but it will of course consume space.

So, with this feature, we can avoid reserving physical tape drives for our instances, if we have enough filesystem storage. The fact that we can use network mounted directories will also simplify restores on different hosts. This can be useful for HDR/RSS setups and also for using another new ontape feature: continuous logical log restore.
But hopefully, this will be covered in a future article.

Regards.

Thursday, March 01, 2007

Cheetah spot by spot: sysdbopen/sysdbclose

Continuing the Cheeta's new features articles, it's time to talk about sysdbopen and sysdbclose.
These are DBA controlled procedures that are run by the users when the try to connect/disconnect to/from a database.

These procedures are regular SPL procedures but with a few particularities:

  • Only DBAs can create or drop them
  • The owner has more meaning than usual. If you create a procedure called myuser.sysdbopen, only "myuser" will run it when he connects to the database.
    If you create a procedure called public.sysdbopen, then all users that don't have a matching user.sysdbopen procedure, will run public.sysdbopen
So, why is this useful? Well... I can only point out several uses for it... But it will really depend on your environment:
  • Can be used to do some kind of logging for connects and disconnects
  • Can be used to change the isolation level when changing the application code is not an option
  • Can be used to change the LOCK MODE in the same situations as above
  • Can be used to stop the creation of new database sessions (inhibit connections during a maintenance period)
  • Can be used to restrict certain users from connecting at certain hours, or from certain hosts...
  • sysdbclose can be used to gather session statistics (and save them in some history table)
Some of the above are obviously very useful, some are less, and you can certainly find a lot more of useful actions that you can do with it.

This mechanism was made available in XPS (DW Informix engine) a long time ago, but it took really too much time to be ported to IDS. Flexibility is the keyword here. It improves usability and adds functionality.

One final note. There is an environment variable (IFX_NODBPROC) that can be used to stop the server from executing these functions, BUT it can only be used by DBAs. Since a DBA could even drop the function this is a nice feature.

It could be nice to have sysdbopen/sysdbclose procedures associated with database roles. Currently this is not possible, but you can use the generic "public.sysdb*" procedure and put some logic in it so that it behaves differently for different default roles (just an example).

For now, let's just appreciate the fact that this was finally implemented.

To summarize, this is really a simple feature, but one that really brings more flexibility and manageability for the DBAs.

Regards.

Monday, February 26, 2007

Cheetah spot by spot: LAST COMMITTED READ

This is the first of a series of articles about the announced features of IDS v11.10, code named Cheetah and currently available as Beta release. I'll try to show the features in random order. Being the first doesn't mean it's more or less important than other features.

LAST COMMITTED READ is a new isolation level that is available in Cheetah. As you probably know, the isolation levels define how "isolated" one session's transactions are from other sessions. If you're not familiar with the subject I suggest you search online. Wikipedia has a nice article on this.

If you ever did any programming in Informix databases you surely understand the subject and you know that the two most used isolation levels are COMMITTED READ (the default for logging databases) and DIRTY READ. In short terms their definitions are:

  • COMMITTED READ
    In this isolation level, your session will "honour" other session's locks. The session's behaviour when it finds a locked resource (table row, index key) depends on the LOCK MODE you selected. If using the default (LOCK MODE NOT WAIT), you'll get an error immediately. If using a time wait (LOCK MODE TO WAIT [x seconds]), it will wait the maximum number of seconds for the lock release. If the lock remains after that timeout you'll get the same error as with lock mode NOT WAIT.
    This is equivalent to the ANSI COMMITTED READ isolation level
  • DIRTY READ
    In this mode, your session won't "see" any locks. It will not give you errors or wait. It will read whatever value is available at the moment. It is called "dirty" because this means you may get values that aren't and possibly will never be committed in the database.
    This is equivalent to the ANSI UNCOMMITTED READ
Most of the time you have probably used COMMITTED READ and we usually try to minimize the time we're holding locks. This way we minimize the number of lock waits and lock related errors. Sometimes, depending on the kind of data and what we're doing with it we can use DIRTY READ.

Note also that this lock waiting and/or lock ignoring happens when we're SELECTing data. Obviously, if you want to change the data, even in DIRTY READ you'll have to wait for any lock holder to finish his transaction and unlock the resource.

So, what does Cheetah bring into this issue? A new isolation level called "COMMITTED READ LAST COMMITTED".
This isolation level is similar to COMMITTED READ in the sense that it only returns committed data, but with the great advantage that your reads will never block. If your session hits a locked resource, the IDS engine will retrieve the value that was there before the current lock holder session made any change. That's where the name "LAST COMMITTED" came...

This information is always available because if the lock holding transaction tries to ROLLBACK WORK, the original value will have to be restored.

Some of you with knowledge from other RDBMS like Oracle or Postgres may be wondering what's new about this... Well, historically there have been two types of RDBMS: The versioning (each transaction gets a "version" number that marks all the data it changes) and the non-versioning. The versioning RDBMSs have always this sort of isolation levels because each transaction sees the "current image" of the Database. But the non-versioning RDBMS usually didn't implement this. They have more light-weight reading primitives that read the data directly (not the before images marked with earlier versions stamps) and this causes the locking conflicts.
Also, some versioning RDBMS don't support the ANSI UNCOMMITTED READ, because the way they read doesn't allow it.

So, it's has been a kind of trade-off. But currently, most of the non-versioning databases implement this kind of COMMITTED isolation without blocking the readers.

Ok... But how do we use it?!

Well, the obvious answer is by issuing the "SET ISOLATION TO COMMITTED READ LAST COMMITTED". After this statement your session will be working in this new mode and it won't get blocked by any row or index locks while reading.

But this would force you to change your existing application code... and in some cases you possibly don't have access to that. So what can be done? Ah!... The nice people of R&D did think about this and you have three solutions:

  1. There is a new ONCONFIG parameter called USELASTCOMMITTED
    You can set it to several values and depending of these values, the database server will automatically "promote" the specified isolation level to the new mode. The values are:
    1. 'NONE'
      This is the same as clearing it. No isolation level will be promoted. This is also the default value
    2. 'COMMITTED READ'
      The transactions with traditional COMMITTED READ will work in the new isolation level
    3. 'DIRTY READ'
      The transactions with traditional DIRTY READ will work in the new isolation level
    4. 'ALL'
      The transactions with traditional DIRTY READ and the transactions with traditional COMMITTED READ will work in the new isolation level

  2. You can also use USELASTCOMMITTED as a client side environment variable. The syntax is the same

  3. If you want this promotion to happen only to specific users, you can use the new feature that allows you to force the execution of a stored procedure (sysdbopen) when a user connects to a database. Inside this procedure you could use the SET ISOLATION.... statement
Some limitations or considerations:
  • This feature requires that the tables use the ROW level locking. Tables with PAGE level locking will inhibit the feature in that transaction.
  • When you established this new isolation level for a session, if it makes a distributed query to a remote database server that don't support this feature (i.e.: any pre v11.10) it will fallback to COMMITTED READ
  • If the table is a RAW (non logged) table the feature will not work
  • If the table has some user-defined columns and these don't support the new mode, the feature will not work

Now, let's see an example. To test this you'll need two sessions. We'll use the customer table from stores_demo database. Please make sure it's defined to use row level locking.

Session one:
  1. connect to the stores_demo
  2. show the customer data (fname, lname) for customer number 101
  3. upate this data to "John Doe" inside a transaction and maintain the transaction open



cheetah@PacMan.domus.online.pt:informix-> dbaccess stores_demo -

Database selected.

> select fname, lname from customer where customer_num = 101;


fname lname

Ludwig Pauli

1 row(s) retrieved.

> begin work; update customer set fname='John', lname='Doe' where customer_num=101;

Started transaction.


1 row(s) updated.

>





Now, in session 2:
  1. Set the isolation to committed read (default) and select the data: An error is returned
  2. Set the isolation to dirty read: the "current" data, as changed by session 1
  3. Set the isolation to the new mode: we get the original (pre session 1 update) data



cheetah@PacMan.domus.online.pt:informix-> dbaccess stores_demo -

Database selected.
> set isolation to committed read; select fname, lname from customer where customer_num = 101;
Isolation level set.


fname lname


244: Could not do a physical-order read to fetch next row.

107: ISAM error: record is locked.
Error in line 1
Near character position 90
> set isolation to dirty read; select fname, lname from customer where customer_num = 101;

Isolation level set.



fname lname

John Doe

1 row(s) retrieved.

> set isolation to committed read last committed; select fname, lname from customer where customer_num = 101;

Isolation level set.



fname lname

Ludwig Pauli

1 row(s) retrieved.

>


One final note: Recent versions of IDS have an ONCONFIG parameter (DEF_TABLE_LOCKMODE) where you can configure the default lock mode for tables created without the "LOCK MODE" clause. The current Cheetah Beta drop apparently doesn't support this. So, if you're counting on this to make your tests it won't work.

You can see aditional info about this subject at:

Tuesday, February 13, 2007

Cheetah in the wild!

No chat talk:


The Open Beta of Cheetah was officially announced. You can get in on:
http://www.ibm.com/software/data/informix/ids/new

IBM create a public web support forum at: Cheetah open Beta support
and there is a new infocenter at: Cheetah InfoCenter
As promised, IBM delivered a public open Beta product. This will give the chance to everyone of us to test it, view and work with the new incredible features that R&D have put in this release.

The feature specification is really impressive. Some of them are by far the deepest changes in IDS I've seen since I started working with it in 1998. It doesn't matter what is your role (DBA, Developer...). You'll find some features specifically directed to some of your major pain points.

A quick list of the features:

  • Backup and Restore to Directories with ontape
    Adds flexibility specially if used with the next feature
  • Continuous Logical Log Restore
    Besides HDR, you have another option to disaster recovery or to get a quick copy of your instance. The second instance will be on rollforward until you need it
  • Improved Parallelism during Backup and Restore
    OnBar got a lot smarter. It's able to make faster backups and specially restores. Full system backups can be done in parallel now (as it should have been from the start if I may say it... :) )
  • Named Parameters in a Callable Statement
  • Indexable Binary Data Types
    Well... not really new. We've got it in 10.00.xC6. An evidence on how easy IDS features travel through product families
  • Encrypted Communications for HDR
    Enterprise Replication (ER) already had it. Now you can use encryption on High availability Data Replication (HDR). It's not a must have in LAN, but very useful if you replicate across public or semi-public networks.
  • RTO Policy to Manage Server Restart
    Killer... IDS can tune itself to assure a user defined fast recovery interval. You specify that want the server up again in 2 minutes... IDS will manage checkpoint intervals to assure this. It can also auto tune LRU cleaning and AIO VPs. The RTO is a big step in self in direction to a self managed system. Of course you may wonder why we need it, since it's normally up and running without issues? Well, IDS tends to have greater uptimes than the hardware... at least in my experience.
  • Non-blocking Checkpoints
    Wow... Serial killer! Got your system tuned for everyday load and have seen it 'hang' in a very long checkpoint after some special maintenance (index, table moving....)? No more! Checkpoint locks will be a small (if ever noticeable) fraction of a second. LRU flushing will be done concurrently with normal transaction processing.
    It's easy to imagine that this kind of feature takes very serious thinking, working and testing. But R&D did it. Besides this you'll get a lot of very useful info about checkpoints with a new onstat -g ckp option...
  • Trigger Enhancements
    One trigger for each type operation is not enough? Now you can have several triggers on the same table for each type of operation (Insert, Delete, Update... )
  • Truncate Replicated Tables
    You can now use truncate on tables involved in ER
  • Derived tables in the FROM Clause of Queries
    Ty
  • Tired of writing "SELECT .... FROM TABLE (....)"?. Ok... Then use the general "SELECT ... FROM (SELECT ...). A usability feature that will facilitate application porting from other RDBMS
  • Index Self-Join Query Plans
    Not new... Reviewed in the last article. Another example of code portability between product families
  • Optimizer Directives in ANSI-Compliant Joined Queries
  • Optimistic concurrency
    Serial killer part II... You don't like dirty reads (who likes?!) and your application keeps waiting on locks because of bad programming or due to high concurrency? No problem! Just set an ONCONFIG parameter, or an environment variable and either DIRTY READ or COMMITTED READ or both (or none) will work in the new mode: LAST COMMITTED READ. No more lock waiting. You'll get the image before commit. No need to change application code
  • Enhanced Data Types Support in Cross-Server Distributed Queries
    Some improvements in distributed queries. More data types supported
  • Improved Statistics Maintenance
    If like me, you're mad for having to scan a table to create statistics for some columns where you've just created an index (which makes a scan of the data and a sort...), than you can rest. Now, an index creation will automatically calculate column distributions and index and table data.
  • Deployment Wizard
    New features. Much more control about want you want installed. Your installations can become smaller (not that they were big...)
  • Performance Improvements for Enterprise Replication like trigger firing during synchronization (already in 10.00.xC6) and dynamically change ER configuration parameters and environment variables
  • Schedule Administrative Tasks
    Database cron. You can schedule jobs inside the database
  • Query Drill-Down to Analyze Recent SQL Statements
    Tracing SQL. Choose who you want to trace, how much etc. Turn it on, off and change it dynamically
  • SQL Administration API
    Ever wanted to add a chunk through dbaccess? If you're like me, probably not, BUT... this will allow every maintenance to be done through client side tools. I think this will fuel a lot of innovation around IDS management environments.
  • LBAC
    Label Based Access Control. Just like in DB2. This is powerful and clean. But my first impression is that it will require lots of thinking before use. You have to structure you access policies. The tooling is there, but has to be correctly used.
  • XML
    XML publishing, XML extraction... The type of data centric XML use that we need. DB2 on the other side (Viper release, v9) has an XML engine inside it. The perfect choice for document centric applications.
  • Node datablade
    Indexable hierarchical data. It was available, but now is part of the product and fully supported
  • sysdbopen/sysdbclose
    Ah!!! It was rumored to appear in 10.00.xC1, but never did it. Now it's here. Do you want to change lock mode or isolation level for blackbox applications? No problem. Create a sysdbopen procedure and the user(s) will run whatever you put in there right after connect and before anything else that the application sends.
So... I've already tested some of this features and I got really impressed. Now it's possible for everybody to test them. Bare in mind a few thoughts:
  1. This is a Beta product. It's not supposed to be perfect. There are issues and a few features aren't already there. (stay tuned for more Beta drops)
  2. Don't use it in production. The product should be able to migrate a supported version to the 11.10 release. But it's not guaranteed that you can migrate between Beta drops. Use it for test only purposes and to inspect the new features.
  3. The product has a time limitation... Again don't use it for serious work. Use it for serious testing
  4. Feedback, feedback... IBM is surely making a big effort to push this. It's the first time I recall seeing an Informix product in open beta testing. The R&D and support personal are very motivated by this effort. If you can spare a few moments, please report any issues you find, give suggestions and ask questions. Your response will be the first reward these people will feel, and believe me... They deserve it.

A final non technical note:

Ever since the acquisition in 2001, we've been listening FUD from competitors saying that IBM would kill Informix. It wouldn't make sense to keep two competitive RDBMS systems. According to this FUD customer should move. Of course, people launching the FUD were expecting customers to migrate in their directions. Now... What do we all know about FUD? We all know it stands for Fear, Uncertainty and Doubt. It doesn't stand on Facts, it's Unfounded and we must Doubt it. Let's recall the facts:
  1. Since 2001, IBM released 3 major IDS releases:
    1. 9.40 with major changes in scalability (Big chunks), usability (manageabilty) and security (PAM)
    2. 10.0 with major code changes (configurable page size, table level restore, ER enhancements, default roles, better partitioning....)
    3. 11.1 with all of the above
  2. IBM kept support for IDS 7.31. In fact 7.31 was enhanced with the new btree cleaner and other minor features. Some intermediate release levels were discontinued (9.21, 9.30)
  3. Meanwhile, since 2001 an RDBMS vendor name O.... has discontinued two products (every release levels): version 8 and version 9 (this year, more or less at the same time Cheetah will be released) forcing it's customers to migrate. And let me remind you how easy is to migrate IDS...
So, thinking as a customer, whom do you think will protect your investment? If there was still any doubt about the IBM commitment to Informix, Cheetah should be enough to clear it.
But as we all know, FUD will always be around. Hopefully we'll all be too busy playing with this beloved "cat", exploring new ways to bring value to our organizations and customers to pay any attention to it.

Regards.

Saturday, February 03, 2007

1st safari tour on cheetah territory: FULL

Accordingly to Informix-Zone, the first public customer workshop to show the upcoming IDS release is fully booked.

Our German friends will have the opportunity to see one of Cheetah's first appearance on 15, February in Munich. Munich has been for a long time a technical center of expertize in Informix technology, so it's not surprisingly to see this happen there.

So, this city, well known for it's beer festival will also have the privilege to be one of the first to see our beloved animal outside of it's habitat (that being the IBM labs).

I really hope they'll enjoy it and that it fulfils our expectations. I would love to get my hands on this feline, but I'll have to wait...

Saturday, January 27, 2007

IDS 10.00.*C6 new features and some thoughts...

I finally found a bit of spare time and decided to look at IDS 10.00.*C6 which was recently released. I must confess that I don't do this as often as I should, but given the current rate of IDS improvements I must say that this is a rewarding exercise.
Let me say I haven't read the *C5 release notes, so I did read them too.

If you're like me, and still haven't read them, you can find a shortcut here.

The first feature you'll notice is the Index Self Join. After reading a bit I recall I've already seen a description of this feature in an Oracle article. At the time I thought that although theoretically useful it should be difficult to find a situation where this would give real life benefits. Well, after testing it I was surprised.
So, what is this Index Self Join Feature? Putting it in an easy way, it's a way to scan an index where the Where clause may or may not include the index head column, and where the first index column(s) have very low selectivity.
Previously, the optimizer would scan all the keys which fullfill the complete set of key conditions, or it would make a full table scan if the leading key had no conditions associated. With this feature it will find the unique leading keys (low selectivity), and will make small queries using this unique keys and the rest of the index keys provided in the WHERE condition of your query. Err... I wrote "easy" above? Maybe an example will make it clearer:

Imagine you have table "test" lile this:

create table test(
a smallint,
b smallint,
c smallint,
t varchar( 255)
)
in dbs1 extent size 1000 next size 1000 lock mode row;
create index ix_test_1 on test ( a, b, c);


and you populate it with the result of (bash script):


#!/bin/bash

for ((a=1;a<=15;a++))
do
for ((b=1;b<=2000;b++))
do
for ((c=1;c<=100;c++))
do
echo "$a|$b|$c|dummyt"
done
done
done

If you make a query like this:


select t from test where b = 100 and c = 1;


You'll get a sequential table scan.
If you include a condition on column "a" you'll get an index scan, but the performance won't be nice...


Now, in *UC6, if you make a query with an optimizer hint like the one mentioned in the release notes:


select --+ INDEX_SJ ( test ix_test_1 )
t from test where b = 100 and c = 1;


You will get a what is called the Index Self Join query plan, and believe me, a much quicker response. If you don't believe me, and I suggest you don't, please try it yourself. You'll need the bash script above (if you're not using bash adapt the script to your favorite SHELL). Run the script and send the results to /tmp/test.unl. Then execute the SQL below (queries have a condition on column "a"):


cat <<eof >/tmp/test.sh
#!/bin/bash

for ((a=1;a<=15;a++))
do
for ((b=1;b<=2000;b++))
do
for ((c=1;c<=100;c++))
do
echo "$a|$b|$c|dummyt|"
done
done
done
eof

/tmp/test.sh > /tmp/test.unl


dbaccess stores_demo <<eof

-- use a raw table to avoid long tx
create raw table test(
a smallint,
b smallint,
c smallint,
t varchar( 255)
)
-- choose the right dbspace for you
in dbs1 extent size 1000 next size 1000 lock mode row;

-- locking exclusively to avoid lock overflow or table expansion
begin work;
lock table test in exclusive mode;
load from /tmp/teste.unl insert into test;
commit work;
create index ix_test_1 on test ( a, b, c);

-- dsitributions must be create for optimizer to know about field selectivity
update statistics high for table test (a,b,c);

select "Start INDEX_SJ: ", current year to fraction(5) from systables where tabid = 1;
unload to result1.unl select --+ EXPLAIN, INDEX_SJ ( test ix_test_1 )
* from test where a>=1 and a<=15 and b = 100 and c = 1;
select "Start FULL: ", current year to fraction(5) from systables where tabid = 1;
unload to result2.unl select --+ EXPLAIN, FULL ( test )
* from test where a>=0 and a<=15 and b = 100 and c = 1;
select "Start INDEX: ", current year to fraction(5) from systables where tabid = 1;
unload to result3.unl select --+ EXPLAIN, AVOID_FULL ( test )
* from test where a>=0 and a<=15 and b = 100 and c = 1;
select "Stop INDEX: ", current year to fraction(5) from systables where tabid = 1;

eof



In my system (a vmware machine runing Fedora Core5), the results were (only useful for comparison between query plans):


Start INDEX_SJ: 2007-01-28 19:18:29.77990
Start FULL: 2007-01-28 19:18:29.88101
Start INDEX: 2007-01-28 19:18:34.67570
Stop INDEX: 2007-01-28 19:18:41.92104


So, from about 5 or 6 seconds to about 0.1. Not bad hmmm?
Take a look at the query plan for more details:


QUERY:
------
select --+ EXPLAIN, INDEX_SJ ( test ix_test_1 )
* from test where a>=1 and a<=15 and b = 100 and c = 1

DIRECTIVES FOLLOWED:
EXPLAIN
INDEX_SJ ( test ix_test_1 )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 42
Estimated # of Rows Returned: 15

1) informix.test: INDEX PATH

(1) Index Keys: a b c (Serial, fragments: ALL)
Index Self Join Keys (a )
Lower bound: informix.test.a >= 1
Upper bound: informix.test.a <= 15

Lower Index Filter: informix.test.a = informix.test.a AND (informix.test.b = 100 AND informix.test.c = 1 )


QUERY:
------
select --+ EXPLAIN, FULL ( test )
* from test where a>=0 and a<=15 and b = 100 and c = 1

DIRECTIVES FOLLOWED:
EXPLAIN
FULL ( test )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 118847
Estimated # of Rows Returned: 15

1) informix.test: SEQUENTIAL SCAN

Filters: (((informix.test.b = 100 AND informix.test.c = 1 ) AND informix.test.a <= 15 ) AND informix.test.a >= 0 )


QUERY:
------
select --+ EXPLAIN, AVOID_FULL ( test )
* from test where a>=0 and a<=15 and b = 100 and c = 1

DIRECTIVES FOLLOWED:
EXPLAIN
AVOID_FULL ( test )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 114798
Estimated # of Rows Returned: 15

1) informix.test: INDEX PATH

(1) Index Keys: a b c (Key-First) (Serial, fragments: ALL)
Lower Index Filter: informix.test.a >= 0 AND (informix.test.b = 100 ) AND (informix.test.c = 1 )
Upper Index Filter: informix.test.a <= 15
Index Key Filters: (informix.test.b = 100 ) AND
(informix.test.c = 1 )




Is there a catch? Well, yes and no.
Currently this feature is disabled by default. To use it you'll need to use the optimizer directives or you'll have to change an ONCONFIG "hidden" parameter.
The parameter in question is called INDEX_SELFJOIN. A value of 1 enables it, and 0 disables it.
You can also change this at any time using:


onmode -wm INDEX_SELFJOIN=<1|0>


This information is not clearly explained in the release notes, but you can find it in the performance guide 10.00.*c6 release notes.
This is a feature planned for Cheetah that was backported to version 10. Probably in Cheetah (and in future 10.00 versions) it will be activaded by default. If you plan to use it, be careful and monitor the results... It's still a fresh feature.

So, what other good news do we have in the later versions? Well, one of them was used in the scripts above. Some of you may have noticed I created a raw (non-logged) table, and after loading it I created an index on it. Older versions wouldn't allow this, but we can use it since 10.00.xC5. There is also several enhancements that I won't review in detail:

  • Control the trigger fireing on replicated tables during synchronization
    This enables the control of triggers in replicated tables when we synchronize them

  • It's unecessary to copy oncfg file into target server when doing a imported restore
    When doing and imported restore (restore on a different server, not the one where we make the backups) we don't need to copy the oncfg files as we used to.

  • New binary data types
    There are two new datatypes: BINARYVAR and BINARY18. These datatypes provide indexable binary encoded strings and were created to improve compatibility with WebSphere. They are provided by a new free Datablade (binaryudt). In fact this is a showcase of IDS extensibility. These datatypes support some new specific functions (bit_and(), bit_or(), bit_xor() and bit_complement() ) as long as some standard functions like length(), octet_length(), COUNT DISTINCT(), MAX() and MIN().

    When I read about this I imagined a scenario where this could be used to create functional indexes, based on table fields which values could be represented in binary form by a function. I mean creating an index based on a function that given a list of attributes would generate a binary string. We could represent a true/false with just one bit. The attributes could be marketing fields about your customers (sex, married/single/divorced, has car, has children... etc.). Then you could create a bit representation of this fields and index your tables with it. A search could check all the fields with a bit comparison to the function generated index.
    I couldn't prove to myself that this was a good ideia, neither with search time comparisons neither with flexibility comparisons against the traditional index methods. But I leave here the idea. If someone manages to use it efficiently, please give me some feedback.

  • View folding
    This optimization permits that in certain cases there is no need to materialize a view (by creating a temp table). Instead the optimizer will make the join not against the resulting temp table but against the view base tables.


Well, that's it for now. The main message of this post is: keep up to date with the new versions release notes... They contain precious information that can help you decide if it's time to upgrade your systems or not.