Kaaes Codepage: How to generate a problem

Whenever an interface is described as being generic and platform independant one should never blindly trust it. This is a short story of why ...

Some time ago I was involved in a project concerning constructing and deploying a simple .NET webservice interface to a core library that we develop in house. The need for the webservice interface was introduced by a very large international customer, so there was no way around it. Well, actually the customer just wanted a way to access the library from a server running a non-supported operating system. Where "non-supported operating system" should be read as an operating system that once was supported and now is so old that our product managers decided to cut the line. Doing so has an immediate impact on the amount maintaince :-)

So, this webservice was designated to convert back and forth between two military media, which shall remain nameless for no obvious reason. We, the developers, chose to implement the required feature as a webservice in .NET facilitating the SOAP standard protocol. Using standard tools and protocols should guarantee that the customer was able to interface the webservice seamlessly.

After releasing our product including the webservice a customer reported an issue concerning non-US characters in the data. When the customer invokes the webservice with data containing non-US character data the customers application fails to read the output of the webservice call. Such an inquiry is normally regarded as user fault, indicating that the bug was at their end. However, due to the importance of the customer we started debugging the issue. After several hours of debugging strange runtime environments I concluded that the fault was partly the customers issue (and partly our own). The issue was caused by the nature of the data transmitted which was in local code page of the system (which was a requirement). However, transmitting code page data over a UTF-8 encoded medium is not a clever move, seeing that non-US characters are represented as a multi byte encoding in UTF-8 and a single byte representation in local code page.

Our soloution was the base-64 encode the transmitted data, this prevents any mangling and incompatible data in the UTF-8 encoded SOAP XML Envelope. This fixes the problem and everything is jolly good - or is it?

Of course not, the data also needed to be encoded on the user end. As it turns out using different Java SOAP integration kits have different results. Some encodes the string flawlessly and lets the communication between the user application and our webservice work correct, while other kits fails to encode properly and thus provoke our webservice.

Another unforseen event was that the user application was written in Java, which uses 16-bit string representation. This is not a problem, however it is a problem when the developer at the customer end believes that UNICODE is the same be it UTF-7, UTF-8, UTF-16, etc.

So what was the morale of todays story? Never trust that anyone else knows what you are doing. If you need to publish an interface, make sure it is well documented.

Kaaes Codepage

Wednesday, January 31, 2007

How to generate a problem

No comments:

Labels