How to efficiently define End-of-Transmission for XML-based text protocol?

Question

I want to develop a text protocol based on XML and transmitted via TCP/IP sockets. Let's say I have a simple request/response mechanism to be send over a persistent TCP/IP connection between client and server like this:

<?xml version="1.0" encoding="UTF-8"?>
<request id="1" command="get.answer">
    <value type="string">Answer to the Ultimate Question of Life, the Universe, and Everything</value>
</request>

<?xml version="1.0" encoding="UTF-8"?>
<response id="1" command="get.answer">
    <value type="int32">42</value>
</response>

When should each side start to process the incoming data or in other words when would the server know that the incoming client data is fully transfered and possible to process to create a response?

Of course I made some research about that topic: I found this answer which points in the right direction based on an HTTP example: So using a kind of 'Transfer Protocol' on top of the XML messages would certainly help.

But I also looked at the purely XML-based XMPP protocol which doesn't use any 'Transfer Protocol' like HTTP at least as far as I have seen.

From RFC 6120 at "2.4. Structured Data" it reads:

The basic protocol data unit in XMPP is not an XML stream (which simply provides the transport for point-to-point communication) but an XML "stanza", which is essentially a fragment of XML that is sent over a stream. The root element of a stanza includes routing attributes (such as "from" and "to" addresses), and the child elements of the stanza contain a payload for delivery to the intended recipient.

So they send basically small XML chunks over TCP/IP w/o 'Transfer Protocol' and from my wireshark traces I can see that there is also no special End-Of-Transmission character at the end of each XML stanza like two times \r\n or something like that. So how do they know about the end of a message (stanza)?

Robin · Accepted Answer · 2012-04-18 13:52:16Z

2

Actually, XMPP uses an XML stream to transfer data. The data unit you are referring to is the actual exchange of individual messages, but they are all contained within an XML stream that define the start and endpoint of the communication for an XMPP session.

This would be where the End Of Transmission occurs, as in end of all transmission. Within that stream, there are 3 defined packet types (IQ, Message and Presence) which would indicate the start and end of individual messages (for client to server comms).

Although the basic case is done over a TCP connection, there are extensions to support different wireline protocols as well, such as HTTP which is useful for allowing XMPP through a firewall.

If you want to do something similar, then you can follow the same approach, which is to start and end you XML stream when your connection is established and dropped. Then you simply need to define the individual message types, so your endpoints will know what constitutes a complete message.

Or you could just use XMPP which seems to fit your use case perfectly.

answered Apr 18, 2012 at 13:52

Robin

24.3k5 gold badges54 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

vidar Over a year ago

I'm interested in the end of every individual message from an implementation point-of-view. Does it really check for </iq>, </message> or </presence> as the first check? Maybe these tags appear not on a single line or maybe they didn't fit into the last chunk of TCP data, how to handle these cases? Maybe they just throw every incoming chunk into an XML validator and do not care about fragmentation?

Robin Over a year ago

You are thinking at much too low a level. The chunk of TCP data is irrelevant unless you are writing the TCP stack. Lines are also meaningless, in fact they shouldn't exist except as formatting within content, which would be within a message. You simply read the stream with an XML pull parser and look for begin and end tags for your known top level elements, these you process as individual messages. You continue reading until you get the endtag for the XML stream, then you close the connection.

MattJ Over a year ago

@vidar Most XMPP implementations use a streaming SAX parser (such as expat). Chunks are fed in as they arrive, and the parser calls callbacks for element start/end and other things. By tracking the depth you know an entire element has come in when an element closes at depth 1.

vidar Over a year ago

@MattJ: What about validation e.g. using XML Schema or building up a DOM, that would not be possible by using an XML pull parser, right? Just asking for my own clarification, because I could still wrap the things up with HTTP or something similar to really fetch and buffer whole XML documents before pushing them into some parser/validator.

nos Over a year ago

@vidar a pull parser can (and is often used) to build up a DOM tree as it progresses, and it can perfectly well do validation as well. It's not trivial, and not something you want to write from scratch though.

ggozad · Accepted Answer · 2012-04-18 18:42:59Z

0

XMPP has a transport over XML streams as said by @Robin. It also can use HTTP as a transport with BOSH.

In the second (HTTP) case, things are easy. Strophe for instance, a js library using BOSH, requests are essentially HTTP requests, and thus have Content-Length. It looks like this:

POST /webclient HTTP/1.1
Content-Type: text/xml; charset=utf-8
Content-Length: 240

<body rid='1573741825'
      sid='SomeSID'
      xmlns='http://jabber.org/protocol/httpbind'>
  <iq id='bind_1'
      type='set'
      xmlns='jabber:client'>
    <bind xmlns='urn:ietf:params:xml:ns:xmpp-bind'>
      <resource>httpclient</resource>
    </bind>
  </iq>
</body>

In the first case (XML streams) though things are different. A well-performing, long time in existence and tested python library I use, Twisted, uses a python wrapper on the Expat XML parser. The parser is a fast, non-validating parser that throws useful events indicating the start or end of "root" elements for instance. The elements are then parsed one by one as received.

answered Apr 18, 2012 at 18:42

ggozad

13.1k3 gold badges42 silver badges50 bronze badges

1 Comment

vidar Over a year ago

I knew about expat parser and its advantages but Twisted seems to be a great event framework, thanks for this info. HTTP as transport protocol is a solution as I mentioned in my question but it is also a second protocol to be wrapped around the XML message which I would like to avoid, but I'm still thinking about the advantages, e.g. to send raw binary data in-band within the same socket connection.

Community · Accepted Answer · 2017-05-23 12:03:17Z

0

As it is mentioned in here there mainly two methods: Have a delimiter or the length in header. Your delimiter could simply be the end of your beginning tag and that's what XMPP is doing. This means as long as your XML messages are wrapped in a tag which starts and ends properly you are set to go. If you wanna have a sort of validation on the chunk of data you receive, what you need to do is to make sure that there is an end for all of your tags. Most of the parser packages do this for you. If you pass them a non-parsable package they will throw you a sort of exception. If you wanna write your own parser, then you need to study more about parsers rather than the transfer/XML protocol.

edited May 23, 2017 at 12:03

CommunityBot

11 silver badge

answered Apr 19, 2012 at 10:04

Jermin Bazazian

1,9702 gold badges18 silver badges20 bronze badges

Comments

nos · Accepted Answer · 2012-04-19 10:14:00Z

An XMPP endpoint have to parse the XML. By doing so, it knows when the end is, as there is only allowed to be 1 document (top level) element (I'm unsure if they can possibly be preceeded by XML processor instructions)

<?xml version="1.0" encoding="UTF-8"?>
<request id="1" command="get.answer">
    <value type="string">Answer to the Ultimate Question of Life, the Universe, and Everything</value>
</request>

This is self delimited, in that once you've parsed the <request markup, you know that this XML documents ends when you hit the matching </request>.

(Personally, I'd place a framing protocol at the protocol level below, instead of stuffing raw xml on top of a (TCP) stream, perhaps just preceed every message with a 4 byte big endian length field.)

Collectives™ on Stack Overflow

How to efficiently define End-of-Transmission for XML-based text protocol?

4 Answers 4

5 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related