Electronic Data Interchange (EDI): Major Control Structures

EDI defines a class of files that are used to transmit information from one computer system to another. They are characterized by several features:

  • Each row, or segment, begins with a special tag that determines the structure of the record
  • Generally, each piece of data within the segment, called an element, is separated by some character (although there are a few fixed-width EDI formats)
  • Different data types are supported, such as number, string, fixed-decimal number, and/or date
  • Some standards body or consortium periodically publishes updates to the format

Some of the better-known variants include EDIFACT, HL7, and X12. This document describes their similarities and differences, and includes notes on how DataDirect XML Converters work with them.

The major dialects that the DataDirect XML Converters for EDI currently support are:

  • EANCOM for general commerce
  • EDIG@S for energy allocation
  • EDIFACT for general commerce and travel
  • HL7 for healthcare
  • IATA (PADIS) for the airline and travel industry
  • X12 for general business and health

Although very common, one of the biggest problems with EDI files is that general purpose transformation tools are big and expensive. XML, on the other hand, because of mixing content and structure in the same location, allows many general purpose tools and languages to be used. The XML Converters are ideal in this regard, since they allow generalized XML development environments, such as Stylus Studio to be used to transform EDI as if it were XML.

DataDirect XQuery has been designed with the XML Converters in mind, so that this powerful streaming XML processor with hooks into wire-protocol database drivers can be used to stream even huge EDI files into and out of applications and databases.

EDI Internal Message Structure

Within each individual message, or transaction set, there are various nested chunks of information. These include loops, segments, elements, and composite elements.

EDI Loops

An EDI loop is a group of segments that should be taken together as a unit. Individual segments may me mandatory or optional, or conditional based on the content of other segments. Each may be present once, or mandated to appear up to some specified number of times.

The interesting part about loops is that they are not explicit in the EDI, but you must "just know" where they start and end by looking up each element in the EDI structure repository.

The DataDirect XML Converters for EDI contain a wide variety of schemas for various editions and dialects of EDI, because of course these vary not only by type of EDI but even between versions of the same specification.

EDI Segments

Each segment starts with a prefix that tells the EDI software what kind of record format follows. Other than that, there is no descriptive information about the individual fields present in the file.

The XML Converters look up each segment in the EDI repository for the specific version of the EDI data stream and from there determine the individual fields that must follow so that the correctly validated and labeled XML can be produced. When converting back to EDI from XML, the reverse process is used so that the contents of each element within the segment are written properly.

EDI Composite Elements

Composite elements are like records within records. They have their own structure, which is defined in the EDI schema repository. Typically they contain only elements, but in some dialects like HL7, they can in turn contain other composites.

Here is an example of a DTM segment from EDIFACT showing three elements:

DTM+97:20011213:102'

What does this mean? Perhaps the expanded XML version will help:

<DTM>
    <DTM01>
        <DTM0101-DateTimePeriodQualifier>
            <!--2005-->97<!--Transaction creation date-->
        </DTM0101-DateTimePeriodQualifier>
        <DTM0102-DateTimePeriod>
            <!--2380-->20011213<!--2001-12-13-->
        </DTM0102-DateTimePeriod>
        <DTM0103-DateTimePeriodFormatQualifier>
            <!--2379-->102<!--CCYYMMDD-->
        </DTM0103-DateTimePeriodFormatQualifier>
    </DTM01>
</DTM>

In this case, this segment means that the purpose of the date (DTM) is going to be the "Transaction creation date" (code 97 from list 2005). The value of the date is December 13, 2001 (20011213), and we know this because the date format is CCYYMMDD (code 102 from table 2379). Where did the knowledge that there would be three elements in a composite element and that the first and last would be from lists 2005 and 2379 and the middle would be a date come from? Not from the file, but from the EDI repository for EDIFACT version D97A segment DTM.

EDI Elements

An element is the basic unit of information in an EDI file. It can be a piece of text, a number or amount, a date or time, a piece of binary data like an image or embedded document, or a code from a codelist that indicates some value or action.

SEF Files

Why do SEF files exist? The EDI repository that comes with DataDirect XML Converters is quite extensive, but each company has its own way of doing business. Add to that a number of smaller dialects and local variations, and it quickly becomes clear that no single tool can contain all of the EDI definitions of the world. So SEF files are a way for you as our customer to describe your own variant of EDI to the converter.

SEF is a open specification used by a number of tool vendors and EDI users. Many sites publish their standards in SEF format, and the DataDirect XML Converters for EDI are able to use those files to extend the set of EDI specifications understood.

Structure of EDIFACT, EANCOM, EDIG@S and IATA (PADIS) Files

An EDIFACT data stream (file) consists of one or more interchanges. Each interchange can be batch or interactive. The DataDirect XML Converters allow the mixture of both types within a single data stream, except that you cannot mix batch and interactive segments within a single interchange.

Batch interchanges have control segments that begin with "UN", as in UNA, UNB, UNG, UNH, UNT, UNE, and UNZ. Interactive interchanges use "UI" as in UIB, UIG, UIH, UIT, UIE and UIZ. There is no UIA segment corresponding to the batch UNA segment.

A UNB/UIB segment in an interchange is mandatory, and although the trailing UNZ/UIZ is sometimes omitted in practice, it actually is a very good idea to use.

Within each interchange there can be zero or more groups. A group consists of a UNG/UNE or UIG/UIE pair of segments which wrap one or more messages. It is possible to have multiple messages in an interchange without them being contained in a group.

Each message starts with UNH/UIH, which tells the type of the message. That is followed by the content of the message, whose constituent segments are based on the pattern set in the message dictionary. The message is concluded by the UNT/UIT segment.

BATCH MODE SAMPLE
UNA (service string advice)
UNB (batch interchange start)
UNG (batch group start)
UNH (batch message header)
message #1 payload
UNT (batch message trailer)
UNE (batch group end)
UNG (batch group start)
UNH (batch message header)
message #2 payload
UNT (batch message trailer)
UNH (batch message header)
message #3 payload
UNT (batch message trailer)
UNE (batch group end)
UNZ (batch interchange end)
UNB (batch interchange start)
UNG (batch group start)
UNH (batch message header)
message #4 payload
UNT (batch message trailer)
UNE (batch group end)
UNH (batch message header)
message #5 payload
UNT (batch message trailer)
UNZ (batch interchange end)
INTERACTIVE MODE SAMPLE
UIB (interactive interchange start)
UIG (interactive group start)
UIH (interactive message header)
message #1 payload
UIT (interactive message trailer)
UIE (interactive group end)
UIG (interactive group start)
UIH (interactive message header)
message #2 payload
UIT (interactive message trailer)
UIH (interactive message header)
message #3 payload
UIT (interactive message trailer)
UIE (interactive group end)
UIZ (interactive interchange end)
UIB (interactive interchange start)
UIG (interactive group start)
UIH (interactive message header)
message #4 payload
UIT (interactive message trailer)
UIE (interactive group end)
UIH (interactive message header)
message #5 payload
UIT (interactive message trailer)
UIZ (interactive interchange end)
MIXED STREAM SAMPLE
UIB (interactive interchange start)
UIH (interactive message header)
message #4 payload
UIT (interactive message trailer)
UIH (interactive message header)
message #5 payload
UIT (interactive message trailer)
UIZ (interactive interchange end)
UNB (batch interchange start)
UNG (batch group start)
UNH (batch message header)
message #1 payload
UNT (batch message trailer)
UNE (batch group end)
UNG (batch group start)
UNH (batch message header)
message #2 payload
UNT (batch message trailer)
UNH (batch message header)
message #3 payload
UNT (batch message trailer)
UNE (batch group end)
UNZ (batch interchange end)

(DataDirect XML Converters will automatically create and populate the UNZ/UIZ segments if they are missing, as well as automatically perform the necessary calculations to fill in the counters and values for the UNT/UIT/UNE/UIE segments.)

The various types of message payloads can be seen in the matrix of EDIFACT versions supported. EANCOM, EDIG@S, and IATA (PADIS) messages also share this same structure.

Structure of X12 Files

An X12 data stream is similar in some ways to one for EDIFACT. It also consists of one or more interchanges.

Each X12 interchanges begins with an ISA segment and ends with an IEA segment. Inside there is a GS and GE pair, to start and end one or more message groups. And within each GS-GE pair, there will be one or more messages, each starting with ST and ending with an SE segment.

The message type, or "transaction set", is coded in the ST segment, but the version actually comes from the surrounding GS segment. The content of the message follows the rules from the transaction set repository which includes which segments are appropriate, in which order, in what quantity, and how grouped.

X12 SAMPLE STRUCTURE
ISA (interchange start)
GS (group start)
ST (message header)
message #1 payload
SE (message trailer)
GE (group end)
GS (group start)
ST (message header)
message #2 payload
SE (message trailer)
ST (message header)
message #3 payload
SE (message trailer)
GE (group end)
IEA (interchange end)
ISA (interchange start)
GS (group start)
ST (message header)
message #4 payload
SE (message trailer)
GE (group end)
IEA (interchange end)

Structure of HL7 Files

Typically an HL7 data stream contains one or more messages, each starting with an MSH segment. Unlike other EDI dialects, HL7 does not have a message end segment.

HL7 messages can also be sent in a batches, and those batches can be grouped into logical files. The file header segment is FHS and the corresponding trailer is FTS. The batch within the file starts with BHS and ends with BTS.

The type and version information for the message is contained within the MSH segment. In addition to segments that are defined in the message dictionary, HL7 messages can have other customized segments. These all begin with a 'Z'.

HL7 SAMPLE STREAM
MSH (message header)
message #1 payload
MSH (message header)
message #2 payload
MSH (message header)
message #3 payload
MSH (message header)
message #4 payload
HL7 SAMPLE FILE
FHS (file start)
BHS (batch #1 start)
MSH (message header)
message #1 payload
MSH (message header)
message #2 payload
BHT (batch #1 end)
BHS (batch #2 start)
MSH (message header)
message #3 payload
MSH (message header)
message #4 payload
MSH (message header)
message #5 payload
BHT (batch #2 end)
FTS (file end)

Using EDI and XML Together

This overview was designed to help to see the reasons why EDI is serialized into XML as it is. Knowing this should help in creating and executing mappings using the Stylus Studio XQuery and XSLT mapping tools as well as the DataDirect XQuery engine.

Although both DataDirect XQuery and the DataDirect XML Converters can be used separately, they are also designed to work best together, so that the streaming and projection capabilities of DataDirect XQuery enable processing of very large EDI files with a low memory footprint for low-latency, high-bandwidth applications.