This is a summary of some features of a coding method I developed in July, 1999. It was designed to meet several requirements:
It must encode 16 bit data words into no more than 20 bit code words, each with exactly 10 zero bits and 10 one bits, No two consecutive code words should have the same value, resulting in a self clocking parallel encoding, The code words should be easily extracted from a serial bit stream with no more than one code word for synchronization, It must be possible to rapidly generate code words and extract data words with simple hardware, and Words should contain no more than 7 consecutive zeros or ones.
Lets start with a couple of definitions: a balanced code word is one with exactly the same number of zeros as ones. The simplest form of a balanced code word is the 2 bit word encoding a single data bit. A hardware implementation of this 2 bit balanced word is a differential signal. When one wire is high and the other low, a value of one is asserted, and if the reverse is true, a value of zero is asserted. Here we also use the term data word to refer to the collection of bits that must be transmitted across the communication channel from the source to the destination. In general, any possible bit pattern is allowed, so the number of codes required for an N-bit data word is exactly 2^N. Likewise, we use the term code word to refer to the set of bits associated with one instance of the data word. In general not all bit patterns possible in the communication channel will be used. Often this is done to implement forward error correction, other times it is used to enable extraction of a clock or to permit the use of narrower bandwidth amplifiers.
A code meeting these requirements will produce minimal electromagnetic emissions if the 20 bits are routed as a group across a circuit board, and if used within an integrated circuit the current flow between the source and the destination of the 20 bit connection is constant over time. Other coding methods also have these characteristics but this coding method can do so with the fewest possible bits in the code words. Since there are 184,756 balanced 20 bit codes and 48,620 balanced 18 bit code, the smallest set of balanced code words that can represent the 65,536 data words representable with 16 bits is 20 bits long. In fact, since the number of 20 bit codes is greater than or equal to 131,072, a balanced code word can actually be assigned to each possible 17 bit data word. In this way, if the 17th bit alternates between 0 and 1 with each word, resulting in a self clocking parallel code. So the first two conditions can certainly be met.
If we define valid code words to be the 20 bit code words that are encodings of 16 bit data words and the unique sync word about to be defined, then we can define a sync word that meets the third condition as follows. The synchronization word must be a word taken from the set of balanced 20 bit code words that cannot be constructed from the trailing bits of one valid code word and the leading bits of another valid code word. This is not possible if we use the naieve self-clocking approach I just described. But if we assign a small group of codes the function of repeating the previous data word value we can retain the self clocking nature of the code and have enough unused code values that we can insure uniqueness of the sync code at the expense of requiring a compare at the sender and one at the reciever and a memory element at each end as well.
Since July, 1999, I have fine tuned the generation logic and have reduced the logic to a simple AND/OR tree for each bit with each OR fed by 3 to 5 AND gates fed by 3 to 5 data word bits or their inverses. The extraction logic is similar but requires as many as 7 inputs on the AND gates and has a 6 input AND/XOR on the output bit.
In case you are curious, I cheated on the final requirement: it was arrived at by enumerating and examining the encodings that met the other requirements and selecting the shortest maximal length span of zeros or ones and making that a requirement!