OT: OO! XML!!!

Comments

farss wrote on 4/4/2008, 5:15 PM
Terje,
I could dig out the links to the XML specs and docs if you like, been a few years since I worked on this however the problem in the end is purely logical, it's nothing unique to XML.

Consider this
<xsd:element name="FreeText" type="xsd:string" minOccurs="0"/>

User types into a table: Hello World
Value from table is sent as:
<FreeText>Hello World</FreeText>

No problem so far

However when the user enters:
If you want to end free text use </FreeText>
Value from table is sent as
<FreeText>If you want to end free text use </FreeText></FreeText>

Do you see a problem?

There's no way to resolve this, you can fiddle around with it but you cannot define a system that will not ultimately have this kind of flaw. The problem resolves itself immediately when you can use non printable (i.e. control chars) to define the delimiters. Then and only then can anything be contained and the reader will not have an issue.

Bob.


DJPadre wrote on 4/4/2008, 6:38 PM
A lot.

Video codecs are getting standardized (Microsoft's VC-1 comes to mind...), EDL formats are getting standardized, audio formats are getting standardized, and so on.

Standardization has many benefits, but if single companies can get away with controlling the voting process, we'll end up with standards that serve one manufacturer at the expense of everybody else.

End result? We have to pay higher prices for everything due to less competition.

"All it takes for evil to prevail is good people doing nothing."

Next time you see something like this (and the time will come, don't worry), please speak up wherever you see it!"

Sounds alot like the argument against BluRay....

In regard to "standards" there wont ver be TRUE standards as the Industry itself is not regulated...
If there was a governing body making sure everyone kept to these standards (and not jsut anyone deciding that something is "good enough" to publish, then and only then will any element of standard be of any benefit to anyone.
As it stands, DVDA is one of the only apps to meet and strictly adhere to any standards put forward.
Vegas on the other hand, allows one to move across BD encoding standards and create hybrid encodes which BD doesnt "legally" support. Yes they work, but theyre not wihtin the "standard" of BD encoding.
Hell 25p isnt even "supported" but it works regardless...

So theres the catch...

I could ramble on but i wont, but i think u get the meaning
farss wrote on 4/4/2008, 6:51 PM
"Anything that can be expressed can be described in a human-readable format."

I'd forgotten about this. Probably the best example of what I'm getting at without getting into any technicalities:



Bob.
johnmeyer wrote on 4/4/2008, 8:18 PM
Ah, thanks for that Bob. Life is good again ...
Coursedesign wrote on 4/5/2008, 2:23 PM
However when the user enters:

So you mean that the helpful postings in this forum that show users how to input HTML into postings don't exist?

Even the fairly early markup language (not the first one by far) I used to compose newsletters in 1972, IBM Script, had solutions for inputting any text, without using control characters.
farss wrote on 4/5/2008, 2:39 PM
Of course they exist, the fact that they exist shows the problem.

Take your pick, HTML, XML, any human readable markup language will always expose the user to the markup language itself. Hardly a good solution for the masses creating documents in a word processor or inputting data that's to be transmitted to another system and read by that system. As you've noted using something as simple as the Esc control character solve the problem entirely. Then the reader knows to start interpreting a command.

Bob.
Coursedesign wrote on 4/5/2008, 2:58 PM
There seems to be a fundamental misunderstanding about standards.

Standards are not about mandate.

A video codec standard for example doesn't mean that everyone has to use only this format for video.

It just means that there is one format that is likely to be exchangeable between Avid and FCP and Edius and Vegas, etc. and between Windows and Linux and OS X, etc.

Ditto with other standards.

As John Meyer reminded us, five years ago WMV was quite superior to/more efficient than other codecs for typical bit rates used online.

So it made sense to use that for most daily online use at the time, since there were so few Linux and OS X users in the consumer world.

WMV even got good free support in OS X (with Flip4Mac), and perhaps even something in Linux, I don't know.

Over the years, the standards-supporting codecs got better and better, and seem to have reached the point where they are optimal for most uses. There will always be exceptions like Cineform and Sheer, etc.

Heck, even the 10-year old MPEG-2 standard lives on in every DVD sold, and provides amazing quality thanks to continuous improvement (the Intel way), even with the emergence of many codec alternatives that are each better in some way.

The lack of an agreed consensus standard for high definition disks led to consumers staying on the sidelines; vendors didn't make any money off of it and consumers had to make do with uprezzing SD DVD players.

A standard doesn't have to be the best alternative for every circumstance. It just has to be overall OK for most users in the field it covers, and this is sufficient for it to save significant money for users, and to successfully launch accepted products that much sooner.

While most users go with the standard, there will of course be geeks who say "MPEG-2 is obsolete" and insist on using DivX, Xvid, etc., possibly leading to successor standards, or complementary standards for niche users..
Coursedesign wrote on 4/5/2008, 3:28 PM
Take your pick, HTML, XML, any human readable markup language will always expose the user to the markup language itself. Hardly a good solution for the masses creating documents in a word processor...

HTML and XML were never intended for the masses.

Markup Language was not "a good solution" for word processing on main frames in 1971. It was the only solution, since there was no WYSIWYG at the time.

XML is not meant to be used as an input language for the masses ever.

It is a language designed to be only "relatively human-legible" to facilitate troubleshooting, communication over a variety of channels, and easy extension.

Human and application Input and output is meant to be handled by different translators on different platforms, for many different purposes.

Don't overlook the troubleshooting part. This is likely the #1 reason why HTML is still with us, instead of having been replaced by a potentially more efficient binary code standard.

(There are much bigger fish to fry in making the serving of web pages more efficient, I believe #1 is the number of hits, i.e. server requests, required to paint a web page on your screen.)

I remember the days when every computer manufacturer had their own networking protocols, and there was no interoperability. SNA, DNA, etc.

After many years of "tough!", there came standardized interchange protocols with gateways, then some agreement on OSI (Physical, Data Link ,Network, Transport, Session, Presentation, and Application) and the native use of ISO and CCITT standards, followed by the simplicity of the TCP/IP model that chose a simpler way to slice the pie (with a subset of the functionality).

Long live standards! But not each standard for too long, that's not the purpose.

Terje wrote on 4/6/2008, 4:11 AM
However when the user enters:

No, I don't see the problem at all. If "the user" here is the person who is developing the XML document in a text editor, the user has a serious problem in his head, namely that he doesn't understand XML. If the user is someone who is writing text in an editor where the text is eventually being stored as XML and the developer of the editor has not anticipated this possibility the developer of said editor should be fired for gross incompetence.

There's no way to resolve this

Of course there is. The XML standards deals with this problem directly. There are in fact multiple ways of dealing with. It's not even that hard.
Terje wrote on 4/6/2008, 4:18 AM
Take your pick, HTML, XML, any human readable markup language will always expose the user to the markup language itself.

BZZT! Wrong! All the new versions of Microsoft Office use XML as the standard document format, no user ever sees it or has to deal with it. The same goes for Open Office.

Hardly a good solution for the masses creating documents

It seems you are blaming the format for the lack of decent tools, which is a little bit wrong way around, isn't it? Besides, aren't "the masses" able to work with Microsoft Office? Of course they are. Even though Office saves all data in XML.
farss wrote on 4/6/2008, 6:13 AM
From:
http://www.w3.org/TR/xml/#sec-cdata-sect

<![CDATA[<greeting>Hello, world!</greeting>]]>

Solves the problem or maybe not, if the character data contains ]]> there's a problem. This isn't a bug or error in the specification, it's the problem of using human readable characters as markup.

Or maybe not. Maybe there is a way around this. Trying to think it through I find myself stuck in a loop of infinite recursion. If you can get my head out of my orifice much appreciated.

Bob.
DrLumen wrote on 4/6/2008, 10:14 AM
Here is a link to a listing posted by one member of a technical group. Supposedly this person works within one of the technical groups that studied the "standard" proposal. He works for IBM so it is a bit biased and he gets a little shrill at times. But, it is a good example of the issues with the standard.... (I can't say or write that with a straight face)

Also, keep in mind that M$ submitted 6000 pages for a quick route type vote process. Also, from my understanding, they use xml for their document structure and not ooxml.

6. # Page 4387, Section 6.1.2.3 — For the “class” attribute it says “Specifies a reference to the definition of a CSS style.” The example implies that some sort of mapping will occur between CSS attributes and DrawingML. But no such mapping is defined in OOXML. The "doubleclicknotify" attribute implies some sort of event model that us undefined in OOXML. How do you send a message for doubleclicknotify? Why do we describe organization chart layouts here when it is not applicable to a bezier curve? What happens if this shape is declared to be a horizontal rule or bullet or ole object? The text allows you label it as one of these, but assigns no meaning or behavior to this. Why do we have an spid as well as an id attribute? The "target" attribute refers to Microsoft-specific I.E. features such as "_media". Although the text says that control points have default values, the schema fragment does not show this.

9. Page 4492, Section 6.1.2.11 — The "althref" attribute is described as "Defines an alternate reference for an image in Macintosh PICT format". Why is this necessary for only Mac PICT files? Why would "bilevel" necessarily lead to 8 colors? We're well beyond 8-bit color these days. "blacklevel" attribute is defined as "Specifies the image brightness. Default is 0." What is the scale here? This needs to be defined. Is it 0-1.0, 0-255 or what? And what is "image brightness" in terms of the art? Is this luminosity? Opacity? Is this setting the level of the black point? For "cropleft", etc. -- what units are allowed? (implies %) How does "detectmouseclick" work when no event model is defined? "emboss effect" is not defined. "gain" has the same problem as "blacklevel" -- no scale is defined. This element has two different id attributes in two different namespaces, with two different types. "movie" attribute is described as "Specifies a pointer to a movie image. This is a data block that contains a pointer to a pointer to movie data". Excuse me? "A pointer to a pointer to movie data"? This is useless. The "recolortarget" example appears to contradict the description. It shows shows blue recolored to red, not black. The "src" attribute is said to be a URL, yet is typed to xsd:string. This should be xsd:anyURI.

http://www.robweir.com/blog/2008/03/how-many-defects-remain-in-ooxml.html

I don't want to imply that I understand all of this but, like with the black levels, I get the idea that it would take some hacking to figure out what M$ had in mind (gets out the crystal ball).

intel i-4790k / Asus Z97 Pro / 32GB Crucial RAM / Nvidia GTX 560Ti / 500GB Samsung SSD / 256 GB Samsung SSD / 2-WDC 4TB Black HDD's / 2-WDC 1TB HDD's / 2-HP 23" Monitors / Various MIDI gear, controllers and audio interfaces

apit34356 wrote on 4/6/2008, 11:38 AM
Well, the sad fact is ODF was approved by the main body over a year ago and is in current use with IBM office products. The MS OOXML failed to passed with one committee so they approached a smaller subset. MS claimed they could not reached ODF performance in a timely matter and without major changes to the current Office products, which MS claimed is in 90% of the EU gov offices thru purchasing bundled OS packages. This reminds me of the old Java battle in some ways, MS insisting the world to adopt their approach. If MS wants government contracts, then they should followed the same rules and eat the cost of upgrading their products.
farss wrote on 4/6/2008, 3:21 PM
In the middle of all this can someone explain to me what happened to Adobe and Acrobat?

Bob.

apit34356 wrote on 4/6/2008, 4:00 PM
"Adobe and Acrobat" WHO??? ;-)


They are one part of the parties interested in ODF being the only standard, if I remember correctly.


Here's a link:http://www.odfalliance.org/resources/TheCaseAgainstOOXML.pdf

This points out some of the serious flaws in OOXML.
farss wrote on 4/6/2008, 5:19 PM
Maybe I'm a bit of a simple sod but I get specs from the government all the time. They're either Acrobat or Word docs. Tabular data in Excel, large slabs of data in Access files or even comma delimited which are a doodle to suck into Access or Excel as needed.
On the video side I can read almost anything that clients send, if not I tell them what I can read and get them to do the conversion. If it's a big job and a client who insists or doesn't have the means to handle the conversion I go buy whatever software I need to read it and convert whatever they've sent me.

So I'm kind of left wondering why we need ODF or OOXML and why limit ourselves to only one standard. I've sort of been on the sidelines over the years whenever someone tried to mandate only one standard and it just never seems to have worked out in any field.

Bob.
Coursedesign wrote on 4/6/2008, 6:23 PM
I don't know of any field where there is only one standard.

The key is to not add new standards that have as their only purpose to force users to buy products from one single vendor.

Word doesn't cover everything that can be done with ODF, and CSV and PDF are quite limited.

Word has become something of a de facto standard because so many have it. As compatibles have popped up, Microsoft is trying to stay one step ahead by continuously changing the document format, so people are locked into buying upgrades and only buying genuine MS Word for hundreds of dollars even when all they need is a basic word processor to write simple memos.


johnmeyer wrote on 4/7/2008, 12:03 AM
I don't know of any field where there is only one standard.

Generally, that is true, and is why worrying about this document standard doesn't make sense.

There are, however, a few areas where you have to have a standard. Fax machines. The CCIT-Group3 standard had to be adopted by everyone, or faxing wouldn't have taken off. Modem standards were similar, although at each stage during that technology's development various companies tried to get their approach adopted as the standard.

Anything involving broadcast (wireless, television, radio) or interoperability requires standards.

But file formats? Who cares?

I mean, just click on Open in Word or WordPerfect or any other word processor. The first thing you see is a list of about fifty different file formats that can be opened and, in many cases, can also be exported. Like Bob, I am totally indifferent as to what file format you send to me, as long as I can open it and go from there.
apit34356 wrote on 4/7/2008, 12:14 AM
"But file formats? Who cares?" Well, this is a big push for government docs between between agencies of one government then to other governments then to their agencies then to their contractors ---so docs' structure and meanings can be same without issues of language and layout design, etc,,,,, .--- nothing is loss in translation.
apit34356 wrote on 4/7/2008, 12:22 AM
But file formats? Who cares?" an example of this is the massive delay that Airbus 380 suffer from cabling connectors being too short from different French and German manufacturers. The master Cad drawings and documents lost about 3" between translation into French and German worksheets. Of course, I credit the French manufacturers' short 3-4 day workweek ( 5 means 4) ;-)
farss wrote on 4/7/2008, 12:34 AM
"nothing is loss in translation"

So they'll all be written in Latin?

But seriously, closer to home, wouldn't it be nice if Vegas's project files were an open standard. I can't even get some of my V7.0e projects to open in V7.0d.

At a simpler level we can't even agree on what ONE frame rate really is.

Vegas says 29.96, DJ says 30.00. I wonder how many realise that.
I just read a few wishfull posts on CML about having one standard film frame rate. Bit of wishful thinking but I agree it'd be nice if there wasn't two standard film speeds, one for real film and one for film on video.

Bob.
Coursedesign wrote on 4/7/2008, 8:03 AM
But file formats? Who cares?

Well, that's the clincher. Not one on the "long list of file formats" covers all the advanced features that are used in word processing nowadays.

I mean, just click on Open in Word or WordPerfect or any other word processor.

Open a Word 2003 or Word 2007 document in Wordperfect and see what you get.

Often a crippled document.

A comprehensive standard is needed that covers today's document requirements.
You could argue that "Word 2007" is it, but then everybody has to pay hundreds of dollars to MS.

And of course nobody will be able to use Linux, because MS not only doesn't support Linux, it appears to be trying to kill it off by surreptitiously funding SCO's harassing lawsuits through third party companies.

Terje wrote on 4/7/2008, 8:45 AM
<![CDATA[<greeting>Hello, world!</greeting>]]>

Interestingly this works in this forum. It works in Microsoft Office and Open Office. Why do you think that is? If it was a huge problem, would we not see problems here in the forum, in Open Office or Microsoft Office?