Zipping files for longterm storage

Comments

ushere wrote on 6/1/2014, 2:42 AM
i've never had a problem with a zip file, other than forgetting the password ;-(

however, i was under the (misguided?) impression that you could actually vary the degree of compression in some zipping software, from lossless downwards?
altarvic wrote on 6/1/2014, 3:09 AM
ZIP is lossless. Period.
farss wrote on 6/1/2014, 3:35 AM
[I]"however, i was under the (misguided?) impression that you could actually vary the degree of compression in some zipping software, from lossless downwards?"[/I]

WinZip offers a number of choices that always result in lossless compression. The choice is speed over compression. In other words if you don't mind waiting while the code spends more time looking for patterns you might get a smaller file

There's also the question of backwards compatibility. WinZip 7 (I think) has a new compression option which gives more compression and is slower but cannot be unzipped by previous versions.


Bob.
ushere wrote on 6/1/2014, 3:59 AM
thanks bob - so my memory isn't THAT bad.... just speed rather than compression.
deusx wrote on 6/1/2014, 9:30 AM
>>>>@ deusx: Where did you get that info from?<<<

I zipped and unzipped some files 3-4 times and they definitely did not look as good as the originals. Used 7Zip and its default settings.

Obviously you can set different levels of compression and different methods, may depend on footage/original files, but no it definitely did not look the same after 3 or 4 round trips through the zipper.

>>>ZIP is lossless. Period.<<<<

And Facebook cares about your privacy.

Just because somebody says something it doesn't make it true. First of all these guys are trying to sell us a product, secondly they aren't nearly as smart as they think they are and could be delusional too.

Finally, there is no such thing as lossless compression. You can't create something out of nothing. Once you throw bits away they aren't coming back. On unzipping they just guess what the missing info is supposed to be and the more you do it the more wrong guesses you get. Think of it as a replicating DNA. Replicate it enough times until enough mistakes accumulate and it all goes to hell.

Just look up this quote from that Wikipedia article: "Some image file formats, like PNG or GIF, use only lossless compression"

Think about it for a few second and you may realize how idiotic that statement is.
If it's all lossless, why can I choose level of compression with GIF files?
Wouldn't it all automatically compress to the smallest possible file size while retaining all of its glorious info? And do you know what a slider where you can set the file size is called ( in Adobe's Fireworks )? It's called Loss. It's right next to Dither%

Likewise, if it's all lossless with .zip files why so many compression options? Just compress the thing to the smallest file size if it's all lossless ( which it obviously is not ).

Former user wrote on 6/1/2014, 9:43 AM
Deusx,

You need to do more research on ZIP files and compression.

ZIP files use tokens and an index to compress. They are not throwing out any information. There is an index table that has tokens assigned that represent groups of bits/data. If a pattern repeats, rather than store the whole pattern, it stores a token that references the pattern in an index.

There are also many image compression schemes that are lossless, using the same type of algorithm.


Like I said, search Google or Yahoo or whatever on ZIP files and you will learn that there is a LOSSLESS compression.
Chienworks wrote on 6/1/2014, 10:22 AM
deusx, i can state categorically that there was something wrong with your testing. You may have been using something other than a true zip program or had some other steps involved.

As we've pointed out before, if Zip compression altered or lost data in ANY way, even a single bit in a terabyte file, it would be useless for databases, spreadsheets, word processor documents, architectural drawings, email, and all the other etcetera it is used for billions of times a day.

I receive a feed of airline data that amounts to couple gigabytes a day and we would quickly exhaust all the drive space we could afford in short order. However, after the day's processing is done we zip the day's files together into one archive that is only several megabytes. The compression ratio is better than 500:1. It can achieve this because the message format is extremely repetitive and redundant with a lot of empty space. At any time we can extract any of the messages from the .zip file and it is guaranteed accurate bit-for-bit with the original.
Former user wrote on 6/1/2014, 10:25 AM
I gave a link to LZW compression that explains how it works. There are other links and pages available that explain how compression can be lossless.

I have reported your use of an obscenity as well to the forum administrators.

Good luck.
deusx wrote on 6/1/2014, 10:38 AM
>>>>deusx, i can state categorically that there was something wrong with your testing.<<<<

All I did was zip a roughly 4GB folder using 7 Zip and its default options. That is a very popular open source program.

After either 3 or 4 round trips ( this was about a year ago, so not sure ) the unzipped files looked less sharp than the originals. That was the first thing I noticed and it was obvious. That was it for me as far as storing videos in zipped files.

>>>>I have reported your use of an obscenity as well to the forum administrators.

Good luck.<<<<

Yeah, now they are gonna tell me to zip it ( see what I did there? )
altarvic wrote on 6/1/2014, 10:41 AM
@deusx:

> "Finally, there is no such thing as lossless compression"

The simplest example of lossless compression:
AAAAAAAAAABBBBBCCC -> 10A5B3C

> "If you zip and unzip once you may not notice a difference. Do it 3-4 times you will definitely notice a significant drop in quality, and I mean really significant with some file formats ( .mov for example )"

It would be very interesting to see your examples

> " And do you know what a slider where you can set the file size is called ( in Adobe's Fireworks )? It's called Loss"

It has nothing to do with GIF format. LZW compression is lossless.



John_Cline wrote on 6/1/2014, 4:12 PM
deusx, I don't know of a nice way to put this but you are absolutely, positively, 100% wrong. Dead wrong. ZIP and RAR are lossless, what you put in is what you get out.
Steve Mann wrote on 6/1/2014, 5:05 PM
Zip is lossless

You can zip a zipped file, but the additional token tables in the second pass file can make the second pass file bigger than the first pass.

When you get options for more compression, you are telling the zip program to look harder for repeat data that can be compressed at the expense of compression time, and usually not worth the bother. I don't recall the default pattern size, I think it was four bytes. Higher compression looks for longer duplicate patterns of data to make into tokens.
farss wrote on 6/2/2014, 2:31 AM
The overall statement being challenged is that lossless compression is possible.

Clearly it is possible.
Storing data on magnetic media requires it to be losslessly encoded as magnetic media cannot store a sequences of ones or a sequence of zeros, there has a to be a change of magnetic flux to record anything. To get around this problem schemes such as Non Return to Zero (NRZ) and Run Length Limited (RLL) are or at least were used. I say "were" because that's from my old grey cells from decades ago when silicon wasn't anywhere near as fast as it is today. Whatever systems are used today isn't that important, what is important to realise is that lossless compression is being used without us even being aware of it and clearly it works with mind boggling reliability.

Bob.

riredale wrote on 6/2/2014, 7:36 AM
Compression is a fascinating topic, and one which has been pursued for a long time. There are many techniques that have been invented. There's lossy compression, where one throws away information that is deemed least important (i.e. can't hear [mp3] or see [jpg] the difference), and lossless, where clever tricks are used to remove redundancies in the raw data. BY DEFINITION lossless compression can be repeated forever and the original data can still be recovered. The "AAAAAABBBCCC" example above is a good one. Another is Huffman coding, where the encoder looks at the overall file and breaks it down into chunks. Those chunks that are most common are assigned the shortest code in a new "language." Then all you have to do is just transmit the master file showing the chunk assignments followed by the resulting code.

Reading what I just wrote I don't think I've described the process very well, but in any event lossless encoding can do up to about a 50% compression for many types of data. And I remember using a zip compressor years ago that gave me several options for compression. The difference was that I could choose to compress something quickly and get a pretty good compression, or I could tell the compressor to work really hard and use a number of techniques in order to get a slightly-increased compression, but at the cost of taking maybe ten times longer to compress. The same option shows up in my Acronis backup program. Compress quickly or take longer to compress a bit better.
Terje wrote on 6/2/2014, 6:52 PM
@deusx >> I zipped and unzipped some files 3-4 times and they definitely did not look as good as the originals. Used 7Zip and its default settings

Sorry, you are wrong. What goes into a zip file comes back out, bit for bit identical. Unless something went wrong that is.

>> First of all these guys are trying to sell us a product, secondly they aren't nearly as smart as they think they are and could be delusional too.

Sorry, you are wrong. Entirely, totally wrong. Zip is 100% lossless no matter what you think.

>> Likewise, if it's all lossless with .zip files why so many compression options?

Because different things compress better/worse depending on your algorithm. Text compresses very well with some, binary with others. No matter what, the data that comes back out of the zip file is, bit for bit, identical to what went in.
Terje wrote on 6/2/2014, 6:58 PM
@deusx >> the unzipped files looked less sharp than the originals

Probably what is known as confirmation bias. You were expecting the files to be of lower quality, so that is what you saw. Utterly normal. Still not true. The files, after 1 million zip/unzip would be bit for bit identical.

You can zip/unzip .exe files (programs). You can even set up your disk so that all files are always compressed using zip algorithms. If a single bit in the executable part of an executable file (it can also have resources, which are less problematic often) is changed the file will simply not run properly. Zip files are only altered in the zip/unzip process if there is something wrong with the zip utility you use. Otherwise they always compress 100% lossless.
deusx wrote on 6/3/2014, 2:50 AM
>>>You were expecting the files to be of lower quality<<<

No I wasn't. I was actually surprised at the difference. I wasn't even looking for a difference in quality. I was just testing how much space I could save ( not much ) and this one folder went through 3-4 zip/unzip round trips in the process ( this was over a period of 7-10 days ). The only reason I remember this is because it was a surprise.

amp simulators use algorithms too. Duplicate real amp's waveforms exactly, bit for bit, yet in the end sounds like crap when compared to the real thing.

I don't know what the case may be, but just because somebody tells me it's lossless it doesn't make it so in the real world. Lossless can mean any number of things like all of the bits are there, no bits lost, , but are they in the right place, order, whatever.

If it were so simple amp simulators would sound exactly like real amps and we'd only have one zip utility that does exactly the same thing. I don't doubt that zipping and unzipping something will result in pretty much or exactly the same file coming out of it. But from my own experience zipping and unzipping a number of times I don't take it for granted. What it depends on I don't know. I tested a single file 5 times and could not see any difference, yet that huge folder ( it was 40GB, not 4 ) of .mov files did not end up looking the same after 3-4 round trips.

Sorry, I don't give a crap what Wikipedia says. I believe my own results and they say lossless is just another marketing term. And nothing is wrong with 7 Zip.
Maybe it was planet alignment that week that threw a few bits for a spin.

>>>>When you get options for more compression, you are telling the zip program to look harder for repeat data<<<<

That is exactly my point. If it has to be 'told" to look harder that means it doesn't always "see" everything therefore it can make mistakes. It is just an algorithm after all. Whatever other variable could have caused this has to be taken into account.
GeeBax wrote on 6/3/2014, 3:51 AM
I am going to weigh into the argument and say the same thing everyone else is saying, ZIP compression absolutely does not lose data, or scramble it or anything else.

It has nothing to do with amp simulators, it is a mathematical function. If ZIP did lose data for any reason, it would not have the standing it does today, because no-one would trust it and it would be essentially useless for most purposes.

As has been stated several times, just one bit out of place in an executable, and that executable fails.
John_Cline wrote on 6/3/2014, 4:04 AM
deusx, as far as I'm concerned, your credibility on this forum has just reached zero. There is an incredibly simple way to prove you are 100% dead wrong.

Go to the following website and download the "gfsw.exe" utility, it requires no installation, just run it, it will calculate both the CRC-32 and MD5 signature of a file, if even a single bit in a file is different, you will come up with different CRC-32 and MD5 values. Use this utility to calculate the signature of your source file, then run your .MOV file(s) through as many ZIP/UNZIP passes as you want and then calculate the signature of the resulting file. You will find the file to be absolutely identical.

http://esrg.sourceforge.net/utils_win_up/md5sum/

I just took a 2GB .MOV file and got into a hex editor and changed ONE BIT in the middle of the file, the checksum of the file did not match the original. I then changed the single bit back to its original value and the checksums matched again.

By the way, all file compressors like ZIP, RAR and 7ZIP use CRC-32 checksum signatures to determine if a file has changed through the process and will warn you if the values don't match.
ushere wrote on 6/3/2014, 5:03 AM
interestly enough, this was just posted over on gaotd:

http://www.giveawayoftheday.com/easy-archive-recovery/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+giveawayoftheday%2Ffeed+%28Giveaway+of+the+Day%29
John_Cline wrote on 6/3/2014, 6:21 AM
This GOTD program is useful when the archiving utility reports (through the CRC-32 checksum) that the archive is broken, this utility can recover only the undamaged files, it is unlikely that it is effective in recovering the damaged files.

However, the RAR archive format supports a special type of redundant data called the recovery record. Presence of recovery record makes an archive larger, but allows to repair it even in case of physical data damage due to disk failure or data loss of any other kind, provided that the damage is not too severe. Such damage recovery can be done with Repair archive command. ZIP archive format does not support the recovery record.
farss wrote on 6/3/2014, 7:59 AM
[I]"That is exactly my point. If it has to be 'told" to look harder that means it doesn't always "see" everything therefore it can make mistakes."[/I]

Looking "harder" means to use more complex algorithms in an attempt to make the compressed file smaller. The more complex algorithms, by working harder, take longer so the user is given the option of getting the compression done quickly but not as efficiently or slower with more compression.

I've tried a couple of options with the paid for version of WinZip compressing my very bloated Outlook file. Around a 30% further reduction in file size takes way too long for me given how cheap disk space is but if I was uploading the backup to the cloud I'd use it to save upload time.

Bob.
Former user wrote on 6/3/2014, 8:08 AM
Deusx

Maybe it would make sense to us concerning your quality loss if you explained step for step the process you did to notice the difference. What was the roundtrip path? Were you zipping the original each time? Zipping an unzipped file? or zipping a zipped file?

This has nothing to do with "Wikipedia crap". This a fact that as you see is supported by everybody else on this thread as well as many white pages that can be found on the web.