Re: DOS ideas with fast simple algorithms - was: BSUM BS

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Re: DOS ideas with fast simple algorithms - was: BSUM BS

Karen Lewellen-2
From: Rugxulo <[hidden email]>

Hi, Eric, always good to hear from you,

On Mon, Apr 10, 2017 at 4:21 PM, Eric Auer <[hidden email]> wrote:

>> BSUM (by Mateusz Viste) :  6.0s (100%)
>> CRC32 (by Joe Forster)  :  8.5s  (70%)
>> MD5 (by Colin Plumb)    : 52.9s  (11%)
>> SHA1 (by Colin Plumb)   : 85.7s   (7%)
> Entertaining :-) Still you need to find a good balance
> between speed and collision risk. If you want to find
> duplicate files, you can first check simply the sizes.

Check sizes? Okay, but some files still have bogus data at the end
that is (largely) ignored.

Well, maybe .ZIP comments aren't quite the literal "end", but I did
find a .ZIP recently that had a bunch of 0x1A (EOF) markers appended
(for some obscure reason, yes I know about CP/M's reasoning, but why
would that carry over to a DOS-only .ZIP ???). And I've seen .ZIPs
with the same exact files but using different internal compression
methods. Same with OS-specific "extra fields".

So even if the outside container is "slightly" different, the
internals are 100% the same. There are no guarantees for 100% "byte
exact", usually only "close enough".

I am not a mathematician, and I'm out of the loop, but I feel like the
risk of (accidental) collision is still fairly low. Call me naive.
Besides, don't forget that .ZIP (and .ARJ and who else, ZOO ??) still
uses CRC32 internally, and .ZIP is still overwhelmingly used for
downloads (despite more efficient solutions). Even .7z and .xz have
been criticized for flaws, so nothing is perfect.

Similarly, it's not as easy as it sounds to replicate 100% "byte
exact" executables. Even the slightest detail can alter the checksum,
even if 100% equivalent functionality, even if using the exact same
tools. Honestly, most things (software, data, et al.) just aren't
meant to be "byte exact" (match identical).

> [Skein] is even faster than Groestl but only on modern 64-bit CPU.

"Modern"? AMD64 (with mandatory SSE2) appeared in 2003, Intel cloned
it in Xeons in 2004 and Core 2 in 2006. It's been around quite a
while, in various iterations. I think "modern" probably implies
AVX(es) or newer Haswell-era / Skylake instructions.

Heck, AMD's newfangled Ryzen supports the following (quoting from
Wikipedia):  AMD64/x86-64, MMX(+), SSE1, SSE2, SSE3, SSSE3, SSE4a,

(Note that CLMUL also says it can do ultra-fast CRCs, see the relevant
Intel PDF linked from Wikipedia.)

> If you feel like trying a new DOS project: It would be a
> very fancy thing to have a disk-backed TEA encrypted disk
> image based "disk" or a disk-backed COMPRESSED disk image
> based "disk" driver with some very minimalistic compression
> algorithm.

Regrettably, there hasn't been a lot of interest in DOS file systems
work. Not that I blame them, it's not easy for any OS.

I assume you vaguely remember (or are familiar with) an old DOS
compression program called "DIET", which had an optional TSR mode.
Probably not quite what you meant, but I'm just reminding you anyways.

> A tiny-amount-of-RAM compression algorithm would be for
> example run length encoding. LZ variants such as LZO can
> decompress without needing extra RAM outside the unpack
> buffer itself.

"mini LZO" is very small, (allegedly) very easy to use / embed in new
projects. It was also updated last month:

> LZO and LZ4 are simple enough to even be used in Linux zram
> which can swap out RAM to a compresed RAM disk on the fly.

"zram was merged into the Linux kernel mainline in kernel version
3.14, released on March 30, 2014."
"Google uses zram in Chrome OS since 2013 and in Android since its
version 4.4. Lubuntu also started using zram in its version 13.10."

But I had read somewhere that it only saves a relatively small amount
of RAM (a dozen or so MB). Better than nothing, but not exactly
life-saving / earth-shattering.

Check out the vibrant tech community on one of the world's most
engaging tech sites,!
Freedos-user mailing list
[hidden email]

--- Internet Rex 2.29
 * Origin: - 502/875-8938 (276:10/901)
--- Synchronet 3.15a-Linux ListGate 1.3
 *  Capitol City Online - Frankfort, KY - telnet://

Check out the vibrant tech community on one of the world's most
engaging tech sites,!
Freedos-user mailing list
[hidden email]