Quantcast

bsum - compute BSD checksums of your files

classic Classic list List threaded Threaded
89 messages Options
12345
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bsum - compute BSD checksums of your files

Mateusz Viste-5
Hello,

I needed to verify the integrity of a few files after transferring them
to/from my 8086 PC the other day. The obvious method for such task is
computing a checksum of the file, like MD5, SHA1, etc... However, on an
8086 this may take ages (even on a fairly fast 386, computing the MD5 sum
of a 2 MiB file takes one minute).

Since I don't like waiting, I created an alternative tool over the
weekend: bsum.

bsum is a tiny DOS tool that computes the BSD checksum of a file. It's
very tiny: only 256 bytes (half of which is taken by the help screen), so
it will easily fit in a single disk sector. A BSD checksum is obviously
not as strong as MD5 or SHA1, but it's still more than enough for
verifying whether or not a file got corrupted during a transfer.

bsum is compatible with 8086 and requires only a few kilobytes of memory.
Also, it's very fast.

Homepage: http://bsum.sourceforge.net

Mateusz


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Rugxulo
Hi,

I haven't tried this (yet), but nice work!

On Sun, Apr 9, 2017 at 1:16 PM, Mateusz Viste <[hidden email]> wrote:
>
> I needed to verify the integrity of a few files after transferring them
> to/from my 8086 PC the other day. The obvious method for such task is
> computing a checksum of the file, like MD5, SHA1, etc...

These days, MD5 and SHA1 are normally considered broken and obsolete,
but they're still good for simple private checks against corruption.

Having said that, I normally prefer MD5 myself, but in DOS I often
(also) use CRC32, which is fairly universal (and used by archivers
like ZIP).

> However, on an 8086 this may take ages (even on a fairly fast 386,
> computing the MD5 sum of a 2 MiB file takes one minute).

It would be interesting to see some benchmark numbers for that (for
various specific tools, 8086, 386, etc).

I know Blair's (16-bit) MD5SUM is usually half the speed of DOS386's
FBMD5 (32-bit). Of course that can vary by cpu family and other
factors. Also, like all things, I'm sure there's plenty of room for
improvement.

AFAIK, the 386 (often with little or no cache?) preferred much smaller
code (similar to 8086) vs. 486's pipelined way of preferring simpler
instructions. The 486 was also allegedly very sensitive to alignment.
I'm not sure many compilers truly took full advantage of those
specific cpus.

> Since I don't like waiting, I created an alternative tool over the weekend: bsum.
>
> bsum is a tiny DOS tool that computes the BSD checksum of a file. It's
> very tiny: only 256 bytes (half of which is taken by the help screen), so
> it will easily fit in a single disk sector. A BSD checksum is obviously
> not as strong as MD5 or SHA1, but it's still more than enough for
> verifying whether or not a file got corrupted during a transfer.
>
> bsum is compatible with 8086 and requires only a few kilobytes of memory.
> Also, it's very fast.

Splurge on the memory, give it 32 kb or so. It'll "probably" be faster
with a bigger buffer.

> Homepage: http://bsum.sourceforge.net

Sounds good. Although I admit to being mostly unfamiliar with BSD cksum.

I did recently try to mirror some CRC32 tools to iBiblio, just for completeness.

http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/util/file/crc32/

Charles Dye's CHKSUM is fairly feature-packed (albeit only CRC32 and
DR-DOS XDIR sums), roughly 5 kb.

The other CRC32 util is a very simplistic (but good) .COM of roughly 1
kb, which is what I often use in a pinch (mostly due to its small
size). And I think the author of that one still frequents FreeDOS
mailing lists.

Again, feel free to benchmark some of these, and tell us the results.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Mateusz Viste-5
On Mon, 10 Apr 2017 00:56:17 -0500, Rugxulo wrote:
> It would be interesting to see some benchmark numbers for that (for
> various specific tools, 8086, 386, etc).

Just for the fun of it, I did some quick measures on my 386SX PC,
computing various checksums of a 2 MiB file. Results below.

BSUM (by Mateusz Viste) :  6.0s (100%)
CRC32 (by Joe Forster)  :  8.5s  (70%)
CRC32 (by Colin Plumb)  : 26.7s  (22%)
MD5 (by Colin Plumb)    : 52.9s  (11%)
SHA1 (by Colin Plumb)   : 85.7s   (7%)

BSUM is the fastest, which is no surprise since the algorithm is
extremely simple (4 CPU instructions). The CRC32 computation by Joe
Forster is surprisingly fast as well. It's 30% slower than bsum and the
binary is 4x times larger (and I suppose the memory usage is also much
higher) but that's still quite impressive for a 32-bit checksum.

> Splurge on the memory, give it 32 kb or so. It'll "probably" be faster
> with a bigger buffer.

At the cost of reducing the number of platforms it would be able to run
on.
Currently bsum uses an 8K memory buffer to optimize disk reads. Using a
buffer of 64KB increases the overall speed by 10%. Not that much, for a
700% increase of memory usage.

Mateusz


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Dale E Sterner
In reply to this post by Mateusz Viste-5
Would you or anyone else know if there is an 802.11 client for dos?
Never heard of one but you guys know alot more than I ever will.

cheers
DS




On Mon, 10 Apr 2017 13:36:24 +0000 (UTC) Mateusz Viste
<[hidden email]> writes:

> On Mon, 10 Apr 2017 00:56:17 -0500, Rugxulo wrote:
> > It would be interesting to see some benchmark numbers for that
> (for
> > various specific tools, 8086, 386, etc).
>
> Just for the fun of it, I did some quick measures on my 386SX PC,
> computing various checksums of a 2 MiB file. Results below.
>
> BSUM (by Mateusz Viste) :  6.0s (100%)
> CRC32 (by Joe Forster)  :  8.5s  (70%)
> CRC32 (by Colin Plumb)  : 26.7s  (22%)
> MD5 (by Colin Plumb)    : 52.9s  (11%)
> SHA1 (by Colin Plumb)   : 85.7s   (7%)
>
> BSUM is the fastest, which is no surprise since the algorithm is
> extremely simple (4 CPU instructions). The CRC32 computation by Joe
> Forster is surprisingly fast as well. It's 30% slower than bsum and
> the
> binary is 4x times larger (and I suppose the memory usage is also
> much
> higher) but that's still quite impressive for a 32-bit checksum.
>
> > Splurge on the memory, give it 32 kb or so. It'll "probably" be
> faster
> > with a bigger buffer.
>
> At the cost of reducing the number of platforms it would be able to
> run
> on.
> Currently bsum uses an 8K memory buffer to optimize disk reads.
> Using a
> buffer of 64KB increases the overall speed by 10%. Not that much,
> for a
> 700% increase of memory usage.
>
> Mateusz
>
>
>
-------------------------------------------------------------------------
-----
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Freedos-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/freedos-user
>


******************************************************>>>>
>From Dale Sterner - MS organic chemistry
http://pubs.acs.org/doi/abs/10.1021/jo00975a052
*******************************************************>>>>

____________________________________________________________
Police Urge Americans to Carry This With Them at All Times
The Observer
http://thirdpartyoffers.juno.com/TGL3141/58eb98a73a29d18a74578st01duc

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Ralf Quint
In reply to this post by Mateusz Viste-5
On 4/10/2017 6:36 AM, Mateusz Viste wrote:

> On Mon, 10 Apr 2017 00:56:17 -0500, Rugxulo wrote:
>> It would be interesting to see some benchmark numbers for that (for
>> various specific tools, 8086, 386, etc).
> Just for the fun of it, I did some quick measures on my 386SX PC,
> computing various checksums of a 2 MiB file. Results below.
>
> BSUM (by Mateusz Viste) :  6.0s (100%)
> CRC32 (by Joe Forster)  :  8.5s  (70%)
> CRC32 (by Colin Plumb)  : 26.7s  (22%)
> MD5 (by Colin Plumb)    : 52.9s  (11%)
> SHA1 (by Colin Plumb)   : 85.7s   (7%)
>
> BSUM is the fastest, which is no surprise since the algorithm is
> extremely simple (4 CPU instructions). The CRC32 computation by Joe
> Forster is surprisingly fast as well. It's 30% slower than bsum and the
> binary is 4x times larger (and I suppose the memory usage is also much
> higher) but that's still quite impressive for a 32-bit checksum.
Well, most of all, it's kind of comparing apples and oranges. Those
benchmark tests mean nothing if you don't compare them with the number
of possible collisions you get for each of them.
Though that doesn't mean that there aren't use cases where "simple does
it"...
>> Splurge on the memory, give it 32 kb or so. It'll "probably" be faster
>> with a bigger buffer.
Nope, won't do a thing. Didn't do much good "back in the days" to use
anything over 16KB and it is even less relevant on modern hard drives
with MBs of cache. Or SSDs...

Ralf



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: DOS ideas with fast simple algorithms - was: BSUM BSD checksum

Eric Auer-3
In reply to this post by Mateusz Viste-5

Hi Mateusz,

> BSUM (by Mateusz Viste) :  6.0s (100%)
> CRC32 (by Joe Forster)  :  8.5s  (70%)

...

> MD5 (by Colin Plumb)    : 52.9s  (11%)
> SHA1 (by Colin Plumb)   : 85.7s   (7%)

Entertaining :-) Still you need to find a good balance
between speed and collision risk. If you want to find
duplicate files, you can first check simply the sizes.

For the remaining candidates, I would say BSUM can be
useful if your disk is fast and your CPU is slow. If
it is the other way round, you feel the extra cost to
read the file as second time for a stronger checksum
after the quick BSUM says "possible" duplicate.

For checking if downloads worked without noise, I would
already want something "stronger" than BSUM, such as

https://en.wikipedia.org/wiki/Fletcher%27s_checksum

or Adler-32, CRC-32 or -64, but for prevention of faked
downloads even MD5 and SHA1 are actually too weak today.

You could check http://skein-hash.info/sha3-engineering
for candidates like groestl.info for fast and quite okay
hash/checksum tasks or use the official choice for that,
Keccak, which got selected as SHA-3 algorithm...

https://en.wikipedia.org/wiki/Skein_(hash_function)

is even faster than Groestl but only on modern 64-bit CPU.

> BSUM is the fastest, which is no surprise since the algorithm is
> extremely simple (4 CPU instructions). The CRC32 computation by Joe
> Forster is surprisingly fast as well...

If you feel like trying a new DOS project: It would be a
very fancy thing to have a disk-backed TEA encrypted disk
image based "disk" or a disk-backed COMPRESSED disk image
based "disk" driver with some very minimalistic compression
algorithm. Example abstraction layer: You could have some
array of CLUSTER offsets into the disk image, with units
in the order of SECTORS. The image could be pre-compressed
with a tool and the disk driver could open it read-only,
or you could store all changed clusters in a new offset as
soon as they no longer fit into their allocated compressed
file, using some offline re-compression process to "defrag"
that growth away later. Advantage of using sectors as units
of cluster offsets would be extremely fast seeking and the
ease of having "small" 16, 24 or 32 bit int as array items.
Which you could even store instead of one of the FAT and
then hide the change by showing users a copy of the other
when they try to access it ;-)

https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm

A tiny-amount-of-RAM compression algorithm would be for
example run length encoding. LZ variants such as LZO can
decompress without needing extra RAM outside the unpack
buffer itself.

https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Oberhumer

Classic exe-packers used algorithms similar to LZ4, which
focus on easy data formats for easiest decompression:

https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)

Nibble-based always feels better than having to scrape up
individual bits of some Huffman coded stream even though a
decompressor for those is still reasonably small and fast.

Various harddisk compressors also used methods in the same
style as LZ4 today, so you could say LZ is a real classic.
LZO and LZ4 are simple enough to even be used in Linux zram
which can swap out RAM to a compresed RAM disk on the fly.

> Currently bsum uses an 8K memory buffer to optimize disk reads. Using a
> buffer of 64KB increases the overall speed by 10%. Not that much, for a
> 700% increase of memory usage.

Interesting!

Cheers, Eric :-)



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

dmccunney
In reply to this post by Mateusz Viste-5
On Mon, Apr 10, 2017 at 9:36 AM, Mateusz Viste <[hidden email]> wrote:
> On Mon, 10 Apr 2017 00:56:17 -0500, Rugxulo wrote:
>
>> Splurge on the memory, give it 32 kb or so. It'll "probably" be faster
>> with a bigger buffer.
>
> At the cost of reducing the number of platforms it would be able to run
> on.

I have to ask.  How many folks *have* platforms now it *wouldn't* run
on? I suspect the number is *very* small.

(Most folks now are trying to get FreeDOS to boot native on a machine
rather larger and more powerful than the machines DOS was used on, or
running it in a VM.  Even folks doing embedded development on IoT
devices are probably dealing with fast full 32bit CPUs with more than
enough RAM and external storage, and can run a Linux kernel or an RTOS
that bears no resemblance to DOS.)

> Mateusz
______
Dennis
https://plus.google.com/u/0/105128793974319004519

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Rugxulo
In reply to this post by Mateusz Viste-5
Hi again,

On Mon, Apr 10, 2017 at 8:36 AM, Mateusz Viste <[hidden email]> wrote:
> On Mon, 10 Apr 2017 00:56:17 -0500, Rugxulo wrote:
>>
>> It would be interesting to see some benchmark numbers for that (for
>> various specific tools, 8086, 386, etc).
>
> Just for the fun of it, I did some quick measures on my 386SX PC,
> computing various checksums of a 2 MiB file. Results below.

Very interesting ....

> CRC32 (by Colin Plumb)  : 26.7s  (22%)
> MD5 (by Colin Plumb)    : 52.9s  (11%)
> SHA1 (by Colin Plumb)   : 85.7s   (7%)

Blair's (16-bit, FD) MD5SUM can do all of those hashes as well. Not
sure if it'd be faster, though.

> BSUM is the fastest, which is no surprise since the algorithm is
> extremely simple (4 CPU instructions). The CRC32 computation by Joe
> Forster is surprisingly fast as well. It's 30% slower than bsum and the
> binary is 4x times larger (and I suppose the memory usage is also much
> higher) but that's still quite impressive for a 32-bit checksum.

"30% slower" is machine specific, and I'm quite sure it can be
improved. Although his tool does seem to use a fairly big (64512 byte)
buffer.

***
If extremely bored, check out these "modern" (CRC32C, aka Castagnoli)
implementations, which I don't grok:

http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software
http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411
***

Of course he also combines (unused) decimal output routine with (used)
hex output routine, which unnecessarily (in this case) always uses
slow DIV (which you don't need at all for converting to hex). Of
course he only needs to call that routine once at the end. It would be
much worse result if called more often (e.g. hundreds of times). I've
done the same mistake in the past, too.

"4x times larger" is only in raw bytes, but in reality it uses a full
cluster (as you well know), so even a 256 byte .COM will still use
minimum one cluster (e.g. 512 bytes on 1.44 MB floppy). So 1024 isn't
really much worse than 512.   ;-)    Believe me, shrinking size is
fairly easy, but it's a tradeoff in accidental errors, readability,
and speed.

>> Splurge on the memory, give it 32 kb or so. It'll "probably" be faster
>> with a bigger buffer.
>
> At the cost of reducing the number of platforms it would be able to run on.
> Currently bsum uses an 8K memory buffer to optimize disk reads. Using a
> buffer of 64KB increases the overall speed by 10%. Not that much, for a
> 700% increase of memory usage.

Don't you have an 8086 machine? How much RAM does it have? I had
thought most had at least 64 kb of RAM, but I guess that's not
accounting for the DOS + shell overhead. Honestly, I wrote several
simple hexdump variants in recent months, and the biggest slowdown was
my small buffer (only 16 bytes in the .ASM version). The C version is
larger but always well-buffered, so it's the fastest. I even got 2x
speedup (and noticeable size decrease) by avoiding printf entirely and
using my own outhex routine.

Okay, so let me break down your source and give some (trivial)
comments here. I assume that's okay with you!  ;-)

Irrelevant aesthetics:   lines too long (shouldn't be more than 80),
not enough indentation (instructions vs. labels), irrelevant "jz short
..." (when "short" conditional jump is always mandatory for "cpu
8086").

"section .data align=1" is probably what you intended here. (No need
to comment it out entirely. I think default is align=4 or some such,
that's probably what you didn't like.)

"buff resb 8192" and "mov cx, 8192" should be moved to EQU for clarity
(and, even better, as "1024 * 8" constant expression).

The program does not end in a CR+LF pair. Thus the output is an
incomplete line. Not a huge deal but still (sometimes) noticeable.

"int 21h // xchg ax, bp // int 21h" is repeated several times. If you
really want to save space, put "msgquit:" before the first one and
"jmp short msgquit" for the others (since this is quitting the program
anyways).

BTW, most asm devs actively hate "loop" in lieu of "dec // jnz". Not
sure if this would really be worth it, even for your 8086.

"shl bx, cl" (where CL=4) is also shunned, AFAIK, on 8086 machines, in
lieu of speedier (times 4) "shl bx,1". But if it's only done extremely
rarely then it won't add up to much difference. Only when done
thousands of times would you barely even notice.

Converting hex nibble to ASCII shouldn't need a jump at all. On the
8086 all jumps are very slow. Best to avoid them entirely if possible.
Here you can easily use the old "cmp al, 0Ah // sbb al, 69h // das"
trick instead. But since you're only printing hex one time (instead of
thousands), you probably don't care.

Okay, just wanted to add my $0.02 in case it was (accidentally) helpful.   :-)

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: DOS ideas with fast simple algorithms - was: BSUM BSD checksum

Rugxulo
In reply to this post by Eric Auer-3
Hi, Eric, always good to hear from you,

On Mon, Apr 10, 2017 at 4:21 PM, Eric Auer <[hidden email]> wrote:

>
>> BSUM (by Mateusz Viste) :  6.0s (100%)
>> CRC32 (by Joe Forster)  :  8.5s  (70%)
>
>> MD5 (by Colin Plumb)    : 52.9s  (11%)
>> SHA1 (by Colin Plumb)   : 85.7s   (7%)
>
> Entertaining :-) Still you need to find a good balance
> between speed and collision risk. If you want to find
> duplicate files, you can first check simply the sizes.

Check sizes? Okay, but some files still have bogus data at the end
that is (largely) ignored.

Well, maybe .ZIP comments aren't quite the literal "end", but I did
find a .ZIP recently that had a bunch of 0x1A (EOF) markers appended
(for some obscure reason, yes I know about CP/M's reasoning, but why
would that carry over to a DOS-only .ZIP ???). And I've seen .ZIPs
with the same exact files but using different internal compression
methods. Same with OS-specific "extra fields".

So even if the outside container is "slightly" different, the
internals are 100% the same. There are no guarantees for 100% "byte
exact", usually only "close enough".

I am not a mathematician, and I'm out of the loop, but I feel like the
risk of (accidental) collision is still fairly low. Call me naive.
Besides, don't forget that .ZIP (and .ARJ and who else, ZOO ??) still
uses CRC32 internally, and .ZIP is still overwhelmingly used for
downloads (despite more efficient solutions). Even .7z and .xz have
been criticized for flaws, so nothing is perfect.

Similarly, it's not as easy as it sounds to replicate 100% "byte
exact" executables. Even the slightest detail can alter the checksum,
even if 100% equivalent functionality, even if using the exact same
tools. Honestly, most things (software, data, et al.) just aren't
meant to be "byte exact" (match identical).

> [Skein] is even faster than Groestl but only on modern 64-bit CPU.

"Modern"? AMD64 (with mandatory SSE2) appeared in 2003, Intel cloned
it in Xeons in 2004 and Core 2 in 2006. It's been around quite a
while, in various iterations. I think "modern" probably implies
AVX(es) or newer Haswell-era / Skylake instructions.

Heck, AMD's newfangled Ryzen supports the following (quoting from
Wikipedia):  AMD64/x86-64, MMX(+), SSE1, SSE2, SSE3, SSSE3, SSE4a,
SSE4.1, SSE4.2, AES, CLMUL, AVX, AVX2, FMA3, CVT16/F16C, ABM, BMI1,
BMI2, SHA

(Note that CLMUL also says it can do ultra-fast CRCs, see the relevant
Intel PDF linked from Wikipedia.)

> If you feel like trying a new DOS project: It would be a
> very fancy thing to have a disk-backed TEA encrypted disk
> image based "disk" or a disk-backed COMPRESSED disk image
> based "disk" driver with some very minimalistic compression
> algorithm.

Regrettably, there hasn't been a lot of interest in DOS file systems
work. Not that I blame them, it's not easy for any OS.

I assume you vaguely remember (or are familiar with) an old DOS
compression program called "DIET", which had an optional TSR mode.
Probably not quite what you meant, but I'm just reminding you anyways.
  ;-)

ftp://ftp.sac.sk/pub/sac/pack/diet145f.zip

> A tiny-amount-of-RAM compression algorithm would be for
> example run length encoding. LZ variants such as LZO can
> decompress without needing extra RAM outside the unpack
> buffer itself.

"mini LZO" is very small, (allegedly) very easy to use / embed in new
projects. It was also updated last month:

http://www.oberhumer.com/opensource/lzo/

> LZO and LZ4 are simple enough to even be used in Linux zram
> which can swap out RAM to a compresed RAM disk on the fly.

https://en.wikipedia.org/wiki/Zram

"zram was merged into the Linux kernel mainline in kernel version
3.14, released on March 30, 2014."
...
"Google uses zram in Chrome OS since 2013 and in Android since its
version 4.4. Lubuntu also started using zram in its version 13.10."

But I had read somewhere that it only saves a relatively small amount
of RAM (a dozen or so MB). Better than nothing, but not exactly
life-saving / earth-shattering.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Mateusz Viste-5
In reply to this post by dmccunney
On Mon, 10 Apr 2017 17:57:59 -0400, dmccunney wrote:
> I have to ask.  How many folks *have* platforms now it *wouldn't* run
> on? I suspect the number is *very* small.

Surely, yes. Still, a 700% memory increase for a 10% performance boost
just doesn't feel right. I wrote bsum to cover an extreme case - in such
context I prefer keeping the memory footprint as small as possible.

> Most folks now are trying to get FreeDOS to boot native on a machine
> rather larger and more powerful than the machines DOS was used on, or
> running it in a VM

I'd say that for these machines bsum is irrelevant - they are much better
off using md5 or anything else.

Mateusz


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Mateusz Viste-5
In reply to this post by Rugxulo
On Mon, 10 Apr 2017 17:07:30 -0500, Rugxulo wrote:
> Blair's (16-bit, FD) MD5SUM can do all of those hashes as well. Not sure
> if it'd be faster, though.

I believe that's the one I used. If I understand correctly, the original
author is Colin Plumb, and Blair took the maintenance of it at some point.

> Believe me, shrinking size is fairly easy,

If you say so.

> but it's a tradeoff in accidental errors, readability,
> and speed.

Unless it's a goal in itself ("keep the whole thing in 256 bytes"), as is
the case of bsum.

> Irrelevant aesthetics:   lines too long (shouldn't be more than 80),

I'll skip all aesthetics remarks, since these are a rather personal thing.

> irrelevant "jz short ..." (when "short" conditional jump is always
> mandatory for "cpu 8086").

I don't think so.
Note that short means "8 bit jump" in this context, and NOT "16 bit jump".

> "section .data align=1" is probably what you intended here. (No need to
> comment it out entirely.

No need to have it either (not in tiny model).

> The program does not end in a CR+LF pair. Thus the output is an
> incomplete line. Not a huge deal but still (sometimes) noticeable.

True. I noticed that command.com adds a CR+LF pair whenever a program
doesn't end with those. This seems to be consistent with both FreeDOS and
MS-DOS, so I thought I'd exploit this to save a few bytes in the program.

> "int 21h // xchg ax, bp // int 21h" is repeated several times. If you
> really want to save space, put "msgquit:" before the first one and "jmp
> short msgquit" for the others (since this is quitting the program
> anyways).

Indeed, that would save 1 byte or 2. Good catch.

> BTW, most asm devs actively hate "loop" in lieu of "dec // jnz". Not
> sure if this would really be worth it, even for your 8086.

Actually my trunk version (svn) does avoid loop in favor of dec/jnz.
The former is shorter by one byte, but 3 times slower than the latter
(5/6 clks vs 2 clks).

> "shl bx, cl" (where CL=4) is also shunned, AFAIK, on 8086 machines, in
> lieu of speedier (times 4) "shl bx,1".

But repeated shl bx,1 is so much bigger. I definitely prefer shl bx,cl,
at least whenever not in performance-critical parts.

> Converting hex nibble to ASCII shouldn't need a jump at all. On the 8086
> all jumps are very slow. Best to avoid them entirely if possible.
> Here you can easily use the old "cmp al, 0Ah // sbb al, 69h // das"
> trick instead. But since you're only printing hex one time (instead of
> thousands), you probably don't care.

Indeed, I care little about jumps there, but still your version might be
shorter, which would make it interesting. Will compare.

Mateusz


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Mateusz Viste-5
On Tue, 11 Apr 2017 02:52:06 +0000, Mateusz Viste wrote:
> On Mon, 10 Apr 2017 17:07:30 -0500, Rugxulo wrote:
>> Converting hex nibble to ASCII shouldn't need a jump at all. On the
>> 8086 all jumps are very slow. Best to avoid them entirely if possible.
>> Here you can easily use the old "cmp al, 0Ah // sbb al, 69h // das"
>> trick instead. But since you're only printing hex one time (instead of
>> thousands), you probably don't care.
>
> Indeed, I care little about jumps there, but still your version might be
> shorter, which would make it interesting. Will compare.

Checked: your nibble-to-hex version is indeed smaller. Hence it's better
than mine both in terms of space (by 3 bytes) and speed (no jump). Nice!

Mateusz


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Mateusz Viste-5
In reply to this post by Ralf Quint
I have to clarify here that my intention was never to compete in any way
with the other algorithms out there. The BSD checksum is a well-known,
and pretty weak (16 bits) checksum. The goal behind bsum was only to
obtain a checksum tool that would run on my 8086 fast enough for me to
not get frustrated, and just good enough to be reasonably sure that the
files I just copied from a diskette and then over network-through-
parallel-port didn't get corrupted in the process.

Mateusz



On Mon, 10 Apr 2017 09:48:41 -0700, Ralf Quint wrote:

> On 4/10/2017 6:36 AM, Mateusz Viste wrote:
>> On Mon, 10 Apr 2017 00:56:17 -0500, Rugxulo wrote:
>>> It would be interesting to see some benchmark numbers for that (for
>>> various specific tools, 8086, 386, etc).
>> Just for the fun of it, I did some quick measures on my 386SX PC,
>> computing various checksums of a 2 MiB file. Results below.
>>
>> BSUM (by Mateusz Viste) :  6.0s (100%)
>> CRC32 (by Joe Forster)  :  8.5s  (70%)
>> CRC32 (by Colin Plumb)  : 26.7s  (22%)
>> MD5 (by Colin Plumb)    : 52.9s  (11%)
>> SHA1 (by Colin Plumb)   : 85.7s   (7%)
>>
>> BSUM is the fastest, which is no surprise since the algorithm is
>> extremely simple (4 CPU instructions). The CRC32 computation by Joe
>> Forster is surprisingly fast as well. It's 30% slower than bsum and the
>> binary is 4x times larger (and I suppose the memory usage is also much
>> higher) but that's still quite impressive for a 32-bit checksum.
> Well, most of all, it's kind of comparing apples and oranges. Those
> benchmark tests mean nothing if you don't compare them with the number
> of possible collisions you get for each of them.
> Though that doesn't mean that there aren't use cases where "simple does
> it"...
>>> Splurge on the memory, give it 32 kb or so. It'll "probably" be faster
>>> with a bigger buffer.
> Nope, won't do a thing. Didn't do much good "back in the days" to use
> anything over 16KB and it is even less relevant on modern hard drives
> with MBs of cache. Or SSDs...
>
> Ralf


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Rugxulo
In reply to this post by Mateusz Viste-5
Hi,

On Mon, Apr 10, 2017 at 9:52 PM, Mateusz Viste <[hidden email]> wrote:
> On Mon, 10 Apr 2017 17:07:30 -0500, Rugxulo wrote:
>
>> irrelevant "jz short ..." (when "short" conditional jump is always
>> mandatory for "cpu 8086").
>
> I don't think so.
> Note that short means "8 bit jump" in this context, and NOT "16 bit jump".

Unless I'm mistaken, conditional jumps on 8086 don't go beyond -128 ..
127 (signed) byte range. Hence the billions of workarounds (TASM
"jumps", MASM "option ljmp", etc).

>> "section .data align=1" is probably what you intended here. (No need to
>> comment it out entirely.
>
> No need to have it either (not in tiny model).

But you still have it commented out, so I assume you at least wanted
it for descriptive purposes.

>> The program does not end in a CR+LF pair. Thus the output is an
>> incomplete line. Not a huge deal but still (sometimes) noticeable.
>
> True. I noticed that command.com adds a CR+LF pair whenever a program
> doesn't end with those. This seems to be consistent with both FreeDOS and
> MS-DOS, so I thought I'd exploit this to save a few bytes in the program.

Most (but not all) FreeCOM versions do this too. But ... that won't
work if you redirect the output to file. Then the CR+LF is (still)
missing. Of course, if you really need a workaround, afterwards do
"echo. >>bsum.out" and don't worry about it. (I still have at least
one util with the same problem, but I didn't fix it yet either.
Trivial but annoying. Some tools get confused by such incomplete
lines.)

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Mateusz Viste-5
On Mon, 10 Apr 2017 23:30:35 -0500, Rugxulo wrote:
> Unless I'm mistaken, conditional jumps on 8086 don't go beyond -128 ..
> 127 (signed) byte range. Hence the billions of workarounds (TASM
> "jumps", MASM "option ljmp", etc).

I won't argue about what opcode is or is not available on 8086, since I
did not bother decoding their exact meaning. I do see however that (NASM
at least) can assemble JZ and JZ SHORT in two very different forms, JZ
SHORT being significantly shorter.

  5 00000000 B80100          mov ax, 1
  6 00000003 48              dec ax
  7 00000004 746A            jz short gameover

  5 00000000 B80100          mov ax, 1
  6 00000003 48              dec ax
  7 00000004 7503E9DD01      jz gameover

Of course NASM always uses the short form whenever it's possible, but
when the jump is too far away it silently uses the longer form, hence the
need to always specify SHORT if one wants to be sure what's going on.

Mateusz


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Rugxulo
Hi,

On Tue, Apr 11, 2017 at 1:26 AM, Mateusz Viste <[hidden email]> wrote:

> On Mon, 10 Apr 2017 23:30:35 -0500, Rugxulo wrote:
>>
>> Unless I'm mistaken, conditional jumps on 8086 don't go beyond -128 ..
>> 127 (signed) byte range. Hence the billions of workarounds (TASM
>> "jumps", MASM "option ljmp", etc).
>
> I won't argue about what opcode is or is not available on 8086, since I
> did not bother decoding their exact meaning. I do see however that (NASM
> at least) can assemble JZ and JZ SHORT in two very different forms, JZ
> SHORT being significantly shorter.
>
>   5 00000000 B80100          mov ax, 1
>   6 00000003 48              dec ax
>   7 00000004 746A            jz short gameover
>
>   5 00000000 B80100          mov ax, 1
>   6 00000003 48              dec ax
>   7 00000004 7503E9DD01      jz gameover
>
> Of course NASM always uses the short form whenever it's possible, but
> when the jump is too far away it silently uses the longer form, hence the
> need to always specify SHORT if one wants to be sure what's going on.

AFAIK, the longer one is 386+ only, hence not available with "cpu
8086". Thus, if it still quietly assembles, that is a bug (but I
thought that was long-ago fixed/avoided).

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Mateusz Viste-5
On Tue, 11 Apr 2017 02:03:54 -0500, Rugxulo wrote:
> AFAIK, the longer one is 386+ only, hence not available with "cpu 8086".

The above code assembles with "cpu 8086" (NASM 2.12.02).

> Thus, if it still quietly assembles, that is a bug (but I thought that
> was long-ago fixed/avoided).

Perhaps a bug, didn't investigate. My point is - explicitly mentioning
SHORT is always a good idea. Better safe than sorry.

Mateusz


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: DOS ideas with fast simple algorithms - was: BSUM BSD checksum

Mateusz Viste-5
In reply to this post by Eric Auer-3
On Mon, 10 Apr 2017 23:21:26 +0200, Eric Auer wrote:
> For checking if downloads worked without noise, I would already want
> something "stronger" than BSUM, such as
>
> https://en.wikipedia.org/wiki/Fletcher%27s_checksum

As already stated in this thread a few times, the BSD checksum is far
from perfect - its major advantage is that it's extremely simple, hence
fast to compute even on a 8088, yet at the time it is reasonable to
assume in normal conditions (ie. no malicious intent) that a file that
shows the same BSD checksum before and after a transfer is indeed the
same file.

Still, since you mention "stronger" algorithms, it might be interesting
(if not entertaining) to see how the BSD checksum compares in terms of
collisions to other algorithms like CRC16, CRC32, Fletcher, etc. I did
try to google that out, but unfortunately the BSD checksum is rarely
mentioned in this kind of research. Would you mind sharing some links
that explain how much stronger the other algorithms are, and what are the
actual chances of hitting a collision by accident with BSUM?

Mateusz


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

tom ehlert
In reply to this post by Rugxulo
>>> Unless I'm mistaken, conditional jumps on 8086 don't go beyond -128 ..
>>> 127 (signed) byte range. Hence the billions of workarounds (TASM
>>> "jumps", MASM "option ljmp", etc).
right.

>> I won't argue about what opcode is or is not available on 8086, since I
>> did not bother decoding their exact meaning.

meaning 'I am a lazy, clueless guy, but write anyway ...'

>> I do see however that (NASM
>> at least) can assemble JZ and JZ SHORT in two very different forms, JZ
>> SHORT being significantly shorter.
>>
>>   5 00000000 B80100          mov ax, 1
>>   6 00000003 48              dec ax
>>   7 00000004 746A            jz short gameover
>>
>>   5 00000000 B80100          mov ax, 1
>>   6 00000003 48              dec ax
>>   7 00000004 7503E9DD01      jz gameover
>>
>> Of course NASM always uses the short form whenever it's possible, but
>> when the jump is too far away it silently uses the longer form, hence the
>> need to always specify SHORT if one wants to be sure what's going on.

> AFAIK,
meaning 'I am completely clueless , but offer my unfounded opinion anyway ...'

> the longer one is 386+ only, hence not available with "cpu
> 8086". Thus, if it still quietly assembles, that is a bug (but I
> thought that was long-ago fixed/avoided).

the longer one is 2 instructions instead, automatically generated by NASM
because the intended jump goes farther then 127 bte


c:\>debug
-e 100
1430:0100  00.75   00.03   00.e9   00.dd   00.01
-u 100
1430:0100 7503          JNZ     0105
1430:0102 E9DD01        JMP     02E2


Tom



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bsum - compute BSD checksums of your files

Mateusz Viste-5
Hi Tom,

That's nice of you to provide the explanation. I didn't read it
completely (too lazy), nor understand it fully (too stupid), but the
other clueless guy might find it interesting that he was only half wrong.

At the end of the day, I will keep using "JZ SHORT" anyway, 'cause that
just works for me.

cheers,
Mateusz





On Tue, 11 Apr 2017 12:33:17 +0200, Tom Ehlert wrote:

>>>> Unless I'm mistaken, conditional jumps on 8086 don't go beyond -128
>>>> ..
>>>> 127 (signed) byte range. Hence the billions of workarounds (TASM
>>>> "jumps", MASM "option ljmp", etc).
> right.
>
>>> I won't argue about what opcode is or is not available on 8086, since
>>> I did not bother decoding their exact meaning.
>
> meaning 'I am a lazy, clueless guy, but write anyway ...'
>
>>> I do see however that (NASM at least) can assemble JZ and JZ SHORT in
>>> two very different forms, JZ SHORT being significantly shorter.
>>>
>>>   5 00000000 B80100          mov ax, 1 6 00000003 48              dec
>>>   ax 7 00000004 746A            jz short gameover
>>>
>>>   5 00000000 B80100          mov ax, 1 6 00000003 48              dec
>>>   ax 7 00000004 7503E9DD01      jz gameover
>>>
>>> Of course NASM always uses the short form whenever it's possible, but
>>> when the jump is too far away it silently uses the longer form, hence
>>> the need to always specify SHORT if one wants to be sure what's going
>>> on.
>
>> AFAIK,
> meaning 'I am completely clueless , but offer my unfounded opinion
> anyway ...'
>
>> the longer one is 386+ only, hence not available with "cpu 8086". Thus,
>> if it still quietly assembles, that is a bug (but I thought that was
>> long-ago fixed/avoided).
>
> the longer one is 2 instructions instead, automatically generated by
> NASM because the intended jump goes farther then 127 bte
>
>
> c:\>debug -e 100 1430:0100  00.75   00.03   00.e9   00.dd   00.01 -u 100
> 1430:0100 7503          JNZ     0105 1430:0102 E9DD01        JMP    
> 02E2
>
>
> Tom


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freedos-user
12345
Loading...