From: Nicholas Clark Date: 15:57 on 01 Aug 2007 Subject: xargs $ echo | xargs uname vs $ echo | xargs uname Linux I'm not sure whether to hate the open group: The utility will be executed one or more times until the end-of-file is reached. ( http://opengroup.org/onlinepubs/007908799/xcu/xargs.html ) for having lame behaviour and no flag to give me the useful 'zero or more' Or FreeBSD for lulling me into an assumption that its xargs is standards conformant: STANDARDS The xargs utility is expected to be IEEE Std 1003.2 (``POSIX.2'') compli- ant. The -J, -o, -P and -R options are non-standard FreeBSD extensions which may not be available on other operating systems. FreeBSD 6.2 August 2, 2004 FreeBSD 6.2 Nicholas Clark
From: Peter da Silva Date: 17:25 on 01 Aug 2007 Subject: Re: xargs I'm all for hating The Open Group. This isn't the only place they've specified bizarre non-standard behavior as the new standard.
From: David Landgren Date: 21:45 on 01 Aug 2007 Subject: Re: xargs Nicholas Clark wrote: > $ echo | xargs uname > > vs > > $ echo | xargs uname > Linux What purpose does the latter serve? EMWTK, David
From: Peter da Silva Date: 22:37 on 01 Aug 2007 Subject: Re: xargs Now I think about this, I think the real problem is that xargs is inherently hateful. % echo % echo -n % In a sane world, "echo | xargs cmd" would result in "cmd" being executed as "cmd ''". But xargs splits on whitespace, rather than newline. Which is probably historically unfixable, but still hateful. And, no, xargs -0 is only a partial solution. Though my fingers have learned the macro... ... | tr '\n' '\0' | xargs -0 Unfortunately, "xargs -0" still doesn't DTRT. Hateful trollop.
From: David Cantrell Date: 23:02 on 01 Aug 2007 Subject: Re: xargs Peter da Silva wrote: > Now I think about this, I think the real problem is that xargs is > inherently hateful. What's hateful is the shell or OS (I can't be bothered to remember which) which makes it necessary - namely the limitation on how long your command line can be. Without that, xargs need not exist and so would be free of hate.
From: Peter da Silva Date: 01:13 on 02 Aug 2007 Subject: Re: xargs On Aug 1, 2007, at 17:02, David Cantrell wrote: > What's hateful is the shell or OS (I can't be bothered to remember > which) which makes it necessary - namely the limitation on how long > your > command line can be. Without that, xargs need not exist and so would > be > free of hate. An unlimited command line is impossible, and hideously inefficient. Eventually you'll run out of VM building the command line, no matter how much VM you have. You have to get away from the whole concept of the command line to resolve that problem.
From: Michael G Schwern Date: 05:51 on 02 Aug 2007 Subject: Re: xargs Peter da Silva wrote: > On Aug 1, 2007, at 17:02, David Cantrell wrote: >> What's hateful is the shell or OS (I can't be bothered to remember >> which) which makes it necessary - namely the limitation on how long your >> command line can be. Without that, xargs need not exist and so would be >> free of hate. > > An unlimited command line is impossible, and hideously inefficient. > Eventually you'll run out of VM building the command line, no matter how > much VM you have. You have to get away from the whole concept of the > command line to resolve that problem. I have an "unlimited" scrollback history option on my Terminal and somehow the universe has not yet imploded. Do we really have to suffix every use of "unlimited" with "up to the limitations of memory and hard drive space"? Its the 21st century. Dynamically allocated memory ain't exactly rocket science. C programs that still think it is, now that's hateful.
From: David Cantrell Date: 12:22 on 02 Aug 2007 Subject: Re: xargs On Wed, Aug 01, 2007 at 07:13:42PM -0500, Peter da Silva wrote: > On Aug 1, 2007, at 17:02, David Cantrell wrote: > >What's hateful is the shell or OS (I can't be bothered to remember > >which) which makes it necessary - namely the limitation on how long > >your > >command line can be. > An unlimited command line is impossible, and hideously inefficient. > Eventually you'll run out of VM building the command line, no matter > how much VM you have. You have to get away from the whole concept of > the command line to resolve that problem. Sure, but limiting me to a mere few thousand characters is just stupid. If I want to use a billion characters and kick the shit out of my swap slice, the computer should allow me, the administrator, to shoot myself in the foot. That it doesn't is a misfeature.
From: Andrew Black Date: 13:13 on 02 Aug 2007 Subject: Re: xargs On Thu, Aug 02, 2007 at 12:22:42PM +0100, David Cantrell wrote: > If I want to use a billion characters and kick the shit out of my swap > slice, the computer should allow me, the administrator, to shoot myself > in the foot. That it doesn't is a misfeature. I like the command grep somestring * > somefile which on windows means "fill my disk up". Why is an exercize for the reader.
From: Peter da Silva Date: 15:24 on 02 Aug 2007 Subject: Re: xargs On Aug 2, 2007, at 6:22, David Cantrell wrote: > If I want to use a billion characters and kick the shit out of my swap > slice, the computer should allow me, the administrator, to shoot myself > in the foot. That it doesn't is a misfeature. The OS is not required to allow you, as a user or administrator, to shoot yourself in the foot* every time you want to. Especially when the system would get seriously broken long before your gigabyte command lines started showing up... just letting users use a few megabytes of command line routinely would have unfortunate consequences. The solution is to get rid of the command *line* as the only way to pass parameters to a program. One of the ancestors of CP/M, ISIS, passed command arguments as preloaded input... your program just read the command line as the first line. In UNIX, it would be better to have a "stdcmd" stream that let you read a *genuinely* unlimited stream of null-terminated parameters. In the absence of that, handcraft it yourself. * For example... you're not allowed, as administrator, to use "dd" to zero out a directory entry on a mounted file system any more. You used to be able to treat directories just like files if you were root, and there was much wailing and gnashing of teeth when that went away.
From: Jarkko Hietaniemi Date: 15:29 on 02 Aug 2007 Subject: Re: xargs > a "stdcmd" stream that let you read a *genuinely* unlimited stream of > null-terminated parameters. In the absence of that, handcraft it Your C is showing. Please have the length of strings upfront. That being said, a stdcmd would be cool. > yourself. > > * For example... you're not allowed, as administrator, to use "dd" to > zero out a directory entry on a mounted file system any more. You used > to be able to treat directories just like files if you were root, and > there was much wailing and gnashing of teeth when that went away. Oh, the times: "unlink .". (Usually followed by "1..2..3...SYSTEM HALTED".)
From: Peter da Silva Date: 20:32 on 02 Aug 2007 Subject: Re: xargs On Aug 2, 2007, at 9:29, Jarkko Hietaniemi wrote: >> a "stdcmd" stream that let you read a *genuinely* unlimited stream of >> null-terminated parameters. In the absence of that, handcraft it > Your C is showing. It's UNIX, there's so many places where the requirement for a null terminator is exposed in file names and other parameters that there's no point targeting this one. Plus, that much compatibility with "xargs -0" would be useful. > Please have the length of strings upfront. Length encoding isn't self-syncing, and it also limits the length of individual parameters by the size of the length. Which would sure as anything bring us back to this discussion eventually. Use UTF-8 encoding with an alternate null mapping.
From: Andy Armstrong Date: 20:42 on 02 Aug 2007 Subject: Re: xargs On 2 Aug 2007, at 20:32, Peter da Silva wrote: > Use UTF-8 encoding with an alternate null mapping. Chunked UTF-8 encoding with BER encoded chunk lengths :)
From: Jarkko Hietaniemi Date: 02:00 on 03 Aug 2007 Subject: Re: xargs Peter da Silva wrote: > On Aug 2, 2007, at 9:29, Jarkko Hietaniemi wrote: >>> a "stdcmd" stream that let you read a *genuinely* unlimited stream of >>> null-terminated parameters. In the absence of that, handcraft it > >> Your C is showing. > > It's UNIX, there's so many places where the requirement for a null > terminator is exposed in file names and other parameters that there's > no point targeting this one. Plus, that much compatibility with "xargs > -0" would be useful. I thought we were Fixing things and not just dinking with them. >> Please have the length of strings upfront. > > Length encoding isn't self-syncing, and it also limits the length of > individual parameters by the size of the length. Which would sure as > anything bring us back to this discussion eventually. > > Use UTF-8 encoding with an alternate null mapping. Java lover, are you? :-) I was about to suggest BER encoding but somebody beat me to it. > >
From: A. Pagaltzis Date: 06:59 on 03 Aug 2007 Subject: Zero-terminated strings suck (was: xargs) * Peter da Silva <peter@xxxxxxx.xxx> [2007-08-02 21:45]: > > Please have the length of strings upfront. > > Length encoding isn't self-syncing, Synching the length is O(1). Not doing it makes *everything* O(n) and leaves you to deal with the semi-predicate problem, source of lots of 'sploits and other fun for the whole family. Pick your poison; I know which one I prefer. > and it also limits the length of individual parameters by the > size of the length. So what? Just make sure the counter is wide enough for any string you can keep in memory. Wasteful? So use a variable-width encoding for the length. It can be brutally stupid as it only needs to be competitive with NUL termination on very short strings. Using a single byte for strings < 128 bytes and a machine word for anything longer will do just fine. If you use the lowest bit as a flag to signify the type, then massaging the counter takes under 2 cycles on average with optimal machine code: ridiculously fast -- a common theme with operations on length-prefixed strings. That C and UNIX immortalised zero-terminated strings amounts to a crime if you consider the amount of CPU cycles that have since been wasted scanning for NULs. Regards,
From: demerphq Date: 10:45 on 03 Aug 2007 Subject: Re: Zero-terminated strings suck (was: xargs) On 8/3/07, A. Pagaltzis <pagaltzis@xxx.xx> wrote: > * Peter da Silva <peter@xxxxxxx.xxx> [2007-08-02 21:45]: > > > Please have the length of strings upfront. > > > > Length encoding isn't self-syncing, > > Synching the length is O(1). > > Not doing it makes *everything* O(n) and leaves you to deal with > the semi-predicate problem, source of lots of 'sploits and other > fun for the whole family. > > Pick your poison; I know which one I prefer. > > > and it also limits the length of individual parameters by the > > size of the length. > > So what? Just make sure the counter is wide enough for any string > you can keep in memory. > > Wasteful? So use a variable-width encoding for the length. It can > be brutally stupid as it only needs to be competitive with NUL > termination on very short strings. Using a single byte for > strings < 128 bytes and a machine word for anything longer will > do just fine. > > If you use the lowest bit as a flag to signify the type, then > massaging the counter takes under 2 cycles on average with > optimal machine code: ridiculously fast -- a common theme with > operations on length-prefixed strings. > > That C and UNIX immortalised zero-terminated strings amounts to a > crime if you consider the amount of CPU cycles that have since > been wasted scanning for NULs. Not just that, also the amount of developer hours that have been wasted dealing with issues related to this subject. Length encoded strings (and related array support) are about the only thing that Wirth got right in Pascal that K&R got wrong in C. Yves
From: Peter da Silva Date: 11:57 on 03 Aug 2007 Subject: Re: Zero-terminated strings suck (was: xargs) On Aug 3, 2007, at 0:59, A. Pagaltzis wrote: > * Peter da Silva <peter@xxxxxxx.xxx> [2007-08-02 21:45]: >>> Please have the length of strings upfront. >> Length encoding isn't self-syncing, > Synching the length is O(1). Can you elaborate on that comment, because I don't see how you can in principle resync a stream of length-prefixed records without any sentinel values. And if you have sentinels you're back where you started. > So what? Just make sure the counter is wide enough for any string > you can keep in memory. Stream data is not limited to the size of memory. > That C and UNIX immortalised zero-terminated strings amounts to a > crime if you consider the amount of CPU cycles that have since > been wasted scanning for NULs. In this situation, you're reading a stream anyway. You have to touch every byte as it comes in.
From: A. Pagaltzis Date: 06:49 on 05 Aug 2007 Subject: Re: Zero-terminated strings suck (was: xargs) * Peter da Silva <peter@xxxxxxx.xxx> [2007-08-03 17:02]: > On Aug 3, 2007, at 0:59, A. Pagaltzis wrote: > >* Peter da Silva <peter@xxxxxxx.xxx> [2007-08-02 21:45]: > >>> Please have the length of strings upfront. > > >>Length encoding isn't self-syncing, > > >Synching the length is O(1). > > Can you elaborate on that comment, because I don't see how you > can in principle resync a stream of length-prefixed records > without any sentinel values. And if you have sentinels you're > back where you started. I didn't realise you were talking specifically about *lists* of strings represented by a successing of zero-terminated strings. I though you were talking about single strings being self-syncing, which I understood to mean the same thing that Andy meant: you can futz around with the allocated length of a zero-terminated string without having to update any part of the string. Regards,
From: Peter da Silva Date: 13:29 on 06 Aug 2007 Subject: Re: Zero-terminated strings suck (was: xargs) > I didn't realise you were talking specifically about *lists* of > strings represented by a successing of zero-terminated strings. I could probably gen up a healthy rant about the hateful way people mix up wire formats, storage formats, and internal formats, but I don't feel up to that much hate this morning. :) Plus, I've kicked myself in the head over it too. Don't you hate coming across code you wrote years ago and wishing you had a temporal bitchslap machine so you could knock some sense into yourself?
From: A. Pagaltzis Date: 15:13 on 06 Aug 2007 Subject: Re: Zero-terminated strings suck (was: xargs) * Peter da Silva <peter@xxxxxxx.xxx> [2007-08-06 14:40]: > Don't you hate coming across code you wrote years ago and > wishing you had a temporal bitchslap machine so you could knock > some sense into yourself? That would lead us off-topic, because actually, not really. I'm probably still an idiot in the exact same ways as ever, because in old code of mine I see cluelessness but not much idiocy. Regards,
From: David Cantrell Date: 12:47 on 03 Aug 2007 Subject: Re: xargs On Thu, Aug 02, 2007 at 02:32:20PM -0500, Peter da Silva wrote: > > Please have the length of strings upfront. > Length encoding isn't self-syncing, Better that than not being able to easily pass arbitrary data, which might include NULLs. > and it also limits the length of > individual parameters by the size of the length. Which would sure as > anything bring us back to this discussion eventually. So have a sizeof_length field first :-) sizeof(sizeof_length) == 128 should work for at least a few years.
From: Peter da Silva Date: 13:23 on 03 Aug 2007 Subject: Re: xargs On Aug 3, 2007, at 6:47, David Cantrell wrote: > On Thu, Aug 02, 2007 at 02:32:20PM -0500, Peter da Silva wrote: >>> Please have the length of strings upfront. >> Length encoding isn't self-syncing, > Better that than not being able to easily pass arbitrary data, which > might include NULLs. As parameters to programs that are going to use them in UNIX system calls? If you want to redesign UNIX from scratch, be my guest... beware the second system syndrome (as someone already mentioned). I'm just trying to eliminate the hateful xargs.
From: David Cantrell Date: 14:28 on 03 Aug 2007 Subject: Re: xargs On Fri, Aug 03, 2007 at 07:23:55AM -0500, Peter da Silva wrote: > On Aug 3, 2007, at 6:47, David Cantrell wrote: > >On Thu, Aug 02, 2007 at 02:32:20PM -0500, Peter da Silva wrote: > >>> Please have the length of strings upfront. > >>Length encoding isn't self-syncing, > >Better that than not being able to easily pass arbitrary data, which > >might include NULLs. > As parameters to programs that are going to use them in UNIX system > calls? I want only one kind of string thankyouverymuch.
From: Andy Armstrong Date: 14:50 on 03 Aug 2007 Subject: Re: xargs On 3 Aug 2007, at 14:28, David Cantrell wrote: > On Fri, Aug 03, 2007 at 07:23:55AM -0500, Peter da Silva wrote: >> On Aug 3, 2007, at 6:47, David Cantrell wrote: >>> On Thu, Aug 02, 2007 at 02:32:20PM -0500, Peter da Silva wrote: >>>>> Please have the length of strings upfront. >>>> Length encoding isn't self-syncing, >>> Better that than not being able to easily pass arbitrary data, which >>> might include NULLs. >> As parameters to programs that are going to use them in UNIX system >> calls? > > I want only one kind of string thankyouverymuch. Sheesh :) Are we still arguing about program args or am I right in thinking we've moved on to strings in general? And if we're rehashing the old 'which string representation is better' argument do we genuinely believe it to be a burning issue - or are we just doing the dance for nostalgic reasons? I recall an editorial in either Byte or PCW about the time it first seemed likely that C would displace Pascal as the hot language de- jour. The gist was that it'd never be possible to write a decent text editor in C because of the need to constantly scan to the end of the text to find out how big it was. Duh. Yes it's miserable to consider how many cycles have been wasted chasing down nulls over the years and to contemplate all the extant O (N^2) loops that call strlen() for every character processed - but C strings have advantages over Pascal style counted strings in other ways: the fact that characters can be dropped from the front of the string just by advancing the pointer is a huge win in a lot of situations. With Pascal strings a function that trims leading spaces has to return a new string - with C strings you just return a pointer that has advanced past the whitespace. Makes it much easier to write parsers &c. Counted strings make sense when you have plenty of memory and garbage collection; in more primitive languages they're a pain.
From: Peter da Silva Date: 15:14 on 03 Aug 2007 Subject: Re: xargs On 03-Aug-2007, at 08:50, Andy Armstrong wrote: > Are we still arguing about program args or am I right in thinking > we've moved on to strings in general? The hate in question is still xargs. However... > With Pascal strings a function that trims leading spaces has to > return a new string - with C strings you just return a pointer that > has advanced past the whitespace. Makes it much easier to write > parsers &c. Internally, address/length strings have a little more overhead than counted or terminated strings, but are more versatile and efficient than either. The "string pointer" is twice as large, but it avoids redundant copies and scanning. The thing is, though, that this is irrelevant to xargs... because stdcmd parameters are not stored in an internal string format like a command line or argument vector... its a wire protocol. Programs using either counted or terminated strings on the wire can efficiently convert them to address/length strings on read and generate either from address/length strings on output. Internally, start/end strings are another possibility, though it could be argued that this is merely another representation of address/ length strings. Ultimately, though, something like this structure tends to raise its head: struct _buffer { int size; // size of buffer int length; // size of data in buffer #ifdef BUFFERGAP int gap; // start of gap #endif #ifdef RESIZABLE char *data; // start address of buffer #else char data[]; // buffer follows header #endif }; struct _string { struct _buffer *buffer; int offset; int len; };
From: Andy Armstrong Date: 15:54 on 03 Aug 2007 Subject: Re: xargs On 3 Aug 2007, at 15:14, Peter da Silva wrote: > Ultimately, though, something like this structure tends to raise > its head: > > struct _buffer { > int size; // size of buffer > int length; // size of data in buffer > #ifdef BUFFERGAP > int gap; // start of gap > #endif > #ifdef RESIZABLE > char *data; // start address of buffer > #else > char data[]; // buffer follows header > #endif > }; Oooh, gap-in-the-middle buffer - my favourite :) (anyone remember WordWise?) If we're really bothered I *think* you tend to end up with slightly more compact code if you do struct _buffer { int size; int gap_lo; // start of gap int gap_hi; // end of gap } If you want the ungapped variant you can just drop gap_hi and assume gap_hi == size. Then gap_lo == length.
From: Michael G Schwern Date: 01:33 on 06 Aug 2007 Subject: Re: xargs Andy Armstrong wrote: > On 3 Aug 2007, at 14:28, David Cantrell wrote: >> I want only one kind of string thankyouverymuch. > > Sheesh :) > > Are we still arguing about program args or am I right in thinking we've > moved on to strings in general? > > And if we're rehashing the old 'which string representation is better' > argument do we genuinely believe it to be a burning issue - or are we > just doing the dance for nostalgic reasons? It is the burning dance of hate which burns within us like a flame of purest hate that cannot be quenched with all the software in this hateful world. Step higher, the bile is rising.
From: Peter da Silva Date: 14:53 on 03 Aug 2007 Subject: Re: xargs On 03-Aug-2007, at 08:28, David Cantrell wrote: > I want only one kind of string thankyouverymuch. That's precisely my point. If this was RSX-11 or the UCSD P-System, counted strings would make sense. Trying to improve part of a system and in the process making it inconsistent and incompatible with itself leads to the most remarkable sources of hate.
From: David Cantrell Date: 15:33 on 02 Aug 2007 Subject: Re: xargs On Thu, Aug 02, 2007 at 09:24:07AM -0500, Peter da Silva wrote: > The OS is not required to allow you, as a user or administrator, to > shoot yourself in the foot* every time you want to. Especially when the > system would get seriously broken long before your gigabyte command > lines started showing up... just letting users use a few megabytes of > command line routinely would have unfortunate consequences. No it wouldn't, because I can already restrict things like their process sizes and CPU time.
From: Peter da Silva Date: 20:41 on 02 Aug 2007 Subject: Re: xargs On Aug 2, 2007, at 9:33, David Cantrell wrote: > No it wouldn't, because I can already restrict things like their > process > sizes and CPU time. Depending on system restrictions to rein in the results of poor redesign is hateful. Command streams -> fork/exec remains fast and becomes less frequent, pipelines retain a high degree of concurrency, system performance remains good, xargs becomes irrelevant and can be deprecated. Unlimited command lines -> fork/exec can take minutes, becomes a pipeline bottleneck, system performance goes into the toilet, and restricting users leaves xargs in place.
From: Robert G. Werner Date: 01:49 on 03 Aug 2007 Subject: Re: xargs Peter da Silva wrote: [snip] > An unlimited command line is impossible, and hideously inefficient. > Eventually you'll run out of VM building the command line, no matter how > much VM you have. You have to get away from the whole concept of the > command line to resolve that problem. > > If something is impossible, seems like it would be pretty efficient too ("can't be done" doesn't take very long to print to the output buffer). Although, I guess if the code tried and tried and tried and only realized somewhere near the Heat Death of the Universe that the task was impossible, many would call that hideously inefficient. I mean just think of all the other things you could have computed during that time. So, I guess I'll have to grant you your point.
From: Peter da Silva Date: 15:13 on 02 Aug 2007 Subject: Re: xargs On Aug 1, 2007, at 23:51, Michael G Schwern wrote: > I have an "unlimited" scrollback history option on my Terminal and > somehow the > universe has not yet imploded. That's no such thing, and even in an environment where the relative overhead of having no *explicit* limits on scrollback is much much less important than it would be for command lines (fork/exec overhead already being the majority of the cost of many shell scripts as it is) you still use other programs (like less or tail) rather than simply doing "cat multi-megabyte-logfile" and then shift-page-up. Because that scrollback buffer is the wrong tool for that. > Its the 21st century. Dynamically allocated memory ain't exactly > rocket > science. C programs that still think it is, now that's hateful. Don't be silly, this has nothing to do with C. It has to do with NOT gratuitously abusing virtual memory for something that's better served by a stream mechanism. The command *line* itself is the *wrong* tool.
From: Michael Poole Date: 15:25 on 02 Aug 2007 Subject: Re: xargs Peter da Silva writes: > Don't be silly, this has nothing to do with C. > > It has to do with NOT gratuitously abusing virtual memory for > something that's better served by a stream mechanism. > > The command *line* itself is the *wrong* tool. Second-system syndrome ahoy! Michael
Generated at 10:26 on 16 Apr 2008 by mariachi