Bash VS Csh VS Others - Which Is Better for Application Maintenance

bash vs csh vs others - which is better for application maintenance?

These days, just about any non-embedded (or large embedded) operating system has a POSIX:2001 a.k.a. Single Unix v3 compatibility layer. This is native on unix platforms (Linux, Mac OS X, Solaris, *BSD, etc.) and installable on other platforms such as Windows and Android. POSIX specifies a shell language, usually known as POSIX sh. This language is derived from the Bourne shell.

Most unix systems have one of two implementations of POSIX sh: ksh or bash, which have additional useful features compared to POSIX. However some less mainstream systems (especially embedded ones) may have only POSIX-mandated features.

Given your objectives, I see three choices:

  • Restrict yourself to POSIX sh. Pro: you don't have to worry about differing variants, since there's a standard and compliant implementations are readily available. Con: you don't benefit from bash and ksh's extensions.
  • Use the intersection of ksh and bash. This is attractive in appearance, but it does mean you have to use two reference documents instead of just one — and even the features that bash and ksh have in common don't always use the same syntax. Figuring out which one you want to use on a given system is also a pain.
  • Choose one of ksh or bash. Both bash and ksh are available on all unix-like platforms and on Windows. Both have an open source implementation (the only one for bash, ATT ksh93 for ksh) that can be installed on most platforms. I'd go for bash over ksh for two reasons. First, it's the default on Linux, so you'll find more people who're used to it. Second, there are systems that come with an older, less-featured implementation of ksh; even if you can install ksh93, it's another thing you have to think about when deploying.

Forget about csh for scripting, and forget about zsh if you want common default availability.

See also What are the fundamental differences between the mainstream *NIX shells?, particularly the “for scripting” part of my answer.

Note that shell programming involves other utilities beyond the shell. POSIX specifies those other utilities. “Bash plus other POSIX utilities” is a reasonable choice, distinct from “POSIX utilities (including sh)”.

What is the best (portable and maintanable) Shell Scripting language today?

I would suggest plain old sh, which is available everywhere.

Also, it is worth noting that portability involves not only shell but also other commands used in a script such as awk, grep, ps or echo.

Bash or KornShell (ksh)?

Bash.

The various UNIX and Linux implementations have various different source level implementations of ksh, some of which are real ksh, some of which are pdksh implementations and some of which are just symlinks to some other shell that has a "ksh" personality. This can lead to weird differences in execution behavior.

At least with bash you can be sure that it's a single code base, and all you need worry about is what (usually minimum) version of bash is installed. Having done a lot of scripting on pretty much every modern (and not-so-modern) UNIX, programming to bash is more reliably consistent in my experience.

What is the benefit of using $() instead of backticks in shell scripts?

The major one is the ability to nest them, commands within commands, without losing your sanity trying to figure out if some form of escaping will work on the backticks.

An example, though somewhat contrived:

deps=$(find /dir -name $(ls -1tr 201112[0-9][0-9]*.txt | tail -1l) -print)

which will give you a list of all files in the /dir directory tree which have the same name as the earliest dated text file from December 2011 (a).

Another example would be something like getting the name (not the full path) of the parent directory:

pax> cd /home/pax/xyzzy/plugh
pax> parent=$(basename $(dirname $PWD))
pax> echo $parent
xyzzy

(a) Now that specific command may not actually work, I haven't tested the functionality. So, if you vote me down for it, you've lost sight of the intent :-) It's meant just as an illustration as to how you can nest, not as a bug-free production-ready snippet.

shell-script headers (#!/bin/sh vs #!/bin/csh)

This is known as a Shebang:

http://en.wikipedia.org/wiki/Shebang_(Unix)

#!interpreter [optional-arg]

A shebang is only relevant when a script has the execute permission (e.g. chmod u+x script.sh).

When a shell executes the script it will use the specified interpreter.

Example:

#!/bin/bash
# file: foo.sh
echo 1

$ chmod u+x foo.sh
$ ./foo.sh
1

Python vs Bash - In which kind of tasks each one outruns the other performance-wise?

Typical mainframe flow...

Input Disk/Tape/User (runtime) --> Job Control Language (JCL) --> Output Disk/Tape/Screen/Printer
| ^
v |
`--> COBOL Program --------'

Typical Linux flow...

Input Disk/SSD/User (runtime) --> sh/bash/ksh/zsh/... ----------> Output Disk/SSD/Screen/Printer
| ^
v |
`--> Python script --------'
| ^
v |
`--> awk script -----------'
| ^
v |
`--> sed script -----------'
| ^
v |
`--> C/C++ program --------'
| ^
v |
`--- Java program ---------'
| ^
v |
: :

Shells are the glue of Linux

Linux shells like sh/ksh/bash/... provide input/output/flow-control designation facilities much like the old mainframe Job Control Language... but on steroids! They are Turing complete languages in their own right while being optimized to efficiently pass data and control to and from other executing processes written in any language the O/S supports.

Most Linux applications, regardless what language the bulk of the program is written in, depend on shell scripts and Bash has become the most common. Clicking an icon on the desktop usually runs a short Bash script. That script, either directly or indirectly, knows where all the files needed are and sets variables and command line parameters, finally calling the program. That's a shell's simplest use.

Linux as we know it however would hardly be Linux without the thousands of shell scripts that startup the system, respond to events, control execution priorities and compile, configure and run programs. Many of these are quite large and complex.

Shells provide an infrastructure that lets us use pre-built components that are linked together at run time rather than compile time. Those components are free-standing programs in their own right that can be used alone or in other combinations without recompiling. The syntax for calling them is indistinguishable from that of a Bash builtin command, and there are in fact numerous builtin commands for which there is also a stand-alone executable on the system, often having additional options.

There is no language-wide difference between Python and Bash in performance. It entirely depends on how each is coded and which external tools are called.

Any of the well known tools like awk, sed, grep, bc, dc, tr, etc. will leave doing those operations in either language in the dust. Bash then is preferred for anything without a graphical user interface since it is easier and more efficient to call and pass data back from a tool like those with Bash than Python.

Performance

It depends on which programs the Bash shell script calls and their suitability for the subtask they are given whether the overall throughput and/or responsiveness will be better or worse than the equivalent Python. To complicate matters Python, like most languages, can also call other executables, though it is more cumbersome and thus not as often used.

User Interface

One area where Python is the clear winner is user interface. That makes it an excellent language for building local or client-server applications as it natively supports GTK graphics and is far more intuitive than Bash.

Bash only understands text. Other tools must be called for a GUI and data passed back from them. A Python script is one option. Faster but less flexible options are the binaries like YAD, Zenity, and GTKDialog.

While shells like Bash work well with GUIs like Yad, GtkDialog (embedded XML-like interface to GTK+ functions), dialog, and xmessage, Python is much more capable and so better for complex GUI windows.

Summary

Building with shell scripts is like assembling a computer with off-the-shelf components the way desktop PCs are.

Building with Python, C++ or most any other language is more like building a computer by soldering the chips (libraries) and other electronic parts together the way smartphones are.

The best results are usually obtained by using a combination of languages where each can do what they do best. One developer calls this "polyglot programming".

Why are scripting languages (e.g. Perl, Python, and Ruby) not suitable as shell languages?

There are a couple of differences that I can think of; just thoughtstreaming here, in no particular order:

  1. Python & Co. are designed to be good at scripting. Bash & Co. are designed to be only good at scripting, with absolutely no compromise. IOW: Python is designed to be good both at scripting and non-scripting, Bash cares only about scripting.

  2. Bash & Co. are untyped, Python & Co. are strongly typed, which means that the number 123, the string 123 and the file 123 are quite different. They are, however, not statically typed, which means they need to have different literals for those, in order to keep them apart.

    Example:

                    | Ruby             | Bash    
    -----------------------------------------
    number | 123 | 123
    string | '123' | 123
    regexp | /123/ | 123
    file | File.open('123') | 123
    file descriptor | IO.open('123') | 123
    URI | URI.parse('123') | 123
    command | `123` | 123
  3. Python & Co. are designed to scale up to 10000, 100000, maybe even 1000000 line programs, Bash & Co. are designed to scale down to 10 character programs.

  4. In Bash & Co., files, directories, file descriptors, processes are all first-class objects, in Python, only Python objects are first-class, if you want to manipulate files, directories etc., you have to wrap them in a Python object first.

  5. Shell programming is basically dataflow programming. Nobody realizes that, not even the people who write shells, but it turns out that shells are quite good at that, and general-purpose languages not so much. In the general-purpose programming world, dataflow seems to be mostly viewed as a concurrency model, not so much as a programming paradigm.

I have the feeling that trying to address these points by bolting features or DSLs onto a general-purpose programming language doesn't work. At least, I have yet to see a convincing implementation of it. There is RuSH (Ruby shell), which tries to implement a shell in Ruby, there is rush, which is an internal DSL for shell programming in Ruby, there is Hotwire, which is a Python shell, but IMO none of those come even close to competing with Bash, Zsh, fish and friends.

Actually, IMHO, the best current shell is Microsoft PowerShell, which is very surprising considering that for several decades now, Microsoft has continually had the worst shells evar. I mean, COMMAND.COM? Really? (Unfortunately, they still have a crappy terminal. It's still the "command prompt" that has been around since, what? Windows 3.0?)

PowerShell was basically created by ignoring everything Microsoft has ever done (COMMAND.COM, CMD.EXE, VBScript, JScript) and instead starting from the Unix shell, then removing all backwards-compatibility cruft (like backticks for command substitution) and massaging it a bit to make it more Windows-friendly (like using the now unused backtick as an escape character instead of the backslash which is the path component separator character in Windows). After that, is when the magic happens.

They address problem 1 and 3 from above, by basically making the opposite choice compared to Python. Python cares about large programs first, scripting second. Bash cares only about scripting. PowerShell cares about scripting first, large programs second. A defining moment for me was watching a video of an interview with Jeffrey Snover (PowerShell's lead designer), when the interviewer asked him how big of a program one could write with PowerShell and Snover answered without missing a beat: "80 characters." At that moment I realized that this is finally a guy at Microsoft who "gets" shell programming (probably related to the fact that PowerShell was neither developed by Microsoft's programming language group (i.e. lambda-calculus math nerds) nor the OS group (kernel nerds) but rather the server group (i.e. sysadmins who actually use shells)), and that I should probably take a serious look at PowerShell.

Number 2 is solved by having arguments be statically typed. So, you can write just 123 and PowerShell knows whether it is a string or a number or a file, because the cmdlet (which is what shell commands are called in PowerShell) declares the types of its arguments to the shell. This has pretty deep ramifications: unlike Unix, where each command is responsible for parsing its own arguments (the shell basically passes the arguments as an array of strings), argument parsing in PowerShell is done by the shell. The cmdlets specify all their options and flags and arguments, as well as their types and names and documentation(!) to the shell, which then can perform argument parsing, tab completion, IntelliSense, inline documentation popups etc. in one centralized place. (This is not revolutionary, and the PowerShell designers acknowledge shells like the DIGITAL Command Language (DCL) and the IBM OS/400 Command Language (CL) as prior art. For anyone who has ever used an AS/400, this should sound familiar. In OS/400, you can write a shell command and if you don't know the syntax of certain arguments, you can simply leave them out and hit F4, which will bring a menu (similar to an HTML form) with labelled fields, dropdown, help texts etc. This is only possible because the OS knows about all the possible arguments and their types.) In the Unix shell, this information is often duplicated three times: in the argument parsing code in the command itself, in the bash-completion script for tab-completion and in the manpage.

Number 4 is solved by the fact that PowerShell operates on strongly typed objects, which includes stuff like files, processes, folders and so on.

Number 5 is particularly interesting, because PowerShell is the only shell I know of, where the people who wrote it were actually aware of the fact that shells are essentially dataflow engines and deliberately implemented it as a dataflow engine.

Another nice thing about PowerShell are the naming conventions: all cmdlets are named Action-Object and moreover, there are also standardized names for specific actions and specific objects. (Again, this should sound familar to OS/400 users.) For example, everything which is related to receiving some information is called Get-Foo. And everything operating on (sub-)objects is called Bar-ChildItem. So, the equivalent to ls is Get-ChildItem (although PowerShell also provides builtin aliases ls and dir – in fact, whenever it makes sense, they provide both Unix and CMD.EXE aliases as well as abbreviations (gci in this case)).

But the killer feature IMO is the strongly typed object pipelines. While PowerShell is derived from the Unix shell, there is one very important distinction: in Unix, all communication (both via pipes and redirections as well as via command arguments) is done with untyped, unstructured strings. In PowerShell, it's all strongly typed, structured objects. This is so incredibly powerful that I seriously wonder why noone else has thought of it. (Well, they have, but they never became popular.) In my shell scripts, I estimate that up to one third of the commands is only there to act as an adapter between two other commands that don't agree on a common textual format. Many of those adapters go away in PowerShell, because the cmdlets exchange structured objects instead of unstructured text. And if you look inside the commands, then they pretty much consist of three stages: parse the textual input into an internal object representation, manipulate the objects, convert them back into text. Again, the first and third stage basically go away, because the data already comes in as objects.

However, the designers have taken great care to preserve the dynamicity and flexibility of shell scripting through what they call an Adaptive Type System.

Anyway, I don't want to turn this into a PowerShell commercial. There are plenty of things that are not so great about PowerShell, although most of those have to do either with Windows or with the specific implementation, and not so much with the concepts. (E.g. the fact that it is implemented in .NET means that the very first time you start up the shell can take up to several seconds if the .NET framework is not already in the filesystem cache due to some other application that needs it. Considering that you often use the shell for well under a second, that is completely unacceptable.)

The most important point I want to make is that if you want to look at existing work in scripting languages and shells, you shouldn't stop at Unix and the Ruby/Python/Perl/PHP family. For example, Tcl was already mentioned. Rexx would be another scripting language. Emacs Lisp would be yet another. And in the shell realm there are some of the already mentioned mainframe/midrange shells such as the OS/400 command line and DCL. Also, Plan9's rc.

How to run .sh on Windows Command Prompt?

The error message indicates that you have not installed bash, or it is not in your PATH.

The top Google hit is http://win-bash.sourceforge.net/ but you also need to understand that most Bash scripts expect a Unix-like environment; so just installing Bash is probably unlikely to allow you to run a script you found on the net, unless it was specifically designed for this particular usage scenario. The usual solution to that is https://www.cygwin.com/ but there are many possible alternatives, depending on what exactly it is that you want to accomplish.

If Windows is not central to your usage scenario, installing a free OS (perhaps virtualized) might be the simplest way forward.

The second error message is due to the fact that Windows nominally accepts forward slash as a directory separator, but in this context, it is being interpreted as a switch separator. In other words, Windows parses your command line as app /build /build.sh (or, to paraphrase with Unix option conventions, app --build --build.sh). You could try app\build\build.sh but it is unlikely to work, because of the circumstances outlined above.



Related Topics



Leave a reply



Submit