Tabs vs. Spaces "an eternal HOLY war!"

I stumbled across a great article on programmatic style and tabs v spaces while searching for some emacs customization stuff. Jamie Zawinski took the time to explain the issues and highlight the differences in editors (vi and emacs included.) This is really a great article and while it is a simple priciple the debate does get heated and many many development shops dont enforce a standard which leads to serious problems (such as lost productivity and hard to maintain code.)

Article reprinted below with zero permission and article linked. Check it out, and for the record FOUR SPACES!!!!

Tabs versus Spaces:An Eternal Holy War.

© 2000 Jamie Zawinski
<jwz@jwz.org>


The last time the tabs-versus-spaces argument flared up in my
presence, I wrote this. Gasoline for the fire? Maybe.

I think a big part of these interminable arguments about tabs
is based on people using the same words to mean different things.

In the following, I'm trying to avoid espousing my personal
religion here, I just thought it would be good to try and explain the
various sects.

Anyway. People care (vehemently) about a few different things:

  1. When reading code, and when they're done writing new code, they
    care about how many screen columns by which the code tends to
    indent when a new scope (or sexpr, or whatever) opens.

  2. When there is some random file on disk that contains
    ASCII byte #9, the TAB character, they care about how
    their software reacts to that byte, display-wise.

  3. When writing code, they care about what happens when they press
    the TAB key on their keyboard.

    Note that I make a distinction between the TAB character (which is a
    byte which can occur in a disk file) and the TAB key (which is that
    plastic bump on your keyboard, which when hit causes your computer to do
    something.)

    As to point #1:


      A lot of people like that distance to be two columns, and a lot of
      people like that distance to be four columns, and a smaller number
      of people like to have somewhat more complicated and context-
      dependent rules than that.

    As to point #2: there is a lot of history here.


      On defaultly-configured Unix systems, and on ancient dumb terminals
      and teletypes, the tradition has been for the TAB character to mean
      ``move to the right until the current column is a multiple of 8.''
      (As it happens, this is how Navigator interprets TAB as well.)
      This is also the default in the two most popular Unix editors,
      emacs and vi.

    In many PC and Mac editors, the default interpretation is the same,
    except that multiples of 4 are used instead of multiples of 8.

    However, some people configure vi to make TAB be mod-2 instead of
    mod-4 (see below.)

    With these three interpretations, the ASCII TAB character is
    essentially being used as a compression mechanism, to make sequences
    of SPACE-characters take up less room in the file.

    Both Emacs and vi are customizable about the number of columns used.
    Unix terminals and shell-windows are usually customizable away from
    their default of 8, but sometimes not, and often it's difficult.

    A third interpretation is for the ASCII TAB character to mean
    ``indent to the next tab stop,'' where the tab stops are set
    arbitrarily: they might not necessarily be equally distanced from
    each other. Most word processors can do this; Emacs can do this.
    I don't think vi can do this, but I'm not sure.

    On the Mac, BBedit defaults to 4-column tabs, but the tabstops
    can be set anywhere. It also has ``entab'' and ``detab'' commands,
    for converting from spaces to tabs and vice versa (just like
    Emacs's ``M-x tabify'' and
    ``M-x untabify''.)

    As to point #3: this is an editor user interface issue.

    1. Some editors (like vi) treat TAB as being exactly like X, Y,
      and Z: when you type it, it gets inserted into the file, end of
      story. (It then gets displayed on the screen according to point
      #2.)

      With editors like this, the interpretation of point #2 is what
      really matters: since TAB is just a self-inserting character, the
      way that one changes the semantics of hitting the TAB key on the
      keyboard is by changing the semantics of the display of the TAB
      character.

    2. Some editors (like Emacs) treat TAB as being a
      command which
      means ``indent this line.'' And by indent, it means, ``cause the
      first non-whitespace character on this line to occur at column
      N.''

      To editors like this, it doesn't matter much what kind of
      interpretation is assigned to point #2: the TAB character in a
      file could be interpreted as being mod-2 columns, mod-4 columns,
      or mod-8 columns. The only thing that matters is that the editor
      realize which interpretation of the TAB character is being used,
      so that it knows how to properly put the file characters on the
      screen. The decisions of how many characters by which an
      expression should be indented (point #1) and of how those columns
      should be encoded in the file using the TAB character (point #2)
      are completely orthogonal.

      So, the real religious war here is point #1.

      Points #2 and #3 are technical issues about interoperability.

      My opinion is that the best way to solve the technical issues is to
      mandate that the ASCII #9 TAB character never appear in disk files:
      program your editor to expand TABs to an appropriate number of spaces
      before writing the lines to disk. That simplifies matters greatly,
      by separating the technical issues of #2 and #3 from the religious
      issue of #1.

      As a data point, my personal setup is the same as the default Emacs
      configuration: the TAB character is interpreted as mod-8 indentation;
      but my code is indented by mod-2.

      I prefer this setup, but I don't care deeply about it.

      I just care that two people editing the same file use the same
      interpretations, and that it's possible to look at a file and know what
      interpretation of the TAB character was used, because otherwise it's
      just impossible to read.

      In Emacs, to set the mod-N indentation used when you hit the TAB key,
      do this:

        (setq c-basic-indent 2)
        or (setq c-basic-indent 4)

        To cause the TAB file-character to be interpreted as mod-N
        indentation, do this:

          (setq tab-width 4)
          or (setq tab-width 8)

          To cause TAB characters to not be used in the file for compression, and
          for only spaces to be used, do this:


            (setq indent-tabs-mode nil)

          You can also do this stuff on a per-file basis. The very first line of
          a file can contain a comment which contains variable settings. For the
          XP code in the client, you'll see many files that begin with


            /* -*- Mode: C; tab-width: 4 -*- */

          The stuff between -*-, on the very first line of the file, is
          interpreted as a list of file-local variable/value pairs. A hairier
          example:


            /* -*- mode: java; c-basic-indent: 2; indent-tabs-mode: nil -*- */

          If you have different groups of people with different customs, the
          presence of these kinds of explicit settings are really handy.

          I believe vi has a mechanism for doing this sort of thing too, but I
          don't know how it works.

          To keep myself honest (that is, to ensure that no tabs ever end up
          in source files that I am editing) I also do this in my .emacs file:

            (defun java-mode-untabify ()
              (save-excursion
                (goto-char (point-min))
                (while (re-search-forward "[ \t]+$" nil t)
                  (delete-region (match-beginning 0) (match-end 0)))
                (goto-char (point-min))
                (if (search-forward "\t" nil t)
                    (untabify (1- (point)) (point-max))))
              nil)
          
            (add-hook 'java-mode-hook 
                      '(lambda ()
                         (make-local-variable 'write-contents-hooks)
                         (add-hook 'write-contents-hooks 'java-mode-untabify)))
          

          That ensures that, even if I happened to insert a literal tab in
          the file by hand (or if someone else did when editing this file
          earlier), those tabs get expanded to spaces when I save. This
          assumes that you never use tabs in places where they are actually
          significant, like in string or character constants, but I never do
          that: when it matters that it is a tab, I always use '\t'
          instead.

          Here are some details on vi, courtesy of Woody Thrower:

          Standard vi interprets the tab key literally, but there are popular
          vi-derived alternatives that are smarter, like
          vim. To get vim to interpret tab
          as an ``indent'' command instead of an insert-a-tab command, do this:


            set softtabstop=2

          To set the mod-N indentation used when you hit the tab key
          in vim (what Emacs calls c-basic-indent), do this:


            set shiftwidth=2

          To cause the TAB file-character to be displayed as
          mod-N in vi and vim (what Emacs calls tab-width),
          do this:


            set tabstop=4

          To cause TAB characters to not be used in the file for compression,
          and for only spaces to be used (what emacs calls
          indent-tabs-mode), do this:


            set expandtab

          In vi (and vim), you can do this stuff on a per-file basis using
          ``modelines,'' magic comments at the top of the file, similarly to
          how it works in Emacs:


            /* ex: set tabstop=8 expandtab: */

          So go forth and untabify!
            Tabs vs. Spaces "an eternal HOLY war!"

          Comments

          Re: Tabs vs. Spaces

          Hi atrox,

          try rereading that article... for the record, the original author recommends (and uses) 2 spaces, not 4.

          > As a data point, my personal setup is the
          > same as the default Emacs configuration:
          > the TAB character is interpreted as mod-8
          > indentation; but my code is indented by mod-2.

          regards
          Stuart

          Re: Tabs vs. Spaces

          thanks Stuart, I can read. i disagree with the author. that was a toungue in cheek comment at the end of the post that reflects my PERSONAL preference. 4 spaces, jackass.

          RE: Tabs vs. Spaces "an eternal HOLY war!"

          Working in powers of 2, four spaces works out to be the generally optimal columnar indenation recognized by the human brain (this of course is somewhat dependent on the monitor resolution/size). Less than four, and the brain has some difficulty noting the horizontal shift. More than four tends to cause extra left-right scanning of the eye than is needed for comprehension, thus decreasing reading rates.

          RE: Tabs vs. Spaces "an eternal HOLY war!"

          BTW, I just made all that up.

          RE: Tabs vs. Spaces "an eternal HOLY war!"

          For what it's worth, I also like four spaces better than two. In my experience, that's the most common preference, which is another point in its favor.

          Comment viewing options

          Select your preferred way to display the comments and click "Save settings" to activate your changes.