« Movable Type with multiple MySQL installations | Main | Exemplifying Lotus Notes »

Iterating through directories in bash

bash and the other shells make scripting the file system a joy. The speed and simplicity of applying command line tools to groups of files is immensely powerful. It's unfortunate then, that the most straightforward approach to iterating subdirectories and files fails miserably when it comes to filenames with spaces.


#!/bin/bash
 
src="/my/path"
    
for dir in `ls "$src/"`
do
  if [ -d "$src/$dir" ]; then
    #uh oh - $dir will never match paths with spaces
  fi
done

It seems that many people, when faced with this barrier, ultimately turn to (slightly) more complicated solutions like Perl, Python or find. It occurs to me that this is a pity, since this simple addition puts you straight back on the path:


#!/bin/bash
 
src="/my/path"
 
#enable for loops over items with spaces in their name
IFS=$'\n'
 
for dir in `ls "$src/"`
do
  if [ -d "$src/$dir" ]; then
    #yay, we get matches!
  fi
done

That (IFS=$'\n') is all there is to it! Revel in the simplicity and power of the shell once again!

As a matter of illustration, the script I ended up coding looks like this. It resets the last modified dates on a copy of a set of files, which are grouped into subfolders.


#!/bin/bash
 
src='/path/to/original/documents'
dst='/path/to/copy'
 
#enable for loops over items with spaces in their name
IFS=$'\n'
 
for dir in `ls "$src/"`
do
  if [ -d "$dst/$dir" ]; then
    for f in `ls "$src/$dir"`
    do
      if [ -f "$dst/$dir/$f" ]; then
        touch -m -r "$src/$dir/$f" "$dst/$dir/$f"
      fi
    done
  fi
done
 
echo
echo "Done."
echo

Now take a look what I came up with using Windows. The available scripting environment (JScript? VBScript? WHS?) available in Windows XP is just appalling.

After several frustrating hours wading through the philosophical but vacuous documentation on Microsoft's site I finally came up with this.


option explicit
 
if WScript.Arguments.Count <> 2 then
  WScript.echo "Supply two arguments, the source folder then the destination folder"
  WScript.Quit
end if
 
'folders
dim fsSrc, fsDst, rootSrc, rootDst, dirsSrc, dirsDst, dirSrc, dirDst
'files
dim filesSrc, filesDst, fileSrc, fileDst
 
'create file system objects
set fsSrc = CreateObject("Scripting.FileSystemObject")
set fsDst = CreateObject("Scripting.FileSystemObject")
 
set rootSrc = fsSrc.GetFolder(WScript.Arguments(0))
set rootDst = fsDst.GetFolder(WScript.Arguments(1))
 
set dirsSrc = rootSrc.SubFolders
set dirsDst = rootDst.SubFolders
 
for each dirSrc in dirsSrc
  set dirDst = dirsDst.Item(dirSrc.name)
 	
  set filesSrc = dirSrc.Files
  set filesDst = dirDst.Files
  for each fileSrc in filesSrc
    set fileDst = filesDst.Item(fileSrc.name)
 		
    fileDst.DateLastModified = fileSrc.DateLastModified
  next
 	
next
 
WScript.echo
WScript.echo "Done."
WScript.echo

And guess what! It still doesn't work! Turns out DateLastModified is a read-only property. It seems the only way forward is to wrap a port of the Unix touch program in a VBScript and attempt to call that. Who knows what dramas that will cause with date formats and whatnot.

Sigh.

TrackBack

TrackBack URL for this entry:
http://heath.hrsoftworks.net/cgi-bin/mt-tracker.cgi/171

Comments

Thanks a lot for this post. It helps me a lot

Three suggestions. First, since IFS is a global variable used in many contexts, it's best (IMO) to bind IFS to the loop where you want special behavior:

for IFS=$'\n' dir in `ls "$src/"`
do
#...
done

Used this way, IFS doesn't change after the loop. Second, if your filename has a newline in it, this will break -- just as it had broken earlier with spaces. Consider using a null IFS:

for IFS= dir in `ls "$src/"`
do
#...
done

Finally, using find would eliminate the two-deep directory structure required by the hard-coded for-loops (at the cost of process overhead, of course) and give you more concise flexibility in the file that were updated:
find src -type f -exec sh -c 'exec touch -mr "$@" "dst${@##src}"' X '{}' \;

Example:
$ tree
.
|-- dst
| `-- dir
| |-- bar
| `-- foo
`-- src
`-- dir
|-- bar
`-- foo

4 directories, 4 files
$ find src -type f -exec sh -c 'exec echo touch -mr "$@" "dst${@##src}"' X '{}' \;
touch -mr src/dir/bar dst/dir/bar
touch -mr src/dir/foo dst/dir/foo

Whoops, I'm so used to doing:
while IFS=: read ...

that I forgot for doesn't take commands. So, to revise my previous first point, it's best to reset IFS after you're done with it so that other parts of your script work as expected:
oIFS=$IFS
IFS=$'\n'
for dir in `ls "$src/"`
do
#...
done
IFS=$oIFS

But I find that distasteful. So, an equivalent representation could be something like this:
find src -type d -print0 | while IFS= read -rd $'\0' dir
do
# ...
done

Which is bulletproof on all kinds of weird files... except those with null in their names.

Post a comment