Iterating through directories in bash
bash and the other shells make scripting the file system a joy. The speed and simplicity of applying command line tools to groups of files is immensely powerful. It's unfortunate then, that the most straightforward approach to iterating subdirectories and files fails miserably when it comes to filenames with spaces.
#!/bin/bash
src="/my/path"
for dir in `ls "$src/"`
do
if [ -d "$src/$dir" ]; then
#uh oh - $dir will never match paths with spaces
fi
done
It seems that many people, when faced with this barrier, ultimately turn to (slightly) more complicated solutions like Perl, Python or find. It occurs to me that this is a pity, since this simple addition puts you straight back on the path:
#!/bin/bash
src="/my/path"
#enable for loops over items with spaces in their name
IFS=$'\n'
for dir in `ls "$src/"`
do
if [ -d "$src/$dir" ]; then
#yay, we get matches!
fi
done
That (IFS=$'\n') is all there is to it! Revel in the simplicity and power of the shell once again!
As a matter of illustration, the script I ended up coding looks like this. It resets the last modified dates on a copy of a set of files, which are grouped into subfolders.
#!/bin/bash
src='/path/to/original/documents'
dst='/path/to/copy'
#enable for loops over items with spaces in their name
IFS=$'\n'
for dir in `ls "$src/"`
do
if [ -d "$dst/$dir" ]; then
for f in `ls "$src/$dir"`
do
if [ -f "$dst/$dir/$f" ]; then
touch -m -r "$src/$dir/$f" "$dst/$dir/$f"
fi
done
fi
done
echo
echo "Done."
echo
Now take a look what I came up with using Windows. The available scripting environment (JScript? VBScript? WHS?) available in Windows XP is just appalling.
After several frustrating hours wading through the philosophical but vacuous documentation on Microsoft's site I finally came up with this.
option explicit
if WScript.Arguments.Count <> 2 then
WScript.echo "Supply two arguments, the source folder then the destination folder"
WScript.Quit
end if
'folders
dim fsSrc, fsDst, rootSrc, rootDst, dirsSrc, dirsDst, dirSrc, dirDst
'files
dim filesSrc, filesDst, fileSrc, fileDst
'create file system objects
set fsSrc = CreateObject("Scripting.FileSystemObject")
set fsDst = CreateObject("Scripting.FileSystemObject")
set rootSrc = fsSrc.GetFolder(WScript.Arguments(0))
set rootDst = fsDst.GetFolder(WScript.Arguments(1))
set dirsSrc = rootSrc.SubFolders
set dirsDst = rootDst.SubFolders
for each dirSrc in dirsSrc
set dirDst = dirsDst.Item(dirSrc.name)
set filesSrc = dirSrc.Files
set filesDst = dirDst.Files
for each fileSrc in filesSrc
set fileDst = filesDst.Item(fileSrc.name)
fileDst.DateLastModified = fileSrc.DateLastModified
next
next
WScript.echo
WScript.echo "Done."
WScript.echo
And guess what! It still doesn't work! Turns out DateLastModified is a read-only property. It seems the only way forward is to wrap a port of the Unix touch program in a VBScript and attempt to call that. Who knows what dramas that will cause with date formats and whatnot.
Sigh.
Comments
Thanks a lot for this post. It helps me a lot
Posted by: Pavlov Dmitry | February 24, 2010 1:55 AM
Three suggestions. First, since IFS is a global variable used in many contexts, it's best (IMO) to bind IFS to the loop where you want special behavior:
for IFS=$'\n' dir in `ls "$src/"`
do
#...
done
Used this way, IFS doesn't change after the loop. Second, if your filename has a newline in it, this will break -- just as it had broken earlier with spaces. Consider using a null IFS:
for IFS= dir in `ls "$src/"`
do
#...
done
Finally, using find would eliminate the two-deep directory structure required by the hard-coded for-loops (at the cost of process overhead, of course) and give you more concise flexibility in the file that were updated:
find src -type f -exec sh -c 'exec touch -mr "$@" "dst${@##src}"' X '{}' \;
Example:
$ tree
.
|-- dst
| `-- dir
| |-- bar
| `-- foo
`-- src
`-- dir
|-- bar
`-- foo
4 directories, 4 files
$ find src -type f -exec sh -c 'exec echo touch -mr "$@" "dst${@##src}"' X '{}' \;
touch -mr src/dir/bar dst/dir/bar
touch -mr src/dir/foo dst/dir/foo
Posted by: bishop | April 22, 2011 3:40 AM
Whoops, I'm so used to doing:
while IFS=: read ...
that I forgot for doesn't take commands. So, to revise my previous first point, it's best to reset IFS after you're done with it so that other parts of your script work as expected:
oIFS=$IFS
IFS=$'\n'
for dir in `ls "$src/"`
do
#...
done
IFS=$oIFS
But I find that distasteful. So, an equivalent representation could be something like this:
find src -type d -print0 | while IFS= read -rd $'\0' dir
do
# ...
done
Which is bulletproof on all kinds of weird files... except those with null in their names.
Posted by: bishop | April 22, 2011 4:12 AM