my solution:
first, make a list of the hashes of the files in each directory
find . -type f -print0 | xargs -0 shasum -a 256 > ~/tmp/files.lstedit the file lists if necessary.
take a set intersection of the hashes
cut -f 1 -d ' ' list.txt | sort > hash.txt comm -12 hash1.txt hash2.txt > clean.txtthe result is a list of just hashes in common between the lists.
list all the files with the selected hashes, using a perl script
#!/usr/bin/perl -w use strict; if(@ARGV != 2) { die "use: select hashes list\n"; } my %hashes; local *IN; open(IN, "<", $ARGV[0]) or die "open: $!"; while(<IN>) { chomp; $hashes{$_} = 1; } close(IN); open(IN, "<", $ARGV[1]) or die "open: $!"; while(<IN>) { my ($hash) = m,^(\S+), or next; $hashes{$hash} or next; print $_; } close(IN);
invoked like
perl select.pl clean.txt list1.txt | cut -c 67- > remove.txtwhich results in a list of file names in the first list for files that also exist in the second list.
finally, remove the files
while read x ; do rm -- "$x"; done < ~/tmp/remove.txt
No comments:
Post a Comment