Smarter find duplicates in Apple photo

I took the liberty of modifying Jonathan Birge’s code to retrieve data from the apple Photo database (which applescript cannot seem to do). This script also uses Phil Harvey’s exif tool to retrieve attributes of every image. The size was retrieved from the photos database such that the script would pick the smaller of the two identical files (as some of mine were downsampled). The created date is taken from the exif data to serve as a unique identifier for the photo. Nothing special, but perhaps a base to use applescript to do other things.

The script is below, and here is the file. Rename it to scpt. Find Photo Duplicates

property libPhotos : "/Users/kilka/Pictures/Photos Library.photoslibrary" --> change it to the path of your package  
property tempLibDB : missing value

-- check the path, error if it not exists  
libPhotos as POSIX file as alias
set libDB to quoted form of (libPhotos & "/database/Library.apdb")

-- make a temporary folder, and copy the database to this new folder  
--tell application "Finder" to set tempFolder to (make new folder) as text
set tempLibDB to quoted form of POSIX path of ("/Users/kilka/Pictures/tempLib.db")
do shell script "cp -f " & libDB & "  " & tempLibDB -- copy the database  

set searchsize to 5 -- true duplicates are usually near to each other

on min(x, y)
	if x < y then
		return x
	else
		return y
	end if
end min

tell application "Photos"
	set duplicatephotos to {}
	set filenames to {}
	set photoCollection to {}
	set dupcount to 0
	set selectedphotos to the selection
	set n to count of selectedphotos
	
	--gets the filename, and other attributes
	log "working with " & (count of selectedphotos) & " items..."
	repeat with k from 1 to count of selectedphotos
		if k mod 100 is 0 then
			log "pulling item " & k
		end if
		set thephoto to item k of selectedphotos
		set {this_id, this_filename, this_date} to {thephoto's id, thephoto's filename as text, thephoto's date}
		
		set r to do shell script "sqlite3  -separator $'\\n' " & tempLibDB & " 'select RKMaster.imagePath, RKMaster.fileIsReference,RKMaster.filesize,RKMaster.width,RKMaster.height from RKMaster, RKVersion  where RKVersion.uuid = \"" & this_id & "\" and RKMaster.modelid = RKVersion.modelid '"
		if r is not "" then
			set {this_filepath, this_filesize, this_width, this_height} to {paragraph 1 of r, paragraph 3 of r, paragraph 4 of r, paragraph 5 of r}
			try
				if paragraph 2 of r is "1" then set this_finalFilename to this_filepath as POSIX file as alias -- 1 equal referenced file, return path of the original file  
				return (libPhotos & "/Previews/" & f) as POSIX file as alias -- return the path of the file in Previews folder  
			end try
			try
				if paragraph 2 of r is not "1" then set this_finalFilename to (libPhotos & "/Masters/" & this_filepath) as POSIX file as alias -- not exists in Previews then return the path of the file in Masters folder  
			end try
		end if
		
		--for some reason, Iphoto doesn't return the the dimensions properly.
		set exifpath to "/usr/local/bin/exiftool -Createdate \"" & (POSIX path of this_finalFilename) & "\"" --could use this to call the exif tool		
		set r to do shell script exifpath
		set this_createdate to text 35 thru (count of characters in r) of r
		set end of photoCollection to {this_id, this_filename, this_filesize, this_createdate}
	end repeat
	
	
	if searchsize is 0 then
		set searchsize to n
	end if
	
	repeat with k from 1 to (count of photoCollection) - 1
		
		set thisname to item k of photoCollection
		repeat with kcompare from k + 1 to my min(n, k + searchsize)
			set compname to item kcompare of photoCollection
			--if the file has the same name, and created date, it's probably identical
			if item 2 of thisname is equal to item 2 of compname and item 4 of thisname is equal to item 4 of compname then
				set size1 to (item 3 of thisname as integer)
				set size2 to (item 3 of compname as integer)
				if size1 > size2 then --if one file is bigger than the other
					set end of duplicatephotos to item kcompare of selectedphotos
				else
					set end of duplicatephotos to item k of selectedphotos
				end if
				set dupcount to dupcount + 1
			end if
		end repeat
	end repeat
	
	if dupcount is greater than 0 then
		if not (exists folder "Duplicates") then
			make new folder named "Duplicates"
		end if
		set duplicatesalbum to make new album named "Duplicates" at folder "Duplicates"
		add duplicatephotos to duplicatesalbum
	end if
	return "Found " & dupcount & " duplicates out of " & (count of selectedphotos) & "."
end tell

tell application "System Events" to delete (alias tempLibDB) -- delete the temporary folder  
Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise. -John Tukey
The plural of anecdote is not data. - John Myles White

Recent Posts

RSS PowerBI blog

  • New Power BI Premium summary and workload metrics available in the admin portal April 22, 2019
    We have rolled out a new high-level summary metrics experience in the Power BI Admin Portal . This new experience replaces the four summary tiles you previously saw : CPU, Memory thrashing, Memory usage and DirectQuery with wide range of metrics measuring the summary of usage in the last 7 days to better portray the health […]
  • Live now, full session catalog for Microsoft Business Applications Summit 2019 April 22, 2019
    Time to get excited – the full session catalog for Microsoft Business Applications Summit is here. Explore every breakout session and workshop coming to the conference, taking place in Atlanta, Georgia June 10 – 11, 2019. Get ready to flex your skills – and build new ones – with all things Power BI and beyond. […]
  • E-Mail Subscriptions for Paginated Reports is Now Available April 19, 2019
    E-mail subscriptions is one of the key features used by millions of SQL Server Reporting Services customers today. Our latest release for “New Feature Friday” is the support of Paginated Reports in e-mail subscriptions in Power BI. Now, for the first time, you can schedule an e-mail subscription with a PDF attachment of your full […]