{"id":74,"date":"2015-12-20T16:58:31","date_gmt":"2015-12-20T16:58:31","guid":{"rendered":"http:\/\/magebrew.com\/blog\/?p=74"},"modified":"2015-12-20T17:02:05","modified_gmt":"2015-12-20T17:02:05","slug":"delete-duplicated-images-in-magento","status":"publish","type":"post","link":"https:\/\/magebrew.com\/blog\/delete-duplicated-images-in-magento\/","title":{"rendered":"Delete duplicate product images in Magento"},"content":{"rendered":"<p>In this article I would like to share a script for deleting duplicate images in Magento.<\/p>\n<p><!--more--><\/p>\n<p>So you have used product import in Magento and now you have multiple identical images assigned to product. Images where imported using different name, so solution to this problem is more than\u00a0single database query. We will compare md5 hash of files to detect duplicates.\u00a0This is how I&#8217;m going to fix it:<\/p>\n<ul>\n<li>grab products that\u00a0have more then one image assigned<\/li>\n<li>remember which image is currently assigned as Base Image<\/li>\n<li>loop through images comparing md5 hash<\/li>\n<li>remove files from a disk and records from database<\/li>\n<\/ul>\n<p>And here is a script itself:<\/p>\n<pre class=\"lang:php decode:true \">$installer = new Mage_Catalog_Model_Resource_Setup('core_setup');\r\n        try {\r\n            \/\/get products that have more then one image assigned\r\n            $mediaSubSelect = $installer-&gt;getConnection()-&gt;select()\r\n                -&gt;from(array('gallery' =&gt; $installer-&gt;getTable('catalog_product_entity_media_gallery')), array('id' =&gt; 'entity_id'))\r\n                -&gt;group('entity_id')\r\n                -&gt;having('count(entity_id) &gt;1');\r\n\r\n            \/\/get id_to_all_assigned_images relation\r\n            $mediaSelect = $installer-&gt;getConnection()-&gt;select()\r\n                -&gt;from(array('gallery' =&gt; $installer-&gt;getTable('catalog_product_entity_media_gallery')), array('id' =&gt; 'entity_id', 'images' =&gt; 'GROUP_CONCAT(value)'))\r\n                -&gt;join(\r\n                    array('sub_media' =&gt; $mediaSubSelect),\r\n                    'sub_media.id = gallery.entity_id',\r\n                    array()\r\n                )\r\n                -&gt;group('id');\r\n\r\n            \/\/get id_to_base_image relation\r\n            $baseImageSelect = $installer-&gt;getConnection()-&gt;select()\r\n                -&gt;from(array('product' =&gt; $installer-&gt;getTable('catalog\/product')), array('id' =&gt; 'product.entity_id'))\r\n                -&gt;join(\r\n                    array('varchar_table' =&gt; $installer-&gt;getTable('catalog_product_entity_varchar')),\r\n                    'varchar_table.entity_id = product.entity_id',\r\n                    array('base_image' =&gt; 'value')\r\n                )\r\n                -&gt;join(\r\n                    array('eav_table' =&gt; $installer-&gt;getTable('eav\/attribute')),\r\n                    'varchar_table.attribute_id = eav_table.attribute_id',\r\n                    array()\r\n                )\r\n                -&gt;where('eav_table.attribute_code = \"image\" AND eav_table.entity_type_id = 4 AND varchar_table.value &lt;&gt; \"no_selection\" AND varchar_table.value &lt;&gt; \"\" AND varchar_table.store_id = 0');\r\n            $idToBase = $installer-&gt;getConnection()-&gt;fetchPairs($baseImageSelect);\r\n            $baseToId = array_flip($idToBase);\r\n            $imagesToDelete = array();\r\n            $csvFileHandle = fopen(Mage::getBaseDir('var') . DS . 'duplicated_images.csv', 'w');\r\n            $query = $installer-&gt;getConnection()-&gt;query($mediaSelect);\r\n            while ($row = $query-&gt;fetch()) {\r\n                $md5 = array();\r\n                foreach (explode(',', $row['images']) as $image) {\r\n                    $filepath = Mage::getBaseDir('media') . '\/catalog\/product' . $image;\r\n                    if (file_exists($filepath)) {\r\n                        $md5Hash = md5_file($filepath);\r\n                        if (!isset($md5[$md5Hash])) {\r\n                            $md5[$md5Hash] = $image;\r\n                        } else {\r\n                            if (!isset($baseToId[$image])) {\r\n                                $imagesToDelete[] = $image;\r\n                                fputcsv($csvFileHandle, array($row['id'], $image));\r\n                            } else {\r\n                                $imagesToDelete[] = $md5[$md5Hash];\r\n                                fputcsv($csvFileHandle, array($row['id'], $md5[$md5Hash]));\r\n                            }\r\n                        }\r\n                    }\r\n                }\r\n            }\r\n            fclose($csvFileHandle);\r\n            $where = array(\r\n                'value in(?)' =&gt; $imagesToDelete\r\n            );\r\n            echo 'executing of mass detele query ' . PHP_EOL;\r\n            $installer-&gt;getConnection()-&gt;delete($installer-&gt;getTable('catalog_product_entity_media_gallery'), $where);\r\n            foreach ($imagesToDelete as $image) {\r\n                Mage::log($image, null, 'image_duplicate.log', true);\r\n\r\n                if (unlink(Mage::getBaseDir('media') . '\/catalog\/product' . $image)) {\r\n                    echo 'Deleted ' . Mage::getBaseDir('media') . '\/catalog\/product' . $image . PHP_EOL;\r\n                } else {\r\n                    echo 'Can not delete ' . Mage::getBaseDir('media') . '\/catalog\/product' . $image . PHP_EOL;\r\n                }\r\n\r\n            }echo 'Finished' . PHP_EOL;\r\n        } catch (Exception $e) {\r\n            echo $e-&gt;getMessage() . \"\\n\";\r\n        }<\/pre>\n<p>You can clone this <a href=\"https:\/\/github.com\/magebrew\/magentoDuplicateImages\/\" target=\"_blank\">repository<\/a>, copy script to shell directory and run it from command line:<\/p>\n<pre class=\"lang:sh decode:true\">php shell\/removeDuplicatedImages.php<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article I would like to share a script for deleting duplicate images in Magento.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[12,3,9,10],"_links":{"self":[{"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/posts\/74"}],"collection":[{"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/comments?post=74"}],"version-history":[{"count":5,"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/posts\/74\/revisions"}],"predecessor-version":[{"id":79,"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/posts\/74\/revisions\/79"}],"wp:attachment":[{"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/media?parent=74"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/categories?post=74"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/magebrew.com\/blog\/wp-json\/wp\/v2\/tags?post=74"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}