Logik Labs

Want to see what’s cooking over at Logik Systems? Check out Logik Labs to see what interesting and new eDiscovery features and software we are developing for our clients.

While you are at it send us an idea you have for anything eDiscovery related. Most of our ideas come from our clients input, so please, help us help you and we can make the world of eDiscovery a better place!

Near-Duplicate Detection

Regular de-duplication techniques are very unforgiving. If one character is off between two documents that are almost 100% the same in content, regular de-duplication will treat those documents as unique and not associate the two in any way.

Near-duplication techniques solve this problem by comparing every word in a document against the words in other documents, thereby detecting word similarities. For example, let’s say a sales employee of XYZ company likes to send template-based emails to all of his clients on a daily basis. Each email sent is almost identical except the recipient is different and some of the content has slightly changed. Regular de-duplication would never categorize these documents as duplicate, because they technically are not duplicates, but they are considered near-dupes since the content is almost exactly the same.

Grouping near-duplicate documents can drastically speed up and optimize document reviews, because it enables a reviewer to easily compare like documents. Much like our other proprietary technologies, we built our near-duplication software from the ground up. We can compare billions of documents per-day and provide on-going near-duplicate detection for the life of a project. This means that documents processed in March will be analyzed and compared against everything processed in January and vice-versa, thus keeping the near-duplicate grouping consistent. Unlike other competing near-duplicate technologies ours does not identify false-positives because we do not use synonym look-ups for every word compared. This results in clean and accurate near-duplication in a fraction of the time.

For more information about our near-duplicate technology please send an email to questions@lksi.com

Posted by Andrew Wilson on December 5, 2007 | Permanent Link |

Post a Comment

Post a Comment

Comment Form