Здравейте, някой може ли да ми помогне с решението на тази задача?
Find all duplicated files
In a given folder - find recursively all files that are duplicates even though their names are not matching and show duplicate files as groups in the output.
Think about how to check if two files are duplicates
Have in mind that some files are links and some links are broken (not following anywhere)
Ignore OS files, think about how, check the API and also think about to explain what will happen if you don't ignore them.
Just think about how the task will work for large files over a few MB - solve the task for files not larger than a MB.
If you are having performance problems, an optimization is to not read all the bytes, but to read them and compare them in chunks.
If two files are different, usually the difference will be pretty early in their bytes.
1) Group all files by the hashcode of their bytes.
Use a Map<Long, List<Path>> to represent them, the key is the hashCode of all the bytes f rom a File.Do not worry about collisions. How can you quickly detect colissions with very high probability? Without reading and comparing the files?
2) Group all paths by file size. Two files cannot be duplicates if they are of different sizes, right?
So once we form the groups, in each group, every two files are either duplicates, or have different content with the same size.