Dear Jason,
J> I think that entirely depends on the format the file is distributed in.
J> You could take a zipfile and pad it in non critical areas to change the
J> MD5 without creating a substantial difference in the deliverable J> content. You could do the same with gzip or bzip formatted files. You
J> could also pad any embedded jpeg images to engineer a collision. There
J> are quite a few opportunities where this method could be used to twiddle
J> the new MD5 without materially changing the content.
Clever approach there, haven't thought about that beforehand.
J> Software that is ~150M in size, it gets redistributed as a new file that J> is 160M is size but has a collision with your software which is also J> 160M in size. I imagine there would be some computational time involved J> to find the appropriate collision but a lot less computational time than J> finding a perfect match to the original.
If I understood your point correctly and if my knowledge about hash algos is correct then to my believe the computational time to generate a collision is exactly the same for the perfect match as it would be to use an existing file to create a potenatial collision.
I've not looked into it to be honest. I am thinking aloud.
The only difference between theory and reality is implementation.