Silkenweb Example: Hackernews Clone

The Problem with Perceptual Hashes

705 points by rivo 3 years ago | 418 comments

ezoe 3 years ago
The problem of hash or NN based matching is, the authority can avoid explaining the mismatch.
Suppose the authority want to false-arrest you. They prepare a hash that matches to an innocent image they knew the target has in his Apple product. They hand that hash to the Apple, claiming it's a hash from a child abuse image and demand privacy-invasive searching for the greater good.
Then, Apple report you have a file that match the hash to the authority. The authority use that report for a convenient reason to false-arrest you.
Now what happens if you sue the authority for the intentional false-arrest? Demand the original intended file for the hash? "No. We won't reveal the original file because it's child abusing image, also we don't keep the original file for moral reason"
But come to think of it, we already have tons of such bogus pseudo-science technology like the dogs which conveniently bark at police's secret hand sign, polygraph, and the drug test kit which detect illegal drugs from thin air.
- delusional 3 years ago
  What about trolling. Assume 4chan figures out apples algorithm. What now happens when they start generating memes that happen to match known child pornography? Will anyone who saves those memes (or repost them to reddit/facebook) be flagged? What will apple do once flagged false positive photos go viral?
  - mirkules 3 years ago
    One way this hair-brained Apple program could end is to constantly generate an abundance of false positives, and try to render it useless.
    For those old enough to remember “Jam Echelon Day”, maybe it won’t have any effect. But what other recourse do we have other than to maliciously and intentionally subvert and break it?
    - summm 3 years ago
      What recourse? how about not buying apple devices that have always locked the user into a golden cage?
  - 3 years ago
  - sunshinerag 3 years ago
    >> Will anyone who saves those memes (or repost them to reddit/facebook) be flagged?
    Shouldn't they be?
    - paulryanrogers 3 years ago
      Umm, no? If someone happens upon some funny cat meme that 4chan users made with an intentional hash collision then they're not guilty of anything.
      A poor analogy could be trolls convincing a flash mob to dress like a suspect's description which they overheard with a police scanner. No one in the mob is guilty of anything more than poor fashion choice.
    - Frost1x 3 years ago
      The point made was that there are always flaws in these sorts of approaches that lead to false positives. If you can discover the flawed pattern(s) that leads to false positives and engineer them into seemingly harmless images, you can quite literally do what OP I'd suggesting. It's a big IFF but it's not theoretically impossible.
      The difference between this and hashes that require image data to be almost identical is that someone who accidently sees it can avoid and report it. If I can make cat photos that set off Apple's false positives, then there's a lot of people who will be falsely accused of propagating child abuse photos when they're really just sending cat memes.
- thaumasiotes 3 years ago
  > like the dogs which conveniently bark at police's secret hand sign
  This isn't necessary; the state of the art is for drug dogs to alert 100% of the time. They're graded on whether they ever miss drugs. It's easy to never miss.
  - exporectomy 3 years ago
    Airport baggage drug dogs must obviously have far fewer false positives than that. So alerting on everything can't be the state of the art.
    - thaumasiotes 3 years ago
      https://reason.com/2021/05/13/the-police-dog-who-cried-drugs...
      > Similar patterns abound nationwide, suggesting that Karma's career was not unusual. Lex, a drug detection dog in Illinois, alerted for narcotics 93 percent of the time during roadside sniffs, but was wrong in more than 40 percent of cases. Sella, a drug detection dog in Florida, gave false alerts 53 percent of the time. Bono, a drug detection dog in Virginia, incorrectly indicated the presence of drugs 74 percent of the time.
    - 3 years ago
  - jbuhbjlnjbn 3 years ago
    To be more precise, if the dog always barks and only missing positives are counted, it is inevitable to never miss. An obvious number cheat.
  - intricatedetail 3 years ago
    Dogs are used to protect police from accusations of racism and profiling.
    - pixl97 3 years ago
      Which is odd as dogs can be just as racist as their handlers want.
- fogof 3 years ago
  Well, presumably at that point, someone in that position would just reveal their own files with the hash an prove to the public that they weren't illegal. Sure, it would be shitty to be forced to reveal your private information that way, but you would expose a government agency as fabricating evidence and lying about the contents of the picture in question to falsely accuse someone. It seems like that would be a scandal of Snowden-level proportions.
  - BiteCode_dev 3 years ago
    Na they will ruin your life even if you are found innocent and pay no price for it.
    That's the problem: the terrible asymetry. The same one you find with TOS, or politicians working for lobbists.
    - sharken 3 years ago
      Who would a company hire: the candidate with a trial for CP due to a false positive or the candidate without ?
      And this is just to address the original concept of this scanning.
      As many others have pointed out there is too much evidence pointing to other uses in the future.
  - dannyw 3 years ago
    There are literally hundreds of cases of police fabricating evidence and getting caught in court, or on bodycam.
    This happens today. We must not build technology that makes it even more devastating.
  - nicce 3 years ago
    ”Sorry, but collisions happen with all hashing algorithms, and you can’t prove otherwise. It is just a matter of time. Nothing to see here.”
    - Frost1x 3 years ago
      Well, not all hashing algorithms but all interesting or considered useful hashing algorithms, probably.
      When dealing with say countable infinite sets you can certainly create a provable unique hash for each item in that set. The hash won't be interesting or useful. E.g. a hash that indexes all the integers n with a hashing function h(n+1)... so every integer you hash will be that value plus one. But this just being pedantic and wanting to walk down the thought.
    - nullc 3 years ago
      In the past the FBI used some cryptographic hash. Collisions with a secure cryptographic hash are functionally unobservant in practice (or else the hash is broken).
      The use of the perceptual hash is because some people might evade the cryptographic hash by making small modifications to the image. The fact that they'd discarded the protection of cryptographic hashing just to accommodate these extra matches is unsurprising because their behavior is largely unconstrained and unbalanced by competing factors like the public's right to privacy or your security against being subject to a false accusation.
  - gpm 3 years ago
    It wouldn't prove anything, because hash functions are many-to-one. It's entirely possible that it was just a coincidence.
  - visarga 3 years ago
    You can reveal your files and people can accuse you you deleted the incriminating ones.
    - kelnos 3 years ago
      Not if you show the file that matches the perceptual hash that "caught" you.
  - 3 years ago
- ATsch 3 years ago
  The way I see it, this is the only possible purpose this system could have. With the press after this announcement, almost every single person in posession of those materials knows it's not safe to store them on an iPhone. By it's construction, this system can only be effective against things that the owner is not aware their phones are being searched for.
- emodendroket 3 years ago
  Parallel construction is another way this is often pursued.
- nullc 3 years ago
  > Demand the original intended file for the hash?
  Even if they'd provide it-- the attacker need only perturb an image from an existing child abuse image database until it matches the target images.
  Step 1. Find images associated with the race or political ideology that you would like to genocide and compute their perceptual hashes.
  Step 2. Obtain a database of old widely circulated child porn. (Easy if you're a state actor, you already have it, otherwise presumably it's obtainable since if it wasn't none of this scanning would be needed).
  Step 3. Scan for the nearest perceptual matches for the target images in the CP database. Then perturb the child porn images until they match (e.g. using adversarial noise).
  Step 4. Put the modified child porn images into circulation.
  Step 5. When these in-circulation images are added to the database the addition is entirely plausibly denyable.
  Step 6. After rounding up the targets, even if they're allowed any due process at all you disallow them access to the images. If that dis-allowance fails, you can still cover by the images existing and their addition having been performed by someone totally ignorant of the scheme.
- jMyles 3 years ago
  I know this is a tough thing to consider, but:
  Isn't this a problem generally with laws against entire classes of media?
  Planting a child abuse image (or even simply claiming to have found one) is trivial. Even robust security measures like FDE don't prevent a criminal thumb-drive from appearing.
  I think we probably need to envision a future in which there is simply no such concept under law as an illegal number.
- some_random 3 years ago
  The police can arrest you for laws that don't exist but they think exist. They don't need to any of this stuff.
- 3 years ago
- jokoon 3 years ago
  > Suppose the authority want to false-arrest you.
  Why would they want that?
  - nicce 3 years ago
    Corruption. Lack of evidence on some other cases. Personal revenge. Who knows, but list is big.
    - jokoon 3 years ago
      Ok but but what ends?
  - ATsch 3 years ago
    This is a pretty weird question considering the mountains of documentation of authorities doing just that. This is not some kind of hypothetical that needs extraordinary justification.
  - awestroke 3 years ago
    Oh, sweet, naive child.
    - latexr 3 years ago
      Be kind[1]. Not everyone will have a life experience or knowledge similar to yours. Someone looking to fill the gaps in their knowledge in good faith should be encouraged, not ridiculed.
      [1]: https://news.ycombinator.com/newsguidelines.html
    - jokoon 3 years ago
      I'm not american, I'm just asking a simple question.
marcinzm 3 years ago
Given all the zero day exploits on iOS I wonder if it's now going to be viable to hack someone's phone and upload child porn to their account. Apple with happily flag the photos and then, likely, get those people arrested. Now they have to, in practice, prove they were hacked which might be impossible. Will either ruin their reputation or put them in jail for a long time. Given past witch hunts it could be decades before people get exonerated.
- dylan604 3 years ago
  >Given past witch hunts it could be decades before people get exonerated.
  Given how pedophiles are treated in prison, that might be longer than your expected lifespan if you are sent to prison because of this. Of course I'm taking it to the dark place, but you kinda gotta, you know?
- seph-reed 3 years ago
  Someone is going to figure out how to make false positives, and then an entire genre of meme will be born from putting regular memes through a false positive machine, just for the lulz.
  Someone else could find a way to make every single possible mutation of false positive Goatse/Lemonparty/TubGirl/etc. Then some poor Apple employee has to check those out.
  - 0x426577617265 3 years ago
    If the process of identifying the images is done on the device, then a jailbroken device will likely give an attacker access to the entire DB. I'm not sure how useful it would be, but if the attacker did have access to actual known CSAM images it probably wouldn't be hard for them to produce false positives and test it against the DB on the jailbroken device, without notifying the company.
  - mirker 3 years ago
    If Apple is indeed using CNNs, then I don’t see why any of the black-box adversarial attacks used today in ML wouldn’t work. It seems way easier than attacking file hashes, since there are many images in the image space that are viable (e.g., sending a photo of random noise to troll with such an attack seems passable).
    - brokenmachine 3 years ago
      What's a CNN?
- remram 3 years ago
  The "hack" might be very simple, since I'm sure it's possible to craft images that look like harmless memes but trigger the detection for CP.
  - hda2 3 years ago
    The new and improved swatting.
  - 0x426577617265 3 years ago
    Couldn't the hack just be as simple as sending someone an iMessage with the images attached? Or somehow identify/modify non-illegal images to match the perceptual hash -- since it's not a cryptographic hash.
    - barsonme 3 years ago
      Does iCloud automatically upload iMessage attachments?
- new_realist 3 years ago
  This is already possible using other services (Google Drive, gmail, Instagram, etc.) that already scan for CP.
  - t0mas88 3 years ago
    Does Google scan all files you upload to them with an algorithm like the one now proposed? Or do they have only a list of exact (not perceptual) SHA hashes of files to flag on? The latter I think is also used for pirated movies etc being removed under DMCA?
    - acdha 3 years ago
      Yes: it’s called PhotoDNA and is used by many, many services. See https://en.wikipedia.org/wiki/PhotoDNA
      SHA hashes aren’t suitable for this: you can change a single bit in the header to bypass a hash check. Perceptual hashes are designed to survive cropping, rotation, scaling, and embedding but all of those things mean that false-positives become a concern. The real risk would be if someone figured out how to many plausibly innocent collisions where you could send someone a picture which wasn’t obviously contraband or highly suspicious and attempt to convince them to save it.
    - bccdee 3 years ago
      I'm pretty sure they use perceptual hashes for matching CSAM. A lot of cloud services do this sort of thing.
- toxik 3 years ago
  This is really a difficult problem to solve I think. However, I think most people who are prosecuted for CP distribution are hoarding it by the terabyte. It’s hard to claim that you were unaware of that. A couple of gigabytes though? Plausible. And that’s what this CSAM scanner thing is going to find on phones.
  - 0x426577617265 3 years ago
    Why would they hoard it in the camera/iPhotos app? I assume that storage is mostly pictures taken with the device. Wouldn't this be the least likely place to find a hoard of known images?
  - emodendroket 3 years ago
    A couple gigabytes is a lot of photos... and they'd all be showing up in your camera roll. Maybe possible but stretching the bounds of plausibility.
    - giantrobot 3 years ago
      The camera roll's defaults display images chronologically based on the image's timestamp. I've got thousands of photos on my phone going back years.
      If you hack my phone and plant some photos with a sufficiently old timestamp I'd never notice them. I can't imagine my situation is all that uncommon either.
    - danachow 3 years ago
      A couple gigabytes is enough to ruin someone’s day but not a lot to surreptitiously transfer, it’s literally seconds. Just backdate them and they may very well go unnoticed.
    - runlevel1 3 years ago
      Gigs of software updates and podcast episodes are regularly downloaded to phones without being noticed.
      How frequently do most people look at their camera roll? I'd be surprised if it's more than a few times a week on average.
      Does an attacker even need access to the phone? If iCloud is syncing your photos, your phone will eventually see all your pictures. Unless I've misunderstood how this works, the attacker only needs access to your iCloud account.
    - MinusGix 3 years ago
      As others have said, people have a lot of photos. It wouldn't be too hard to hide them a bit from obvious view. As well, I rarely look at my gallery unless I need to. I just add a few photos occasionally. So maybe once every two weeks I look at my gallery, plenty of time to initiate that.
- gnopgnip 3 years ago
  Wouldn't this risk exist already, as long as it is uploaded to icloud?
- TeeMassive 3 years ago
  You don't even need hacking for this to be abused by malevolent actors. A wife in a bad marriage could simply take nude pictures of their child to falsely accuse her husband.
  This tech is just ripe for all kind of abuses.
  - amannm 3 years ago
    That picture wouldn't already be in the CSAM database...
    - TeeMassive 3 years ago
      There's no point for this tech then. There's no way that they will no try to expand to detect the presence of porn and children being both present at the same time.
avnigo 3 years ago
> These cases will be manually reviewed. That is, according to Apple, an Apple employee will then look at your (flagged) pictures.
I'm surprised this hasn't gotten enough traction outside of tech news media.
Remember the mass celebrity "hacking" of iCloud accounts a few years ago? I wonder how those celebrities would feel knowing that some of their photos may be falsely flagged and shown to other people. And that we expect those humans to act like robots and not sell or leak the photos, etc.
Again, I'm surprised we haven't seen a far bigger outcry in the general news media about this yet, but I'm glad to see a lot of articles shining light on how easy it is for false positives and hash collisions to occur, especially at the scale of all iCloud photos.
- judge2020 3 years ago
  They wouldn't be falsely flagged. It doesn't detect naked photos, it detects photos matching real confirmed CSAM based on the NCMEC's database.
  - avnigo 3 years ago
    The article posted, as well as many others we've seen recently, demonstrate that collisions are possible, and most likely inevitable with the number of photos to be scanned for iCloud, and Apple recognizes this themselves.
    It doesn't necessarily mean that all flagged photos would be of explicit content, but even if it's not, is Apple telling us that we should have no expectation of privacy for any photos uploaded to iCloud, after running so many marketing campaigns on privacy? The on-device scanning is also under the guise of privacy too, so they wouldn't have to decrypt the photos on their iCloud servers with the keys they hold (and also save some processing power, maybe).
    - spacedcowboy 3 years ago
      Apple already use the same algorithm on photos in email, because email is unencrypted. Last year Apple reported 265 cases according to the NYT. Facebook reported 20.3 million.
      Devolving the job to the phone is a step to making things more private, not less. Apple don’t need to look at the photos on the server (and all cloud companies in the US are required to inspect photos for CSAM) if it can be done on the phone, removing one more roadblock for why end-to-end encryption hasn’t happened yet.
  - nullc 3 years ago
    If there were no false positives there would be no legitimate reason for Apple to review-- they would just be needlessly exposing their employees to child abuse material.
    But the fact that there is no legitimate reason according to the system's design doesn't prevent there from being an illegitimate reason: Apple's "review" undermines your legal due process protection against warrantless search.
    See US v. Ackerman (2016): The appeals court ruled that when AOL forwarded an email with an attachment whos hash matched the NCMEC database to law enforcement without anyone looking at it, and law enforcement looked at the email without obtaining a warrant was an unlawful search and had AOL looked at it first (which they can do by virtue of your agreement with them) and gone "yep, thats child porn" and reported it, it wouldn't have been an unlawful search.
  - auggierose 3 years ago
    If that would always work, a manual review would not be necessary. Just send the flagged photo and its owner straight to the police.
  - wongarsu 3 years ago
    It will flag pictures that match a perceptual hash of pictures of child abuse. Now what legal kinds of pictures are most similar in composition, color, etc. to those offending pictures? What kinds of pictures would be hardest to distinguish from offending pictures if you were given only 16x16 thumbnails?
    I'm going to bet the algorithm will struggle the most with exactly the pictures you don't want reviewers or the public to see.
  - josefx 3 years ago
    Hashes, no false matches, pick one.
- lliamander 3 years ago
  That really alarmed me. I don't think a hosting provider like Apple should have a right to access private pictures, especially just to enforce copyright.
  Edit: I see now it's not about copyright, but still very disturbing.
- fortran77 3 years ago
  So we have an Apple employee, the type of person who gets extremely offended over such things as "Chaos Monkeys," deciding if someone is a criminal? No thanks!
at_a_remove 3 years ago
I do not know as much about perceptual hashing as I would like, but have considered it for a little project of my own.
Still, I know it has been floating around in the wild. I recently came across it on Discord when I attempted to push an ancient image, from the 4chan of old, to a friend, which mysteriously wouldn't send. Saved it as a PNG, no dice. This got me interested. I stripped the EXIF data off of the original JPEG. I resized it slightly. I trimmed some edges. I adjusted colors. I did a one degree rotation. Only after a reasonably complete combination of those factors would the image make it through. How interesting!
I just don't know how well this little venture of Apple's will scale, and I wonder if it won't even up being easy enough to bypass in a variety of ways. I think the tradeoff will do very little, as stated, but is probably a glorious apportunity for black-suited goons of state agencies across the globe.
We're going to find out in a big big way soon.
* The image is of the back half of a Sphynx cat atop a CRT. From the angle of the dangle, the presumably cold, man-made feline is draping his unexpectedly large testicles across the similarly man-made device to warm them, suggesting that people create problems and also their solutions, or that, in the Gibsonian sense, the street finds its own uses for things. I assume that the image was blacklisted, although I will allow for the somewhat baffling concept of a highly-specialized scrotal matching neural-net that overreached a bit or a byte on species, genus, family, and order.
- judge2020 3 years ago
  AFAIK Discord's NSFW filter is not a perceptual hash nor uses the NCMEC database (although that might indeed be in the pipeline elsewhere) but instead uses a ML classifier (I'm certain it doesn't use perceptual hashes as Discord doesn't have a catalogue of NSFW image hashes to compare against). I've guessed it's either open_nsfw[0] or Google's Cloud Vision since the rest of Discord's infrastructure uses Google Cloud VMs. There's a web demo available of this api[1], Discord probably pulls the safe search classifications for determining NSFW.
  0: https://github.com/yahoo/open_nsfw
  1: https://cloud.google.com/vision#section-2
- a_t48 3 years ago
  Adding your friend as a "friend" on discord should disable the filter.
  - J_tt 3 years ago
    Each user can adjust the settings for how incoming images are filtered, one of the options disables it for friends.
- noduerme 3 years ago
  I had to go search for that image. Love it.
  >> in the Gibsonian sense
  Nice turn of phrase. Can't wait to see what the street's use cases are going to be for this wonderful new spyware. Something nasty, no doubt.
mrtksn 3 years ago
The technical challenges aside, I’m very disturbed that my device will be reporting me to the authorities.
That’s very different from authorities taking a sneak peek into my stuff.
That’s like the theological concept of always being watched.
It starts with child pornography but the technology is indifferent towards it, it can be anything.
It’s always about the children because we all want to save the children. Soon they will start asking you start saving your country. Depending on your location they will start checking against sins against religion, race, family values, political activities.
I bet you, after the next election in the US your device will be reporting you for spreading far right or deep state lies, depending on who wins.
I’m big Apple fanboy, but I’m not going to carry a snitch in my pocket. That’s “U2 Album in everyone’s iTunes library” blunder level creepy with the only difference that it’s actually truly creepy.
In my case, my iPhone is going to be snitching me to Boris and Erdogan, in your case it could be Macron, Bolsonaro, Biden, Trump etc.
That’s no go for me, you can decide for yourself.
- asimpletune 3 years ago
  I have been a big Apple fan ever since my first computer. This is the first time I legitimately thought I need to start thinking about something else. It’s kind of sad.
  - jeromegv 3 years ago
    Genuinely curious, why? This scanning was already happening server-side in your iCloud photos, just like Google Photos, etc. Now they are removing it from server-side to client-side (which still require this photo to be hosted in iCloud)
    What changed, really?
    - asimpletune 3 years ago
      The problem for me is my device “informing” on me changes the fundamental nature of my device. Also the opaqueness of the whole system is concerning, along with the potential for false positives. And lastly is the “if this then what’s next?” And that is definitely a road I don’t want to go down on.
      For me it’s sad because I have literally always stood by them and they make amazing hardware and software. However at the end of the day I’d rather have the nature of my device be one where it is under my control, than all the wonderful apple tech.
      I just don’t know what to use instead.
    - zekrioca 3 years ago
      You answered your own question and still don’t get it.
  - voidnullnil 3 years ago
    Companies change. The sad part is, there is no next company to move to.
  - echelon 3 years ago
    Good! These assholes have been building a moat around all of computing, and now it's almost impossible to avoid the multi-trillion dollar monster.
    Think about all the startups that can't deploy software without being taxed most of our margin, the sign in with apple that prevents us from having a real customer relationship, and the horrible support, libraries, constant changes, etc. It's hostile! It's unfair that the DOJ hasn't done anything about it.
    A modern startup cannot succeed without Apple's blessing. To do so would be giving up 50% of the American market. When you're struggling to grow and find traction, you can't do that. It's so wildly unfair that they "own" 50+% of computer users.
    Think of all the device owners that don't have the money to pay Apple for new devices or upgrades. They can't repair them themselves. Apple's products are meant to go into the trash and be replaced with new models.
    We want to sidestep these shenanigans and use our own devices? Load our own cloud software? We can't! Apple, from the moment Jobs decreed, was fully owned property. No alternative browsers, no scripting or runtimes. No computing outside the lines. You're just renting.
    This company is so awful.
    Please call your representatives and ask them to break up the biggest and most dangerous monopoly in the world.
- gpm 3 years ago
  With you up to here, but this is jumping the shark
  > I bet you, after the next election in the US your device will be reporting you for spreading far right or deep state lies, depending on who wins.
  The US is becoming less stable, sure [1], but there is still a very strong culture of free speech, particularly political speech. I put the odds that your device will be reporting on that within 4 years as approximately 0. The extent that you see any interference with speech today is corporations choosing not to repeat certain speech to the public. Not them even looking to scan collections of files about it, not them reporting it to the government, and the government certainly wouldn't be interested if they tried.
  The odds that it's reporting other crimes than child porn though, say, copyright infringement. That strikes me as not-so-low.
  [1] I agree with this so much that it's part of why I just quit a job that would have required me to move to the US.
  - esyir 3 years ago
    >but there is still a very strong culture of free speech
    In my opinion, that culture has been rapidly dying, chipped away by a very sizable and growing chunk that doesn't value it at all, seeing it only as a legal technicality to be sidestepped.
    - bigyikes 3 years ago
      I find this varies greatly depending on location. Living in California, I was convinced of the same. Living in Texas now, I’m more optimistic.
  - dannyw 3 years ago
    Did you literally not see a former president effectively get silenced in the public sphere by 3 corporations?
    How can you seriously believe that these corporations (who are not subject to the first amendment, and cannot be challenged in court) won't extend and abuse this technology to tackle "domestic extremism" but broadly covering political views?
    - macintux 3 years ago
      > a former president effectively get silenced in the public sphere
      It's laughable that a man who can call a press conference at a moment's notice and get news coverage for anything he says can be "silenced" because private companies no longer choose to promote his garbage.
    - wpietri 3 years ago
      Those companies can definitely be challenged in courts. But they also have rights, including things like freedom of speech and freedom of association, which is why they win when challenged on this. Why do you think a former president and claimed billionaire should have special rights to their property?
  - feanaro 3 years ago
    > but there is still a very strong culture of free speech, particularly political speech.
    Free speech didn't seem so important recently when the SJW crowd started mandating to censor certain words because they're offensive.
    - wpietri 3 years ago
      Free speech doesn't mean the speaker is immune from criticism or social consequences. If I call you a bunch of offensive names here, I'll get downvoted for sure. The comment might be hidden from most. I might get shadowbanned or totally banned, too.
      That was true of private spaces long before HN existed. If you're a jerk at a party, you might get thrown out. I'm sure that's been true as long as there have been parties.
      The only thing "the SJW crowd" has changed is which words are now seen as offensive.
  - efitz 3 years ago
    Apple has a shitty record wrt free speech. Apple hates free speech. Apple likes “curation”. They canned Parler in a heartbeat; they also police the App Store for anything naughty.
    - gpm 3 years ago
      Canning Parler is Apple choosing not to advertise and send you an app they don't like, i.e. it's Apple exercising it's own right to free speech. Agree or disagree with it, it's categorically different from Apple spying on what the files you have are saying (not even to or via Apple) and reporting it to the government.
  - l33t2328 3 years ago
    I don’t see how the US is becoming “less stable” in any meaningful sense. Can you elaborate?
    - gpm 3 years ago
      Both sides of the political spectrum think the other side is stupid, and evil. The gap between the two sides is getting bigger. Politicians and people (especially on the right, but to some extent on the left) are increasingly willing to cheat to remain in power.
      If you want some concrete examples:
      - Trump's attempted coup, the range of support it received, the lack of condemnation it received.
      - Law's allowing things like running over protestors
      - Law's with the transparent goal of suppressing voters
      - Widespread support (not unjustified IMO) for stacking the supreme court
      - Police refusing to enforce certain laws as a political stance (not because they legitimately think they're unlawful, just that they don't like them)
      - (Justified) lack of trust in the police quickly trending higher
      - (Justified?) lack of trust in the military to responsibly use tools you give it, and support for a functional military
      - (Justified?) lack of faith in the border guards and the ability to pass reasonable immigration laws, to the point where many people are instead advocating for just not controlling the southern border.
      Generally these (and more) all speak towards the institutions that make the US a functional country failing. The institutions that make the rules for the country are losing credibility, the forces that enforce the rules are losing credibility. Neither of those are things that a country can survive forever.
    - jeromegv 3 years ago
      An attack on the capitol on January 6. A former president that spent weeks trying to delegitimize the election, trying to get people fired when they were just following the process to ratify the election, etc.
- Klonoar 3 years ago
  I would really like people to start answering this: what exactly do you think has changed? e.g,
  >That’s very different from authorities taking a sneak peek into my stuff.
  To be very blunt:
  - The opt out of this is to not use iCloud Photos.
  - If you _currently_ use iCloud Photos, your photos are _already_ hash compared.
  - Thus the existing opt out is to... not use iCloud Photos.
  The exact same outcome can happen regardless of whether it's done on or off device. iCloud has _always_ been a known vector for authorities to peek.
  >I’m big Apple fanboy, but I’m not going to carry a snitch in my pocket.
  If you use iCloud, you arguably already do.
  - Renaud 3 years ago
    What has changed is the inclusion of spyware technology on the device that can be weaponised to basically report on anything.
    Today it's only geared toward iCloud and CSAM. How many lines of codes do you think it will take before it scans all your local pictures?
    How hard do you think it will be for an authoritarian regime like China, that Apple bends over backwards to please, to start including other hashes that are not CSAM?
    iCloud is opt-out. They can scan server-side like everyone does. Your device is your device, and it now contains, deeply embedded into it, the ability to perform actions that are not under your control and can silently report you directly to the authorities.
    If you don't see a deep change there, I don't know what to say.
    I live in a country that is getting more authoritarian by the day, where people are sent to prison (some for life) for criticizing the government, sometime just for chanting or printing a slogan.
    This is the kind of crap that makes me extremely angry at Apple. Under the guise of something no-one can genuinely be against (think of the children!), they have now included a pretty generic snitch into your phone and made everyone accept it.
    - Klonoar 3 years ago
      >What has changed is the inclusion of spyware technology on the device that can be weaponised to basically report on anything.
      - You are running a closed source proprietary OS that you cannot verify is not already doing anything.
      - This could theoretically already be weaponized (with the existing server-side implementation) by getting someone to download a file to a folder that iCloud automatically syncs from.
      >iCloud is opt-out.
      Yes, and that's how you opt out of this scanning. It's the same opt-out as before.
      >Under the guise of something no-one can genuinely be against (think of the children!) they have now included a pretty generic snitch into your phone and made everyone accept it.
      I dunno what to tell you. I think the system as designed is actually pretty smart[1] and more transparent than before.
      If you used iCloud before, and you're putting photos up that'd be caught in a hash comparison, you've already got a snitch. Same with any other cloud storage, short of hosting your own.
      [1] I reserve the right for actual bona-fide cryptographers to dissect it and set the record straight, mind you.
    - wonnage 3 years ago
      We gotta stop with the China bogeyman every time a privacy issue comes up. This is a feature designed by an American company for American government surveillance purposes. China is perfectly capable of doing the same surveillance or worse on its own citizens, with or without Apple. China has nothing to do with why American tech is progressively implementing more authoritarian features in a supposedly democratic country.
  - coldtea 3 years ago
    >I would really like people to start answering this: what exactly do you think has changed? e.g,
    Apple has announced they'll be doing this check?
    What exactly do you think is the same as before?
    >The exact same outcome can happen regardless of whether it's done on or off device. iCloud has _always_ been a known vector for authorities to peek.
    That's neither here, nor there. It's another thing to peak selectively with a warrant of sorts, than to (a) peak automatically in everybody, (b) with a false-positive-prone technique, especially since the mere accusation on a false match can be disastrous for a person, even if they eventually are proven innocent...
    - Klonoar 3 years ago
      Responding in a separate comment since I either missed the second half, or it was edited in.
      >That's neither here, nor there. It's another thing to peak selectively with a warrant of sorts, than to (a) peak automatically in everybody, (b) with a false-positive-prone technique, especially since the mere accusation on a false match can be disastrous for a person, even if they eventually are proven innocent...
      I do not believe that iCloud CSAM server side matching ever required a warrant, and I'm not sure where you've gotten this idea. It quite literally is (a) peak automatically in everybody.
      Regarding (b), with this way - thanks to them publishing details on it - there's more transparency than if it was done server side.
      >especially since the mere accusation on a false match can be disastrous for a person
      As noted elsewhere in this very thread, this can happen whether client or server side. It's not unique in any way, shape or form to what Apple is doing here.
    - Klonoar 3 years ago
      >What exactly do you think is the same as before?
      The same checking when you synced things to iCloud. As has been repeated over and over again, this check happens for iCloud Photos. It's not running arbitrarily.
      Your photos were compared before and they're being compared now... if you're using iCloud Photos.
  - foerbert 3 years ago
    I think one of the major factors that changes how people perceive this is that it's happen on their own device. If you upload a thing to a server and the server does something... I mean sure. You gave a thing to somebody else, and they did something with it. That's a very understandable and largely accepted situation.
    This is different. This is your own device doing that thing, out of your control. Alright sure, it's doing the same thing as the other server did and under the same circumstances* so maybe functionally nothing has changed. But the philosophical difference is quite huge between somebody else's server watching over what you upload and your own device doing it.
    I'm struggling to come up with a good analogy. The closest I can really think of is the difference between a reasonably trusted work friend and your own family member reporting you to the authorities for suspicious behavior in your workplace and home respectively. The end result is the same, but I suspect few people would feel the same about those situations.
    * There is no inherent limitation for your own device to only be able to check photos you upload to iCloud. There is however such a limitation for the iCloud servers. A very reasonably and potentially functional difference is the ability for this surveillance to be easily expanded beyond iCloud uploads in the future.
  - matheusmoreira 3 years ago
    What changed is we are not the masters of our technology anymore. If I tell my computer to do something, it should do it without question. It doesn't matter if it's a crime. The computer is supposed to be my tool and obey my commands.
    Now what's going to happen instead is the computer will report me to its real masters: corporations, governments. How is this acceptable in any way?
    - brokenmachine 3 years ago
      Vote with your wallet.
  - xuki 3 years ago
    It makes even less sense, given that they are currently doing this with your iCloud photos. Now they have this tool that can match to a database of photos, how do we know they wouldn't use this to identify non-sexual photos? Maybe Tim Cook wouldn't, what about the next CEO? And the one after that?
    - tialaramex 3 years ago
      What makes you think that Apple has a database of actual child sex abuse images? Does that feel like a thing you'd be OK with? "Oh, this is Jim, he's the guy who keeps our archive of sex abuse photographs here at One Infinite Loop" ? If you feel OK with that at Apple, how about at Facebook? Tencent? What about the new ten-person SV start-up would-be Facebook killer whose main founder had a felony conviction in 1996 for violating the Mann Act. Still comfortable?
      Far more likely Apple takes a bunch of hashes from a third party in the law enforcement side of things (ie cops) and trust that the third party is definitely giving them hashes to protect against the Very Bad Thing that Apple's customers are worried about.
      Whereupon what you're actually trusting isn't Tim Cook, it's a cop. I'm told there are good cops. Maybe all this is done exclusively by good cops. For now.
      Now, I don't know about the USA, but around here we don't let cops just snoop about in our stuff, on the off-chance that by doing so they might find kiddie porn. So it should be striking that apparently Apple expects you to be OK with that.
    - Klonoar 3 years ago
      The questions re: what the CEO would sign off on here don't really matter, as the question could apply whether it's server side or client side.
      It _does_ make sense client side if you view it being done server side as a blocker for E2EE on iCloud. There is absolutely no world where Apple could implement that without keeping the ability to say "yes, we're blocking child porn".
  - bigiain 3 years ago
    > The opt out of this is to not use iCloud Photos.
    Wasn’t yesterday’s version of this sorry about how Apple is implementing this as a client side service on iPhones?
    https://news.ycombinator.com/item?id=28068741
    I don’t know if the implication there is “don’t use the stock Apple camera app and photo albums”, or “don’t store any images on yours Phone any more” if they are scanning files from other apps for perceptual hash matches as well…
    - Klonoar 3 years ago
      ...yes, and the client-side check is only run before syncing to iCloud Photos, which is basically just shifting the hash check from before upload (client side) to after upload (server side).
- dylan604 3 years ago
  >but I’m not going to carry a snitch in my pocket.
  I wonder how this will hold up against 5th ammendment (in the US) covering self-incrimination?
  - ssklash 3 years ago
    I assume the third party doctrine makes it so that the 5th amendment doesn't apply here.
    - dylan604 3 years ago
      "The third-party doctrine is a United States legal doctrine that holds that people who voluntarily give information to third parties—such as banks, phone companies, internet service providers, and e-mail servers—have "no reasonable expectation of privacy." A lack of privacy protection allows the United States government to obtain information from third parties without a legal warrant and without otherwise complying with the Fourth Amendment prohibition against search and seizure without probable cause and a judicial search warrant." --wiki
      Okay, but the users of said 3rd party are doing it under the assumption that it is encrypted on the 3rd party's system in a way that they cannot gain access to it. The unencrypted data is not what the user is giving to iCloud. So technically, the data this scan is providing to the authorities is not the same data that the user is giving to the 3rd parties.
      Definitely some wiggle room on both sides for some well versed lawyers to chew up some billing hours.
- baggy_trough 3 years ago
  Totally agree. This is very sinister indeed. Horrible idea, Apple.
  - zionic 3 years ago
    So what are we going to do about it?
    I have a large user base on iOS. Considering a blackout protest.
    - mrtksn 3 years ago
      IMHO, Unless everything being E2E encrypted becomes the law we can’t do anything about it because that’s not Apple’s initiative but comes from people whose job is to know things and they cannot resist keeping their hands out of these data collecting devices. They promise politicians that all the troubles will go away if we do that.
      Child pornography, Terrorism? Solve it the old way.
      I don’t know why citizens are obligated to make their jobs easier.
      We survived the times when phone calls were not moderated, we survived the times when signal intelligence was not a thing.
    - Blammar 3 years ago
      Write an iCloud photo frontend that uploads only encrypted images to iCloud and decrypts on your phone only?
- cyanydeez 3 years ago
  you have to realize though that the panopticon is limited only by the ability of "authority" to sift through it for whatever it is it is looking for.
  as this article points out, the positive matches will still need an observe to confirm what it is and is not.
  lastly, the very reason you have this device exposes you to the reality of either accepting a government that regulates these corporate overreaches or accepting private ownership thats profit motive is deeply personal.
  you basically have to reverse society or learn to be a hermit, or more realistically, buy into a improved democratic construct that opts into transparent regulation.
  but it sounds more like you want to live in a split brained world where your paranoia and antigovernment stance invites dark corporste policies to sell you out anyway
yellow_lead 3 years ago
Regarding false positives re:Apple, the Ars Technica article claims
> Apple offers technical details, claims 1-in-1 trillion chance of false positives.
There are two ways to read this, but I'm assuming it means, for each scan, there is a 1-in-1 trillion chance of a false positive.
Apple has over 1 billion devices. Assuming ten scans per device per day, you would reach one trillion scans in ~100 days. Okay, but not all the devices will be on the latest iOS, not all are active, etc, etc. But this is all under the assumption those numbers are accurate. I imagine reality will be much worse. And I don't think the police will be very understanding. Maybe you will get off, but you'll be in a huge debt from your legal defense. Or maybe, you'll be in jail, because the police threw the book at you.
- nanidin 3 years ago
  > Apple has over 1 billion devices. Assuming ten scans per device per day, you would reach one trillion scans in ~100 days.
  People like to complain about the energy wasted mining cryptocurrencies - I wonder how this works out in terms of energy waste? How many people will be caught and arrested by this? Hundreds or thousands? Does it make economic sense for the rest of us to pay an electric tax in the name of scanning other people's phones for this? Can we claim it as a deductible against other taxes?
  - FabHK 3 years ago
    > I wonder how this works out in terms of energy waste?
    Cryptocurrency waste is vastly greater. It doesn't compare at all. Crypto wastes as much electricity as a whole country. This will lead to a few more people being employed by Apple to verify flagged images, that's it.
    - nanidin 3 years ago
      In net terms, you're probably right. But at least the energy used for cryptocurrency is being used toward something that might benefit many (commerce, hoarding, what-have-you), vs against something that might result in the arrest of few.
      The economics I'm thinking of are along the lines of cryptocurrency energy usage per participant, vs image scanning energy per caught perpetrator. The number of caught perpetrators via this method over time will approach zero, but we'll keep using energy to enforce it forever.
      All this does is remove technology from the problem of child abuse, it doesn't stop child abuse.
- axaxs 3 years ago
  Eh...I don't think of it as one in a trillion scans...but one in a trillion chance per image. I have something like 2000 pics. My wife, at least 5x that number. If we split the difference, and assume the average device has 5000 pics, that's already hitting false positives multiple times. Feel sorry for the first 5 to get their account banned on day 1 because their pic of an odd piece of toast was reported to the govt as cp.
  - 3 years ago
- wilg 3 years ago
  Apple claims that metric for a false positive account flagging, not photo matching.
  > The threshold is set to provide an extremely high level of accuracy and ensures less than a one in one trillion chance per year of incorrectly flagging a given account.
  https://www.apple.com/child-safety/
  - yellow_lead 3 years ago
    Good find
- KarlKemp 3 years ago
  Do you really believe that if they scan your photo library at 10am and don't get any false positives, another scan five hours later, with no changes to the library, has the same chance of getting false positives as the first one, independent of that result?
  - NoNotTheDuo 3 years ago
    Even if the library doesn’t change, doesn’t the possibility of the list of “bad” hashes changing exist? I.e., in your example, a new hash is added to by Apple to the list at 11:30am, and then checked against your unchanged library.
    - IfOnlyYouKnew 3 years ago
      Oh god have mercy on whatever has happened to these people…
  - burnished 3 years ago
    If you take photos, then yes?
- dylan604 3 years ago
  knowing Apple, the initial scan of this will be done while the phone is on charge just like previous versions of scanning your library. However, according to Apple it is just the photos shared with iCloud. So since it's on a charger, it's minimal electron abuse.
  Once you start adding new content from camera to iCloud, I'd assume the new ML chips of Apple Silicone will be calculating the phashes as part-and-parcel to everything else it does. So unless you're trying to "recreate" known CP, then new photos from camera really shouldn't need this hashing done to them. Only files not originated from the user's iDevice should qualify. If a CP creator is using an iDevice, then their new content won't match existing hashes, so what's that going to do?
  So so many questions. It's similar yet different to mandatory metal detectors and other screening where 99.99% of people are innocent and "merely" inconvenienced vs the number of people any of that screening catches. Does the mere existence of that screening act as a deterent? That's like asking how many angels can stand on the head of a pin. It's a useless question. The answer can be whatever they want it to be.
stickfigure 3 years ago
I've also implemented perceptual hashing algorithms for use in the real world. Article is correct, there really is no way to eliminate false positives while still catching minor changes (say, resizing, cropping, or watermarking).
I'm sure I'm not the only person with naked pictures of my wife. Do you really want a false positive to result in your intimate moments getting shared around some outsourced boiler room for laughs?
- 7373737373 3 years ago
  I, too, have worked on similar detection technology using state of the art neural networks. There is no way there won't be false positives, I suspect many, many more than true positives.
  It is very likely that as a result of this, thousands of innocent people will have their most private of images viewed by unaccountable strangers, will be wrongly suspected or even tried and sentenced. This includes children, teenagers, transsexuals, parents and other groups this is allegedly supposed to protect.
  The willful ignorance and even pride by the politicians and managers who directed and voted for these measures to be taken disgusts me to the core. They have no idea what they are doing and if they do they are simply plain evil.
  It's a (in my mind entirely unconstitutional) slippery slope that can lead to further telecommunications privacy and human rights abuses and limits freedom of expression by its chilling effect.
  Devices should exclusively act in the interest of their owners.
  - nonbirithm 3 years ago
    Microsoft, Facebook, Google and Apple have scanned data stored on their servers for CSAM for over a decade already. The difference is that Apple is moving the scan on-device. Has there been any report of even a single person who's been a victim of a PhotoDNA false positive in those ten years? I'm not trying to wave away the concerns about on-device privacy, but I'd want evidence that a such significant scale of wrongful conviction is plausible as a result of Apple's change.
    I can believe that a couple of false positives would inevitably occur assuming Apple has good intentions (which is not a given), but I'm not seeing how thousands could be wrongfully prosecuted unless Apple weren't using the system like they state they will. At least in the US, I'm not seeing how a conviction can be made on the basis of a perceptual hash alone without the actual CSAM. The courts would still need the actual evidence to prosecute people. Getting people arrested on a doctored meme that causes a hash collision would at most waste the court's time, and it would only damage the credibility of perceptual hashing systems in future cases. Also, thousands of PhotoDNA false positives being reported in public court cases would only cause Apple's reputation to collapse. They seem to have enough confidence that such an extreme false positive rate is not possible to the point of implementing this change. And I don't see how just moving the hashing workload to the device fundamentally changes the actual hashing mechanism and increases the chance of wrongful conviction over the current status quo of serverside scanning (assuming that it only applies to images uploaded to iCloud, which could change of course). The proper time to be outraged at the wrongful conviction problem was ten years ago, when the major tech companies started to adopt PhotoDNA.
    On the other hand, if we're talking about what the CCP might do, I would completely agree.
    - 7373737373 3 years ago
      > I'm not seeing how a conviction can be made on the basis of a perceptual hash alone without the actual CSAM
      This is a good point, but it's not just about people getting wrongly convicted, this system even introducing a remote possibility of having strangers view your personal files is disturbing. In the US, it violates the 4th amendment against unreasonable search, a company being the middleman doesn't change that. Privacy is a shield of the individual, here the presumption of innocence is deposed even before the trial. An extremely low false positive rate or the perceived harmlessness of the current government don't matter, the systems' existence is inherently wrong. It's an extension of the warrantless surveillance culture modern nations are already so good at.
      "It is better that ten guilty persons escape than that one innocent suffer." - https://en.wikipedia.org/wiki/Blackstone%27s_ratio
      In a future with brain-computer interfaces, would you like such an algorithm to search your mind for illegal information too?
      Is it still your device if it acts against you?
  - FabHK 3 years ago
    > thousands of innocent people will have their most private of images viewed by unaccountable strangers, will be wrongly suspected or even tried and sentenced
    Apple says: "The threshold is set to provide an extremely high level of accuracy and ensures less than a one in one trillion chance per year of incorrectly flagging a given account."
    What evidence do you have against that statement?
    Next, flagged accounts are reviewed by humans. So, yes, there is a minuscule chance a human might see a derivative of some wrongly flagged images. But there is no reason to believe that they "will be wrongly suspected or even tried and sentenced".
    - 7373737373 3 years ago
      > Apple says: "The threshold is set to provide an extremely high level of accuracy and ensures less than a one in one trillion chance per year of incorrectly flagging a given account."
      I'd rather have evidence for that statement first, since these are just funny numbers. I couldn't find false-positive rates for PhotoDNA either. How many people have been legally affected by false positives so far, how many had their images viewed? The thing is, how exactly the system works has to be kept secret, because it can otherwise be circumvented. So these technical numbers will be unverifiable. The outcomes will not, and this might be a nice reason for a FOIA request.
      But who knows, it might not matter, since it's a closed source, effectively uncontrollable program running soon on millions of devices against the interest of their owners and no one is really accountable so false positives can be treated as 'collateral damage'.
- vineyardmike 3 years ago
  > Do you really want a false positive to result in your intimate moments getting shared around some outsourced boiler room for laughs?
  these people also have no incentive to find you innocent for innocent photos. If they err on the side of false-negative, they might find themselves at the wrong end of a criminal search ("why didn't you catch this"), but if they false-positive they at worse ruin a random person's life.
  - whakim 3 years ago
    This is mostly orthogonal to the author's original point (with which I concur, having also implemented image similarity via hashing and hamming distance). There just aren't a lot of knobs to tune using these algorithms so it's difficult if not impossible to make small changes to err on the side of reducing false positives.
  - TchoBeer 3 years ago
    Does claiming a false positive not run the risk of libel?
    - vineyardmike 3 years ago
      IANAL but i doubt it - they just forward to law enforcement
    - heavyset_go 3 years ago
      I doubt it. The claim isn't being published and you'd have a hard time proving damages.
  - jdavis703 3 years ago
    Even still this has to go to the FBI or other law enforcement agency, then it’s passed on to a prosecutor and finally a jury will evaluate. I have a tough time believing that false positives would slip through that many layers.
    That isn’t to say CASM scanning or any other type of drag net is OK. But I’m not concerned about a perceptual hash ruining someone’s life, just like I’m not concerned about a botched millimeter wave scan ruining someone’s life for weapons possession.
    - gambiting 3 years ago
      >>I have a tough time believing that false positives would slip through that many layers.
      I don't, not in the slightest. Back in the days when Geek Squad had to report any suspicious images found during routine computer repairs, a guy got reported to the police for having child porn, arrested, fired from his job, named in the local newspaper as a pedophile, all before the prosecutor was actually persuaded by the defense attorney to look at these "disgusting pictures".....which turned out to be his own grand children in a pool. Of course he was immediately released but not before the damage to his life was done.
      >>But I’m not concerned about a perceptual hash ruining someone’s life
      I'm incredibly concerned about this, I don't see how you can not be.
    - mattnewton 3 years ago
      By the time it has reached a jury you're already publicly accused of having CSAM which is a life ruining moment on its own, and no one before the jury has much incentive to halt the process on your behalf.
    - vineyardmike 3 years ago
      > But I’m not concerned about a perceptual hash ruining someone’s life
      I want ZERO computerized algorithms involved in any law enforcement process - especially the "criminal hunting" steps.
- mjlee 3 years ago
  > I'm sure I'm not the only person with naked pictures of my wife.
  I'm not completely convinced that says what you want it to.
  - enedil 3 years ago
    Didn't she possibly have previous partners?
    - websites2023 3 years ago
      Presumably she wasn’t his wife then. But also people have various arrangements so I’m not here to shame.
    - iratewizard 3 years ago
      I don't even have nude photos of my wife. The only person who might would be the NSA contractor assigned to watch her.
  - dwaltrip 3 years ago
    The reasonable interpretation is that GP is saying many people may have private pictures of their partner.
- zxcvbn4038 3 years ago
  Rookie mistake.
  Three rules to live by:
  1) Always pay your taxes
  2) Don’t talk to the police
  3) Don’t take photographs with your clothes off
  - jimmygrapes 3 years ago
    I might amend #2 a bit to read "Be friends with the police" as that has historically been more beneficial to those who are.
    - mattnewton 3 years ago
      Lots of people have believed that they were friends with the police and were actually being manipulated into metaphorically hanging themselves- some of them innocent.
      Counterargument, why you should not talk to the police (In the US): https://youtu.be/d-7o9xYp7eE
    - digi59404 3 years ago
      The point that being friends with police will be beneficial to you - Means there's a scenario where the inverse is also true. Not being friends with police is used to your detriment.
      Police Officers exist in a career field that is riddled with incidents of Tunnel Vision. The sibling comment posts a video about not talking to police from a law professor. I'd heed that advice.
  - slapfrog 3 years ago
    > 2) Don’t talk to the police
    2b) Don't buy phones that talk to the police.
- avnigo 3 years ago
  I would want absolute transparency as to which of my photos have been exposed to the human review process and found to be false positives.
  Somehow I doubt we would ever get such transparency, even though it would be the right thing to do in such a situation.
- planb 3 years ago
  I fully agree with you. But while scrolling to next comment, a question came to my mind: Would it really bother me if some person that does not known my name, has never met me in real life and never will is looking at my pictures without me ever knowing about it? To be honest, I'm not sure if I'd care. Because for all I know, that might be happening right now...
- nine_k 3 years ago
  Buy a subcompact camera. Never upload such photos to any cloud. Use your local NAS / external disk / your Linux laptop's encrypted hard drive.
  Unless you prefer to live dangerously, of course.
  - ohazi 3 years ago
    Consumer NAS boxes like the ones from Synology or QNAP have "we update your box at our whim" cloud software running on them and are effectively subject to the same risks, even if you try to turn off all of the cloud options. I probably wouldn't include a NAS on this list unless you built it yourself.
    It looks like you've updated your comment to clarify Linux laptop's encrypted hard drive, and I agree with your line of thinking. Modern Windows and Mac OS are effectively cloud operating systems where more or less anything can be pushed at you at any time.
    - derefr 3 years ago
      With Synology's DSM, at least, there's no "firmware" per se; it's just a regular Linux install that you have sudo(1) privileges on, so you can just SSH in and modify the OS as you please (e.g. removing/disabling the update service.)
    - cm2187 3 years ago
      At least you can deny the NAS access to the WAN by blocking it on the router or not configuring the right gateway.
    - moogly 3 years ago
      Synology [...] have "we update your box at our whim"
      You can turn off auto-updates on the Synology devices I own at least (1815+, 1817+).
- zimpenfish 3 years ago
  > Do you really want a false positive to result in your intimate moments getting shared around some outsourced boiler room for laughs?
  You'd have to have several positive matches against the specific hashes of CSAM from NCMEC before they'd be flagged up for human review, right? Which presumably lowers the threshold of accidental false positives quite a bit?
- jjtheblunt 3 years ago
  Why would other people have a naked picture of your wife?
  - pdpi 3 years ago
    GP’s wife presumably had a personal life before being in a relationship with GP. It’s just as reasonable that her prior partners have her photos as it is for GP to have them.
  - dwaltrip 3 years ago
    Others have pictures of their wife, not GP's wife.
  - jjtheblunt 3 years ago
    (joke)
  - giantrobot 3 years ago
    She's a comely lass. I can't recommend her pictures enough.
karmakaze 3 years ago
It really all comes down to if Apple has and is willing to maintain the effort of human evaluations prior to taking action on the potentially false positives:
> According to Apple, a low number of positives (false or not) will not trigger an account to be flagged. But again, at these numbers, I believe you will still get too many situations where an account has multiple photos triggered as a false positive. (Apple says that probability is “1 in 1 trillion” but it is unclear how they arrived at such an estimate.) These cases will be manually reviewed.
At scale, even human classification which ought to be clear will fail, accidentally clicking 'not ok' when they saw something they thought was 'ok'. It will be interesting to see what happens then.
- jdavis703 3 years ago
  Then law enforcement, a prosecutor and a jury would get involved. Hopefully law enforcement would be the first and final stage if it was merely the case that a person pressed “ok” by accident.
  - karmakaze 3 years ago
    This is exactly the kind of thing that is to be avoided: premature escalation, tying up resources, increasing costs, and raising the stakes and probability of bad outcomes.
  - gtyras2mrs 3 years ago
    Do you think once you are charged with possessing child porn - will you still have your job, your friends, your family, your life as you know it? Will a court decision - months or years later - restore what you have lost?
rustybolt 3 years ago
> an Apple employee will then look at your (flagged) pictures.
This means that there will be people paid to look at child pornography and probably a lot of private nude pictures as well.
- pkulak 3 years ago
  Apple, with all those Apple == Privacy billboards plastered everywhere, is going to have a full-time staff of people with the job of looking through it's customers' private photos.
  - arvinsim 3 years ago
    Sue them for false marketing.
- hnick 3 years ago
  Yes, private nude pictures of other people's children too, which do not necessarily constitute pornography. It was common when I was young for parents to take pictures of their kids doing things, clothes or not. Some still exist of me I'm sure.
  So far as I know some parents still do this. I bet they'd be thrilled having Apple employees look over these.
- emodendroket 3 years ago
  And what do you think the content moderation teams employed by Facebook, YouTube, et al. do all day?
  - josephcsible 3 years ago
    They look at content that people actively and explicitly chose to share with wider audiences.
    - emodendroket 3 years ago
      While that's a snappy response, it doesn't seem to have much to do with the concern about perverts getting jobs specifically to view child abuse footage, which is what I thought this thread was about.
  - mattnewton 3 years ago
    There's a big difference in the expectation of privacy between what someone posts on "Facebook, Youtube, et al" and what someone takes a picture of but doesn't share.
    - alkonaut 3 years ago
      Couldn’t they always avoid ever flagging pictures taken on the device itself (camera, rather than download) since if those match, it’s always a false positive?
    - spacedcowboy 3 years ago
      Odd, then, that Facebook reported 20.3 million photos to NCMEC last year, and Apple 265, according to the NYT that is.
    - emodendroket 3 years ago
      A fair point but, again, quite aside from the concern being raised about moderators having to view potentially illegal content.
  - mattigames 3 years ago
    Yeah, we obviously needed one more company doing it as well, and I'm sure having more positions in the job market which pretty much could be described as "Get paid to watch pedophilia all day long" will not backfire in any way.
    - emodendroket 3 years ago
      You could say there are harmful effects of these jobs but probably not in the sense you're thinking. https://www.wired.com/2014/10/content-moderation/
  - techbio 3 years ago
    Hopefully, in between the moral sponge work they do, occasionally gaze over a growing history of mugshots, years-left-in-sentence reminders, and death notices for the producers of this content, their enablers, and imitators.
  - 3 years ago
- Spivak 3 years ago
  Yep! I guess this announcement is when everyone is collectively finding out how this has, apparently quietly, worked for years.
  It’s a “killing floor” type job where you’re limited in how long you’re allowed to do it in a lifetime.
- varjag 3 years ago
  There are people who are paid to do that already, just generally not in corporate employment.
- mattigames 3 years ago
  I'm sure thats the dream position for most pedophiles, watching child porn fully legally and being paid for it, plus on the record being someone who helps destroy it; and given that CP will exist for as long as human beings do there will be no shortage no matter how much they help capturing other pedophiles.
siscia 3 years ago
What I am missing from all this story, is what triggered Apple to put in place, or even think about, this system.
It is clearly a no-trivial project, no other company is doing it, and it will be one of the rare case of a company doing something not for shareholders value but for "goodwill".
I am really not understanding the reasoning behind this choice.
- spacedcowboy 3 years ago
  Er, every US company that hosts images in the cloud scans them for CSAM if they have access to the photo, otherwise they’re opening themselves up to a lawsuit.
  US law requires any ESP (electronic service provider) to alert NCMEC if they become aware of CSAM on their servers. Apple used to comply with this by scanning images on the server in iCloud photos, and now they’re moving that to the device if that image is about to be uploaded to iCloud photos.
  FWIW, the NYT says Apple reported 265 cases last year to NCMEC, and say Facebook reported 20.3 million. Google [1] are on for 365,319 for July->Dec.
  I’m still struggling to see what has changed here, apart from people realising what’s been happening..
  - it’s the same algorithm that Apple has been using, comparing NCMEC-provided hashes against photos
  - it’s still only being done on photos that are uploaded to iCloud photos
  - it’s now done on-device rather than on-server, which removes a roadblock to future e2e encryption on the server.
  Seems the only real difference is perception.
  [1] https://transparencyreport.google.com/child-sexual-abuse-mat...
- jeromegv 3 years ago
  One theory is that they are getting ready for E2E encryption of iCloud photos. Apple will have zero access to your photos in the cloud. So the only way to get the authorities to accept this new scheme is that there is this backdoor where there is a check client-side for sexual predator photos. Once your photo pass that check locally, it gets encrypted, sent to the cloud, never to be decrypted by apple.
  Not saying it will happen, but that's a decent theory as of why https://daringfireball.net/2021/08/apple_child_safety_initia...
- MontagFTB 3 years ago
  Legally, I believe, they are responsible for distribution of CSAM that may wind up in their cloud, regardless of who put it there. Many cloud companies are under considerable legal pressure to find and report it.
BiteCode_dev 3 years ago
The problem is not perceptual hashes. The problem is the back door. Let's not focus on the defect of the train leading you to the concentration camp. The problem is that there is a camp at the end of the rail road.
klodolph 3 years ago
> Even at a Hamming Distance threshold of 0, that is, when both hashes are identical, I don’t see how Apple can avoid tons of collisions...
You'd want to look at the particular perceptual hash implementation. There is no reason to expect, without knowing the hash function, that you would end up with tons of collisions at distance 0.
- mirker 3 years ago
  If images have cardinality N and hashes M and N > M, then yes, by pigeonhole principle you will have collisions regardless of hash function, f: N -> M.
  N is usually much bigger than M, since you have the combinatorial pixel explosion. Say images are 8 bit RGB 256x256, then you have 2^(8x256x256x3) bit combinations. If you have a 256-bit hash, then that’s only 2^256. So there is a factor of 2^(8x256x3) difference between N and M if I did my math right, which is a factor I cannot even calculate without numeric overflow.
  - klodolph 3 years ago
    The number of possible different images doesn't matter, it's only the number of actually different images encountered in the world. This number cannot be anywhere near 2^256, that would be physically impossible.
    - mirker 3 years ago
      But you cannot know that a-priori so it’s either an attack vector for image manipulation or straight up false positives.
      Assume we had this perfect hash knowledge. I’d create a compression algorithm to uniquely map between images and the 256 bit hash space, which we probably agree is similarly improbable. It’s on the order of 1000x to 10000x more efficient than JPEG and isn’t even lossy.
drzoltar 3 years ago
The other issue with these hashes is non-robustness to adversarial attacks. Simply rotating the image by a few degrees, or slightly translating/shearing it will move the hash well outside the threshold. The only way to combat this would be to use a face bounding box algorithm to somehow manually realign the image.
- foobarrio 3 years ago
  In my admittedly limited experience in image hashing, typically you extract some basic feature and transform the image before hashing (eg darkest corner in the upper left or look for verticals/horizontals and align). You also take multiple hashes of the images to handle various crops, black and white vs color. This increases robustness a bit but overall yea you can always transform the image in such a way to come up with a different enough hash. One thing that would be hard to catch is if you do something like a swirl and then the consumers of that content will use a plugin or something to "deswirl" the image.
  There's also something like the Scale Invariant Feature Transform that would protect against all affine transformations (scale, rotate, translate, skew).
  I believe one thing that's done is whenever any CP is found, the hashes of all images in the "collection" is added to the DB whether or not they actually contain abuse. So if there are any common transforms of existing images then those also now have their hashes added to the db. The idea being that a high percent of hits from even the benign hashes means the presence of the same "collection".
  - megous 3 years ago
    Huh, or you can just use encryption if you'll be using some SW based transformation anyway.
Waterluvian 3 years ago
I’m rather fascinated by the false matches. Those two images are very different and yet beautifully similar.
I want to see a lot more pairs like this!
starkd 3 years ago
The method Apple is using looks more like a cryptographic hash. That's entirely different (and more secure) than a perceptual hash.
From https://www.apple.com/child-safety/
"Before an image is stored in iCloud Photos, an on-device matching process is performed for that image against the known CSAM hashes. This matching process is powered by a cryptographic technology called private set intersection, which determines if there is a match without revealing the result. The device creates a cryptographic safety voucher that encodes the match result along with additional encrypted data about the image. This voucher is uploaded to iCloud Photos along with the image."
Elsewhere, it does explain the use of neuralhashes which I take to be the perceptual hash part of it.
I did some work on a similar attempt awhile back. I also have a way to store hashes and find similar images. Here's my blog post. I'm currently working on a full site.
http://starkdg.github.io/posts/concise-image-descriptor
- cvwright 3 years ago
  The crypto here is for the private set intersection, not the hash.
  So your device has a list of perceptual (non-cryptographic) hashes of its images. Apple has a list of the hashes of known bad images.
  The protocol lets them learn which of your hashes are in the “bad” set, without you learning any of the other “bad” hashes, and without Apple learning any of the hashes of your other photos.
  - bastawhiz 3 years ago
    Well therein lies the problem: perceptual hashes don't produce an exact result. You need to compare something like the hamming distance (as the article mentions) of each hash to decide if it's a match.
    Is it possible to perform private set intersection where the comparison is inexact? I.e., if you have two cryptographic hashes, private set intersection is well understood. Can you do the same if the hashes are close, but not exactly equal?
    If the answer is yes, that could mean you would be able to derive the perceptual hashes of the CSAM, since you're able to find values close to the original and test how far you can drift from it before there's no longer a match.
    - cvwright 3 years ago
      From what I’ve read, part of the magic here is that Apple’s perceptual hash is an exact hash. Meaning, you don’t have to do the Hamming distance thing.
      Admittedly, I haven’t had a chance to read the original source material yet. It’s possible that the person I heard this from was wrong.
    - aix1 3 years ago
      Would love to learn more about actual algorithms that could be used to do something like this (private set intersection with approximate matching) if they exist.
- dogma1138 3 years ago
  The cryptography is most likely done at a higher level than the perception comparison and is quite likely done to protect the CSAM hashes than your privacy.
  My interpretation of this is that they still use some sort of a perception based matching algorithm they just encrypt the hashes and then use some “zero knowledge proof” when comparing the locally generated hashes against the list, the result of which would be just that X hashes marched but not which X.
  This way there would be no way to reverse engineer the CSAM hash list or bypass the process by altering key regions of the image.
  - visarga 3 years ago
    > the result of which would be just that X hashes marched but not which X
    That means you can't prove an incriminating file was not deleted even if you're the victim of a false positive. So they will suspect you and put you through the whole police investigation routine.
    - dogma1138 3 years ago
      Not necessarily it just means that you don’t know/prove until a certain threshold is reached, in guessing above a specific one that hashes and the photo is then uploaded to Apple for verification and preservation.
- 3 years ago
jiggawatts 3 years ago
The world in the 1900s:
Librarians: "It is unthinkable that we would ever share a patron's borrowing history!"
Post office employees: "Letters are private, only those commie countries open the mail their citizens send!"
Police officers: "A search warrant from a Judge or probable cause is required before we can search a premises or tap a single, specific phone line!"
The census: "Do you agree to share the full details of your record after 99 years have elapsed?"
The world in the 2000s:
FAANGs: "We know everything about you. Where you go. What you buy. What you read. What you say and to whom. What specific type of taboo pornography you prefer. We'll happily share it with used car salesmen and the hucksters that sell WiFi radiation blockers and healing magnets. Also: Cambridge Analytica, the government, foreign governments, and anyone who asks and can pony up the cash, really. Shh now, I have a quarterly earnings report to finish."
Device manufacturers: "We'll rifle through your photos on a weekly basis, just to see if you've got some banned propaganda. Did I say propaganda? I meant child porn, that's harder to argue with. The algorithm is the same though, and just how the Australian government put uncomfortable information leaks onto the banned CP list, so will your government. No, you can't check the list! You'll have to just trust us."
Search engines: "Tiananmen Square is located in Beijing China. Here's a cute tourist photo. No further information available."
Online Maps: "Tibet (China). Soon: Taiwan (China)."
Media distributors: "We'll go into your home, rifle through your albums, and take the ones we've stopped selling. Oh, not physically of course. No-no-no-no, nothing so barbaric! We'll simply remotely instruct your device to delete anything we no longer want you to watch or listen to. Even if you bought it from somewhere else and uploaded it yourself. It matches a hash, you see? It's got to go!"
Governments: "Scan a barcode so that we can keep a record of your every movement, for public health reasons. Sure, Google and Apple developed a secure, privacy-preserving method to track exposures. We prefer to use our method instead. Did we forget to mention the data retention period? Don't worry about that. Just assume... indefinite."
- bcrosby95 3 years ago
  Your view of the 1900s is very idyllic.
asimpletune 3 years ago
“ Even at a Hamming Distance threshold of 0, that is, when both hashes are identical, I don’t see how Apple can avoid tons of collisions, given the large number of pictures taken every year (1.4 trillion in 2021, now break this down by iPhone market share and country, the number for US iPhone users will still be extremely big).”
Is this true? I’d imagine you could generate billions a second without having a collision, although I don’t know much about how these hashes are produced.
It would be cool for an expert to weigh in here.
Wowfunhappy 3 years ago
> At my company, we use “perceptual hashes” to find copies of an image where each copy has been slightly altered.
Kind of off topic, does anyone happen to know of some good software for doing this on a local collection of images? A common sequence of events at my company:
1. We're designing a website for some client. They send us a collection of a zillion photos to pull from. For the page about elephants, we select the perfect elephant photo, which we crop, lightly recolor, compress, and upload.
2. Ten years later, this client sends us a screenshot of the elephant page, and asks if we still have a copy of the original photo.
Obviously, absolutely no one at this point remembers the name of the original photo, and we need to either spend hours searching for it or (depending on our current relationship) nicely explain that we can't help. It would be really great if we could do something like a reverse Google image search, but for a local collection. I know it's possible to license e.g. TinEye, but it's not practical for us as a tiny company. What I really want is an open source solution I can set up myself.
We used Digicam for a while, and there were a couple of times it was useful. However, for whatever reason it seemed to be extremely crash-prone, and it frequently couldn't find things it really should have been able to find.
- xioren00 3 years ago
  https://pypi.org/project/ImageHash/
  - Wowfunhappy 3 years ago
    Thank you!
brian_herman 3 years ago
Fortunately I have a cisco router and enough knowledge to block the 17.0.0.0/8 ip address range. This combined with an openvpn vpn will block all apple services from my devices. So basically my internet will look like this:
Internet <---> CISCO <---> ASUS ROUTER with openvpn <-> Network The cisco router will block the 17.0.0.0/8 ip address range and I will use spotify on all my computers.
- brian_herman 3 years ago
  Disregard comment I don't want to edit it because I am lazy. You can do all of this inside the asus router underneath the routes page just put this inside the asus router: Ip address 17.0.0.0 Subnet 255.0.0.0 Destination 127.0.0.1
  - procinct 3 years ago
    You don't plan to ever use 4G/5G again?
    - brian_herman 3 years ago
      I have openvpn so the block will remain in effect. I don't plan to use apple services ever again but the hard ware is pretty good.
- verygoodname 3 years ago
  And then they switch to using Akamai or AWS IP space (like Microsoft does), so you start blocking those as well?
lancemurdock 3 years ago
I am going to give this lineageOS on an android device a shot. This is one of the most egregious things Apple has ever done
read_if_gay_ 3 years ago
Big tech has been disintegrating the foundational principles on which our society is built in the name of our society. Every one of their moves is a deeper attack on personal freedom than the last. They need to be dealt with. Stop using their services, buying their products, defending them when they silence people.
jbmsf 3 years ago
I am fairly ignorant if this space. Do any of the standard methods use multiple hash functions vs just one?
- heavyset_go 3 years ago
  I've built products that utilize different phash algorithms at once, and it's entirely possible, and quite common, to get false positives across hashing algorithms.
- jdavis703 3 years ago
  Yes, I worked on such a product. Users had several hashing algorithms they could chose from, and the ability to create custom ones if they wanted.
alkonaut 3 years ago
The key here is scale. If the only trigger for action is having (say) a few hundred matching images, or a dozen from the same known set of offending pictures, then I can see how apples “one in a trillion” claim would work.
Also, Apple could ignore images from the device camera - since those will never match.
This is also in stark contrast to the task faced by photo copyright hunters. They don’t have the luxury of only focusing on those who handle tens of thousands of copyrighted photos. They need to find individual violations because that’s what they are paid to do.
altitudinous 3 years ago
This article focusses too much on the individual case, and not enough on the fact that Apple will need multiple matches to report someone. Images would normally be distributed in sets I suspect, so it is going to be easy to detect when someone is holding an offending set because of multiple matches. I don't think Apple are going to be concerned with a single hit. Here in the news offenders are reported as holding many thousands of images.
- trynumber9 3 years ago
  Does it scan files within archives?
  If it does, you could download the wrong zip and instantaneously be over their threshold.
  - altitudinous 3 years ago
    The scanning is to take place within iCloud Photos, which handles images / videos etc on an individual basis. It would be a pretty easy thing to do for Apple to calculate hashes on these. I'm not sure how iOS handles archives, but it doesn't matter - remember it isn't 100% or 0% with these things - say only 50% of those people store images in iCloud Photo, catching out only 50% of those folk is still a good result.
    - trynumber9 3 years ago
      Yeah, I'm not sure. Just is a bit worrying to me. On my device iCloud Drive synchronizes anything in my downloads folder. If images contained within zips are treated as individual images, then I'm always just one wrong click from triggering their threshold.
JacobiX 3 years ago
Given that Apple technology uses NN and triplet embedding loss, the exact same techniques used by neural networks for face recognition, so maybe the same shortcomings would apply here. For example a team of researchers found a 'Master Faces' that can bypass over 40% of Facial ID. Now suppose that you have such an image in your photo library, it would generate so many false positives …
SavantIdiot 3 years ago
This article covers three methods, all of which just look for alterations of a source image to find a fast match (in fact, that's the paper referenced). It is still a "squint to see if it is similar" test. I was under the impression there were more sophisticated methods that looked for types of images, not just altered known images. Am I misunderstanding?
- chipotle_coyote 3 years ago
  Apple's proposed system compares against a database of known images. I can't think of a way to "look for types of images" other than trying to do it with machine learning, which strikes me as fraught with incredible fiasco potential. (The compare-to-a-known-database approach has its own issues, including the ones the article talks about, of course.)
  - SavantIdiot 3 years ago
    Ok, that's what it is seeming like. Since a crypto hash by definition has to generate a huge hamming distance for a small change, everything i've read about perceptual hashes is just the opposite: they should be tolerant enough of a certain amount of difference.
chucklenorris 3 years ago
So, if there's code on the device that's computing these hashes then it can be extracted. Afterwards it should be possible to add changes to a inocent picture to make it produce a target hash. Getting a hash should pe possible too, just find a known pedo image and run the extracted algorithm. It's only a matter of time until someone makes this
cratermoon 3 years ago
If I'm reading this right? Apple is saying they are going to flag CSAM they find on their servers. This article talks about finding a match for photos by comparing a hash of a photo you're testing with a hash you have, from a photo you have.
Does this mean Apple had/has CSAM available to generate the hashes?
- aix1 3 years ago
  For the purposes of this they only have the hashes, which they receive from third parties.
  > on-device matching using a database of known CSAM image hashes provided by NCMEC and other child safety organizations
  https://www.apple.com/child-safety/
  (Now, I do wonder how secure those third parties are.)
- 3 years ago
ngneer 3 years ago
What is the ratio of consumers of child pornography to the population of iPhone users? In order of magnitude, is it 1%, 0.1%, 0.001%, 0.0001%? With all the press around the announcement, this is not exactly stealth technology. Wouldn't such consumers switch platforms, rendering the system pointless?
- aix1 3 years ago
  It's clearly a marketing exercise aimed to sell products to parents and other concerned citizens. It doesn't actually need to be effective to achieve this goal. (I am not saying whether it will or won't be, just that it doesn't need to be.)
3 years ago
ris 3 years ago
I agree with the article in general except part of the final conclusion
> The simple fact that image data is reduced to a small number of bits leads to collisions and therefore false positives
Our experience with regular hashes suggests this is not the underlying problem. SHA256 hashes have 256 bits and still there are no known collisions, even with people deliberately trying to find them. SHA-1 only has only 160 bits to play with and it's still hard enough to find collisions. MD5 is easier to find collisions but at 128 bits, still people don't come across them by chance.
I think the actual issue is that perceptual hashes tend to be used with this "nearest neighbour" comparison scheme which is clearly needed to compensate for the inexactness of the whole problem.
- dogma1138 3 years ago
  This isn’t due to the entropy of the hash but due to the entropy of the source data.
  These algos work by limiting the color space of the photo, usually to only black and white (not even grey scale) resizing it to a fraction of its original size and then chopping it into tiles using a fixed size grid.
  This increases the chances of collisions greatly because photos with a similar composition are likely to match on a sufficient number of tiles to flag the photo as a match.
  This is why the women image was matched to the butterfly image, if you turn the image to B&W resize it to something like 256x256 pixels and divide it into a grid of say 16 tiles all of a sudden a lot of these tiles can match.
- giantrobot 3 years ago
  Perceptual hashes don't involve diffusion and confusion steps like cryptographic hashes. Perceptual hashes don't want decorrelation like cryptographic hashes. In fact they want similar but not identical images to end up with similar hash values.
btheshoe 3 years ago
I'm not insane in thinking this stuff has to be super vulnerable to adversarial attacks, right? And it's not like adversarial attacks are a solved problem or anything.
- mkl 3 years ago
  Wouldn't you need a way to determine if an image you generate has a match in Apple's database?
  The way it's set up, that's not possible: "Given a user image, the general idea in PSI is to apply the same set of transformations on the image NeuralHash as in the database setup above and do a simple lookup against the blinded known CSAM database. However, the blinding step using the server-side secret is not possible on device because it is unknown to the device. The goal is to run the final step on the server and finish the process on server. This ensures the device doesn’t know the result of the match, but it can encode the result of the on-device match process before uploading to the server." -- https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni... (emphasis mine)
  - btheshoe 3 years ago
    I was thinking something along the lines of applying small transformations to all images before uploading, or even just images that are known to be problematic. Seems like something that people who traffic cp would be willing to do
- aix1 3 years ago
  Yes, I agree that this is a significant risk.
chucklenorris 3 years ago
This technology is a godsend for the government to catch wistleblowers before they're able to leak information. You wouldn't even hear about those poor souls.
3 years ago
lliamander 3 years ago
What about genuine duplicate photos? Say there is a stock picture of a landscape, and someone else goes and takes their own picture of the same landscape?
kazinator 3 years ago
Perceptual hashing was invented by the Chinese: four-corner code character lookup, that lumps together characters with similar features.
legulere 3 years ago
Which photos does Apple scan? Also of emails and messages? Could you swat somebody by sending them benign images that have the same hash?
madmax96 3 years ago
Why not make it so that I can see flagged images in my library? It would give me a lot more confidence that my photos stay private.
acidioxide 3 years ago
It's really disturbing that, in case of doubt, real person would check photos. That's a red flag.
bastawhiz 3 years ago
Correct me if I'm wrong, but nowhere in Apple's announcement do they mention "perceptual" hashing. I've searched through some of the PDFs they link as well, but those also don't seem to mention the word "perceptual". Can someone point out exactly where this is mentioned?
- rcarback 3 years ago
  "NeuralHash is a perceptual hashing function"
  https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...
ChrisMarshallNY 3 years ago
That’s a really useful explanation.
Thanks!
marcinzm 3 years ago
> an Apple employee will then look at your (flagged) pictures.
Always fun when unknown strangers get to look at your potentially sensitive photos with probably no notice given to you.
- judge2020 3 years ago
  They already do this for photodna-matched iCloud Photos (and Google Photos, Flickr, Imgur, etc), perceptual hashes do not change that.
  - version_five 3 years ago
    I'm not familiar with iPhone picture storage. Are the pictures automatically sync'ed with cloud storage? I would assume (even if I don't like it) that cloud providers may be scanning my data. But I would not expect anyone to be able to see or scan what is stored on my phone.
    Incidentally, I work in computer vision and handle proprietary images. I would be violating client agreements if I let anyone else have access to them. This is a concern I've had in the past e.g. with Office365 (the gold standard in disregarding privacy) that defaults to sending pictures in word documents to Microsoft servers for captioning, etc. I use a Mac now for work, but if somehow this snooping applies to computers as well I can't keep doing so while respecting the privacy of my clients.
    I echo the comment on another post, Apple is an entertainment company, I don't know why we all started using their products for business applications.
    - abawany 3 years ago
      By default it is enabled. One has to go through Settings to turn off the default iCloud upload, afaik.
    - Asdrubalini 3 years ago
      You can disable automatic backups, this way your photos won’t ever be uploaded to iCloud.
ajklsdhfniuwehf 3 years ago
whatsapp and other apps place pictures from groups chats in folders deep in your IOS gallery.
Swatting will be a problem all over again.... wait, did it ever stop being a problem?
lordnacho 3 years ago
Why wouldn't the algo check that one image has a face while the other doesn't? That would remove this particular false positive, though I'm not sure what it might cause of new ones.
- PUSH_AX 3 years ago
  Because where do you draw the line with classifying arbitrary features in the images? The concept is it should work with an image of anything.
ivalm 3 years ago
I am not exactly buying the premise here, if you train a CNN on useful semantic categories then the representations they generate will be semantically meaningful (so the error shown in blog wouldn’t occur).
I dislike the general idea of iCloud having back doors but I don’t think the criticism in this blog is entirely valid.
Edit: it was pointed out apple doesn’t have semantically meaningful classifier so the blog post’s criticism is valid.
- SpicyLemonZest 3 years ago
  Apple's description of the training process (https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...) sounds like they're just training it to recognize some representative perturbations, not useful semantic categories.
  - ivalm 3 years ago
    Ok, good point, thanks.
- jeffbee 3 years ago
  I agree the article is a straw-man argument and is not addressing the system that Apple actually describes.
IfOnlyYouKnew 3 years ago
Apple’s documents said they require multiple hits before anything happens, as the article notes. They can (and have) adjusted that number to any desired balance of false positive to negatives.
How can they say it’s 1 in a trillion? You test the algorithm on a bunch of random negatives, see how many positives you get, and do one division and one multiplication. This isn’t rocket science.
So, while there are many arguments against this program, this isn’t it. It’s also somewhat strange to believe the idea of collisions in hashes of far smaller size than the images they are run on somehow escaped Apple and/or really anyone mildly competent.
- fogof 3 years ago
  I was unhappy to find this comment so far down and even unhappier to see it downvoted. I'm not a fan of the decrease in privacy Apple is creating with this move but I think this forum has gotten its feelings for Apple caught up with its response to a completely valid criticism of an anti-Apple article.
  To explain things even further, let's say that the perceptual algorithm makes a false positive 1% of the time. That is, 1 in every 100 completely normal pictures are incorrectly matched with some picture in the child pornography database. There's no reason to think (at least none springs to mind, happy to hear suggestions) that a false positive in one image will make it any more likely to see a false positive in another image. Thus, if you have a phone with 1000 pictures on it, and it takes 40 trigger a match, there's less than a 1 in a trillion probability that this would happen if the pictures are all normal.
  - IfOnlyYouKnew 3 years ago
    At this point, the COVID vaccines seem to barely have majority support on HN, and “cancel culture” would win any survey on our times’ top problems, beating “women inventing stories of rape’ and “the black guy mentioning something borderline political at work, just because he’s paid 5/8th as much as others”.
    An inability to follow even the most elementary argument from statistics isn’t really surprising. Although I can’t quite say if it’s actual inability, or follows from the fact that it supports the wrong outcome.
- bt1a 3 years ago
  That would not be a good way to arrive at an accurate estimate. Would you not need dozens of trillions of photos to begin with in order to get an accurate estimate when the occurrence rate is so small?
  - KarlKemp 3 years ago
    What? No...
    Or, more accurately: if you need "dozens of trillions" that implies a false positive rate so low, it's practically of no concern.
    You'd want to look up the poisson distribution for this. But, to get at this intuitively: say you have a bunch of eggs, some of which may be spoiled. How many would you have to crack open, to get a meaningful idea of how many are still fine, and how many are not?
    The absolute number depends on the fraction that are off. But independent of that, you'd usually start trusting your sample when you've seen 5 to 10 spoiled ones.
    So Apple runs the hash algorithm on random photos. They find 20 false positives in the first ten million. Given that error rate, how many positives would it require for the average photo collection of 10,000 to be certain at at 1:a trillion level that it's not just coincidence?
    Throw it into, for example, https://keisan.casio.com/exec/system/1180573179 with lambda = 0.2 (you're expecting one false positive for every 50,000 at the error rate we assumed, or 0.2 for 10,000), and n = 10 (we've found 10 positives in this photo library) to see the chances of that, 2.35x10^-14, or 2.35 / 100 trillion.
ttul 3 years ago
Apple would not be so naive as to roll out a solution to child abuse images that has a high false positive rate. They do test things prior to release…
- smlss_sftwr 3 years ago
  ah yes, from the same company that shipped this: https://medium.com/hackernoon/new-macos-high-sierra-vulnerab...
  and this: https://www.theverge.com/2017/11/6/16611756/ios-11-bug-lette...
- celeritascelery 3 years ago
  Test it… how exactly? This is detecting illegal material that they can’t use to test against.
  - zimpenfish 3 years ago
    > This is detecting illegal material that they can’t use to test against.
    But they can because they're matching the hashes to the ones provided by NCMEC, not directly against CSAM itself (which presumably stays under some kind of lock and key at NCMEC.)
    Same as you can test whether you get false positives against a bunch of MD5 hashes that Fred provides without knowing the contents of his documents.
  - bryanrasmussen 3 years ago
    Not knowing anything about it but I suppose various governmental agencies maintain corpora of nasty stuff and that you can say to them - hey we want to roll out anti-nasty stuff functionality in our service therefore we need access to corpora to test at which point there is probably a pretty involved process that requires governmental access also to make sure things work and are not misused otherwise -
    how does anyone ever actually fight the nasty stuff? This problem structure of how do I catch examples of A if examples of A are illegal must apply in many places and ways.
    - vineyardmike 3 years ago
      Test it against innocent data sets, then in prod swap it for the opaque gov db of nasty stuff and hope the gov was honest about what is in it :)
      They don't need to train a model to detect the actual data set. They need to train a model to follow a pre-defined algo
  - ben_w 3 years ago
    While I don’t have any inside knowledge at all, I would expect a company as big as Apple to be able to ask law enforcement to run Apple’s algorithm on data sets Apple themselves don’t have access to and report the result.
    No idea if they did (or will), but I do expect it’s possible.
    - zimpenfish 3 years ago
      > ask law enforcement to run Apple’s algorithm on data sets Apple themselves don’t have access to
      Sounds like that's what they did since they say they're matching against hashes provided by NCMEC generated from their 200k CSAM corpus.
      [edit: Ah, in the PDF someone else linked, "First, Apple receives the NeuralHashes corresponding to known CSAM from the above child-safety organizations."]
  - IfOnlyYouKnew 3 years ago
    They want to avoid false powitives, so you would test for that by running it over innocuous photos, anyway.
- bjt 3 years ago
  I'm guessing you don't remember all the errors in the initial launch of Apple Maps.