Want to ensure financial documents cant be parsed by automated systems

  • cannedtuna@lemmy.world
    link
    fedilink
    arrow-up
    2
    arrow-down
    3
    ·
    10 days ago

    OCR cannot scan documents that have been certified or digitally signed.

    Note that once you certify a document it can no longer be edited, combined with another PDF, or have pages inserted or extracted.

    Once a PDF has been digitally signed it is locked and you can no longer add pages, delete pages, or read it via OCR.

    • MystikIncarnate@lemmy.ca
      link
      fedilink
      English
      arrow-up
      4
      ·
      10 days ago

      This works, right up until you introduce PDF compatible software that doesn’t give a shit about your rules, of which there’s plenty.

      You can also print/scan, or even print to PDF to get around such limitations. The original document cannot be altered since that would invalidate the digital signature on the file, but you can create a perfect digital copy, omitting the signature, and modify it however you want.

      If online systems that are skimming documents for their contents don’t give a shit about what the signature is, and simply take a copy and OCR it to train an AI or amalgamate the information for data harvesting or other purposes.

      I get what you’re saying and in concept, it should be fine, the problem is that it’s a software lock/restriction on a file type that isn’t inherently closed source, unknown, nor was the PDF format built to be secure from the ground up. So we’re applying security to a system that wasn’t built for it.