Large language models (LLMs) like GPT-4 can identify a person’s age, location, gender and income with up to 85 per cent accuracy simply by analysing their posts on social media.

But the AIs also picked up on subtler cues, like location-specific slang, and could estimate a salary range from a user’s profession and location.

Reference:

arXiv DOI: 10.48550/arXiv.2310.07298

  • KptnAutismus
    link
    fedilink
    English
    588 months ago

    not really hard, because people will post anything on the internet. including me.

    • @AbouBenAdhem@lemmy.world
      link
      fedilink
      English
      23
      edit-2
      8 months ago

      It sounds like the reason they used reddit was so they could easily find users who had expressly revealed the information in question, and use it to verify that the AI was accurately deducing the same info from style alone.

      • @imgprojts@lemmy.ml
        link
        fedilink
        English
        28 months ago

        They used reddit because it has corraled dumb users. Users a no longer around anywhere else in the Internet, just here on social media. And yes, what better place to find dumb users than on reddit!

    • 👍Maximum Derek👍
      link
      fedilink
      English
      98 months ago

      Yeah, even if I didn’t belong to a local community and a bunch of communities surrounding my profession, the amount of intrigue and fascination emanating from my comments would cause anyone to guess that I’m the Dos Equis guy.

    • @chatokun@lemmy.dbzer0.com
      link
      fedilink
      English
      28 months ago

      Same. I’m sure I’ve posted about my location, my job, my race, my history, my real first name, general details of my family makeup etc. I also have a pretty unique name so searching just my first and last name will find stuff about me anyway. I’m even listed by name in books (I was young and dumb and answered some questions about work life).

    • P03 Locke
      link
      fedilink
      English
      28 months ago

      People will just post their real name on Facebook. It’s crazy!

  • Kalash
    link
    fedilink
    English
    448 months ago

    You can also do that without AI. We’ve had metadata analysis for a while now.

      • lemmyvore
        link
        fedilink
        English
        118 months ago

        I think it’s overall a good thing if it helps laymen understand just how much privacy matters and how much can be gleaned from seemingly innocuous data online. If an “AI” label makes it hit home, cool. As long as they get it.

    • @pc486@reddthat.com
      link
      fedilink
      English
      108 months ago

      As is typical, this science reporting isn’t great. It’s not only that AI can do it effectively, but that it can do it at scale. To quote the paper:

      “Despite these models achieving near-expert human performance, they come at a fraction of the cost, requiring 100× less financial and 240× lower time investment than human labelers—making such privacy violations at scale possible for the first time.”

      They also demonstrate how interacting with an AI model can quickly extract more private info without looking like it is. A game of 20 questions, except you don’t realize you’re playing.

    • @phx@lemmy.ca
      link
      fedilink
      English
      58 months ago

      Yup, and plenty of people have no issues posting about local events or joining region/city specific groups, so it’s not exactly hard to put two and two together.

      I don’t have much issue posting about the city I grew up in or former jobs, but generally work at being fairly vague about anything current

    • @helenslunch@feddit.nl
      link
      fedilink
      English
      28 months ago

      Well the difference is that AI can process billions of accounts, assign those profiles to them, and use them to serve ads appropriately.

      • Kalash
        link
        fedilink
        English
        7
        edit-2
        8 months ago

        That’s what facebook/google have been doing for years without AI.

  • @SatanicNotMessianic@lemmy.ml
    link
    fedilink
    English
    278 months ago

    Okay, I think I must absolutely be misreading this. They started with 1500 potential accounts, then picked 500 that, by hand, they could make guesses about based on people doing things like actually posting where they live or how much they make.

    And then they’re claiming their LLMs have 85% accuracy based on that subset of data? There has to be more than this. Were they 85% on the full 1500? How did they confirm that? Was it just on the 500? Then what’s the point?

    There was a study on Facebook that showed that they could predict with between 80-95% accuracy (or some crazy number like that) your gender, orientation, politics, and so on just based on your public likes. That was ten years ago at least. What is this even showing?

    • @cucumber_sandwich@lemmy.world
      link
      fedilink
      English
      68 months ago

      There was a study on Facebook that showed that they could predict with between 80-95% accuracy (or some crazy number like that) your gender, orientation, politics, and so on just based on your public likes. That was ten years ago at least. What is this even showing?

      Advocates diabolo: that a large language model can do it without extra training, I guess. The Facebook study presented a statistical model on “like space” while this study relies on text alone, a much less structured type of input.

      I’m not saying it’s a good study. Just pointing out some differences.

    • P03 Locke
      link
      fedilink
      English
      28 months ago

      SnoopSnoo was able to pick out phrases from Reddit posters based on declarative statements they made in their posts, and that site has been down for years.

  • aviationeast
    link
    fedilink
    English
    248 months ago

    I’m just gonna put it out there that I live in the state of Georgia, I work for a office supply company as acoordinator making $153,000 a year working 30 hours a week.

  • @Infynis@midwest.social
    link
    fedilink
    English
    128 months ago

    My city’s subreddit did a thread a while back asking people what they were making in the area for what jobs, to try to crowd source salary transparency. So this is not very impressive lol

  • @jiberish@lemmy.world
    link
    fedilink
    English
    48 months ago

    Anyone can guess anything! Give it a try!

    I can guestimate the number of turkeys it would take to fill any given space. It’s my superpower.

  • Rentlar
    link
    fedilink
    English
    48 months ago

    Well, if you look at the subreddits where a Redditor posts and there’s a lot of r/Seattle or Washington State then it’s not that hard to deduce.

    Although I try to leave a mild aura of mystery around my personal life, it wouldn’t be hard to snoop around a bit to find details here and there about me.