Hey everyone,

This isn’t an announcement, just wanted peoples thoughts on this.

I think everyone knows searching the fediverse can be better. Googling doesn’t work too well, etc. So I wanted to do my part and help out.

Indexing all posts, etc is quite a lot to handle, so I wanted to start small and just focus on video search. I’ve started indexing videos from Peertube and other video websites. (Even YouTube but this could be removed to just focus on independent sites)

I know Peertube has their own search engine for videos. I will be reaching out to them. Compared to my site I’m planning it’ll have other video sources and be easier to use.

So that leads to feedback from you guys.

  • What do you think about indexing videos posted on the fediverse and other independent platforms?
  • Are there similar services?
  • Am I just wasting my time?
    • MHLoppy
      link
      fedilink
      116 months ago

      It’s worth noting that since FedSearch, Mastodon has actually natively implemented opt-in search on posts.

    • @lautan@lemmy.caOP
      link
      fedilink
      English
      96 months ago

      That’s a good point. But those people can be banned? I guess Reddit handles this by moderation and archiving old posts.

      • JustEnoughDucks
        link
        fedilink
        English
        46 months ago

        Yes, but moderation teams on the fediverse are very small, and by nature of it, can make hundreds of account of different servers all trailing that would need to be individually sought out and banned.

        It is a game of cat & 100 mice

      • gabe [he/him]A
        link
        English
        16 months ago

        People will take the harassment off site especially if they are dedicated enough or use it to scrape for potential personal info to publicly release.

        • deweydecibel
          link
          fedilink
          English
          116 months ago

          How is that different from Reddit? If trolls want to search and scrape and find information on people, they’re going to. You can’t put your information on the open Internet and not appreciate there’s always a danger of that.

          • TimLovesTech (AuDHD)(he/him)
            link
            fedilink
            English
            -26 months ago

            There is more effort barrier if the trolls have to do all the scraping and sorting themselves than just popping a term that is a right wing lightning rod into search and getting a list of targets.

    • @ggsu7@futurology.today
      link
      fedilink
      English
      -116 months ago

      >muh trolls

      God shut up please. Why do you have to ruin something amazing like searching the entire fediverse with a meaningless arguments about muh trolls.

      • gabe [he/him]A
        link
        English
        3
        edit-2
        6 months ago

        But they are correct. There are vulnerable groups of people who have a harassment risk against them. We share the fediverse with others, be mindful of that. Making a search engine or an archiver for lemmy is such a good idea with how it functions! but for the wider fediverse… that’s just directly contradictory to its culture unless it can be opted in by instance and users

        • @rglullis@communick.news
          link
          fedilink
          English
          106 months ago

          There are vulnerable groups of people who have a harassment risk against them.

          People that are at risk for what they write on the public internet should be protected and empowered by having better privacy tools, not by pretending that they can have a “safe space” on the public internet.

          There is no such thing as privacy on the internet. The Fediverse makes it seem that it mitigates the surveillance problem by spreading the information around and not having it under the control of one single large entity, but the truth is that the Fediverse makes it actually easier for dedicated malicious actors to collect data and reach their targets.

          • @smeg@feddit.uk
            cake
            link
            fedilink
            English
            36 months ago

            Exactly. If you’re worried about someone searching for your post then you should not be posting it online (or at least ensuing you use an account that’s anonymous enough that it can’t be associated with you). If you want private chats then set up a group in matrix or signal!

        • @ggsu7@futurology.today
          link
          fedilink
          English
          -156 months ago

          Unpopular opinion: overly sensitive people should not be allowed to use the internet. Why should everything revolve around their insecurities? Grow a thicker skin or stop using the internet.

          • gabe [he/him]A
            link
            English
            8
            edit-2
            6 months ago

            There are instances on the fediverse that have harassed people to their deaths before. So…

          • @kaffiene@lemmy.world
            link
            fedilink
            English
            66 months ago

            If people have set up communities that work with standards that you don’t like, you don’t have to be part of it. Equally you don’t get to dictate standards for them.

            • @Womble@lemmy.world
              link
              fedilink
              English
              16 months ago

              But equally equally, if they set up their own communities in public but just an obscure location, they shouldn’t complain that their public posts are public. Security by obscurity is no security. Frankly its the worst of all worlds to have a place like that as it encourages feeling safe while having the possibility of having the rug pulled out from under you at any moment.

          • @spiderman@ani.social
            link
            fedilink
            English
            36 months ago

            i don’t support this statement but if there is a safe space on some part of fediverse that is dedicated for people who suffer from insecurities or other problems, why can’t it be indexed so that more people can find a safe space for them too?

            most of the fediverse platforms have good moderation tools that can even ban a whole instance, so why are we still trying to gatekeep the safe spaces?

            • TimLovesTech (AuDHD)(he/him)
              link
              fedilink
              English
              26 months ago

              And once you have this index of a vulnerable group of people how do you gatekeep that from the trolls? And moderation tools only work on this platform, the problem is the trolls that take that info to everything these people interact with and make it a game to give them no peace.

      • @spiderman@ani.social
        link
        fedilink
        English
        1
        edit-2
        6 months ago

        yes, anything you post on internet can be indexed. if someone wants to post some thing on their little private garden they are options for that too. fediverse has potential to grow and if we try to stop everything that could help to grow it as “no only trolls will use it”, after some point no body except people who complaint won’t use it. do you want fediverse to be your own little echo chamber?

  • gabe [he/him]A
    link
    English
    26
    edit-2
    6 months ago

    Well, please make sure it respects post privacy at least but also realize that on the microblogging side of the fediverse, they may not take kindly to this prospect at all. People who start these kinds of projects are often harassed or at least receive passive hostility. Making it opt in instead out of opt out in some capacity is best.

    • Scrubbles
      link
      fedilink
      English
      39
      edit-2
      6 months ago

      I disagree. Post privacy sure, but the internet is by definition public. Anything you put out there can be used for pretty much everything, the original rules of the internet apply. I’d be happy to see an easy opt out on the engine to remove yourself, but if everything is opt in it’ll never get off the ground.

      • gabe [he/him]A
        link
        English
        86 months ago

        That’s not how the fediverse functions and approaching it that way is a problem waiting to happen. I’m stating so as a warning to be mindful of the culture of the way the fediverse itself functions. This is not Reddit, we share the fediverse with other software with different uses and features and we need to be mindful of that especially when building these kinds of tools. Making it opt out not only places a burden on smaller instances but presents a potential harassment risk for instances with vulnerable people on other fediverse platforms. As well, it is contrary to the entire way specific other activitypub instances operate. The fediverse is like a city we share with others, if Lemmy is not mindful of that city’s culture then people will promptly give them the boot.

        I’m not saying user by user opt in either, but instance by instance. Lemmy needs a tool of archiving especially. There is already cultural clashes I see occurring with the rest of the fediverse. Post like these of potential tools when it seems like the creator doesn’t know the messy history behind previous projects like them in the fediverse make me fearful of the clashes coming to fruition.

        • @lautan@lemmy.caOP
          link
          fedilink
          English
          136 months ago

          Well that’s why I’m asking for input. And I won’t launch this on every instance without letting them know. Baby steps.

          • gabe [he/him]A
            link
            English
            2
            edit-2
            6 months ago

            My matrix is open if you want/are actually interested in doing this in a way that won’t make the rest of the fediverse flip shit. I support this tools creation especially for lemmy, but if it isn’t done the right way it’ll be received poorly. Making it behave differently on lemmy compared to other software as well might be an idea too.

        • Scrubbles
          link
          fedilink
          English
          6
          edit-2
          6 months ago

          But ActivityPub already publishes all of the data out. I don’t think this is going out to servers asking for data, it’s listening to public data being broadcasted out. If people are broadcasting over activitypub then they’re okay with it being shared.

          If they don’t want it shared then they don’t have to publish ActivityPub to anyone. They can defederate from the search federation. Those tools already exist.

        • Scrubbles
          link
          fedilink
          English
          26 months ago

          again it’s not going to servers and scraping data, it would be sitting somewhere receiving public data that is pushed out. There’s no malicious getting around privacy settings, if it’s pushed out then it’s free game. I agree about post privacy, but again activitypub already takes care of that

      • TimLovesTech (AuDHD)(he/him)
        link
        fedilink
        English
        36 months ago

        As the fediverse is almost exclusively run by volunteers that are paying server bills and being admins, I could see some larger instances not taking kindly to this, especially depending on how much stress it would be putting on some already at capacity servers.

          • TimLovesTech (AuDHD)(he/him)
            link
            fedilink
            English
            26 months ago

            I was thinking more in terms of resources (number of spider threads X posts/communities/users being indexed) that would be now dedicated to a bot, not so much network traffic that is probably tiny if not downloading images.

            • @TrickDacy@lemmy.world
              link
              fedilink
              English
              16 months ago

              Right, it would be an initial hit but if the bot was properly built it wouldn’t need to do full reindexing very often. I’m no expert but I think it could be done in a way that there is no noticeable spike in traffic or anything

              • TimLovesTech (AuDHD)(he/him)
                link
                fedilink
                English
                16 months ago

                That’s the thing, it would need to be done in chunks and have its revisits scheduled if you want to do a complete indexing of an instance. And for a large instance that’s a lot of DB thrashing if you aren’t spacing that out, or just sampling like “top 10 posts” or something, but that kind of data is going to make a useless search engine depending on the goal of the search engine. If you wanted to just catalog the daily top posts of the fediverse that might work, but if you want to catalog everything it’s going to take a lot of resources and a long time to make sure you’re not hammering people’s servers.

  • @Aopen@discuss.tchncs.de
    link
    fedilink
    English
    186 months ago

    Why wouldnt people want do have search engine? Without it Fediverse stands no chance against non-free internet. Everything posted here would be much more valuable if it was searchable. Now comment posted once is viewed only until post gets less popular. Any other site of this kind displays answers decades old. Privacy isnt issue as everything posted here is available to everyone on internet.

    • @lautan@lemmy.caOP
      link
      fedilink
      English
      46 months ago

      I mean they are posting on the public internet, they should know that it can be read by anyone. I like the idea of users opting out.

      • @KISSmyOS@lemmy.world
        link
        fedilink
        English
        126 months ago

        If you give users the choice to opt out, all the privacy-focussing communities won’t be searchable.
        What if someone who opts out posts a comment and someone who opts in answers?
        The Fediverse is public, a search engine doesn’t show anything that isn’t already open for anyone to see.

        • @lautan@lemmy.caOP
          link
          fedilink
          English
          86 months ago

          Good point. They should know they are making public comments. If you want it private then send a private message.

  • @stockRot@lemmy.world
    link
    fedilink
    English
    106 months ago

    You should federate the search engine so that folks can defed from the search as desired.

    But then we would need a search engine for all the search engines…

    • @KISSmyOS@lemmy.world
      link
      fedilink
      English
      17
      edit-2
      6 months ago

      Everything you post on the Fediverse is public.
      If you don’t want to show up in an internet search, post your stuff on your private server and only give access to the people you want to invite.

    • @lautan@lemmy.caOP
      link
      fedilink
      English
      56 months ago

      It’s limited to only Peertube and it’s not the most intuitive. I want to work with them on expanding this.

  • @Valmond@lemmy.mindoki.com
    link
    fedilink
    English
    86 months ago

    I love the idea, especially from a technical standpoint!

    How big is the fediverse today? How many posts are there? What kind of algorithms atmre you using to store the results? Do you scan sites and then their connected sites or do you have a premade list?

    More technical information please 😊!

    • @lautan@lemmy.caOP
      link
      fedilink
      English
      36 months ago

      The fediverse is a few thousand servers, from Mastodon, Lemmy, etc. Can’t say the amount of posts but there are a lot.

      So on the more technical side, I plan on using a light weight fast search engine called Sonic (It’s written in rust). I have already used it in other projects and it can handle billions of messages / posts. But it has a cost it doesn’t have faceted search, like for example if you want to exclude certain texts from the results. I think this is a fair trade off. The other solution would be to use something more mature like ElasticSearch but it’ll be expensive (I’m assuming not much money will be made from this and I’m talking about donations)

      For scanning sites there are premade lists to start with and it’ll be possible to scan new sites from other instances if found. So a bit of both.

  • @Sensitivezombie@lemmy.zip
    link
    fedilink
    English
    46 months ago

    I support bigger picture. Rather than an independent site, wouldn’t it be more practical to work with current fediverse app developers for lemmy, mastodon, etc to integrate search engine within the app?

    • @lautan@lemmy.caOP
      link
      fedilink
      English
      26 months ago

      I’m reaching out to see their thoughts. But there are limitations to what they can index.

  • @JoBo@feddit.uk
    link
    fedilink
    English
    16 months ago

    I don’t know anything about the technical side of this. But I would (possibly naively) think that it would be simpler to have a filter that you could automatically apply to sift bog-standard search engine results for Fediverse instances? Like adding “site:uk” to the end of a normal search, except that your filter term would check a list of Fediverse instances to return the relevant results.

    And make it an app/add-on so that people can use it with their usual search strategies.

    • @lautan@lemmy.caOP
      link
      fedilink
      English
      16 months ago

      People been doing that now but it comes with limitations. Thanks for the input.

  • Cyber Yuki
    link
    fedilink
    English
    -56 months ago

    Just, for the love of god, make sure to make it opt-in. Don’t scan people’s posts without their consent.

    Also, if you’re going with this, make sure to respect people’s requests for removal, e.g. deleting their posts from the engine when they delete them from their instances. Otherwise you’d get in trouble with the EU regarding GDPR…