Is anybody aware of any self hosted alternatives to Parrot.ai or Otter.ai? I’ve tried these services and I’m finding them very useful, but the price tag is a little steep. It seems like something that the open source community could solve. Anybody know of any projects, either existing or upcoming? Thanks!
I didn’t know about those services either! What are the “montly recording hours”?
P.s. Since you weren’t rude, why didn’t the people who downvoted you just write their thinking? I’ll never understand those people!!! Aren’t we here to share?
The free version of Otter.ai limits you to 30 minutes per conversation, 300 total monthly transcription minutes. “Pro” moves you up to 90 minutes per conversation with 1200 total minutes for $8.33/month (billed annually). “Business” is $20/month with 4 hours per conversation and 6000 total minutes. They have an “Enterprise” version, but it is one of those “call for a quote” things.
The Pro is somewhat reasonably prices, but the 90 minutes per meeting limit is a wall I would bounce up against pretty often. Hard to justify the $20/month for me when a couple years of service is about the same price as the GPU I’ve been wanting anyway. Plus, the GPU would be a business expense now, right? :)
Didn’t know something like this existed at all! (so no idea of there are alternatives)
I’m super curious about your experiences and use case though! Care to share some insights?
Sure!
For work I attend a lot of meetings, both in person and online. The service takes a recording of the meeting/phone call/etc, transcribes it, identifies the people who were talking and then feeds it into a “ChatGPT” style AI. It then gives meeting notes automatically and lists action items assigned to each attendee along with other pertinent information, like due dates. You can also continue to “chat” with the AI regarding anything to do with the meeting. I often will asked it to expound on various topics, write emails to participants following up on items, give me pertinent information that was shared like emails, phone numbers, etc. You are also able to go back and listen to the meeting along with the transcription. If it was a video meeting, it records the video so you can see what was being presented at the same time. (I think there’s some opportunity for OCRing power point slides too, but these services aren’t doing that yet)
One specific example was a conversation I had with a customer regarding another company we worked with mutually. The customer went into great detail about their issue with the other company and asked if I could write an email to that company to try and help solve their problem. I fed the recording of the phone call into the AI and simply told it to “write the email referenced in the conversation” and it wrote out a pretty good email with a lot of detail that was shared by the customer in it. A couple of tweaks and I was able to copy and paste it right into my email software and send it.
There’s some other features the software has that I personally don’t find as useful, like automatic sharing of meeting minutes/notes. My two biggest issue with these services is that they are charging somewhere in the neighborhood of $20 US per month for an amount of “minutes” of meetings. Also, they are taking all of your meeting data and doing who knows what with it? They do meet all the European Union and California privacy standards according to their site, but we’re all here on a decentralized self-hostable community, so I probably don’t need to expand on my issues there :)
Even if there was just a good “ChatGPT” style AI I could self-host, I could probably transcribe the recordings somehow myself.
Ah okay, wouldn’t have thought that it would be reliable enought to pull this off.
In that case, you might want to look at some kind of knowledge base AI, like danswer. There are others, which might be better suited, but I can’t seem to find them right now.
Wow, I’ve never heard about Danswer, it seems very used, thanks!!!
Thanks for the heads up on Danswer!
You can run a transcription model and a language model (the AI you talk to) locally however you will need a beefy GPU especially if you want to run the large models for better results.
OpenAI’s Whisper is open source and does transcription, and you can run inference on language models like LLaMa (+variants) or GPT4all locally. To store information long term (“AI memory”) you could find an open source vector database but I don’t have experience with this.
Thank you! I’ll make see if I can string together a few things to come up with my own homebrew version of these services. Honestly, for what they’re charging I think I can justify a new dedicated GPU. I’ve got a few other dockers/services which could take advantage of it anyway, so maybe this is the excuse I’ve been needing to pull the trigger on that purchase.
LLaMa-2 was just released and the fine-tunings people have made of it are topping the leaderboards right now in terms of performance for an open source language model. As for inference, don’t forget to look into quantization so you can run larger models on limited vram. I’ve heard about vLLM and llama.cpp and its derivatives.
If you’re looking for a GPU ~$300, I heard a used 3060 is better value than a 4060 right now on performance and memory throughout but not power efficiency (if you want an easy time with ML unfortunately the only option is nvidia).
Good luck! Would be nice to get an update if you find a good solution, it seems could share your use case
Thanks for the tip on the GPU! I live in an area where power is relatively cheap, so I’ll probably go for the 3060. I really wish some of these would work better with AMD since their drivers seem to be more Linux-Friendly these days.
If I get something going, I’ll share for sure!