What are your favorite models so far?

AsAnAILanguageModel@sh.itjust.works · edit-2 2 years ago

What are your favorite models so far?

noneabove1182@sh.itjust.works · 2 years ago

I’ve been impressed with Vicuna 1.5, seems quite competent and enjoyable. Unfortunately I’m only able to do 13B at any reasonable speed so that’s where I tend to stay, though funny enough I haven’t tried any 70Bs since llama.cpp added support, I’ll have to start some downloads…

rufus@discuss.tchncs.de · edit-2 2 years ago

I liked gpt4-x-alpaca for quite some time. That is to talk to and generate dialogue. Nowadays i’m not so sure. There are many promising ones out there, i’m currently trying a few llama2 based ones and switching models every few days.

Naked_Yoga@sh.itjust.works · 2 years ago

I really enjoy Wizard Coder for coding tasks. I use this specifically at work for code review and unit test writing. I also just generally like the Wizard LM.

Outside of that, I’ll say that I haven’t had a great experience with llama-2 yet, no matter how hard I’ve tried 🤣

micheal65536@lemmy.micheal65536.duckdns.org · 2 years ago

I’ve primarily used WizardLM as well but I’ve found that it tends to constantly try to follow the same format for every answer:

Not only is this repetitive, boring, and belittling to converse with, but it means that the model often won’t directly answer a question or give an actual argument/justification for something. It feels vaguely like it’s refusing to commit to a side and telling me off for trying to talk in absolutes rather than actually giving an answer.

Additionally, in cases where there isn’t a counterargument to be made, it will make up nonsense to fill the counterargument section. e.g. “Explain your reasoning for the above answer” tends to result in:

<“You can arrive at the above answer by doing …” followed by mostly sensible reasoning>

<“Alternatively, you could do …” followed by either a made up illogical reasoning or the exact same reasoning as before presented as if it was a different thing>

When I can get it to break out of this pattern, e.g. following the “thought action observation” loop script, it seems to perform marginally better than other models that I have tried.

Kerfuffle@sh.itjust.works · edit-2 2 years ago

Another one that made a good impression on me is Qwen-7B-Chat

Bit off-topic but if I’m looking at this correctly, it uses a custom architecture which requires turning on trust_remote_code and the code that would be embedded into the models and trusted is not included in the repo. In fact, there’s no real code in the repo: it’s the just a bit of boilerplate to run inference and tests. If so, that’s kind of spooky and I suggest being careful not to run inference on those models outside of a locked down environment like a container.

AsAnAILanguageModel@sh.itjust.works · 2 years ago

I think that’s a very relevant comment, and I also got spooked by this before I ran it. But I noticed that the GitHub repo and the huggingface repo aren’t the same. You can find the remote code in the huggingface repo. I also briefly skimmed the code for potential causes of the memory leak, but it’s not clear to me what’s causing it. It could also be PyTorch or one of the huggingface libraries, since mps support is still very beta.

Kerfuffle@sh.itjust.works · 2 years ago

You can find the remote code in the huggingface repo.

Ahh, interesting.

I mean, it’s published by a fairly reputable organization so the chances of a problem are fairly low but I’m not sure there’s any guarantee that the compiled Python in the pickle matches the source files there. I wrote my own pickle interpreter a while back and it’s an insane file format. I think it would be nearly impossible to verify something like that. Loading a pickle file with the safety stuff disabled is basically the same as running a .pyc file: it can do anything a Python script can.

So I think my caution still applies.

It could also be PyTorch or one of the huggingface libraries, since mps support is still very beta.

From their description here: https://github.com/QwenLM/Qwen-7B/blob/main/tech_memo.md#model

It doesn’t seem like anything super crazy is going on. I doubt the issue would be in Transformers or PyTorch.

I’m not completely sure what you mean by “MPS”.

AsAnAILanguageModel@sh.itjust.works · 2 years ago

By MPS I mean “metal performance shaders”, it’s the backend that enables pytorch to use apple’s metal api to use apple silicon specific optimizations. I actually think it’s not unlikely that the issue is with pytorch. The mps support is still beta, and there was a bug that caused a lot of models to output gibberish when I used it. This bug was an open issue for a year and they only just fixed in a recent nightly release, which is why I even bothered to give this model a try.

That being said, I think one should generally be cautious about what to run their computers, so I appreciate that you started this discussion.

Kerfuffle@sh.itjust.works · 2 years ago

Ah, I see. Wouldn’t it be pretty easy to determine if MPS is actually the issue by trying to run the model with the non-MPS PyTorch version? Since it’s a 7B model, CPU inference should be reasonably fast. If you still get the memory leak, then you’ll know it’s not MPS at fault.

AsAnAILanguageModel@sh.itjust.works · 2 years ago

Without mps it uses a lot more memory, because fp16 is not supported on the cpu backend. However, I tried it and noticed that there was an update pushed to the repository that split the model into several parts. It seems like I’m not getting any memory leaks now, even with mps as backend. Not sure why, but maybe it needs less RAM if the weights can be converted part by part. Time to test this model more I guess!

What are your favorite models so far?

What are your favorite models so far?

Usage

Models