

What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching
What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching
Agreed.
Because they were banned.
I didn’t make this claim, no
Because they would just be inactive rather than banned? You’re the one claiming that bans have occurred, which would be a major censorship issue for lemmy…
Can you try setting the num_ctx
and num_predict
using a Modelfile with ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter
Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference. Tiny models are impacted more by quantization
I have no problems with changing num_ctx or num_predict
Um… modlog is public. Where’s your evidence?
Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit
Ummm… did you try /set parameter num_ctx #
and /set parameter num_predict #
? Are you using a model that actually supports the context length that you desire…?
Notice how all those bot accounts that were so active leading up to the election have completely vanished from the internet now? Yeah.
It’s not. I can run the 2.51bit quant
Tell that to my home rig currently running the 671b model…
Hawley’s statement called DeepSeek “a data-harvesting, low-cost AI model that sparked international concern and sent American technology stocks plummeting.”
data-harvesting
???
It runs offline… using open-source software that provably does not collect or transmit any data…
It is low-cost and out-competes American technology, though, true
this is deepseek-v3. deepseek-r1 is the model that got all the media hype: https://huggingface.co/deepseek-ai/DeepSeek-R1
That’s great! Hopefully it shows up on F-Droid sometime soon
We don’t, I already have a steam deck. Touchpads.
there’s still a whole software-side bubble to contend with
They’re ultimately linked together in some ways (not all). OpenAI has already been losing money on every GPT subscription that they charge a premium for because they had the best product, now that premium must evaporate because there are equivalent AI products on the market that are much cheaper. This will shake things up on the software side too. They probably need more hype to stay afloat
Yes, but old and “cheap” ones that were not part of the sanctions.
China really has nothing to do with it, it could have been anyone. It’s a reaction to realizing that GPT4-equivalent AI models are dramatically cheaper to train than previously thought.
It being China is a noteable detail because it really drives the nail in the coffin for NVIDIA, since China has been fenced off from having access to NVIDIA’s most expensive AI GPUs that were thought to be required to pull this off.
It also makes the USA gov look extremely foolish to have made major foreign policy and relationship sacrifices in order to try to delay China by a few years, when it’s January and China has already caught up, those sacrifices did not pay off, in fact they backfired and have benefited China and will allow them to accelerate while hurting USA tech/AI companies
You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file