I think I read this post wrong.
I was thinking the sentence “We could be saving the world!” meant ‘we’ as in humans only.
No need to be training AI. No need to do anything with AI at all. Humans simply start saving the world. Our Research Papers can train on Reddit. We cannot be training, we are saving the world. Let the Research Papers run a train on Reddit AI. Humanity Saves World.
No cynical replies please.
They already do that. You’re being a troglodyte.
Hmmm. Not sure if I’m being insulted. Is that one of those fish fossils that looks kind of like a horseshoe crab?
You’re thinking of a trilobite
Dictionary Definitions from Oxford Languages · Learn more noun (especially in prehistoric times) a person who lived in a cave. a hermit. a person who is regarded as being deliberately ignorant or old-fashioned.
Training it on research papers wouldn’t make it smarter, it would just make it better at mimicking their writing style.
Don’t fall for the hype.
Because they are looking for conversations.
AI isn’t saving the world lol
Machine learning has some pretty cool potential in certain areas, especially in the medical field. Unfortunately the predominant use of it now is slop produced by copyright laundering shoved down our throats by every techbro hoping they’ll be the next big thing.
You could feed all the research papers in the world to an LLM and it will still have zero understanding of what you trained it on. It will still make shit up, it can’t save the world.
Tons of people already are. The following site is useful for searching papers using ai https://consensus.app/
Thank you! That was thoughtful
Both are happening. Samples of casual writing are more valuable to use to generate an article than research papers though.
Yeah. Scientific papers may teach an AI about science, but Reddit posts teach AI how to interact with people and “talk” to them. Both are valuable.
Hopefully not too pedantic, but no one is “teaching” AI anything. They’re just feeding it data in the hopes that it can learn probabilities for certain types of output. It “understands” neither the Reddit post nor the scientific paper.
Describe how you ‘learned’ to speak. How do you know what word comes after the next. Until you can describe this process in a way that doesn’t make it ‘human’ or ‘biological’ only it’s no different. The only thing they can’t do is adjust their weights dynamically. But that’s a limitation we gave it not intrinsic to the system.
I inherited brain structures that are natural language processors. As well as the ability to understand and repeat any language sounds. Over time, my brain focused in on only the language sounds I heard the most and through trial and repetition learned how to understand and make those sounds.
AI - as it currently exists - is essentially a babbling infant with none of the structures necessary to do anything more than repeat sounds back without understanding any of them. Anyone who tells you different is selling you something.
Because AI needs a lot of training data to reliably generate something appropriate. It’s easier to get millions of reddit posts than millions of research papers.
Even then, LLMs simply generate text but have no idea what the text means. It just knows those words have a high probability of matching the expected response. It doesn’t check that what was generated is factual.
Redditors are always right, peer reviewed papers always wrong. Pretty obvious really. :D
Dank memes > science
- tech bros, probably
We are. I just read an article yesterday about how Microsoft paid research publishers so they could use the papers to train AI, with or without the consent of the papers’ authors. The publishers also reduced the peer review window so they could publish papers faster and get more money from Microsoft. So… expect AI to be trained on a lot of sloppy, poorly-reviewed research papers because of corporate greed.
Nobody wants an AI that talks like that.
I kind of think my question is WHY ARE WE FOCUSING ON TALKING TO IT?
Because “ai” ad we colloquially know today are language models: they train on and can produce language, that’s what they are designed on. Yes, they can produce images and also videos, but they don’t have any form of real knowledge or understanding, they only predict the next word or the next pixel based on their prompt and their vast examples of words and images. You can only talk to them because that’s what they are for.
Feeding research papers will make it spit research-sounding words, which probably will contain some correct information, but at best an llm trained on that would be useful to search through existing research, it would not be able to make new one
Papers are most importantly a documentation of exactly what and how a procedure was performed, adding a vagueness filter over that is only going to decrease its value infinitely.
Real question is why are we using generative ai at all (gets money out of idiot rich people)
The Ghost of Aaron Schwartz
What he was fighting for was an awful lot more important than a tool to write your emails while causing a ginormous tech bubble.
Anyone running a webserver and looking at their logs will know AI is being trained on EVERYTHING. There are so many crawlers for AI that are literally ripping the internet wholesale. Reddit just got in on charging the AI companies for access to freely contributed content. For everyone else, they’re just outright stealing it.