Siri was introduced and integrated into iOS and macOS over ten years ago. In that time, little has changed about it. There are articles from 2016 complaining about how far it was behind Google and Cortana (RIP).
In the meantime, LLMs have changed the game when it comes to language understanding, context, and ability to execute based on natural language instruction. All the tools are here: Speech-to-text through libraries like Whisper, and of course the LLMs themselves which interpret language and context to a revolutionary degree. Text-to-speech is already solved.
So, ignoring the “AI hype” and seeing what these tools actually are and can do, it naturally fits that a better personal assistant could be created by combining these pieces and leveraging the existing iOS and macOS APIs to improve Siri’s functionality by an order of magnitude, or more.
Apple remains the only big tech company not publicly diving head-first into the LLM race, but it’s relatively well-known that they’re developing one internally. As that was reported last July right around WWDC 2023, it was obviously not ready for an announcement. Besides, that one was all about Vision Pro anyway.
So, that point, coupled with the fact that Apple tends to be last on the trend train but offer a more polished experience, would make WWDC 2024 the perfect time to reveal an “all new” or revamped Siri which could run locally on, say, newer iOS devices or Macs with Apple Silicon. I use SuperWhisper on my M2 MacBook Air and it works amazingly well. It would also be a good carrot to get people to upgrade their phones, iPads, and Macs. Besides, what else would be important at WWDC next year? I don’t see much else on the horizon, other than the real launch of Vision Pro.
Anyhoo, my random unsubstantiated thoughts. Talk among yourselves.
I don’t think this is true at all, nor do I think we’re any closer than we were several years ago. LLMs don’t understand anything at all. Given a prompt, they assemble portions of words into something that is likely to resemble what a desired response might look like, based on whatever corpus of text they’ve been fed.
They do not actually comprehend the question and then answer it.
Siri actually answers questions using a curated knowledge database. If it doesn’t have an answer, it doesn’t pretend to have. LLMs don’t really have a concept of knowing the answer or not knowing the answer since they’re not based around a repository of facts in the first place. If they have enough training data to assemble something that looks like a response that answers it, they’ll output that response. Whether it’s true or not isn’t even relevant to how they work.
If I ask Siri a question, I want the response to be reliable, or just tell me it doesn’t know. If I ask it to complete a specific task, it needs to have been programmed for that task anyway, so LLMs don’t add anything there. Either it recognizes (meaning matches keywords in its database of functions) a task it knows how to do or it doesn’t.
It can always gain new functions or new knowledge sources, but none of that involves adding a bullshit generator.