Toggle light / dark theme

A small team of AI researchers at Microsoft reports that the company’s Orca-Math small language model outperforms other, larger models on standardized math tests. The group has published a paper on the arXiv preprint server describing their testing of Orca-Math on the Grade School Math 8K (GSM8K) benchmark and how it fared compared to well-known LLMs.

Many popular LLMs such as ChatGPT are known for their impressive conversational skills—less well known is that most of them can also solve math word problems. AI researchers have tested their abilities at such tasks by pitting them against the GSM8K, a dataset of 8,500 grade-school math word problems that require multistep reasoning to solve, along with their correct answers.

In this new study, the research team at Microsoft tested Orca-Math, an AI application developed by another team at Microsoft specifically designed to tackle math word problems, and compared the results with larger AI models.

This post is also available in: he עברית (Hebrew)

Recent advancements in artificial intelligence make it increasingly harder to detect deepfake voices, and the solution might actually come from AI itself.

Scientists at Klick Labs were inspired by their clinical studies using vocal biomarkers to help enhance health outcomes and created an audio deepfake detection method that taps into signs of life like breathing patterns and micropauses in speech.

An emerging research area in AI is developing multi-agent capabilities with collections of interacting AI systems. Andrea Soltoggio and colleagues develop a vision for combining such approaches with current edge computing technology and lifelong learning advances. The envisioned network of AI agents could quickly learn new tasks in open-ended applications, with individual AI agents independently learning and contributing to and benefiting from collective knowledge.

OpenAI has apparently been demonstrating GPT-5, the next generation of its notorious large language model (LLM), to prospective buyers — and they’re very impressed with the merchandise.

“It’s really good, like materially better,” one CEO told Business Insider of the LLM. That same CEO added that in the demo he previewed, OpenAI tailored use cases and data modeling unique to his firm — and teased previously unseen capabilities as well.

According to BI, OpenAI is looking at a summer launch — though its sources say it’s still being trained and in need of “red-teaming,” the tech industry term for hiring hackers to try to exploit one’s wares.

Bayesian neural networks (BNNs) combine the generalizability of deep neural networks (DNNs) with a rigorous quantification of predictive uncertainty, which mitigates overfitting and makes them valuable for high-reliability or safety-critical applications. However, the probabilistic nature of BNNs makes them more computationally intensive on digital hardware and so far, less directly amenable to acceleration by analog in-memory computing as compared to DNNs. This work exploits a novel spintronic bit cell that efficiently and compactly implements Gaussian-distributed BNN values. Specifically, the bit cell combines a tunable stochastic magnetic tunnel junction (MTJ) encoding the trained standard deviation and a multi-bit domain-wall MTJ device independently encoding the trained mean. The two devices can be integrated within the same array, enabling highly efficient, fully analog, probabilistic matrix-vector multiplications. We use micromagnetics simulations as the basis of a system-level model of the spintronic BNN accelerator, demonstrating that our design yields accurate, well-calibrated uncertainty estimates for both classification and regression problems and matches software BNN performance. This result paves the way to spintronic in-memory computing systems implementing trusted neural networks at a modest energy budget.

The powerful ability of deep neural networks (DNNs) to generalize has driven their wide proliferation in the last decade to many applications. However, particularly in applications where the cost of a wrong prediction is high, there is a strong desire for algorithms that can reliably quantify the confidence in their predictions (Jiang et al., 2018). Bayesian neural networks (BNNs) can provide the generalizability of DNNs, while also enabling rigorous uncertainty estimates by encoding their parameters as probability distributions learned through Bayes’ theorem such that predictions sample trained distributions (MacKay, 1992). Probabilistic weights can also be viewed as an efficient form of model ensembling, reducing overfitting (Jospin et al., 2022). In spite of this, the probabilistic nature of BNNs makes them slower and more power-intensive to deploy in conventional hardware, due to the large number of random number generation operations required (Cai et al., 2018a).