Toggle light / dark theme

Benchmarks are a key driver of progress in AI. But they also have many shortcomings. The new GPT-Fathom benchmark suite aims to reduce some of these pitfalls.

Benchmarks allow AI developers to measure the performance of their models on a variety of tasks. In the case of language models, for example, answering knowledge questions or solving logic tasks. Depending on its performance, the model receives a score that can then be compared with the results of other models.

These benchmarking results form the basis for further research decisions and, ultimately, investments. They also provide information about the strengths and weaknesses of individual methods.

Robots are great specialists, but poor generalists. Typically, you have to train a model for each task, robot, and environment. Changing a single variable often requires starting from scratch. But what if we could combine the knowledge across robotics and create a way to train a general-purpose robot?

Today, we are launching a new set of resources for general-purpose robotics learning across different robot types, or embodiments. Together with partners from 33 academic labs we have pooled data from 22 different robot types to create the Open X-Embodiment dataset. We also release RT-1-X, a robotics transformer (RT) model derived from RT-1 and trained on our dataset, that shows skills transfer across many robot embodiments.

In this work, we show training a single model on data from multiple embodiments leads to significantly better performance across many robots than those trained on data from individual embodiments. We tested our RT-1-X model in five different research labs, demonstrating 50% success rate improvement on average across five different commonly used robots compared to methods developed independently and specifically for each robot. We also showed that training our visual language action model, RT-2, on data from multiple embodiments tripled its performance on real-world robotic skills.

This is a risky bet, given the limitations of the technology. Tech companies have not solved some of the persistent problems with AI language models, such as their propensity to make things up or “hallucinate.” But what concerns me the most is that they are a security and privacy disaster, as I wrote earlier this year. Tech companies are putting this deeply flawed tech in the hands of millions of people and allowing AI models access to sensitive information such as their emails, calendars, and private messages. In doing so, they are making us all vulnerable to scams, phishing, and hacks on a massive scale.

I’ve covered the significant security problems with AI language models before. Now that AI assistants have access to personal information and can simultaneously browse the web, they are particularly prone to a type of attack called indirect prompt injection. It’s ridiculously easy to execute, and there is no known fix.

In an indirect prompt injection attack, a third party “alters a website by adding hidden text that is meant to change the AI’s behavior,” as I wrote in April. “Attackers could use social media or email to direct users to websites with these secret prompts. Once that happens, the AI system could be manipulated to let the attacker try to extract people’s credit card information, for example.” With this new generation of AI models plugged into social media and emails, the opportunities for hackers are endless.

Sure, you could just stick a ChatGPT sidebar in your browser. But what do we really want AI to do for us as we use the web? That’s the much harder question.

At some point, if you’re a company doing pretty much anything in the year 2023, you have to have an AI strategy. It’s just business. You can make a ChatGPT plug-in. You can do a sidebar. You can bet your entire trillion-dollar company on AI being the future of how everyone does everything. But you have to do something.

The last one of these was crypto and the blockchain a couple of years ago, and Josh Miller, the CEO of The Browser Company, which makes the popular new Arc browser, says he’s… More.


AI is coming for your online life… but nobody’s exactly sure how it’s going to work.

Zoom’s selling a cheaper AI package than Microsoft 365 Copilot and Google Duet AI, and soon it can plug into a new ‘modular workspace.’

At Zoomtopia 2023 today, Zoom announced Zoom Docs, a collaboration-focused “modular workspace” that integrates the company’s Zoom AI Companion for generating new content or populating a doc from other sources — you know the drill by now.

Along with the Mail and Calendar offerings launched during last year’s event, Zoom Docs is another step toward a full office suite alternative to Google Workspace and Microsoft 365, which both have started to integrate AI-powered tools of their own, dubbed Duet AI and Copilot, respectively. The company says it will be widely… More.


Zoom’s new tool expands beyond the Zoom meeting.

Last week at its annual CloudWorld event in Las Vegas, Oracle showed that it, too, is going full throttle on generative AI–and that it has no plans to cower to its biggest rival Amazon Web Services (AWS.)Before we get into the CloudWorld event itself, it’s important to take a tiny step back to September 14 when the company announced a new partnership with Microsoft that puts Oracle database services on Oracle Cloud Infrastructure (OCI) in Microsoft Azure. The new Oracle Database@Azure makes Microsoft and Oracle the only two hyperscalers to offer OCI to help simplify cloud migration, deployment and management. Especially when you consider that the partners have achieved rate card and… More.


This year at Oracle CloudWorld, the company advanced its generative AI strategy across its cloud infrastructure, apps, and platforms. Exploring this year’s announcements.

Lung cancer screening is crucial for decreasing the death count from the disease but the government can’t scan everyone’s lungs. Here is an AI that identifies people who actually need screening.

Lung cancer is the deadliest cancer type, killing over a million people annually across the globe. The disease is responsible for the highest number of cancer deaths in both men and women in the US.

In fact, the death toll from lung cancer among women and men is nearly triple that of breast cancer and prostate cancer, respectively.

A product designer spotted prompts in Spotify’s codes.

Users may soon be able to create artificial intelligence-generated Spotify playlists using prompts. Speculations are rife ever since hashtag creator and product designer Chris Messina posted pictures of code from Spotify’s backend on Threads.

It would be something like OpenAI’s chatbot ChatGPT, but for creating a song playlist.


Spotify appears to be developing AI-powered playlists. References discovered in the app’s code indicate the company may be developing generative AI playlists users could create using prompts.

The race to develop AI chips continues as Samsung’s chip manufacturing department partnered with Canadian startup Tenstorrent to produce chips and intellectual property for data centers.

The Canadian startup Tenstorrent, which builds artificial intelligence (AI) processors, among other things, revealed a new partnership with Samsung’s chip manufacturing department.

On Oct. 2, the startup announced the partnership with Samsung, saying it will use it to bring the “next generation of AI chiplets to market.” Tenstorrent manufactures chips and intellectual property (IP) for data centers.


The development comes as dominance in the AI chip market is currently held by American tech manufacturer Nvidia.