(2022). Journal of Experimental & Theoretical Artificial Intelligence. Ahead of Print.
AI has for decades attempted to code commonsense concepts, e.g., in knowledge bases, but struggled to generalise the coded concepts to all the situations a human would naturally generalise them to, and struggled to understand the natural and obvious consequences of what it has been told. This led to brittle systems that did not cope well with situations beyond what their designers envisaged. John McCarthy (1968) said ‘a program has common sense if it automatically deduces for itself a sufficiently wide class of immediate consequences of anything it is told and what it already knows’; that is a problem that has still not been solved. Dreifus (1998) estimated that ‘Common sense is knowing maybe 30 or 50 million things about the world and having them represented so that when something happens, you can make analogies with others’. Minsky presciently noted that common sense would require the capability to make analogical matches between knowledge and events in the world, and furthermore that a special representation of knowledge would be required to facilitate those analogies. We can see the importance of analogies for common sense in the way that basic concepts are borrowed, e.g., the tail of an animal, or the tail of a capital ‘Q’, or the tail-end of a temporally extended event (see also examples of ‘contain’, ‘on’, in Sec. 5.3.1). More than this, for known facts, such as ‘a string can pull but not push an object’, an AI system needs to automatically deduce (by analogy) that a cloth, sheet, or ribbon, can behave analogously to the string. For the fact ‘a stone can break a window’, the system must deduce that any similarly heavy and hard object is likely to break any similarly fragile material. Using the language of Sec. 5.2.1, each of these known facts needs to be treated as a schema,14 and then applied by analogy to new cases.
Projection is a mechanism that can find analogies (see Sec. 5.3.1) and hence could bridge the gap between models of commonsense concepts (i.e., not the entangled knowledge in word embeddings learnt from language corpora) and text or visual or sensorimotor input. To facilitate this, concepts should be represented by hierarchical compositional models, with higher levels describing relations among elements in the lower-level components (for reasons discussed in Sec. 6.1). There needs to be an explicit symbolic handle on these subcomponents; i.e., they cannot be entangled in a complex network. For visual object recognition, a concept can simply be a set of spatial relations among component features, but higher concepts require a complex model involving multiple types of relations, partial physics theories, and causality. Secs. 5.2 and 5.3 give a hint of what these concepts may look like, but a full example requires a further paper.
Continue reading “Projection: a mechanism for human-like reasoning in Artificial Intelligence” »