{"id":141251,"date":"2022-06-28T03:23:28","date_gmt":"2022-06-28T08:23:28","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2022\/06\/spotting-unfair-or-unsafe-ai-using-graphical-criteria"},"modified":"2022-06-28T03:23:28","modified_gmt":"2022-06-28T08:23:28","slug":"spotting-unfair-or-unsafe-ai-using-graphical-criteria","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2022\/06\/spotting-unfair-or-unsafe-ai-using-graphical-criteria","title":{"rendered":"Spotting Unfair or Unsafe AI using Graphical Criteria"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/spotting-unfair-or-unsafe-ai-using-graphical-criteria.jpg\"><\/a><\/p>\n<p>How to use causal influence diagrams to recognize the hidden incentives that shape an AI agent\u2019s behavior.<\/p>\n<hr>\n<p>There is rightfully a lot of concern about the fairness and safety of advanced Machine Learning systems. To attack the root of the problem, researchers can analyze the incentives posed by a learning algorithm using causal influence diagrams (CIDs). Among others, DeepMind Safety Research has written about <a class=\"\" href=\"https:\/\/deepmindsafetyresearch.medium.com\/progress-on-causal-influence-diagrams-a7a32180b0d1\" rel=\"noopener\">their research on CIDs<\/a>, and I have written before about how they can be used to <a class=\"\" rel=\"noopener\" target=\"_blank\" href=\"https:\/\/towardsdatascience.com\/how-to-stop-your-ai-agents-from-hacking-their-reward-function-5e26fc006e08\">avoid reward tampering<\/a>. However, while there is some writing on the <a class=\"\" rel=\"noopener\" target=\"_blank\" href=\"https:\/\/towardsdatascience.com\/new-paper-the-incentives-that-shape-behaviour-d6d8bb77d2e4\">types of incentives<\/a> that can be found using CIDs, I haven\u2019t seen a succinct write up of the graphical criteria used to identify such incentives. To fill this gap, this post will summarize the incentive concepts and their corresponding graphical criteria, which were originally defined in the paper <a class=\"\" href=\"https:\/\/arxiv.org\/abs\/2102.01685\" rel=\"noopener ugc nofollow\" target=\"_blank\"><em class=\"\">Agent Incentives: A Causal Perspective<\/em><\/a><em class=\"\">.<\/em><\/p>\n<p>A causal influence diagram is a directed acyclic graph where different types of nodes represent different elements of an optimization problem. Decision nodes represent values that an agent can influence, utility nodes represent the optimization objective, and structural nodes (also called change nodes) represent the remaining variables such as the state. The arrows show how the nodes are causally related with dotted arrows indicating the information that an agent uses to make a decision. Below is the CID of a Markov Decision Process, with decision nodes in blue and utility nodes in yellow:<\/p>\n<p>The first model is trying to predict a high school student\u2019s grades in order to evaluate their university application. The model uses the student\u2019s high school and gender as input and outputs the predicted GPA. In the CID below we see that <em class=\"\">predicted grade <\/em>is a decision node. As we train our model for accurate predictions,<em class=\"\"> accuracy <\/em>is the utility node. The remaining, structural nodes show how relevant facts about the world relate to each other. The arrows from <em class=\"\">gender <\/em>and <em class=\"\">high school <\/em>to <em class=\"\">predicted grade <\/em>show that those are inputs to the model. For our example we assume that a student\u2019s <em class=\"\">gender<\/em> doesn\u2019t affect their <em class=\"\">grade <\/em>and so there is no arrow between them. On the other hand, a student\u2019s <em class=\"\">high school <\/em>is assumed to affect their <em class=\"\">education<\/em>, which in turn affects their <em class=\"\">grade<\/em>, which of course affects <em class=\"\">accuracy<\/em>. The example assumes that a student\u2019s <em class=\"\">race <\/em>influences the <em class=\"\">high school <\/em>they go to. Note that only <em class=\"\">high school <\/em>and <em class=\"\">gender <\/em>are known to the model.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to use causal influence diagrams to recognize the hidden incentives that shape an AI agent\u2019s behavior. There is rightfully a lot of concern about the fairness and safety of advanced Machine Learning systems. To attack the root of the problem, researchers can analyze the incentives posed by a learning algorithm using causal influence diagrams [\u2026]<\/p>\n","protected":false},"author":662,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32,41,6,1491],"tags":[],"class_list":["post-141251","post","type-post","status-publish","format-standard","hentry","category-education","category-information-science","category-robotics-ai","category-transportation"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/141251","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/662"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=141251"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/141251\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=141251"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=141251"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=141251"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}