October 1, 2018

AI Guidance Note (Dr. M Plante) to advance conversations on AI Policies & Ethics

To add your views, please make use of the Building Trust in AI Survey, CATA partnered: Cathy Cobey, Technology Risk Partner, EY Canada located at: http://cata.ca/2018/ai-trust/

You can also join 200 of your peers at the New AI(Artificial Intelligence) opt-in Group at: https://www.linkedin.com/groups/12139393/

We have provided below a guidance note advanced by Distinguished Researcher,  Dr. M Plante and encourage you to add to the conversation.

Guidance Note:

Intro: I have decades of experience in artificial intelligence research and applications.  I applaud this fine initiative, and now add more guidance to help advance your mission. 

Human Responsibility – AI driven machines are the creation of humans. Humans cannot ignore the laws that govern them, and must apply laws either as a creator or a user. The laws aimed at securing human lives apply to machines.

There are subtle aspects of responsibility.  If a self-driving car hits another car, who is responsible for the civil damages or criminal code violation?  The owner, the programmer, or the person who sold it? If combining publicly available information with information gathered from the behaviour of individuals that have access to sensitive information helps to predict stock prices, even if no information is disclosed, who is guilty of insider trading?  Laws presume proximate causation while these algorithms make decisions based on factors that legal reasoning is not equipped to deal with.

In addition laws themselves are approximate rules intended to achieve a purpose in a way that is simple enough for human agents to enforce.  AI algorithms can achieve and enforce the achievement of those purposes without the same burden of rules and simplicity. Should we not change the laws and allow the devices to achieve the objectives better than the laws can?  For instance, there are simple rules governing traffic, to achieve traffic flow and safety and some measure of equity, whose enforcement is very expensive. Intelligent devices can achieve these objectives better at lower cost without the arbitrary rules and expensive human enforcement.  Should the laws be adapted?

Physical Integrity – AI Robots must not kill or harm.

I appreciate the shout out to Asimov’s laws of robotics, but this is not science fiction.  Let’s not anthropomorphize the integration of robots and AI algorithms.

Moral Integrity – AI Robots must not lie, or do harm or go against any entity endowed with the power of reason. In every situation human dignity must be the priority.

No intentionality can be ascribed to models and algorithms. Humans are the only entities endowed with the power of reason, and “go against” is a vague concept.  Most of this is covered in the first point.

However for the benefit of the points that follow, it is useful to examine two aspects that this point may have been trying to grasp at.  One is the additional moral responsibilities that the developers, sellers, and users of these technologies may be bound by that go beyond what is legal to what is good, and the other is possibly unintended consequences that stem from the ignorance of the sorcerers’  apprentices trying to wield these technologies without having a strong grasp of the mathematics.

AI algorithm development has roughly 3 phases, design, training, and application.  This applies particularly to machine learning, but extends to other model-based algorithms.  I will keep the detail of those phases for a different document, but essentially you start with some hypothesis about which measurable things can be predictive of some unknown property, a hypothesis that might be incorrect or unproductive.  Secondly a large amount of data is acquired and various combinations of models are tried and trained with the data to see whether this data can predict the unknown property. This training always has a choice of what to optimize for, whether the person doing the training realizes it or not.  Thirdly, the model that is generated is then frozen and integrated into larger systems that are hypothesized to do something useful. Along the way, many assumptions are made, and many biases are introduced, both biases in the sense of data distribution, and biases in the sense of human judgment.

The process involves many choices, some of them unconscious.  The choice of data is based on human choices as well as availability of data.  What are the consequences of choosing different data, or of choosing to continue working with the data that is there despite its flaws?  There is unconscious bias in the choice of algorithms or even consequences from not realizing the effects of making different choices in training the model.  There are choices made in accepting the model as being reflective of reality when in fact it is a flawed representation using proxies of proxies going through a black box.  There are consequences to the way the prediction is used. These include conscious choices, unconscious choices, and choices that developers and users don’t even realize are choices.  They all have consequences. They are choices that are not all made by a single person. Who is responsible for the consequences of those cascading choices? Are they also responsible for indirect consequences, for example reducing the quality and quantity of employment, allowing breaches of privacy not possible without them, perhaps making crime less detectable?

Privacy – AI Robots must never divulge personal data, information and inner thoughts intrusted to AI machines, nor any information related to humans, unless agreed by Humans. If need be, humans can at any point apply superior power over machines, such as robots.

Privacy is one of the more important issues that AI brings forward, although not in this laughable sci-fi formulation.   AI algorithms, combined with Big Data and IoT, both allow and require more collection of data about persons. To what degree do citizens consent to this data collection and to its use?  A camera in a mall directory can determine whether you are a man or a woman. A human could always do that, but when a machine does it, it is necessarily surreptitious. It is not a normal social interaction in a social context of norms that are socially enforced.  How much can be done with this information and to what degree is consent required and how is this consent to be communicated?

All interactions with businesses and machines generates data.  It can be as subtle as the time between keystrokes or how quickly you move your mouse, or more complex signals.  That is already a privacy issue. Using these data to make predictions about you generates a deeper privacy issue.  There is the well-known story of the algorithm that predicts when a woman has become pregnant before anyone else in her family knows, and sends her pregnancy-related offers.  That is an invasion of privacy, even though it is using data which she has consented to share.

There are privacy issues when data is collected, and these increase as the data is processed, and even more so when the result is used.  You can generate privacy or secrecy issues from innocuous data. To give an example from early in my career, my team collected signals that were unclassified.  We used an open published neural network algorithm to process this data. The predictions made on the basis of applying unclassified algorithms to unclassified data were startlingly accurate and the output was then classified secret.  All our hard drives had to be degaussed.

Neutrality – AI Robots must neither judge human behaviours, nor human values either explicit or implicit; robots must not be capable of creating, storing or formulating any such judgement.

AI algorithms are incapable of judgment, they have neither intentionality nor agency.  You are anthropomorphizing them by imputing this. T

Wellbeing – AI Robots must ease of human labour.  Ease of labour is not a moral good. We have labour standards for that.  Ease of labour translates into job loss. What happens if we ease the labour of cashiers?  Fewer cashiers. Do we want to increase productivity? That lowers prices but reduces jobs.  Would increasing the quality of the output be better? For instance we do want to increase the accuracy of medical diagnosis.  Will that mean less or more work?

Education – AI Robots must help humans become better individuals and help humanity navigate its expending landscape of knowledge and diversity.

I’m not sure what to make of this one.  It is irrelevant to most AI endeavours. And for the applications where it is relevant, very few decisions are ruled out.

Ethical Behaviour – AI Robots must respect human rights to privacy, in their role of collecting, reporting and analyzing the implication to owners and users, even at times of conflicting, self-contradictory situations.

Every time this document says “AI Robots must” I read it as “designers, trainers, sellers, and users of AI algorithms, whether embedded in robots or not, must”.   You cannot ascribe any of these things to the algorithms themselves. They cannot analyze implications, and the data they deal with is always conflicting and self-contradictory, that is the nature of real world data and the reason why AI algorithms are used rather than linear regressions.

However, ethical behaviour is a challenge, and more so in a data constrained world.  I can bring out generalities, but instead I will give one example I dealt with from the early 90s.  The application was credit risk for a large class of loans in the US. This particular financial institution for a variety of reasons had a disproportionately low number of good loans among minority-owned businesses, for several minorities.  The characteristics of businesses that are good credit risks can be different in different minorities. The law prohibits the use of data that can lead to bias in credit decisions. All businesses were therefore judged according to criteria that are typical of majority businesses, because that is where the data was.  If the prohibited data points had been used, the neural network model could have specialized: women-owned businesses are more likely to succeed with a particular set of factors, Latino-owned businesses on average have characteristics that don’t match successful majority businesses. The result of pre-AI statistical models was to decline loans to more non-majority businesses because they were atypical.  This in turn meant that less and less data from successful non-majority businesses was being collected since they were an increasingly small proportion of the customer base.

In this case, there was an economic reason to want to manipulate the data to better represent minorities.  Often there is not. There is an ethical imperative that is very difficult to regulate. Regulation made by those who don’t understand the math can have the opposite effect from what is desired.  In the case of these models, the algorithms themselves found proxies for the prohibited data. One simple example is the sum of the digits of your phone number, which correlates with your age. Older people who have been in their house since the 60s in the days of analog switching of pulse dialing would have had area codes and exchanges with small numbers.  This is because a 9 took longer to switch than a 1. Middle aged people who bought their houses later had exchanges with higher digits when the lower ones had run out, while young people from the days of digital switching and mobile phones got the higher numbers in new exchanges and area codes. Non-linear algorithms find a way, to paraphrase Jurassic Park.

Where do ethics come in?  Being inclusive and equitable requires a great deal of conscious effort and math.  Systemic biases are hidden throughout the data and are exacerbated by those who look for simple rules.  All the steps in developing and fielding these algorithms have to cooperate in actively countering the biases and the stereotyping that is in the very fabric of the data.

Skewing of Opinion – AI Robots organize data openly, without discrimination, and present facts and information in a fair, proportionate and transparent manner. However, it is recognized that AI Robots can detect and report cases where the information has been artificially overweighted.

No.  Training requires weighting data and discriminating in order to correct for data distribution.  This isn’t even just to correct for missing data, it is to optimize for the best outcome. For instance say that you are detecting cancer cells.  Any algorithm will have false positives and false negatives, and real data sets will have 95% non-cancerous cells. An algorithm that always says there is no cancer cell will be 95% accurate.  But accuracy is not what you should be optimizing for. You have to determine a cost function for the consequences of all 4 quadrants, particularly false positives and false negatives. Presumably, depending on distribution, you may intentionally slant the data to have more false positives than false negatives, reducing the accuracy and skewing results in order to achieve the optimal balance between survival rates and the financial and medical cost of unnecessary treatment.  One method is to bias the input data so that the algorithm sees more cancer cells than would normally occur.

Don’t focus on skewing and proportionality, which are loaded words for routine and often necessary statistical data processing.  Focus instead on what the algorithms are optimizing for.

“it is recognized that AI Robots can detect and report cases where the information has been artificially overweighted”.  Don’t try to pre-specify the type of algorithm to be used. For most such instances you don’t need anything more than conventional statistical techniques and simple algorithms.  But more importantly why in the world are you using data that someone else has had the opportunity to tamper with? Deal with your data integrity before you start to make the work of data scientists more difficult.

Transparent?  I am still covered by non-expiring NDAs for work in fraud detection AI algorithms.  Transparency is not always a good thing.

We need to focus on how AI is currently using public data to manipulate or change individual or group actions or fraudenely extract data to benefit or be used often at the detriment of individuals or groups.  

AI can at present use mathematical data sets that always need to call text files to report or explain the results these text files can be manipulated to say anything one wants to say regardless of it being right or wrong.

We need a gold standard for reporting that is focused on our current academic and professional knowledge and can be adjusted as our knowledge base grows or changes.   

One simple approach for now would be to say all data sets that are based on public data lakes that use ethichnicity, the individuals age, sex, location, estimated or projected status political affiliation or financial status should be in the data public domain for any one so use for interpretation regardless of how it has been gathered or been previously been interpreted or is currently being used.  

Don’t just let the media report who got killed or ticked running a stop sign. They must also report  how many people obey that specific stop sign, at least we have some idea where to focus our efforts to change patterns of behavior at that location. Find a pattern an use AI based on public data to tix the problem or report current behavior and how it maybe changing.  

Open access to all raw data sets is critical for AI to work properly.

Good luck getting this agreed to by the likes of Facebook or Amazon or any political party or healthcare providers or the military.

Most people will find is easier to focus on robots rather than the very difficult current issues that need fixing now. Military drones and robots are a very real future threat but currently traditional warfare kills far more.    

AI is seems to be the current replacement for the old Voodoo Spreadsheets. It was not that long ago we were told to purchase a computer with a spreadsheet program so you can start your own business and HP said they knew from their data sets the number of people who showed up at emerge who had chest pain that were not heart related so hospitals could then reduce out patients costs.  

About CATAAlliance
Interact with your Innovation Peer Group Now (No Tech Firm Left Behind)

The Canadian Advanced Technology Alliance (CATAAlliance), Canada’s One Voice for Innovation Lobby Group, crowdsources ideas and guidance from thousands of opt in members in moderated social networks in Canada and key global markets. Supported by evidence-based research, CATAAlliance then mobilizes the community behind public policy recommendations designed to boost Canada’s innovation and competitiveness success.

Support CATA Advocacy Today through Crowdfunding
10 Ways to Advance Agendas with CATAAlliance

Contact: CATAAlliance CEO, John Reid at email jreid@cata.ca, tel: 613-699-8209, website: www.cata.ca, tags: Innovation, Leadership, Entrepreneurship, Advocacy