Professionalism/Data Ownership

In Correspondence with STS 4600 at the University of Virginia.

Introduction
Personal data refers to information that relates to an identified or identifiable individual. Items that identify the individual could vary from their user account, their personal identifiers such as their name and address, or can be objects such as their IP address or cookie identifier. While almost any kind of data can be collected on a user, data collection is often protected and limited. Louise Matsakis, a technology editor from Wired, writes that “health records, social security numbers, and banking details make up the most sensitive information stored online. Social media posts, location data, and search-engine queries may also be revealing but are also typically monetized in a way that, say, your credit card number is not.”

"Data ownership refers to both the possession of and responsibility for information. Ownership implies power as well as control. The control of information includes not just the ability to access, create, modify, package, derive benefit from, sell or remove data, but also the right to assign these access privileges to others."

Data is quickly becoming one of the most valuable resources on Earth. Because data is sourced by people, this introduces a host of ethical dilemmas regarding the ownership of personal data.

People often unwittingly agree to terms that consent to the collection and sale of their personal data. These terms are generally presented in the form of fine-print terms and conditions statements or unassuming notifications from mobile apps asking to access information from your device. Once a data broker obtains a person's data, they generally do not allow the person any control over it; this raises ethical questions, especially if this data is sensitive and in cases where this data is breached.

Legislation
In the United States, there is no general consumer privacy law at the federal level. Industry-focused laws exist, such as the Health Insurance Portability and Accountability Act (HIPAA) and the finance-related Gramm-Leach-Bliley Act (GLBA), but consumer data collected on the internet is largely left unregulated. California and Virginia are the only US states with any legislation requiring data brokers to provide people the option to opt-out of allowing the sale of their data.

California
In California, a bill called the California Consumer Privacy Act (CCPA) gives consumers rights over their collected data. The bill was signed into law on June 28, 2018, and gives Californians the rights to know what personal data is being collected and to whom it is being sold or disclosed. It also gives them the rights to delete their data and opt-out of the sale of their personal information, and provides protection from discrimination for exercising these rights.

Virginia
Virginia enacted the Consumer Data Protection Act (CDPA) on March 2, 2021, becoming the second state to enact data privacy legislation. Similar to the CCPA, it allows consumers to access their data, delete their data, and determine who possesses their data. It also gives consumers the right to opt-out of the processing and sale of their data. The CDPA further requires companies to implement "reasonable" data security practices to protect consumer data, and limits the collection and use of this data only to what is "reasonably necessary." However, it differs from the CCPA in that it also allows consumers to correct errors in their personal data.

Other States
Several other states have proposed similar data privacy bills, including New York, Maryland, Massachusetts, Hawaii, and North Dakota. These bills grant rights and protections similar to the CCPA and CDPA; however, they have yet to be passed.

European Union
The EU has been improving privacy policy since 1995 when it implemented the Data Protection Directive. More recently, it enacted the General Data Protection Regulation (GDPR) in May 2018 to implement protection and privacy with regard to personal data. It addresses transparency, constrains data usage and collection only to what is necessary, calls for reasonable security measures, and ensures that personal data can be corrected for accuracy, among numerous other principles. It also includes a limited version of the, known as the right of erasure; this right allows individuals to request their data be removed from search engines and other databases. However, this right can only be exercised if one of several possible conditions are met. There are two important court cases surrounding this right.

Court Cases
In 2014 the Google Spain case set precedent for the right to be forgotten in the EU. The case arose when a Spanish man complained about an old newspaper article selling his repossessed property. The man felt that because his debts had since been resolved, it was unfair that this search result still appeared. The EU court ruled that the info should be deleted. Some claim deleting information, as seen in this example, is a form of censorship. A similar court case was heard between Google and a French privacy regulator in 2019 about the jurisdiction of the right to be forgotten. The EU court ruled that search results do not have to be deleted outside of the EU. Google implements this using its Geoblocking tool, which restricts access to search results based on the location that the search was performed. For instance, while 'forgotten' search results would be hidden within the EU, since the U.S. does not legally grant the right to be forgotten, the search result would still appear in the United States.

Country Comparison
There is no single approach to data ownership and data privacy. Though the U.S. does not have a nationwide consumer privacy law analogous to the European Union's GDPR, individual states have started to take action, as seen in California and Virginia.

Some see this as a step in the right direction, but issues remain. For example, in California, one can only opt-out of data sale, but not data collection. Thus, companies can still gather and leverage personal data, unless a specific request for deletion is submitted. Furthermore, since only two states have enacted data privacy laws, personal data is left largely unregulated in the United States. While some companies have decided to roll out nationwide changes in response to the laws in California and Virginia, others have not.

China has a privacy law similar to the EU’s GDPR called the Personal Information Security Specification that was enacted in March 2018. However, there is tradeoff between privacy and surveillance. It is difficult to both maintain government access to citizen’s information while also protecting citizens from data usage by other parties.

Handling issues of data ownership is not straightforward as there are many uncertainties and tradeoffs.

General
In today's digital age, data has irreplaceable value. The largest consumers of user data include Google and Facebook, followed by Amazon, Apple, and Microsoft. These big tech companies use data for obvious things such as tailored advertisements and understanding consumer behavior, but the primary application is providing data for input to artificial intelligence algorithms. When data is such a valuable commodity, the question of whether data producers should be paid for their contribution arises.

In tech, what users give away for free is transformed into a precious commodity. It powers today’s most profitable companies. But the consumers it extracts data from often know little about the extent to which their information is being collected, who looks at it, and how much it is worth. In exchange for free use of their products, such as Google, Facebook, Youtube, users are paying with their data.

Examples of How Data Became so Valuable
Popular music/video streaming platforms such as YouTube, Netflix, Spotify rely on the data they collect about a users interests on their platform. Netflix states, "our business is a subscription service model that offers personalized recommendations, to help you find shows and movies of interest to you. To do this we have created a proprietary, complex recommendations system." However your collected data can get complex. In addition to knowing what you have watched on Netflix, Netflix will also best personalize recommendations by looking at the time of day you are watching, the devices you are using, the duration of your watching, how other Netflix members with similar tastes use Netflix, information about the titles, such as their genre, director, actors, release year, etc. This is a common trend among streaming platforms, to collect as much data about users' usage habits and keep them as a customer for longer.

Another common trend with the collection of user data, is the then selling of user data to third parties, done with user consent. However, often times users are not totally informed on what they are truly consenting to. 23andMe is a genetics startup from San Francisco. As of 2018, they had 5 million customers who sent samples of their spit to be analyzed to identify genetic changes at 700,000 different locations in their genomes. 23andMe does as promised, and delivers their analyzations and findings about your DNA. However, in order to use their services, users have to give consent for 23andMe's use of their data in medical research, which is not apart of the product that the user will ever see. "Giving consent by checking the appropriate box below means that you agree to let 23andMe researchers use your Genetic & Self-Reported Information for 23andMe Research." So in 2018, a London based drug giant, GlaxoSmithKline, partnered with 23andMe to develop new medicines. Part of the deal entailed GlaxoSmithKline making a $300 million investment in 23AndMe.

These two examples provide detail on how your personal data is the true commodity that companies are seeking. From a business ethics perspective, companies need to find more effective ways to inform customers about the types of data being collected on them. Similarly, customers need to be better educated about how valuable their personal data truly is, and how it is largely generating money for the companies. In both of the above examples, it is important to note that the user is not necessarily directly harmed through their data collection. However, there is a professional ethical dilemma with not being totally transparent about how personal data is used. Even further, while this personal data is being collected, a data breach/attack can put sensitive information on users under threat. Later in this chapter, professional ethics perspectives on data ownership will be explored.

Data Brokers
Data brokers are entities that collect and sell data. Since the terrorist attacks of 9/11, there has been a high demand for highly accurate identification of individuals through data and data brokers such as LexisNexis, Axciom, and Experian have been able to fill this demand through the collection of highly personal information from millions of people.

The ways in which data brokers collect data are numerous and include public records, internet scraping, as well as getting people to opt-in to their data collection schemes through terms and conditions statements. Some of the data collected could be subject to federal law, but the lack of regulation of data brokers in the US allows data brokers to buy and sell the information anyway.

Once a data broker has someone's information, it is nearly impossible for them to regain control of their data. As mentioned previously, some states have begun legislation to fix this problem

Data Protection
The possession of large quantities of consumer data have become increasingly valuable to those who know how to use it. For companies, big data can be processed and analyzed to lead smarter business moves, promote more efficient operations, and yield higher profits from satisfied customers. However, when placed in the wrong hands from cyberattacks, sensitive data can be used for malicious behavior. The management of data privacy protection has become increasingly complex due to the ubiquity of the information-intensive environment and the multidirectional demand of stakeholders and clients. Companies dealing with the use and distribution of personal data must implement plans to ensure the implementation and compliancy of data privacy policies, standards, guidelines and processes. A systematic approach used in navigating data security is understanding what kind of data the company has, tracking how company data is stored and transferred, and conducting regularly scheduled risk assessments. Some fear that the over-complexity of current data protection practices may expose vulnerabilities and weaknesses as data spreads across more platforms, both on-premise and in the cloud.

Cyberattacks and Data Breaches
Data breaches occur from cyberattacks, or the unauthorized access of a computer system or network, and involve the leak of confidential/sensitive company data. A data breach can have expensive short-term impacts on companies in all industries, costing business on average $3.86 million per breach and $148 per lost or stolen record. On top of that, companies are obligated to perform forensic investigations to assess what information was stolen and where the vulnerability in the data security infrastructure. Long term effects include the loss of customer trust as the company's reputation diminishes from the breach. Studies found that 85% of customers won't shop at businesses that have data security concerns and 69% of customers would avoid a company that suffered a data breach.

Data Collection Freedom for Data Brokers
The lack of legislation regulating data brokers means that there is little to dictate how data brokers go about collecting data. Some data is free to collect, such as public webpages or public records. Other, often times more personal, data is not however, and so data brokers have developed processes to getting users to unwittingly agree to terms.

An example of this is X-Mode, which is a data broker that collects location data from people's smartphones through software embedded into mobile apps. when a user downloads an app running X-Mode's software, the app prompts the user to allow it to use the device's location data without any further information. Many users will allow the app to use the data because they want to be able to in-app features. However, what is unknown to the user is that their location is now being tracked, and that data sold by X-Mode. This lack of transparency in the X-Mode process can be contrasted with that of 23andMe. Where X-Mode's approach doesn't informs user about their data usage, 23andMe tells used exactly who wants to use their data, and how and why they want to use it.

Facebook and Cambridge Analytica Scandal
In 2014, about 270,000 users were paid to take a personality survey through an app which scraped their Facebook profiles. They consented to the collection given it was for academic use by Cambridge University’s Psychometrics Center. Aleksander Kogan, a professor at the University, was hired by Cambridge Analytica to create this app. The app used Facebook's Open Graph platform, which at the time, also gave access to all of the participants' friends' information. The acquired data included names, birthdays, likes, and location information, all of which were both sensitive and valuable information. In this way, approximately 87 million Facebook users' data was obtained by Cambridge Analytica, 99.7% of whom did not consent.

Within the next year Facebook learned that the data was being used by Cambridge Analytica to aid Ted Cruz's presidential campaign. They removed Dr. Kogan's app from the site and demanded that Cambridge Analytica delete the data. According to Facebook, Cambridge Analytica confirmed that the data was deleted. However, in 2016 the company was hired by the Trump 2016 Presidential Campaign to provide tools for identifying personalities of American voters and targeting advertisements to influence their behavior.

The scandal exploded in 2018 when whistleblower Christopher Wylie exposed Cambridge Analytica’s misuse of Facebook data, including that it was not in fact deleted. In the words of Mark Zuckerburg when he testified to Congress: “When we heard back from Cambridge Analytica that they had told us that they weren’t using the data and deleted it, we considered it a closed case. In retrospect, that was clearly a mistake.” Facebook also suspended Cambridge Analytica from the site at this time.

Interestingly, Facebook claimed this was not a data breach. Facebook VP and Deputy General Counsel Paul Grewal stated in 2018: "The claim that this is a data breach is completely false... People knowingly provided their information, no systems were infiltrated, and no passwords or sensitive pieces of information were stolen or hacked." Facebook routinely allows researchers to collect user data for academic purposes, as Dr. Kogan's app did. However, Kogan broke the rules when he sold the data to Cambridge Analytica, a commercial third party.

Christopher Wylie
According to Wylie, as a Cambridge Analytica employee heavily involved in the project, he had witnessed their "corruption and moral disregard" firsthand. . He admitted that the project's massive scale was exciting at first: "We had done it. We had reconstructed tens of millions of Americans inside of a computer... I was proud that we had created something so powerful." However, after realizing the unethical nature in which the data was obtained and seeing how it was being applied, Wylie broke free from this state of acclimatization. He said "the office culture seemed to be clouding my judgment" and that it was easy to "lose sight of what I was actually involved in" while simply "staring at a screen." After experiencing this wake-up-call, he came forward as a whistleblower.

Lessons
Christopher Wylie demonstrated the obligation of the professional to speak out against unethical practices. As an ex-employee, Wylie faced immense pressure when speaking out, but was ultimately motivated by professional responsibility and duty to bring these issues to light. Wylie's experiences demonstrate that it can be easy to fall victim to acclimatization with regard to malpractice in the workplace; however, by taking a step back and considering the impact on consumers' lives, newfound clarity can be achieved. Though the massive amount of personal data initially showed promise for advancing psychological profiling technology, Wylie eventually realized that the unethical procurement and usage of the data was too great a cost (see Data Ownership vs. Data Processing). The end does not always justify the means, in cases of data ownership and professional ethics as a whole.

For more information, see.

Consent
In any study that uses participants to gather data, the participants must give informed consent. For a participant to give informed consent, they must be informed of all relevant information including how the data will be used, they must understand that information, they must participate voluntarily, and they must have the capacity to make a decision about whether to participate. The EU's GDPR is the most comprehensive regulator of online data in that whenever a company collects personal data from a citizen, it requires explicit and informed consent by that person in the form of opting in to the data collection.

However, the case studies presented above exemplify that this practice is nowhere near ubiquitous. Data collection policies hidden inside terms and conditions and apps that do not disclose the true uses of collected data do not satisfy the requirements for informed consent. The US and the rest of the world would need stronger regulation on data harvesting in order to protect the individual rights of online users.

Data Ownership vs. Data Processing
Should companies focus on using data to advance technologies or should consumer data be kept private to avoid misuse?

Accessibility in large amounts of data has allowed us to improve on pre-existing technologies and develop new technological fields like machine learning (ML). However, the availability of all this information can lead to selfish and ill-intentioned actions. While Facebook collected data from their users for their efforts of improving their consumer's experience on the platform, their data was eventually misused. The Cambridge Analytica Scandal, referenced above, is a good example of how data was shared without consent and how this data was then used to influence the 2016 election. However, there's no reason that data like this could not be used for benevolent reasons.

Companies have banks of their users personal data. Their is a professional expectation that the company use the data in an ethical manner and transparently. However, as in the Facebook example, the data was thought to be terminated, but instead it lived on and was eventually used to manipulate political campaigns without the users' knowledge.

Individual Rights vs. The Common Good
At what point does serving the common good start limiting personal liberties? Data-driven solutions rely on utilizing as much data as possible to improve results, but what if obtaining the data is not easily accessible? Similarly, what if obtaining the data is puts the consumer's data at risk?

A good example of this is medical data. Medical data collected could produce advancements in science and medicine, but how is that balanced with an individual’s right to privacy of personal information? We’ve seen this recently surrounding the COVID-19 crisis. Contract-tracing is tracking who has a disease and who those people have been near. This is a crucial tool to limit the spread of an outbreak. The system uses your phone's Bluetooth to anonymously track who you have been in close proximity to, as long as they also use the system. It is built off of data collection. At the start of the COVID-19 pandemic, Google and Apple worked together to add coronavirus tracing to Android and iOS, the two most important mobile operating systems.

Recently revealed in April 2021, the Android version exposure notification app had a privacy flaw that let other preinstalled apps potentially see the sensitive data that is if someone had been in contact with a person who tested positive for COVID-19. Google immediately worked on rolling out a fix to this bug, however, had the vulnerability never been found, members of the contact tracing app are left vulnerable. This contradicts the promise of contact tracing apps to keep data anonymized and secured. Serge Egelman, the CTO of AppCensus, which reported the vulnerability to Google, stated that "the lesson here is that getting privacy right is really hard, vulnerabilities will always be discovered in systems, but that it’s in everyone’s interest to work together to remediate these issues."

Transparency vs. Accessibility
Should companies be able to sell user data collected without being transparent about it's collection?

In the section above, this is discussed in relationship to the company X-Mode. Companies are able to keep cost of their services low or free by selling user data (location data in the case of X-Mode) without really being clear about doing so. Should users have a choice in this? Such as, shoulda user be able to choose to pay for an app rather than sharing location or other data?

Such as in the previously mentioned 23andMe case, users are forced to consent to the sell of their saliva samples/data for medical research. But, has 23andMe considered the selling their product as is, but allowing the user to opt into the medical research program by choice. If so, perhaps their wouldn't be enough users opting into the program, and thus 23andMe is losing revenue.

At the end of the day there are good arguments for handling data ownership problems a variety of different ways. These questions are important to consider, especially as new legislation begins to arise around data ownership and privacy.

Further Work
The subject of data ownership is broad and covers an extensive list of ethical dilemmas and relevant case studies. This chapter can also include ethical assessments on how personal data should be properly used in tech fields of blockchain, artificial intelligence (AI), and machine learning (ML). A topic to explore is also the lifespan of personal data, such as how and when it should be properly disposed of, and do users know how long their data is being used for?