Emerging Technologies in Transportation Casebook/Open Data in Transportation

Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents, or other mechanisms of control.1 One of the definitions of open data is as follows:


 * Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share like.2

Open data that is freely available and distributed by all parties allows the entire data system to be interoperable. Interoperability allows government, business, or citizen alike access to the data with a guarantee that the information is accurate and useful. Open data allows users to mix datasets and information in ways that existing sources have not imagined.

In transportation, not only are people an observable data point, but the entire transportation network inclusive of automobiles, buses, trains, airplanes, and all manner of mass transit options are occurring. Open data refers to information – such as data about transit routes, schedules, highway congestion, pricing – that is made freely available for analysis and integration into other applications.

ANNOTATED LIST OF ACTORS

 * Policy and rule makers
 * National governments make policies to encourage providing and using open data. They also regulate accessibility and distribution of open data to protect personal data and privacy.
 * Governments make policies to encourage innovative usages of open data.
 * Governments and third parties publish licenses for open data.
 * Data providers
 * Governments and third parties distribute data under license.
 * Service providers
 * Governments and third parties provide services in which open data is used.
 * Users
 * Citizen/private entities download data, provide value, and receive services.

TIMELINE OF EVENTS in the United States3,4

 * 1942 – Robert King Merton explained the benefits of open scientific data, which is freely accessible to all.
 * 1995 – The term “open data” used for the first time in a document from an American scientific document.
 * 1998 – Bay Area Rapid Transit released its schedule data in .csv format, becoming the first known release of transit data to the public.5
 * May 2000 – The Selective Availability (SA) of GPS is turned off.
 * December 2007 – Thirty thinker and activist on the Internet held a meeting to define the eight principle for open government data
 * Jan. 21, 2009 – President Barack Obama’s memoranda titled “Transparency and Open Government” 6
 * May 20, 2009 – Data.gov7 launched.
 * December 8, 2009 – Whitehouse issues the Open Government Directive
 * May 13, 2012 – President Barack Obama’s memorandum titled “Building a 21st Century Digital Government”8
 * May 9, 2013 – Executive order titled “Making Open and Machine-Readable the New Default for Government Information”9
 * May 9, 2013 – “Open Data Policy: Managing Information as an Asset”
 * May 16, 2013 – Project Open Data launched
 * April 28, 2014 – Digital Accountability and Transparency act.

MAPS OF LOCATIONS

 * National Governments’ portal sites
 * data.gov (United States) 7
 * data.gov.au (Australia) 10
 * open.canada.ca (Canada)11
 * data.gouv.fr (France)12
 * govdata.de (Germany)13
 * data.go.jp (Japan)14
 * data.go.kr (Republic of Korea)15
 * data.gov.sg (Singapore)16
 * data.gov.uk (United Kingdom)17
 * Other portal sites for open data
 * https://data.cityofnewyork.us/dashboard (New York City) 30
 * As of April 15, 2016, there are 519 open data portal sites are registered in DataPortals.org, which is the catalog site of open data portal sites. 18

POLICY ISSUES

 * Enhanced accessibility to data and personal data and privacy19
 * How governments ensure citizens have a legally enforceable right to access, re-use, and redistribution easily; and
 * Are there enough safeguards to protect personal data and privacy in the environment of enhanced accessibility to data?
 * Common standards for open data20
 * Can be linked to, so that it can be easily shared and talked about;
 * Is available in a standard, structured format, so that it can be easily processed;
 * Has guaranteed availability and consistency over time, so that others can rely on it;
 * Is traceable, through any processing, right back to where it originates, so others can work out where the data originated and whether to trust it.
 * Government role to stimulate innovation in use of open data
 * As determined by review from TRB, the availability of open data encourages innovation that could not be accomplished solely by agency staff.
 * The added benefits that stimulate innovation include increased awareness of services to the public, empowered our customers, improved the perception of government agencies, and provided opportunities for private businesses to utilize the data.21
 * The TRB further stated that in locations where open transit data is available, "[it is] making transit more competitive as a transportation option, providing better regional coordination of services, and providing a better transit experience."21
 * Government agencies are not geared towards innovation. Providing open data to the public allows outside innovators to find a better use for the data, sometimes one that the government agency did not intend or did not explore. 21 An agency does not always contain sufficient resources to develop applications and conduct additional analyses on the same scale as the open market.

NARRATIVE OF THE CASE
Open data is changing the world. Open data can improve governments through increasing transparency, enhancing public services, and resource allocation. Data collected and controlled by the government no longer needs to be useful just to government, but to the citizens and businesses that are impacted by the data. Open data can improve communication and information access for citizens. Interactions between data collectors and data users is improved when both sides are aware of the data available. Information asymmetry currently occurs when data is closed to certain parties. Open data standards can provide a path towards correcting information asymmetry and improving information dissemination. Open data can provide new opportunities for citizens, government, businesses, and organizations by fostering innovation. Open data allows users the opportunity to intermix multiple statistics from disparate locations to aggregate data. The aggregation of open data provides new sources of information for citizens and policy makers to analyze and determine a new path forward to solving problems.

But there can be can negative impacts of open data. Legislation focused on transparency and accountability passed prior to the rise of the Internet could have insufficient safeguards for privacy. Just as open data can be used to provide innovative solutions to problems for government and commercial issues alike, freely available public data is also in the hands of actors that may choose to exploit a weakness of society. For example, if open data includes any identifying information of a citizen which then can be cross-referenced to other data that provides mapping locations, privacy and safety could become endangered.

How Open is “Open” 2

 * Legal openness
 * In many jurisdictions, intellectual property right on data prevents users from using, reusing, and redistributing without explicit permission. Open licenses clarify the openness of a dataset for use, reuse, and redistribution. Open data providers can set the degree of legal-openness of their dataset by applying the appropriate license. Examples of licenses for open data are listed on the webpage of Open Definition. 23  The end goal for open data proponents is to ensure that as much data as possible will be legally free to any end user without fear of repercussions.  Open data will allow global access to data, which in  turn spurs innovation from many different locations and environments.  Data that is collected or controlled by government agencies would fall into this category.
 * Technical openness
 * Technically open data are provided at reasonable cost, especially many data can be freely downloaded from the internet. There are common two ways to distribute open data: Providing dataset on websites and providing data from Application Programming Interface (API).It is ideal that a dataset provided are complete, however it is possible for provider to distribute a partial set of data via API. Technically open data has openly defined and machine-readable format, which allows to reuse the data. The examples of the machine-readable format are as follows:
 * Geographic Information System (GIS)
 * Extensive Markup Language (XML)
 * Computer-Aided Design (CAD)

Data that is collected by a third-party (Google, Apple, Uber, etc.) or under contract/funding from government sources would fall into this category. The controlled data has value to the current holder and economic influences will govern the accessibility. In addition, the government may not have the right to distribute the data freely due to contractual obligations or unfamiliarity with the data, which places the third-party in control.

Open Data in Transportation24
The NYU Rudin Center for Transportation Policy and Management sums up open transportation data for all actors: "The benefits of opening up data include more efficient travel (with an enhanced ability to find optimal routs on the go), a greater understanding of finance/administration (helping to promote improved funding structures), and crowd-sourced analysis (helping detect schedule improvements or errors)."

Open data in transportation allows government organizations to improve travel from multiple sources rather than directly through agency resources. The role of government is to serve the citizens in the most efficient and useful manner. Providing open data provides a new avenue for determining the most efficient and useful manner for the end user.

Encouraging multi-modal transportation is all about choices – choosing to drive, bike, ride, taxi, or carpool, and backing up those choices with solid information. Therefore, it is in every transportation agency's interest to help make that decision-making as seamless and effortless as possible, by making the information its users need easily and widely accessible. Open data is all bout putting information in the hands of people that need it and providing the data in such a way that keeps costs to a minimum and maximizes flexibility.

Public Agencies & Open Data21
Public agencies are tasked with many directives, most of which purposefully or inadvertently collect data, and must determine whether their information is open to the public. Data collected by these agencies may not have been stored in a manner that allows review and preparation for public consumption.

The Transportation Research Board (TRB) surveyed public transit agencies and determined the following four reasons for not providing open data to the public:
 * Too much effort to produce the data/we do not have time or staff;
 * Too much effort to clean the data;
 * We cannot control what someone will do with the data; and
 * We do not know the accuracy of the data.

The research by TRB shows that public agencies are unsure how to efficiently provide data to the public that can be trusted, from all fronts. Accuracy of the data is an important issue because public agency budget projections or private innovative efforts are dependent upon the data. When data cannot be trusted, both public agencies and private users are harmed.

Further, the TRB survey revealed that 60% of the responding transit agencies determined which data to release openly to the public soley based on the ease of releasing the data. This creates an issue where the public agency does not have the expertise to determine the relevant data for public consumption nor the manpower to properly release the data. Certain data that may be contextually important to other open data is not made available to the public, limiting the possible uses for the open data. If data is only released when it is easy for the entity to do so, there will be a constant struggle between data providers and data consumers as to the accuracy and usefullness of the data.

How does a government agency or locality operate a successful open data program?21
In order to provide open data to the public, a government entity needs to provide the ground rules for engagement. Open data can be the public's only interaction with a government entity and it is important that there are successful program management options.


 * Obtaining and maintaining management-level support for the program;
 * Recognizing the need for appropriate level of resources required to provide and maintain open data portals;
 * Establishing ways to monitor data accuracy, timeliness, reliability, quality, usage, and maintenance;
 * Creating and maintaining licensing and registration; and
 * Having an ongoing dialogue with both developers and customers, a practice shown to increase the value of the data and products that are based on the data.

The main point is that active management of open data is required. Data that is not maintained properly or kept up-to-date will have no use to both the public or the entity that is controlling it. An agency needs to provide adequate resources to tackle the issue. Due to the constant changing demands of technology, constant refreshment to the system is necessary to ensure that the goal of usable open data for all can be met.

Case Studies in Transportation

 * Cases in the U.S.
 * TriMet in Portland Oregon25
 * In 2005, the GIS department at TriMet allowed all transit data to be freely distributed to all parties, in particular Google Transit, a website that allows users to plan transit trips in Google Maps.
 * Publishing data online in an open format and with standard terms allowed TriMet to spark an explosion in development for transit-related applications. Over 50 applications were developed to utilize the newly available data.
 * In 2007, TriMet replaced its online system map with open-source mapping tools rather than proprietary software. The use of open data methods helped Portland's TriMet provide additional options that were not contemplated by the original system.  The goal of the original system was to plan transit-only trips.  Through outside influences, the TriMet trip planning software was constantly updated to provide the ability to plan multi-modal trips, with a combination of transit, walking, and biking options.  The open source nature of the project allowed more information to be aggregated into one central location and provide commuters to make informed transportation decisions.
 * Mayor's Office of Data Analytics (MODA) - New York City30
 * Created in 2013 by Mayor Bloomberg, MODA was formed with the explicit task of collecting, aggregating, controlling, measuring, and analyzing all data and information within New York City.
 * The goal was to remove the requirements of data control from individual departments and agencies within the City and allow for a centralized data sharing and analysis agency. MODA allows the City "to analyze data from across City agencies and other sources to more effectively address crime, public safety, and quality-of-life issues by prioritzing risk more strategically, delivering services more efficiently, and enforcing laws more effectively."30
 * MODA operates the NYC Open Data Portal, which allows citizens to access information to leverage innovation in both public and private pursuits.
 * MODA has 5 strategic initiatives where open data is used:
 * Supporting Operations - Aiding City agencies to more effectively deliver services, including the NYPD, FDNY, Departments of Sanitation and Public Parks/Recreation, and many more.
 * Citywide Data Sharing - Provides a data sharing platform for aggregating and storing all city data, including merging the data onto geographic information.
 * Disaster Response and Resiliency - Provide resource management tools to protect citizens and effectively plan for emergencies.
 * Economic Development - Provide citizens and businesses with the data analytics tools needed to innovate and spur growth within the City.
 * Open Data - Encourage all data within the City be made freely and publicly available so that the City can benefit from innovation and opportunities.


 * Cases in the U.K.
 * In 2010, Transport for London (TfL) published “TfL Digital Strategy 2010-2013”.26 In the strategy, followings are planned:
 * Release access to all key transport data published on the TfL website for re-use by third parties;
 * Use common licensing process for all data; and
 * Work closely with key strategic partners to encourage new services using TfL and others’ data.
 * It provides three types of open data on its website27:
 * Static data files – Data files which rarely change;
 * Feeds – Data files refreshed at regular intervals; and
 * API
 * Data is presented as XML wherever possible. It allows using these data freely with a few rules relating to protect its brand. As of April 18, 2016, around 30 feeds and APIs are provided on its website and over 5,000 developers have registered for the usage of its open data. One example is that ITO World Ltd and Google integrated real time information of disruptions on the London Underground into Google Maps.28
 * In addition to the distribution of data, the U.K government assessment the demand for open data. The assessment was done by the Department for Business Innovation & Skills in 2013. According the assessment, transport data was clearly the most popular in terms of page views and number of applications developed.29

DISCUSSION QUESTIONS

 * How can governments encourage possible providers to distribute their data openly?
 * How can governments protect personal data and privacy, while at the same time encouraging the usage of open data?
 * How can governments ensure the accuracy of open data, especially real time data, such as operating information of public transportation?
 * Can data providers respond the growing demand of users?
 * Is funding best spent on providing open data or providing better service?

RECOMMENDED READINGS

 * The United States Open Data Action Plan
 * https://www.whitehouse.gov/sites/default/files/microsites/ostp/us_open_data_action_plan.pdf
 * TRB Transit Cooperative Research Program Synthesis 115 - Open Data: Challenges and Opportunities for Transit Agencies
 * http://onlinepubs.trb.org/Onlinepubs/tcrp/tcrp_syn_115.pdf
 * TED Talk by Ben Wellington - How We Found the Worst Place to Park in New York City Using Big Data
 * https://www.ted.com/talks/ben_wellington_how_we_found_the_worst_place_to_park_in_new_york_city_using_big_data?language=en
 * Open Data Impacts - A repository of case studies maintained by NYU's Tandon School of Engineering
 * http://odimpact.org/
 * Market Assessment of Public Sector Information
 * https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/198905/bis-13-743-market-assessment-of-public-sector-information.pdf

COMPLETE REFERENCES OF CITED DOCUMENTS
1 Auer, S. R.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. (2007). "DBpedia: A Nucleus for a Web of Open Data". The Semantic Web. Lecture Notes in Computer Science 4825. p. 722. doi:10.1007/978-3-540-76298-0_52. ISBN 978-3-540-76297-3.

2 “The Open Data Handbook,” accessed April 10, 2016, http://opendatahandbook.org/guide/en/.

3 “A Brief History of Open Data,” accessed March 24, 2016, http://www.paristechreview.com/2013/03/29/brief-history-open-data/.

4 “A Brief History of Open Data -- FCW,” accessed March 24, 2016, https://fcw.com/articles/2014/06/09/exec-tech-brief-history-of-open-data.aspx.

5 Transit Cooperative Research Program (TCRP) - Synthesis 115. (2015). Open Data: Challenges and Opportunities for Transit Agencies. Washington, D.C.: Transportation Research Board (TRB).

6 “Transparency and Open Government | The White House,” accessed March 24, 2016, https://www.whitehouse.gov/the_press_office/TransparencyandOpenGovernment. 7 “Data.gov,” Data.gov, accessed April 6, 2016, https://www.data.gov/. 8 “Presidential Memorandum -- Building a 21st Century Digital Government,” Whitehouse.gov, May 23, 2012, https://www.whitehouse.gov/the-press-office/2012/05/23/presidential-memorandum-building-21st-century-digital-government. 9 “Executive Order -- Making Open and Machine Readable the New Default for Government Information | Whitehouse.gov,” accessed March 24, 2016, https://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-.

10 “Data.gov.au,” accessed April 15, 2016, https://www.data.gov.au/.

11 “Open.canada.ca,” accessed April 19, 2016, http://open.canada.ca/en.

12 “Accueil - Data.gouv.fr,” accessed April 19, 2016, https://www.data.gouv.fr/fr/.

13 “GovData | Datenportal Für Deutschland - GovData,” accessed April 19, 2016, https://www.govdata.de/.

14 “Data.go.jp,” accessed April 15, 2016, http://www.data.go.jp/.

15 “Data.go.kr,” accessed April 19, 2016, https://www.data.go.kr/e_main.jsp#/L21haW4=.

16 “Data.gov.sg,” Data.gov.sg, accessed April 15, 2016, https://data.gov.sg/.

17  “Data.gov.uk,” accessed April 15, 2016, https://data.gov.uk/.

18 “Home - Data Portals,” accessed April 19, 2016, http://dataportals.org/.

19 “Making Open Data Real: A Public Consultation” (Government of the United Kingdom, August 2011), https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/78884/Open-Data-Consultation.pdf.

20 American Public Transportation Association (APTA). (August 2015). Public Transportation Embracing Open Data. Policy Development and Research.

21 Transit Cooperative Research Program (TCRP) - Synthesis 115. (2015). Open Data: Challenges and Opportunities for Transit Agencies. Washington, D.C.: Transportation Research Board (TRB). 22 TransitScreen.

23 “Conformant Licenses - Open Definition - Defining Open in Open Data, Open Content and Open Knowledge,” accessed April 17, 2016, http://opendefinition.org/licenses/.

24 Kaufman, S. M. (2012). Getting Started with Open Data: A Guide for Transportation Agencies. New York City: New York University.

25 Institute for Sustainable Communities. "TriMet: Pioneering the Field of Open Data". Snapshot: Portland, Oregon. Accessed: http://sustainablecommunitiesleadershipacademy.org/resource_files/documents/TriMet-Portland-OR.pdf

26 “TfL Digital Strategy 2010-2013” (Transport for London, October 2010).

27 Transport for London | Every Journey Matters, “Open Data Users,” Transport for London, accessed April 19, 2016, https://www.tfl.gov.uk/info-for/open-data-users/.

28 “Open Data White Paper: Unleashing the Potential” (Government of the United Kingdom, June 2012).

29 “Market Assessment of Public Sector Information” (Department for Business Innovation & Skills, Government of the United Kingdom, May 2013).

30 Mayor's Office of Data Analytics (MODA), New York City. Accessed at: http://www1.nyc.gov/site/analytics/index.page.