Assess data accessibility, quality, and security
Determine accessibility and quality of data
Once you have compiled your "wish list" of indicators, it is time to determine the accessibility and quality of the data. To prevent complication, you may wish to prioritize relevant indicators for which you are already collecting data. Avoid the temptation of adopting a technology solution to collect new data at this point, such as a photo survey tool or other app. For now, you are trying to determine which datasets are relevant, accessible, and high quality. Be sure to work with your key internal partners while you are identifying your measures. Department staff who work on programs are familiar with the indicators that will accurately support the outcomes and database administrators will be valuable when discussing accessibility and quality of the data.
A brief list of considerations for the accessibility of your data is below:
- Are we currently collecting this dataset or will we be collecting it for the first time?
- How far back does the record go (time horizon)?
- Would we need new technology to collect or analyze the data?
- Can the data be collected manually or can it be automated?
- How frequently is the data produced (continuously, hourly, daily, weekly, monthly, quarterly, yearly, etc.)?
- Is the data available for the entire City population or a subset of the population?
- How granular is the data (individual level, Census block, neighborhood, citywide, etc.)?
- Does the data contain sensitive information?
- Is there a proxy measure that we can use for now?
- Is the data machine readable?
As noted by the Wilson Sheehan Lab for Economic Opportunities (LEO) at the University of Notre Dame, leveraging administrative data can significantly reduce the costs and timelines of rigorous evaluation. Cities can establish an internal data sharing agreement across departments, including language that enables administrative data to be used for research purposes. Cities can also identify external administrative data-holders such as a county, school district, police department, hospital, or another state that may have outcome data important for measuring program impact. Cities can leverage existing relationships and/or work with research institutions to develop streamlined data sharing processes and agreements with these external entities.
Evaluate the quality of your data
Quality is as important as accessibility. To evaluate data quality, you may wish to create a rubric like the one below:
|3||Excellent Data Quality: The data is complete, accurate, and updated regularly.|
|2||Adequate Data Quality: The data is somewhat complete, accurate, and updated fairly regularly.|
|1||Poor Data Quality: The data is not complete, accurate, or updated regularly.|
This is a great time to build on the data inventory that you started in the "gather information" page. Suggested additional fields to add to your data inventory template are below:
|Program||Dataset Title||Dataset Description||Data Source||Department or external partner||Does the City already collect this data?||Sensitivity ranking||Quality ranking|
|What is the name of the program?||Human-readable name of the dataset||Should be understandable by non-technical users||What is the original system/application/file that houses it?||Department or external partner that maintains the data||Is this data already collected?||Sensitivity score based on your city's rubric||Quality score based on your city's rubric|
Update frequency - The file format for the dataset. Typically csv, xlsx, shapefile, txt, etc.
Format - How often the dataset is updated
Time range - The period of time covered in this particular dataset.
Data use - How is this dataset commonly used?
Data users - Who typically uses this dataset?
Data owner's title - Title of the data owner
Data owner's position - Data Owner's job title
Data owner's email address - Data Owner's email address. Who manages the data and/or is responsible for granting permission to access the data? Who understands what the dataset includes and can answer questions about it?
Data quality concerns - Describe the concerns you may have about the quality of the dataset
Division - Division where the data is maintained. If more than one division is responsible, list the primary owner of the dataset.
Sensitive data comments - Describe the issues that would prevent the city from sharing the data publicly. Common sensitivity concerns include privacy violations, security issues, or high cost or staff demands to post the data.
Publishing status - This the place to put any comments regarding the current status of the dataset.
Published link - This is the link to the published dataset on the open data portal or city website.
Review data security and privacy protection protocols
Often, a policy change also involves a change in the data that is being collected, how that data is being handled, who has access to it, and a host of other adjustments. As you take stock of the data assets that are available for your initiative, it is vital to also think through what will be done to protect that data and the privacy of the people it represents. A big part of good data security involves strong data governance practices that define and identify sensitive data and consciously work to balance the value and risk of storing such sensitive data. One way of evaluating your data comes from case law related to the US Freedom of Information Act, which we have translated to the checklist below:
- Would disclosure result in a substantial invasion of privacy?
- Have we considered the extent or value of the public interest and the purpose or object of the individuals seeking disclosure?
- Is the information available from other sources?
- Was the information given with an expectation of confidentiality?
- Is it possible to mould relief so as to limit the invasion of individual privacy?
These risks could be mitigated using a variety of strategies, from aggregating data to obfuscating individuals, to redacting fields that may include Personally Identifiable Information (PII), or re-evaluating your data collection methods to find alternative ways of getting the information you need. These steps should be codified and managed by a data governance committee if possible. For the sensitive data that is deemed essential, create a plan around potential data breaches in order to quickly and confidently respond to emergency situations.
Proactively preparing for these situations is often the best strategy for data security, but do not forget the technical side. Involve your IT administrators in conversations about data storage and access, ensuring that they will be protected by enterprise-grade cybersecurity systems.