Skip to main content

8 – Available Data

Why Should I Care?

Many – if not most – researchers are producing reports based on data that was collected by someone else. This is convenient because it saves time and money. But it is also tricky because there may be gaps or errors in the data that the end-use researcher is not aware of. If you

Usefulness

The point of this is to use data that already exists. Using Available Data is especially easy if you download a data set from a website on the internet. Government statistical agencies, such as Statistics Canada, are obvious go-to sources of Available Data. However there are many, many more sources, such as government websites, corporate filings, and even police reports.

This method is especially useful because you can use many data sets together. If the data sets have a common variable, you can merge them together into one database.

Much of the Available Data on the internet or in libraries is collected for a pre-determined purpose, but researchers can use them for other purposes, and also merge data sets together, so that they can analyze all sorts of information in ways that were not necessarily meant to be.

Available data is often times quantitative in nature, but can also include many qualitative elements. For example, a yearly annual report published by an organization may be a good source of available qualitative data. It might even include facts or opinions that individuals would not feel comfortable to share in an interview. In the world of business, much of this qualitative data is available in legal filings that publically-owned corporations must divulge. This includes risk assessments and management strategy which are wholly qualitative in nature.

Merging Three Data Sets Into One

For example, if you have the telephone listings such as a business phone book, or a directory, you usually have many bits of information such as company name, address, telephone number and postal code.

Using a geographical information system (GIS) database, you can associate postal codes to a geography, such as longitudes and latitudes.

Using a third database with company information, including sales, profits, and number of employees, you now have a complete data set with company data that you can map using computers.

To merge the data sets, you will need strong software, such as R, Python, MS Access, or SPSS. MS Excel can work, but only if the number of lines of data is less than 1 million.

In this example, start by merging with the phone book and the GIS, using Postal Codes as a common variable.

Then you can merge the financial information to the rest, using Company Name as the common variable.

You will have to explain this in the methodology section of your paper, citing all three data sets.

 

Objects of Measurement

Type of Object

Yes

No

Maybe

Data Source

Personal Characteristic

 X



StatsCan Age Figures

Socio-Demographic Characteristic

 X



StatsCan Language Figures

Opinion

 X



Political Polls in Media

Motivations


 X


      Do an Interview

Ideology



      Do an Interview

Biases / Prejudice



X

Frequency of Hate Crimes, Police Records

Preferences


 X


      See results of prior Natural experiment

Personal History / Background


 X


      Do an Interview, Unobstrusive Measrmt

Family Dynamics


 X


      Do an Interview

Cultural History



      Do an Interview

Perception / Self-Perception



      Do an Interview

Aptitude /Ability


 X


School math tests / World rankings

Behaviour

 X



Stats Can Sales Figures / Census

Level of Knowledge

 X



School tests

 Sources of Available Data

Source                                                                Example

Corporation’s Annual Report                            Annual Sales ($), Profits, Expenditures

Statistical Agencies                                           Census Program, Labour Force Survey

Lobbies and Associations                                 Association of Canadian Petroleum Companies

Public Registers                                                 Indian Registry, Gun Registry, Dangerous Criminals,

Public Institutions                                              Annual Reports by Police, Schools, Hospitals, etc.

Sampling

Since data collection is already done, the researcher must live with the sampling technique, and its flaws, that the previous researcher/organization has chosen and used.

Operationalization of variables can be a challenge. Some phenomena are hard to measure, but can be observed indirectly. One should be careful in the interpretation.

For example, if you wish to measure innovation, your only source of data is a patents data sets. This is an issue because lots of innovations are kept secret, but we don’t know how many of them are secret, since the company behind the innovation wants this to be confidential.

Instruments

No instrument. Data is previously collected. Usually survey.

Scientific Power

Exploratory: can be. If you don’t know what you are looking for, you can draw graphs for fun to look for stories.

Descriptive: most likely. You will associate variables, using tables and graphs.

Explanatory: can be done. Tread lightly. The variables that don’t correlate can be eliminated. Those that do correlate will be kept. However, be sure to establish temporal order first, that will determine the cause-effect relationship.

Steps
  1. Find data source
  2. Download
  3. Analyze, Interpret and Report
Advantages
  1. Fast
  2. Lots of quantitative
  3. Allows more time for analysis of data
Disadvantages
  1. Methodology faults may be hidden
  2. Operationalization can be difficult
  3. Gaps in data may exist
Reporting

Tables & Graphs

Spatial Maps

Descriptive Text

Synthesis Tables

Preferred Disciplines

 Economics, Sociology, Geography, Political Science

Other Non-scientific Disciplines

Applications in Journalism, NGO reports and memoirs, lobby group analysis papers, marketing, geographic advertising, Legal Due Diligence, Crime investigation

Not so useful for

Historians, Psychologists, Anthropologists