Capstone data projects are designed to create a more transferrable experience for students in a beginning statistics course. Following the Guidelines for Assessment and Instruction in Statistical Education (GAISE) recommendations, these data projects use real data and require the use of technology to fully explore statistical concepts. Expand each of the sections below to see a brief description of each dataset/project and preliminary discussion questions that can be used with students at the beginning of the project.
In 1979 and 1983, two of the earliest studies in the US were conducted to determine the relationship between children’s lung (pulmonary) function and the absence or presence of cigarette smoke, whether passively or actively inhaled.
In particular, researchers from these two studies measured the forced expiratory volume of children aged 3-19. Forced expiratory volume measures how much air (in liters) a person can exhale during a forced breath. To perform pulmonary function tests such as FEV, the patient is asked to take the deepest breath they can, and then exhale into the sensor as hard as possible, for as long as possible, preferably at least 6 seconds. Sometimes, the test will be preceded by a period of quiet breathing in and out from the sensor (tidal volume).
The maneuver is highly dependent on patient cooperation and effort, and is normally repeated at least three times to ensure quality of results. Due to the effort required, pulmonary function tests can only be used on children old enough to comprehend and follow the instructions given. Other types of lung function tests are available for infants and unconscious persons.
Average values for FEV in healthy people depend on varying factors, as we will examine during the capstone. Values of between 80% and 120% of the average value are considered a normal range.
In 2003, the New Haven Fire Department had seven openings for Captain and eight openings for Lieutenant. The department gave civil service examinations for fill the open positions. The exams consisted of two parts: a written exam worth 60% and an oral exam worth 40%. A total score greater than or equal to 70% was considered a passing grade.
After reviewing (and publishing) the test results, the city of New Haven decided the test was discriminatory against black candidates. Because no black firefighters were eligible for advancement, the city threw out the results.
Ricci and sixteen other white test takers, plus one Hispanic, all of whom would have qualified for consideration for a promotion, sued the city of New Haven including the Mayor John DeStefano, Jr. Their suit claimed that, by discarding the test results, the city discriminated against the plaintiffs based on their race. The city officials defended their actions, arguing that if they had certified the results they could have faced liability for adopting a practice that had an adverse impact on the minority firefighters.
Adverse impact is defined as a substantially different rate of selection in hiring, promotion or other employment decision which works to the disadvantage of members of a race, sex, or ethnic group. Title VII of the Civil Rights Act of 1964 prohibits employment discrimination on the basis of race, color, religion, sex, or national origin (these groups are referred to as protected classes.
Adverse impact is generally the first step in establishing evidence of discrimination under Title VII. The burden is on the plaintiff to show that an employment decision adversely impacted a protected class. The finding of adverse impact shifts the burden of proof to the defendant and would require the employing organization to defend the employment decision in question by providing evidence that the process used to make the decision was valid.
According to the Appalachia Regional Commission, the Appalachian Region is a205,000-square-mile region that follows the spine of the Appalachian Mountains from southern New York to northern Mississippi. It includes all of West Virginia and parts of 12 other states: Alabama, Georgia, Kentucky, Maryland, Mississippi, New York, North Carolina, Ohio, Pennsylvania, South Carolina, Tennessee, and Virginia. Forty-two percent of the Region's population is rural, compared with 20 percent of the national population.
The Appalachian Region's economy, once highly dependent on mining, forestry, agriculture, chemical industries, and heavy industry, has become more diversified in recent times, and now includes manufacturing and professional service industries. Appalachia has come a long way in the past five decades: its poverty rate, 31 percent in 1960, was 16.6 percent over the 2008–2012 period. The number of high-poverty counties in the Appalachian Region (those with poverty rates more than 1.5 times the U.S. average) declined from 295 in 1960 to 107 over the 2008–2012 period.
These gains have transformed the Region from one of widespread poverty to one of economic contrasts: some communities have successfully diversified their economies, while others still require basic infrastructure such as roads and water and sewer systems. The contrasts are not surprising in light of the Region's size and diversity. The Region includes 420 counties in 13 states. It extends more than 1000 miles, from southern New York to northeastern Mississippi and is home to more than 25 million people.
The dataset we will be working with this semester is based on US Census data analyzed and collated by the Appalachia Regional Commission. The data is arranged by both state and county so that regional differences can be examined.
Case 1 - According to the article written by Maria L. La Ganga and Tina Susman in the Los Angeles Times; Nov. 16, 2014
James Boyd, a 38-year-old mentally ill homeless man who suffered from delusions, was camping illegally in the Sandia Foothills on March 16 of this year when Albuquerque police officers tried to arrest him. During a standoff, Boyd waved two knives, and 41 officers from various agencies surrounded him. “Finally, when Mr. Boyd appeared to be surrendering, officers threw a flash bang at him, released a dog to take him down, and shot him with a taser rifle,” according to a wrongful death suit filed against the city. “As Mr. Boyd turned away from the officers, two officers shot three rounds each, hitting him three times, twice in his side and back and once on his arm.” Boyd’s last words on the recording were, “Please don’t hurt me,” and “I can’t move.” Boyd was taken by ambulance to the University of New Mexico Hospital. His right arm was amputated and his spleen and intestine were removed. He died at 2:55 a.m. on March 17.
Case 2 - According to the article written by Richard Fausset in the Los Angeles Times; June 2, 2002
A 4-year-old girl was killed Saturday morning when an auto-theft suspect being pursued by Los Angeles police ran a red light on a busy downtown street, causing a chain-reaction accident that knocked over a traffic light, crushing the girl, authorities said.
Perhaps surprisingly, both of these cases would be considered “police-involved” deaths. Police-involved deaths is a much broader data base than the often-assumed “police shootings”. It includes any incident in which police were called and a death(s) occurred. Understandably, concern about police involved deaths has become a very emotional and divisive controversy in this country. Allegations of racial bias and targeting abound; as does the charge that police often default to lethal responses when use of less force would have been sufficient. In response, policing agencies remind us that while in hindsight a lower level response may seem more reasonable, the choice of response “in the moment” is often informed by intangibles like perceived threat level, adrenalin surge and many other factors which are hard to measure after the fact.
In an attempt to examine the magnitude of the problem nation-wide, many people have begun to search for a database of police involved deaths of all types. It may be surprising to know that no one complete source of this information is being collected.
Recently a nonprofit group called Fatal Encounters has taken on the challenge of collecting this data from all over the US and placing it into a single searchable database. The organization chose to consider only incidents occurring on or after January 1, 2000. It is entirely self-funded and run with volunteers. The only paid employees are data-entry assistants. In this Capstone, we will be examining this database and trying to extract meaning from it.
In the current political climate it is difficult for Americans to dispassionately consider the impact of guns in our society. The issue of easy access to guns and what collateral problems this fact may, or may not, create has become a focus for many groups. Discourse about this subject has become rather divisive as the level of gun violence has risen. The rhetoric of the debate implies that conclusions about the effects of high gun ownership can be easily determined through the application of simple logic. If so, why are there so many diverse conclusions about guns in our society?
In this capstone project we will attempt to analyze data regarding guns, gun ownership and crimes. The data has been collected from 26 “developed” countries which have highly developed economies and advanced technological infrastructure. This group of countries has many traits in common and none are engaged in an active war on their own territory. When considering the domestic ownership and use of guns, nations engaged in organized, armed conflicts on their own soil must be eliminated from the comparison.
The data was collected from NationMaster and from the World Health Organization (WHO). NationMaster is a global team of statistical analysts who mine data from many sources, such as the CIA World Factbook and the United Nations data collections. In general, the data has been adjusted to show “per capita” rates which makes analysis more straight forward
The dataset for Global Health has been compiled from the massive dataset of the World Health Organization. The World Health Organization (WHO) is a specialized agency of the United Nations that is concerned with international public health. It was established on 7 April 1948, and is headquartered in Geneva, Switzerland.
Since its creation, WHO has played a leading role in the eradication of smallpox. Its current priorities include communicable diseases, in particular HIV/AIDS, Ebola, malaria and tuberculosis; the mitigation of the effects of non-communicable diseases; sexual and reproductive health, development, and aging; nutrition, food security and healthy eating; occupational health; substance abuse; and driving the development of reporting, publications, and networking.
The WHO is responsible for the World Health Report, the worldwide World Health Survey, and World Health Day. The Director-General of WHO is Tedros Adhanom who started his five-year term on 1 July 2017.
The dataset includes a wide range of countries but does not attempt to include data from all countries. The selected variables range from economic indicators (average per capita income) to health care delivery reflected in whether or not the country has universal health care
The Global Terrorism Database (GTD) is maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism at the University of Maryland. Using many sources for governments and organizations worldwide, the GTD collects and vets incidents of terrorism for the database. To determine whether a violent incident should be included in the database, the GTD applies a definition of terrorism. The GTD defines a terrorist attack as the threatened or actual use of illegal force and violence by a non-state actor to attain a political, economic, religious, or social goal through fear, coercion, or intimidation. In practice to be included as a terrorist event in the database the incident must have all three of the following characteristics:
In addition, the event must have at least two of the following three additional criteria:
The database we will examine is extracted from the full GTD database and includes only data from 2014. The variables that we will consider include global region as defined by the GTD, country where incident occurred, date, type of attack (weapon), terrorist group responsible (if known), number of deaths and number of wounded.
For more information about these projects, including access to datasets and project tasks, contact SADobbyn 'at' pstcc.edu or BLMosby 'at' pstcc.edu.