MATH 189 CSU Stanislaus Exploratory Data Analysis and Inference Discussion Ques

Description

HW questions at the end of casestudy4 file, given the data you needuse exactly the method given to solve the questionsmeet all the format requirements givengiven the sample homework report, check that for hw format(need to have intro, data, analysis, conclusion, advance analysis)given the data you need

8 attachmentsSlide 1 of 8attachment_1attachment_1attachment_2attachment_2attachment_3attachment_3attachment_4attachment_4attachment_5attachment_5attachment_6attachment_6attachment_7attachment_7attachment_8attachment_8

Unformatted Attachment Preview

Introduction
The data
Background
Investigations
Math
189:
Chapter
4 Data Analysis and Inference
Armin Schwartzman Professor Division of Biostatistics and Halıcıoǧlu Data Science Institute University of California, San Diego
Snow gauge
* Main source of Water for Northern California comes from the
Sierra Nevada mountains.
* To help monitor the water supply, the Forest Service of the
United States Department of Agriculture (USDA) operates a
gamma transmission snow gauge in the Central Sierra Nevada
near Soda Springs, CA. The gauge is used to determine a
depth profile of snow density.
* Analysis of the snow pack profile helps with monitoring the
water supply and flood management. It is also a source of
data for the study of climate change.
2
Snow gauge(cont.)
* The gauge does not directly measure snow density. The
density reading is converted from a measurement of gamma
ray emissions.
* Due to instrument wear and radioactive source decay, there
may be changes over the seasons in the functions used to
convert the measured values into density readings.
* To adjust the conversion method, a calibration run is made
each year at the beginning of the winter season.
* In this case study we will develop a procedure to calibrate the
snow gauge from data.
3
Introduction
The data
Background
Investigations
Math
189:
Chapter
4 Data Analysis and Inference
Armin Schwartzman Professor Division of Biostatistics and Halıcıoǧlu Data Science Institute University of California, San Diego
Description
* The data are from a single calibration run of the snow gauge.
* The run consists of placing polyethylene blocks of known
densities between the two poles of the snow gauge and taking
readings on the blocks. The polyethylene blocks are used to
simulated snow.
* The measurements reported are amplified versions of the
gamma photon count made by the detector. We call the
gauge measurement the ”gain”.
* The data available here consists of 10 measurements for each
of 9 densities in grams per cubic centimeter of polyethylene.
4
The Data
5
The calibration process
To be used in practice, the snow gauge needs to map the measured
gamma ray intensity to snow density. However, the experiment is
done in reverse. The calibration process goes as follows.
1. The experiment measures gamma ray intensity as a function
of the density of the polyethylene blocks.
2. From the data, a function is determined that maps density to
gamma ray intensity.
3. The inverse of the above function is used to map gamma ray
intensity to density.
6
Introduction
The data
Background
Investigations
Math
189:
Chapter
4 Data Analysis and Inference
Armin Schwartzman Professor Division of Biostatistics and Halıcıoǧlu Data Science Institute University of California, San Diego
A Physical Model
The gamma rays that are emitted from the radioactive source
may be scattered or absorbed by the polyethylene molecules
between the source and the detector. With denser
polyethelene, fewer gamma rats will reach the detector.
A simplified version of the model that may be workable for the
calibration problem of interest is described here. A gamma ray
on route to the detector passes a number of polyethylene
molecules.The number of molecules depends on the density of
the polyethylene. A molecule may absorb the gamma photon,
bounce it out of the path to the detector, or allow it to pass.
7
A Physical Model
If each molecule acts independently, the chance that a gamma
ray successfully arrives at the detector is p m where p is the
chance that a single molecule will neither absorb nor bounce
the gamma ray, and m is the number of molecules in a
straight line path from the source to the detector.
Let d = Cm be the density, proportional to the number of
molecules m by some unknown constant C .
Let g = Ap m be the instrument gain, proportional to the
probability of detection p m by some unknown constant A.
8
A Physical Model
The gamma ray measurement can be expressed as
g = Ap m = Ae (log p)m = Ae (log p)/C ·(Cm) = Ae
d
where A > 0 and < 0 are unknown coefficients. In other words, the gamma ray measurement decays exponentially with the density. The purpose of the calibration is to estimate the unknown coefficients A and . 9 Linearization The above model can be made linear on the density d by taking a log transformation: log g = log A + d If we observe the gain g , then Y = log g can be modeled as a linear function of the density X = d as: Y = 0 + X + error Once 0 and have been estimated, the model can be inverted to estimate a new density d as a function of a new observed gain g . 10 Introduction The data Background Investigations Math 189: Chapter 4 Data Analysis and Inference Armin Schwartzman Professor Division of Biostatistics and Halıcıoǧlu Data Science Institute University of California, San Diego The aim of this HW is to provide a procedure for converting gain into density when the gauge is in operation. Keep in mind that the calibration experiment was conducted by varying density and measuring the response in gain, but when the gauge is ultimately in use, the density is to be estimated from the measured gain. 1. Raw data: Fit a regression line to the data and plot the fit. Examine the residual plot and explain why a transformation may be necessary. 2. Transformed data: Determine an appropriate transformation and fit the model to the transformed data. Plot the new fit and examine the residuals. Justify your final model using both theoretical and empirical arguments. 3. Robustness: Suppose the densities of the polyethylene blocks are not reported exactly. How might this a↵ect the fit? Use a simulation to answer this question. 11 The aim of this HW is to provide a procedure for converting gain into density when the gauge is in operation. Keep in mind that the calibration experiment was conducted by varying density and measuring the response in gain, but when the gauge is ultimately in use, the density is to be estimated from the measured gain. 4. Forward prediction: Produce point estimates and uncertainty bands for predicting the gain (in the original scale) as a function of the measured density. Can some gains be predicted more accurately than others? Consider specific prediction intervals for densities of 0.508 and 0.001 and compare these intervals to the range of measured gains for those densities. 12 The aim of this HW is to provide a procedure for converting gain into density when the gauge is in operation. Keep in mind that the calibration experiment was conducted by varying density and measuring the response in gain, but when the gauge is ultimately in use, the density is to be estimated from the measured gain. 5. Reverse prediction: The average measured gains for the density values of 0.508 and 0.001 are 38.6 and 426.7, respectively. Invert the forward prediction line and uncertainty bands to produce point estimates and prediction intervals for the density that correspond to the gain measurements 38.6 and 426.7. How do the reverse predictions compare to the true density values? Are some densities harder to predict than other densities? 13 The aim of this HW is to provide a procedure for converting gain into density when the gauge is in operation. Keep in mind that the calibration experiment was conducted by varying density and measuring the response in gain, but when the gauge is ultimately in use, the density is to be estimated from the measured gain. 6. Cross-Validation: The reverse prediction may be influenced by the fact that the measurement corresponding to the densities 0.508 and 0.001 were included in the fitting. To avoid this, omit the set of measurements corresponding to the block of density 0.508, apply your estimation/calibration procedure to the remaining data, and provide an interval estimate for the density of a block with an average reading of 38.6. Where does the actual density fall in the interval? Try the same test, for the set of measurements at the 0.001 density. 14 MATH 189: Exploratory Data Analysis and Inference Spring 2021 HW Submission Format Objective One of the most important goals of this course is to learn how to write a data analysis report. The HW is to be submitted in a format similar to a data analysis report. The difference is that the HW will be more structured so that it can be more easily composed and graded. Structure The overall structure of the HW report should be as follows: 0. Header 1. Introduction 2. Analysis 3. Conclusion(s)/Discussion 4. Appendix/Appendices Now let’s consider the basic outline of the data analysis report in more detail: 0. Header. This includes important general information: • Title: Choose a succinct but specific title that reflects the goals of the analysis. • Author contributions: Include a brief description of the respective contribution of each of the team members. 1. Introduction. Good features for the Introduction include: • Brief summary of the study and data, as well as any relevant substantive context, background, or framing issues. • The “big questions” answered by your data analyses, and summaries of your conclusions about these questions. These questions should include: 1) the questions posed by the HW prompts; 2) other questions that you may propose. • Brief outline of remainder of the report. 2. Basic analysis. In this format, the analysis is organized by research questions. Devote a subsection for each question raised in the Introduction. These questions should be organized according to the HW prompts. Within each subsection, statistical method, analyses, and conclusion would be described (for each question). For example: 2.1 Data processing and summaries Methods Analysis Conclusions 2.2 Comparison between males and females Methods Analysis Conclusions 1 2.3 Effect of Age Methods Analysis Conclusions Etc. . . 3. Advanced analysis. This section contains analysis that goes beyond the HW prompts. It will display your own interest and creativity. It may include: • An additional analysis question, e.g. estimating another parameter, considering the effect of another variable in the data, evaluating the validity of statistical assumptions not considered, etc. • Using a more advanced analysis method to answer one of the HW questions or a new question. 4. Conclusion(s)/Discussion. This section closes the report: • Conclusion summary: It should reprise the questions and goals of the analysis stated in the introduction. It should also summarize the findings and compare them to the original goals. • Discussion: If relevant, include additional observations or details gleaned from the analysis section. If relevant, discuss relevance to the science and other studies. Discuss data limitations. New questions, future work, etc., can also be raised here. 5. Appendix/Appendices. This section is not mandatory but it may be necessary depending on what you do. This is the place to put details and ancillary materials, that is, materials that you want to include but would disturb the reading flow if they were put in the main text. These might include such items as • Technical descriptions of (unusual) statistical procedures • Detailed tables or computer output • Figures and Tables that were not central to the arguments presented in the body of the report 6. Computer code. In a general data analysis report, computer code may be included in the Appendix. In our course, code should be submitted as a separate file. Make sure to document your code by including appropriate section headers, text sentences, comments and annotations, to make it easier for the reader to follow what you are doing. Formatting and length A good data analysis report should present all the necessary information in a concise fashion. To exercise this and facilitate grading, please abide to the following constraints: • Use 12-point font for the main text, with full space between lines. • Start every section in a new page. This will make it easier for you to mark which pages correspond to each graded item in Gradescope. • Length guidelines: o Header + Introduction: 1 page o Each question: 1 to 2 pages each, including tables and figures o Advanced analysis: 1 to 2 pages, including tables and figures o Summary/conclusions/discussion: 1 page • The total length of the report should not exceed 10 pages (not including Appendix or code). Any additional material, if it is really necessary, should go in the Appendix. 2 Presentation style Points will be given for good presentation style and abiding to the formatting constraints. As a guideline, a good data analysis report has several important features: • It is organized in a way that makes it easy for different audiences to skim/fish through it to find the topics and the level of detail that are of interest to them. • The writing is as invisible/unremarkable as possible, so that the content of the analysis is what the reader remembers, not distracting quirks or tics in the writing. Examples of distractions include: – Extra sentences, overly formal or flowery prose, or at the other extreme overly casual or overly brief prose. – Grammatical and spelling errors. – Placing the data analysis in too broad or too narrow a context for the questions of interest to your primary audience. – Focusing on process rather than reporting procedures and outcomes. – Getting bogged down in technical details, rather than presenting what is necessary to properly understand your conclusions on substantive questions of interest to the primary audience. • Tables are well organized, with well labeled columns and rows. Do not make the table too large so that they can be easily followed and the reader does not get lost. • Figures are well composed, with well labeled axes and large enough fonts. If relevant, use colors and line types to distinguish between different results and include a legend. Do not make the figure too busy so that it can be easily understood. 3 Smoking in Mothers Result in Decreasing Birth Weights Benjamin Pham and Xinran Wang Disclaimer The report provided here IS NOT the definitive answer key for Homework 1. This is meant to serve as an example of what we think might be an adequate submission. There are multiple possible answers for these parts that can also get full marks. (This is the header mentioned in the HW guidelines - includes Title, Authors, and Contribution Statement. This DOES NOT count towards the report 10 page limit.) 0. Contribution Statement Both Benjamin Pham and Xinran Wang wrote R code according to their written parts in this work. Both students discussed and implemented the data processing section. Benjamin Pham wrote the Numerical Analysis section, and Graphical Analysis section. Xinran wrote the Incidence section and Conclusion. Both students contributed equally to the Introduction, Advanced Analysis section, and Conclusion. In addition, both students reviewed and added changes to the whole report. 1. Introduction (Background abbreviated, should include literature reviews + citations in motivating the analysis, 1 page) Smoking has remained a highly addictive and destructive habit among adults in the past 50 years despite multitudes of public health advances. Addiction to nicotine cigarettes causes 80% of people that do try to quit to fail and to indulge themselves in their habit despite knowing the dangers of doing so(1). It is no suprise that soon-to-be pregnant mothers, who may have educated themselves on the hazards of smoking while pregnant, start or continue to smoke. Smoking during pregnancy is known to cause adverse effects to fetal development. Small birth weights and early gestational periods due to smoking during pregnancy usually results in a lower survival rate for babies from various problems such as restricted oxygen and nutritional transfer during fetal development(2). The main goal of this analysis is to investigate the differences in distributions of babies’ birth weight to smoking mothers versus non-smoking mothers. In this analysis, we use numerical summaries and graphical methods to describe the distribution of babies’ birth weight, and experiment on our estimates on the low-weight birth weight rate. Numerical summaries include the minimum, maximum, mean, median, standard deviations, kurtosis, skewness, and quantiles of the birth weights for babies born to women who smoked and did not smoke during their pregnancy. Graphical methods, including histograms and Q-Q plots, compare the distributions of the two groups. Incidence experiments, which is run on different classification standards on low birth-weight babies, assessing the robustness of our estimates. We then utilized the Chi-Squared Test of Independence, a hypothesis testing method, to determine if smoking status is associated with low birth weight. Combining all evidences above, we determine whether the differences observed between groups is important. Data The data from babies.txt is part of the Child Health and Development Studies database which details pregnancies occurring between 1960 and 1967 of women enrolled in the Kaiser Foundation Health plan in the Oakland area. The data consists of women in different race. The dataset consists of 1236 male babies who have lived at least 28 days and were all single births (no twins). The two variables of interest are the baby’s birth weight which is a numerical, discrete variable measured in ounces, and smoking status, which is a categorical variable and is represented by an integer indicator, represented as 1 if the mother smoked during her pregnancy and 0 if the mother did not smoke during her pregnancy. 1 2. Basic Analysis (For each question, Provide a methods, analysis, and conclusion section as shown in the guidelines. The method section describes what was conducted to yield the results. The analysis section shows the results. The conclusion section shows the interpretation of the results. You are NOT limited to talking purely about a specific subsection in each conclusion. You can call back to stated results from prior sections. Each section should be 1-2 pages as needed.) 2.1. Data Processing Methods The data was loaded with R. Our basic analysis mainly focused on birth weight (bwt) and mother smoking status (smoke). The data was cleaned where observations with missing values in these columns were removed from our analysis. Analysis The data originally had 1,236 observations. Removing these observations with missing smoke values reduced the dataset to 1,226 observations. The observations in the dataset are distributed unevenly as there are 484 Non-Smoker mothers and 742 Smoker mothers. Conclusion Removing observations in the dataset results in a loss of data. However, there is still an extremely large number of observations in the dataset which shows that the analysis will not be majorly affected by the loss of 10 observations. It is possible to trim even more data from the dataset if missing values in other columns are considered. However, this could result in a loss of potentially important data points. 2 2.2. Numerical Analysis for Birth Weight Distribution of Smoker vs. Non-Smoker Babies Methods A five number summary of the birth weights for babies of both Smoker and Non-Smoker mothers was generated to initially examine the data. A five number summary of data consists of the minimum, 1st Quartile, median, Third Quartile, and maximum of the data. The skewness and kurtosis of birth weights with different smoke statuses were individually calculated. These calculations were then compared to determine the similarity of these distributions to both each other and the Normal Distribution. By this analysis, a normal distribution has a skewness coefficient of approximately 0 and a kurtosis coefficient of approximately 3. To validate this, the kurtosis and skewness of the birth weights in each smoke category were compared to their respective expected normal distribution. Skewness is defined as: Skewness = n 1X Xi − X̄ 3 ( ) n i=1 s Kurtosis = n 1X Xi − X̄ 4 ) ( n i=1 s Kurtosis is defined as: Where n is the number of observations, Xi is the observation, X̄ is the mean, and s is the standard deviation. Analysis Table 1: Summary Statistics of Smoker and Non-Smoker Mothers. The five point summary and the number of observations in each smoker status group. Smoker BWT Non-Smoker BWT Min 1st QRT Median Mean 3rd QRT Max Number of Observations 58 55 102 113 115 123 114.1095 123.0472 126 134 163 176 484 742 The Smoke and Non-Smoking birthweight distributions both have different five point summary statistics. It is notable that the mean birthweight of babies from Smoking Mothers is smaller than that of Non-Smoker Mothers. 3 Table 2: Kurtosis and Skewness of Smoker and Non-Smoker Mothers Compared to Expected Normal Distribution. The kurtosis and skewness of birthweight measurements of each smoke category are compared to those of the expected normal distribution of their respective sample size. Kurtosis Smokers BWT Non-Smokers BWT Expected Normal Distribution Smoke Expected Normal Distribution Non-Smoke Skewness 2.975698 -0.0334909 4.026186 -0.1866062 2.965161 0.0962498 2.958478 0.1042179 From these calculations, the Smoker birthweights distribution has the same kurtosis and skewness as a Normal distribution. The Non-Smoker birthweights distribution seem to deviate a bit from the Smokers birthweights distribution and the Normal Distribution with a kurtosis of 4.03. All distributions are symmetric since their skewness are close to 0. Conclusion From the kurtosis and skewness calculations, it can be seen that the Smoker baby birthweights are indeed different from the Non-Smoker birthweights. The Smoker birthweight distribution seems to be more normal than the Non-smoker birthweights since the Smoker Birthweights have a very similar kurtosis to a random normal distribution. The kurtosis of the NonSmoker birthweights has a more pronounced peak than the Normal Distribution and Smoker Birthweights since the kurtosis is larger. Even though the Non-Smoker birthweight distribution appears to be different from the Normal Distribution, it is considered weakly normally distributed due to Law of Large Numbers (LLN) and Central Limit Theorem (CLT) since there are a sufficiently large number of observations, 742 observations as shown in Section 2.1. This is not enough to confirm that the Smoking and Non-Smoking distributions are normally distributed. From the initial look in the five-point summaries of Smoker Birth Weights and Non-Smoker Birth Weights, there are some slight differences in the summary statistics which can be indicative of different distributions. 4 2.3. Graphical Analysis for Birth Weight Distribution of Smoker vs. Non-Smoker Babies To confirm that the Smoking and Non-Smoking distributions are normally distributed, graphical methods must be used to visualize each respective distribution. Methods A histogram of both Smoking and Non-Smoking birthweights were created because the birthweight is a continuous numeric variable. To compare to a normal distribution, a expected normal curve with the means and standard deviation of Smoking and Non-Smoking birthweights respectively was drawn in red over the respective histograms. Q-Q plots are then used to confirm if the Smoking and Non-Smoking birthweights do indeed come from a Normal Distribution with their mean and sd parameters. Analysis 80 100 140 0.020 180 60 80 100 140 180 Normal Q−Q Plot Normal Q−Q Plot −1 0 1 2 3 60 120 −2 120 non_smoke$bwt Sample Quantiles smoke$bwt 60 −3 0.000 Density 0.020 60 Sample Quantiles Histogram of non_smoke$bwt 0.000 Density Histogram of smoke$bwt −3 Theoretical Quantiles −2 −1 0 1 2 3 Theoretical Quantiles Figure 1: Histogram of Birthweights by Smoking Status (top) and Q-Q plot of Birthweights by Smoking Status (bottom) 5 The data in both Smoking and Non-Smoking birthweights seem to follow the general shape of their respective expected random Normal density curve, which is depicted in red. The dashed blue line represents the mean of each respective birthweight distribution. In each Q-Qplot, the red line represents their theoretical normal distribution. Because the data points are aligned on the red line in both Q-Qplots, this shows that the Smoking and Non-Smoking birthweights are very close to their expected theoretical normal distribution with some minor deviations at the tails of the distribution. These plots highlight potential outliers, which are explored further with boxplots (see Figure 3 in the Appendix). Conclusion From the histograms and the Q-Q plots, we conclude that the Smoking and Non-Smoking birthweights are normally distributed. However, they do not share the same normal distribution since the Non-Smoker birthweight distribution is skinnier than the Smoker birthweight distribution. The mean of the Smoker birthweights are smaller than the mean of the Non-Smoker birthweights. 6 2.4. Incidence of Low Birth Weight Babies Methods We propose to use the number of babies classified as low-birth-weight to estimate the incidence. The incidence rate of low birth weight is defined as: nlow−birth−weight ntotal In order to understand how the incidence of low birth weight changes when the threshold of low birth weight classification is changed, a list of thresholds of low birth weight standard is generated to examine the the robustness of our estimates. The pattern of the change in proportion estimates as the threshold changes will be examined in this section through a scatterplot of low birth weight proportions against possible classification thresholds. The standard deviation of the proportions are also calculated to assess estimate reliability. We will suggest that our estimate is a reliable estimate if this value does not vary much when slightly changing the classification standard. Analysis Using the provided standard (birth weight less than 88.2 oz), there was 40 out of 484 (8.26%) low-birth-weight babies from the smoking mother group, and 23 out of 742 (3.10%) from the non-smoking mother group. Numerically, it is observed that the incidence rate for low-birth-weight babies is lower in the non-smoking mother group compared to the smoking mother group. In the scatterplot below, each point represents the low-birth-weight rate of each group using a sequence of potential classification thresholds. Since more babies will be classified as low-weight babies when the threshold is moved up, we expected a monotonically increasing trend as shown in the scatter plot below. We visually observed that in the neighborhood of the threshold standard that we use (88.2 ounces), no substantial jumps of the incidence estimate is triggered by slight movements in the threshold. 7 SD of Incidence Rate vs Window Size 0.03 0.02 0.00 0.01 0.8 0.6 0.4 0.2 0.04 Smoking Non−Smoking Standard Deviation of Incidence Rate Smoking Non−Smoking 0.0 Proportion of Babies Lower than Threshold 1.0 Low Birth Weight Rate vs Standard 60 70 80 90 100 120 5 Threshold for Low Birth Weight (ounces) 10 15 20 Window Size Figure 2: Proportion of Babies Classified as Low Birth Weight vs Potential Low Birth Weight Baby Thresholds (left) and Standard Deviation of Incidence Rate vs Window Size (right). The scatterplot below illustrates the changes of standard deviation in incidence rate when changing the examining window around the 88.2 ounces standard. We observed that the estimate is more robust for the non-smoking group, as the rise of standard deviation remains to be slow when the window enlarges. Compared to the non-smoking group, the rise in standard deviation is slightly steeper in the smoker group. However, this standard deviation does not look substantial when the window size becomes large at 20. Conclusion Based on the analysis above, we find that the incidence rate for low-birth-weight babies is higher among the smoking mother gorup compared to the non-smoking one in our sample (8.26% vs 3.10%). From our experiments, we conclude that the estimate for the low-birthweight babies is reliable and robust. The estimate does not vary much when a few more or fewer babies were classified as low birth weight. 8 3. Advanced Analysis (Use methods to answer an additional question not asked. You can also use additional methods not covered in class. 1-2 pages.) We have observed from the previous analysis that there are some groupwise-differences in the distributions. We have also suggested that the incidence rate is a reliable estimate. We would then want to assess if there are any statistically significant differences in the incidence rate between the smoking mother groups and the non-smoking mother groups to analyze if mother’s smoking status is associated with babies weigh under 88.2 ounces. Methods In assessing the importance of differences between the incidence rate, and further if smoking status is associated with the low-birth-weight classification, we propose to use a YatesCorrected Chi-Squared Test of Independence. The null hypothesis in this test is that there is no association between mother’s smoking status and baby’s birth weight. The alternative hypothesis is that there exists an association between smoking status and baby birthweight. Under a significance level of 0.05, we plan to reject the null hypothesis if our test statistic is above a critical value of 3.84. Analysis From the result of the Chi-Squared Test of Independence, we observed a p-value less than our level of significance 0.05 (p = 0.00011). The test statistic 14.99 is also larger than our critical value of 3.84 (see Figure 4 for a visualization with the probability density function (pdf)). Therefore, we decided to reject the null hypothesis and conclude that there is a statistically significant association between mother’s smoking status and the incidence of low-birth-weight babies, under the significance level of 0.05. Conclusion Using a chi-squared test of independence (α = 0.05), we conclude that a mother’s smoking status is associated with the incidence of low-weight-babies weighing less or equal to 88.2 ounces. 9 4. Discussion and Conclusion (Summarize your main findings here. Compare and contrast the results from your separate analyses. Compare your overall findings to findings found in other studies - Does it match what others have found? If not, why? Are there limitations in the data? 1 page.) The numerical analysis shows that only confirms that the Smoker birthweights are normally distributed although the Non-Smoker birthweights is weakly proved to be normally distributed as well due to CLT and LLN. The graphical analysis confirms that both the Smoker and Non-Smoker birthweights are normally distributed. The incidence analysis shows that there is a higher proportion of low-birth-weight babies in the smoking group. From the experiments conducted, we also concluded that our estimate of the incidence is reliable and robust. Lastly, the chi-squared test of independence suggests that there is an association between mother’s smoking status and the incidence of low-birth-weight babies when using the original threshold of 88.2 ounces. Several confounders should be considered in the investigation process. This study must account for confounders since the data used in this research was a result of a retrospective observational study. Since the data was not produced by a controlled experiment, we can only infer association and cannot establish a causal relationship. Another limitation is that the experiment was performed with a group of people that is potentially not representative of all mothers. All of these mothers had single births that were male who had survived for at least 28 days. These might be additional confounders that could influence the conclusion. For instance, there could be socio-economic factors that could affect the mother’s health and in extension, the baby’s health. It could also be possible that there is an effect of gestational age on lower birth weight of the baby since birth weight increases with gestational age(3). A future direction in expanding this analysis is to investigate the effect of smoking on gestational age to determine whether a smaller gestational age acts as a mediator in the relationship between smoking status and low birth weight. Although we cannot establish a causal relationship between smoking during pregnancy and low birth weights, this report found a strong association between these two variables. (more writing tie back to the scientific question, and assess if this difference being important to the health of the baby, which is abbreviated here) Furthermore, there is extensive studies that emphasize birth weight as an indicator of the baby’s health since low birth weight babies are more likely to develop complications such as cognitive deficits, motor delays, cerebral palsy, and psychological problems. In fact, low birth weight babies are 20 times more likely to develop fatal complications and die in comparison to normal birth weight babies(4). Therefore, although a causal relationship cannot be found, smoking during pregnancy is something that should not be overlooked. 10 Work Cited and Appendix DOES NOT count towards 10 page limit 5. Work Cited (Abbreviated, would recommend storing citations with a citation manager such as Mendeley (https://www.mendeley.com/download-desktop-new/) or Zotero (https://www.zotero.org/) so you can store your citations and easily make a work cited page. The studies cited can be found in popular scientific literature sites such as pubmed (https://www.ncbi.nlm.nih.gov/p mc/). For this report, I used the citation format commonly found in Nature, but you can choose whatever MLA, ALA citation format you would like to use.) 1. Benowitz, N. L. Nicotine addiction. N. Engl. J. Med. 362, 2295–2303 (2010). 2. Wickstrom, R. Effects of Nicotine During Pregnancy: Human and Experimental Evidence. CN 5, 213–222 (2007). 3. Topçu, H. O. et al. Birth weight for gestational age: A reference study in a tertiary referral hospital in the middle region of Turkey. Journal of the Chinese Medical Association 77, 578–582 (2014). 4. K. C., A., Basel, P. L. & Singh, S. Low birth weight and its associated risk factors: Health facility-based case-control study. PLoS ONE 15, e0234907 (2020). 11 6. Appendix Boxplot of Non−Smoker bwt 60 60 80 80 100 100 120 140 140 160 180 Boxplot of Smoker bwt Figure 3: Boxplot of birthweights separated by Smoking Status. There are a lot of observations with low birthweights in the non-smoker data that can potentially skew the analysis. 12 0.6 0.4 0.2 0.0 dchisq(x, df = 1) 0.8 Chi−Squared Density Plot df = 1 0 5 10 15 20 x Figure 4: Density of Chi-Squared Distribution df = 1. The red line represents the test statistic from the chi-squared test. The p-value is calulated by adding up the sum of the area to the right of the red line. The blue line represents the critical value for the minimimum p-value of 0.05. The p-value is very small because the area under the curve is extremely small. 13 density gain 0.6860 17.60 0.6860 17.30 0.6860 16.90 0.6860 0.6860 18.50 0.6860 18.70 0.6860 17.40 0.6860 18.60 0.6860 0.6040 25.90 0.6040 26.30 0.6040 24.80 0.6040 24.80 0.6040 0.6040 30.50 0.6040 28.40 0.6040 27.70 0.5080 39.40 0.5080 0.5080 37.70 0.5080 36.30 0.5080 38.70 0.5080 39.40 0.5080 0.5080 40.30 0.4120 60.00 0.4120 58.30 0.4120 59.60 0.4120 0.4120 55.00 0.4120 52.90 0.4120 54.10 0.4120 56.90 0.4120 0.3180 92.70 0.3180 90.50 0.3180 85.80 0.3180 87.50 0.3180 0.3180 88.20 0.3180 88.60 0.3180 84.70 0.2230 128.0 0.2230 0.2230 129.0 0.2230 127.0 0.2230 129.0 0.2230 132.0 0.2230 0.2230 133.0 0.1480 199.0 0.1480 204.0 0.1480 199.0 0.1480 0.1480 200.0 0.1480 205.0 0.1480 202.0 0.1480 199.0 0.1480 0.0800 298.0 0.0800 297.0 0.0800 288.0 0.0800 296.0 0.0800 0.0800 299.0 0.0800 298.0 0.0800 293.0 0.0010 423.0 0.0010 0.0010 428.0 0.0010 436.0 0.0010 427.0 0.0010 426.0 0.0010 0.0010 429.0 1 16.20 16.80 27.60 37.60 38.80 59.10 56.00 88.30 130.0 133.0 207.0 199.0 293.0 421.0 428.0 0.6860 0.6040 0.6040 0.5080 0.5080 0.4120 0.3180 0.3180 0.2230 0.2230 0.1480 0.0800 0.0800 0.0010 0.0010 17.10 24.80 28.50 38.10 39.20 56.30 87.00 91.60 131.0 134.0 200.0 298.0 301.0 422.0 427.0 Low Birth Weight Rate vs Standard SD of Incidence Rate vs Window Size 1.0 • Smoking • Non-Smoking Smoking Non-Smoking 0.04 ខំ 0.8 0.03 9'0 Proportion of Babies Lower than Threshold 0.4 Standard Deviation of Incidence Rate 0.02 . . .00....0.000 0.2 0.01 O'Ο 0.00 60 70 80 90 100 120 5 10 15 20 Threshold for Low Birth Weight (ounces) Window Size Figure 2: Proportion of Babies Classified as Low Birth Weight vs Potential Low Birth Weight Baby Thresholds (left) and Standard Deviation of Incidence Rate vs Window Size (right). The scatterplot below illustrates the changes of standard deviation in incidence rate when changing the examining window around the 88.2 ounces standard. We observed that the Histogram of smoke$bwt Histogram of non_smoke$bwt 0.020 0.020 Density Density 0.000 0.000 60 80 100 140 180 60 80 100 140 180 smoke$bwt non_smoke$bwt Normal Q-Q Plot Normal Q-Q Plot Sample Quantiles 60 120 Sample Quantiles 60 120 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 Theoretical Quantiles Theoretical Quantiles Figure 1: Histogram of Birthweights by Smoking Status (top) and Q-Q plot of Birthweights by Smoking Status (bottom) Purchase answer to see full attachment Tags: data analysis regression model gamma ray measurement large amount of data actual density User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Reviews, comments, and love from our customers and community:

This page is having a slideshow that uses Javascript. Your browser either doesn't support Javascript or you have it turned off. To see this page as it is meant to appear please use a Javascript enabled browser.

Peter M.
Peter M.
So far so good! It's safe and legit. My paper was finished on time...very excited!
Sean O.N.
Sean O.N.
Experience was easy, prompt and timely. Awesome first experience with a site like this. Worked out well.Thank you.
Angela M.J.
Angela M.J.
Good easy. I like the bidding because you can choose the writer and read reviews from other students
Lee Y.
Lee Y.
My writer had to change some ideas that she misunderstood. She was really nice and kind.
Kelvin J.
Kelvin J.
I have used other writing websites and this by far as been way better thus far! =)
Antony B.
Antony B.
I received an, "A". Definitely will reach out to her again and I highly recommend her. Thank you very much.
Khadija P.
Khadija P.
I have been searching for a custom book report help services for a while, and finally, I found the best of the best.
Regina Smith
Regina Smith
So amazed at how quickly they did my work!! very happy♥.