A00-240 Exam Braindumps 2024

Killexams A00-240 Exam Braindumps includes latest syllabus of SAS Statistical Business Analysis SAS9: Regression and Model exam with up-to-date exam contents | Actual Questions - Mahfia.tv

A00-240 PDF Dump Detail

A00-240 Exam Braindumps and VCE


Our products includes A00-240 PDF and VCE;

  • PDF Exam Questions and Answers : A00-240 Exam Braindumps contains complete pool of A00-240 Questions and answers in PDF format. PDF contains actual Questions with March 2024 updated SAS Statistical Business Analysis SAS9: Regression and Model Braindumps that will help you get high marks in the actual test. You can open PDF file on any operating system like Windows, MacOS, Linux etc or any device like computer, android phone, ipad, iphone or any other hand held device etc. You can print and make your own book to read anywhere you travel or stay. PDF is suitable for high quality printing and reading offline.
  • VCE Exam Simulator 3.0.9 : Free A00-240 Exam Simulator is full screen windows app that is like the exam screen you experience in actual test center. This sofware provide you test environment where you can answer the questions, take test, review your false answers, monitor your performance in the test. VCE exam simulator uses Actual Exam Questions and Answers to take your test and mark your performance accordingly. When you start getting 100% marks in the exam simulator, it means, you are ready to take real test in test center. Our VCE Exam Simulator is updated regularly. Latest update is for March 2024.

SASInstitute A00-240 Exam Braindumps

We offer SASInstitute A00-240 Exam Braindumps containing actual A00-240 exam questions and answers. These Exam Braindumps are very useful in passing the A00-240 exams with high marks. It is money back guarantee by killexams.com

Real SASInstitute A00-240 Exam Questions and Answers

These A00-240 questions and answers are in PDF files, are taken from the actual A00-240 question pool that candidate face in actual test. These real SASInstitute A00-240 exam QAs are exact copy of the A00-240 questions and answers you face in the exam.

SASInstitute A00-240 Practice Tests

A00-240 Practice Test uses the same questions and answers that are provided in the actual A00-240 exam pool so that candidate can be prepared for real test environment. These A00-240 practice tests are very helpful in practicing the A00-240 exam.

SASInstitute A00-240 Exam Braindumps update

A00-240 Exam Braindumps are updated on regular basis to reflect the latest changes in the A00-240 exam. Whenever any change is made in actual A00-240 test, we provide the changes in our A00-240 Exam Braindumps.

Complete SASInstitute A00-240 Exam Collection

Here you can find complete SASInstitute exam collection where Exam Braindumps are updated on regular basis to reflect the latest changes in the A00-240 exam. All the sets of A00-240 Exam Braindumps are completely verified and up to date.

SAS Statistical Business Analysis SAS9: Regression and Model Exam Braindumps

Killexams.com A00-240 Exam Braindumps contain complete question pool, updated in March 2024 including VCE exam simulator that will help you get high marks in the exam. All these A00-240 exam questions are verified by killexams certified professionals and backed by 100% money back guarantee.


A00-240 SAS Statistical Business Analysis SAS9: Regression and Model candidate | [HOSTED-SITE]

A00-240 candidate - SAS Statistical Business Analysis SAS9: Regression and Model Updated: 2024

Simply memorize these A00-240 A00-240 Questions and Pass the real test
Exam Code: A00-240 SAS Statistical Business Analysis SAS9: Regression and Model candidate January 2024 by Killexams.com team

A00-240 SAS Statistical Business Analysis SAS9: Regression and Model

This test is administered by SAS and Pearson VUE.

60 scored multiple-choice and short-answer questions.

(Must achieve score of 68 percent correct to pass)

In addition to the 60 scored items, there may be up to five unscored items.

Two hours to complete exam.

Use test ID A00-240; required when registering with Pearson VUE.



ANOVA - 10%

Verify the assumptions of ANOVA

Analyze differences between population means using the GLM and TTEST procedures

Perform ANOVA post hoc test to evaluate treatment effect

Detect and analyze interactions between factors



Linear Regression - 20%

Fit a multiple linear regression model using the REG and GLM procedures

Analyze the output of the REG, PLM, and GLM procedures for multiple linear regression models

Use the REG or GLMSELECT procedure to perform model selection

Assess the validity of a given regression model through the use of diagnostic and residual analysis



Logistic Regression - 25%

Perform logistic regression with the LOGISTIC procedure

Optimize model performance through input selection

Interpret the output of the LOGISTIC procedure

Score new data sets using the LOGISTIC and PLM procedures



Prepare Inputs for Predictive Model Performance - 20%

Identify the potential challenges when preparing input data for a model

Use the DATA step to manipulate data with loops, arrays, conditional statements and functions

Improve the predictive power of categorical inputs

Screen variables for irrelevance and non-linear association using the CORR procedure

Screen variables for non-linearity using empirical logit plots



Measure Model Performance - 25%

Apply the principles of honest assessment to model performance measurement

Assess classifier performance using the confusion matrix

Model selection and validation using training and validation data

Create and interpret graphs (ROC, lift, and gains charts) for model comparison and selection

Establish effective decision cut-off values for scoring



Verify the assumptions of ANOVA

 Explain the central limit theorem and when it must be applied

 Examine the distribution of continuous variables (histogram, box -whisker, Q-Q plots)

 Describe the effect of skewness on the normal distribution

 Define H0, H1, Type I/II error, statistical power, p-value

 Describe the effect of sample size on p-value and power

 Interpret the results of hypothesis testing

 Interpret histograms and normal probability charts

 Draw conclusions about your data from histogram, box-whisker, and Q-Q plots

 Identify the kinds of problems may be present in the data: (biased sample, outliers, extreme values)

 For a given experiment, verify that the observations are independent

 For a given experiment, verify the errors are normally distributed

 Use the UNIVARIATE procedure to examine residuals

 For a given experiment, verify all groups have equal response variance

 Use the HOVTEST option of MEANS statement in PROC GLM to asses response variance



Analyze differences between population means using the GLM and TTEST procedures

 Use the GLM Procedure to perform ANOVA

o CLASS statement

o MODEL statement

o MEANS statement

o OUTPUT statement

 Evaluate the null hypothesis using the output of the GLM procedure

 Interpret the statistical output of the GLM procedure (variance derived from MSE, Fvalue, p-value R**2, Levene's test)

 Interpret the graphical output of the GLM procedure

 Use the TTEST Procedure to compare means Perform ANOVA post hoc test to evaluate treatment effect



Use the LSMEANS statement in the GLM or PLM procedure to perform pairwise comparisons

 Use PDIFF option of LSMEANS statement

 Use ADJUST option of the LSMEANS statement (TUKEY and DUNNETT)

 Interpret diffograms to evaluate pairwise comparisons

 Interpret control plots to evaluate pairwise comparisons

 Compare/Contrast use of pairwise T-Tests, Tukey and Dunnett comparison methods Detect and analyze interactions between factors

 Use the GLM procedure to produce reports that will help determine the significance of the interaction between factors. MODEL statement

 LSMEANS with SLICE=option (Also using PROC PLM)

 ODS SELECT

 Interpret the output of the GLM procedure to identify interaction between factors:

 p-value

 F Value

 R Squared

 TYPE I SS

 TYPE III SS



Linear Regression - 20%



Fit a multiple linear regression model using the REG and GLM procedures

 Use the REG procedure to fit a multiple linear regression model

 Use the GLM procedure to fit a multiple linear regression model



Analyze the output of the REG, PLM, and GLM procedures for multiple linear regression models

 Interpret REG or GLM procedure output for a multiple linear regression model:

 convert models to algebraic expressions

 Convert models to algebraic expressions

 Identify missing degrees of freedom

 Identify variance due to model/error, and total variance

 Calculate a missing F value

 Identify variable with largest impact to model

 For output from two models, identify which model is better

 Identify how much of the variation in the dependent variable is explained by the model

 Conclusions that can be drawn from REG, GLM, or PLM output: (about H0, model quality, graphics)

Use the REG or GLMSELECT procedure to perform model selection



Use the SELECTION option of the model statement in the GLMSELECT procedure

 Compare the differentmodel selection methods (STEPWISE, FORWARD, BACKWARD)

 Enable ODS graphics to display graphs from the REG or GLMSELECT procedure

 Identify best models by examining the graphical output (fit criterion from the REG or GLMSELECT procedure)

 Assign names to models in the REG procedure (multiple model statements)

Assess the validity of a given regression model through the use of diagnostic and residual analysis

 Explain the assumptions for linear regression

 From a set of residuals plots, asses which assumption about the error terms has been violated

 Use REG procedure MODEL statement options to identify influential observations (Student Residuals, Cook's D, DFFITS, DFBETAS)

 Explain options for handling influential observations

 Identify collinearity problems by examining REG procedure output

 Use MODEL statement options to diagnose collinearity problems (VIF, COLLIN, COLLINOINT)



Logistic Regression - 25%

Perform logistic regression with the LOGISTIC procedure

 Identify experiments that require analysis via logistic regression

 Identify logistic regression assumptions

 logistic regression concepts (log odds, logit transformation, sigmoidal relationship between p and X)

 Use the LOGISTIC procedure to fit a binary logistic regression model (MODEL and CLASS statements)



Optimize model performance through input selection

 Use the LOGISTIC procedure to fit a multiple logistic regression model

 LOGISTIC procedure SELECTION=SCORE option

 Perform Model Selection (STEPWISE, FORWARD, BACKWARD) within the LOGISTIC procedure



Interpret the output of the LOGISTIC procedure

 Interpret the output from the LOGISTIC procedure for binary logistic regression models: Model Convergence section

 Testing Global Null Hypothesis table

 Type 3 Analysis of Effects table

 Analysis of Maximum Likelihood Estimates table



Association of Predicted Probabilities and Observed Responses

Score new data sets using the LOGISTIC and PLM procedures

 Use the SCORE statement in the PLM procedure to score new cases

 Use the CODE statement in PROC LOGISTIC to score new data

 Describe when you would use the SCORE statement vs the CODE statement in PROC LOGISTIC

 Use the INMODEL/OUTMODEL options in PROC LOGISTIC

 Explain how to score new data when you have developed a model from a biased sample

Prepare Inputs for Predictive Model



Performance - 20%

Identify the potential challenges when preparing input data for a model

 Identify problems that missing values can cause in creating predictive models and scoring new data sets

 Identify limitations of Complete Case Analysis

 Explain problems caused by categorical variables with numerous levels

 Discuss the problem of redundant variables

 Discuss the problem of irrelevant and redundant variables

 Discuss the non-linearities and the problems they create in predictive models

 Discuss outliers and the problems they create in predictive models

 Describe quasi-complete separation

 Discuss the effect of interactions

 Determine when it is necessary to oversample data



Use the DATA step to manipulate data with loops, arrays, conditional statements and functions

 Use ARRAYs to create missing indicators

 Use ARRAYS, LOOP, IF, and explicit OUTPUT statements



Improve the predictive power of categorical inputs

 Reduce the number of levels of a categorical variable

 Explain thresholding

 Explain Greenacre's method

 Cluster the levels of a categorical variable via Greenacre's method using the CLUSTER procedure

o METHOD=WARD option

o FREQ, VAR, ID statement



Use of ODS output to create an output data set

 Convert categorical variables to continuous using smooth weight of evidence



Screen variables for irrelevance and non-linear association using the CORR procedure

 Explain how Hoeffding's D and Spearman statistics can be used to find irrelevant variables and non-linear associations

 Produce Spearman and Hoeffding's D statistic using the CORR procedure (VAR, WITH statement)

 Interpret a scatter plot of Hoeffding's D and Spearman statistic to identify irrelevant variables and non-linear associations Screen variables for non-linearity using empirical logit plots

 Use the RANK procedure to bin continuous input variables (GROUPS=, OUT= option; VAR, RANK statements)

 Interpret RANK procedure output

 Use the MEANS procedure to calculate the sum and means for the target cases and total events (NWAY option; CLASS, VAR, OUTPUT statements)

 Create empirical logit plots with the SGPLOT procedure

 Interpret empirical logit plots



Measure Model Performance - 25%

Apply the principles of honest assessment to model performance measurement

 Explain techniques to honestly assess classifier performance

 Explain overfitting

 Explain differences between validation and test data

 Identify the impact of performing data preparation before data is split Assess classifier performance using the confusion matrix

 Explain the confusion matrix

 Define: Accuracy, Error Rate, Sensitivity, Specificity, PV+, PV-

 Explain the effect of oversampling on the confusion matrix

 Adjust the confusion matrix for oversampling



Model selection and validation using training and validation data

 Divide data into training and validation data sets using the SURVEYSELECT procedure

 Discuss the subset selection methods available in PROC LOGISTIC

 Discuss methods to determine interactions (forward selection, with bar and @ notation)



Create interaction plot with the results from PROC LOGISTIC

 Select the model with fit statistics (BIC, AIC, KS, Brier score)

Create and interpret graphs (ROC, lift, and gains charts) for model comparison and selection

 Explain and interpret charts (ROC, Lift, Gains)

 Create a ROC curve (OUTROC option of the SCORE statement in the LOGISTIC procedure)

 Use the ROC and ROCCONTRAST statements to create an overlay plot of ROC curves for two or more models

 Explain the concept of depth as it relates to the gains chart



Establish effective decision cut-off values for scoring

 Illustrate a decision rule that maximizes the expected profit

 Explain the profit matrix and how to use it to estimate the profit per scored customer

 Calculate decision cutoffs using Bayes rule, given a profit matrix

 Determine optimum cutoff values from profit plots

 Given a profit matrix, and model results, determine the model with the highest average profit
SAS Statistical Business Analysis SAS9: Regression and Model
SASInstitute Statistical candidate

Other SASInstitute exams

A00-240 SAS Statistical Business Analysis SAS9: Regression and Model
A00-250 SAS Platform Administration for SAS9
A00-280 Clinical Trials Programming Using SAS 9

Hundereds of companies are offering A00-240 dumps but most of them are outdated. Killexams.com has a team of experts that just keep the A00-240 dumps updated with real test questions. They create new A00-240 vce test simulator on each update so that you can practice most updated and valid A00-240 dumps questions and answers.
A00-240 Dumps
A00-240 Braindumps
A00-240 Real Questions
A00-240 Practice Test
A00-240 dumps free
SASInstitute
A00-240
SAS Statistical Business Analysis SAS9: Regression
and Model
http://killexams.com/pass4sure/exam-detail/A00-240
Question #87
What is a benefit to performing data cleansing (imputation, transformations, etc.) on data after partitioning the data for honest assessment as opposed to performing the
data cleansing prior to partitioning the data?
A. It makes inference on the model possible.
B. It is computationally easier and requires less time.
C. It omits the training (and test) data sets from the benefits of the cleansing methods.
D. It allows for the determination of the effectiveness of the cleansing method.
Answer: D
Question #88
A researcher has several variables that could be possible predictors for the final model. There is interest in checking all 2-way interactions for possible entry to the
model. The researcher has decided to use forward selection within PROC LOGISTIC. Fill in the missing code option that will ensure that all 2-way interactions will be
considered for entry.
A. start = 5
B. include = 4
C. include = 5
D. start = 4
Answer: C
Question #89
FILL BLANK -
Refer to the confusion matrix:
An analyst determines that loan defaults occur at the rate of 3% in the overall population. The above confusion matrix is from an oversampled test set (1 = default).
What is the sensitivity adjusted for the population event probability?
Enter your answer in the space below. Round to three decimals (example: n.nnn).
Answer: 0.617
Question #90
Refer to the exhibit:
On the Gains Chart, what is the correct interpretation of the horizontal reference line?
A. the proportion of cases that cannot be classified
B. the probability of a false negative
C. the probability of a false positive
D. the prior event rate
Answer: B
Question #91
Refer to the confusion matrix:
Calculate the accuracy and error rate (0 - negative outcome, 1 - positive outcome)
A. Accuracy = 58/102, Error Rate = 23/48
B. Accuracy = 83/102, Error Rate = 67/102
C. Accuracy = 25/150, Error Rate = 44/150
D. Accuracy = 83/150, Error Rate = 67/150
Answer: A
Question #92
Which statistic is based on the maximum vertical distance between the primary event EDF and the secondary event EDF?
A. KS
B. SBC
C. Max EDF
D. Brier Score
Answer: A
Reference:
https://support.sas.com/documentation/onlinedoc/ets/132/severity.pdf
Question #93
DRAG DROP -
Drag the adjustment formulas for oversamping from the left and place them into the correct location in the confusion matrix shown on the right.
Select and Place:
Answer:
Question #94
An analyst knows that the categorical predictor, zip_code, is an important predictor of a binary target. However, zip_code has too many levels to be a feasible
predictor in a model. The analyst uses PROC CLUSTER to implement Greenacre's method to reduce the number of categorical levels.
What is the correct application of Greenacre's method in this situation?
A. Clustering the levels using the target proportion for each zip_code as input.
B. Clustering the levels using the zip_code values as input.
C. Clustering the levels using the number of cases in each zip_code as input.
D. Clustering the levels using dummy coded zip_code levels as inputs.
Answer: A
Reference:
https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/079-31.pdf
Question #95
What does the Pearson product moment correlation coefficient measure?
A. nonlinear and nonmonotonic association between two variables
B. linear and monotonic association between two variables
C. linear and nonmonotonic association between two variables
D. nonlinear and monotonic association between two variables
Answer: B
Reference:
http://d-scholarship.pitt.edu/8056/1/Chokns_etd2010.pdf
Question #96
This question will ask you to provide a segment of missing code.
The following code is used to create missing value indicator variables for input variables, fred1 to fred7.
Which segment of code would complete the task?
A.
B.
C.
D.
Answer: C
Question #97
This question will ask you to provide a missing option.
Given the following SAS program:
What option must be added to the program to obtain a data set containing Spearman statistics?
A. OUTCORR=estimates
B. OUTS=estimates
C. OUT=estimates
D. OUTPUT=estimates
Answer: D
Question #98
This question will ask you to provide a missing option.
A business analyst is investigating the differences in sales figures across 8 sales regions. The analyst is interested in viewing the regression equation parameter
estimates for each of the design variables.
Which option completes the program to produce the regression equation parameter estimates?
A. Solve
B. Estimate
C. Solution
D. Est
Answer: C
Reference:
https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_ods_examples06.htm&docsetVersion=14.3&locale=en
Question #99
After performing an ANOVA test, an analyst has determined that a significant effect exists due to income. The analyst wants to compare each Income to all others and
wants to control for experimentwise error.
Which GLM procedure statement would provide the most appropriate output?
A. lsmeans Income / pdiff=control adjust=dunnett;
B. lsmeans Income / pdiff=control adjust=t;
C. lsmeans Income / pdiff=all adjust=tukey;
D. lsmeans Income / pdiff=all adjust=t;
Answer: A
Reference:
https://rpubs.com/JsoLab/Stat01_L02
Question #100
SIMULATION -
A linear model has the following characteristics:
*A dependent variable (y)
*One continuous variable (xl), including a quadratic term (x12)
*One categorical (d with 3 levels) predictor variable and an interaction term (d by x1)
How many parameters, including the intercept, are associated with this model?
Enter your numeric answer in the space below. Do not add leading or trailing spaces to your answer.
Answer: 7
For More exams visit https://killexams.com/vendors-exam-list
Kill your test at First Attempt....Guaranteed!

SASInstitute Statistical candidate - BingNews https://killexams.com/pass4sure/exam-detail/A00-240 Search results SASInstitute Statistical candidate - BingNews https://killexams.com/pass4sure/exam-detail/A00-240 https://killexams.com/exam_list/SASInstitute Most Expensive Races No result found, try new keyword!Some Congressional races are seeing more spent by super PACs and other outside groups than the candidates themselves. Use the options below to see the most expensive races based on either campaign ... Tue, 17 Oct 2023 19:38:00 -0500 en-US text/html https://www.opensecrets.org/elections-overview/most-expensive-races?cycle=2004&display=allcands The 2012 Money Race: Compare the Candidates

Below is a tally of the money raised and spent through September by the presidential candidates, the national party committees and the primary “super PACs” whose sole purpose is to support a candidate. Contribution and spending totals do not include money raised or held by each candidate’s “victory fund,” a joint fund-raising committee that will distribute funds to the campaigns and party committees. In addition to these committees, nonprofit groups that do not have to file with the Federal Election Commission and other super PACs have spent at least $65 million more on television advertising, almost all of it against President Obama or in support of Mitt Romney.


Cash on hand

As of Nov. 26

Candidate

Party committee

Primary super PAC

Top spenders
for and against
the candidates

Since April 1; amounts may reflect more accurate spending than contained in totals.

Size of donations

$2,500 maximum

Size of donations

No maximum


Obama $726.2m 68% $775.4m $5.4m
D.N.C $255.1m 24% $285.8m $9.7m
Priorities
USA
$78.8m 7% $74.7m $4.3m
Restore Our Future, Inc. $88.6m Against Obama
American Crossroads $84.6m Against Obama
Republican National Committee $41.1m Against Obama
Americans for Prosperity $33.5m Against Obama
National Rifle Association Of America Political Victory Fund $9.8m Against Obama
More outside spending »
  • $5.0m

    James H. Simons

    President of Euclidean Capital and Board Chair of Renaissance Technologies Corp., a hedge fund company.

  • $4.5m

    Fred Eychaner

    An Obama bundler and Chicago media mogul.

  • $3.0m

    Steve Mostyn

    Texas trial lawyer.

  • $3.0m

    Jeffrey Katzenberg

    Chief executive of Dreamworks Animation.

  • $2.3m

    United Association of Journeymen & Apprentices of the Pipe Fitting Industry

    Trade union.

More donors to Priorities USA Action »

Romney $467.3m 45% $460.2m $12.9m
R.N.C $371.4m 37% $378.8m $3.3m
Restore
Our Future
$153.8m 16% $153.0m $842,062
Priorities USA Action $64.8m Against Romney
Restore Our Future, Inc. $12.8m For Romney
American Crossroads $6.5m For Romney
Ending Spending Action Fund $4.7m For Romney
SEIU COPE $3.6m Against Romney
  • $15.0m

    Sheldon Adelson

    Billionaire casino owner and Newt Gingrich’s longtime friend and patron.

  • $15.0m

    Miriam Adelson

    Physician; wife of Sheldon Adelson.

  • $10.0m

    Bob J. Perry

    Houston homebuilder who was a major financier of Swift Boat Veterans for Truth in 2004.

  • $3.0m

    Larry Ellison

    Chief executive of software giant Oracle Corp.

  • $2.8m

    Oxbow Carbon LLC

    An oil and gas company based in West Palm Beach, Fla. It was founded by William Koch, the brother of David H. and Charles Koch, wealthy conservative businessmen and founders of Americans for Prosperity.

More donors to Restore Our Future »

* The amount raised by each committee represents total contributions and transfers from affiliated committees, but excludes non-contributions such as interest and offsets to expenditures. Combined cash on hand totals include cash held by joint fund-raising committees that transfer money to the candidates and party committees. Both campaigns' joint fund-raising committees also spent money ($203 million by Romney Victory and $148 million by Obama Victory Fund 2012) on the election that is not included here.

Source: Federal Election Commission

By JEREMY ASHKENAS, MATTHEW ERICSON, ALICIA PARLAPIANO and DEREK WILLIS

Fri, 02 Jun 2023 13:35:00 -0500 text/html https://www.nytimes.com/elections/2012/campaign-finance.html
Statistical Background SAS Elementary Statistics Procedures : Statistical Background

The rest of this appendix provides text descriptions and SAS code examples that explain some of the statistical concepts and terminology that you may encounter when you interpret the output of SAS procedures for elementary statistics. For a more thorough discussion, consult an introductory statistics textbook such as Mendenhall and Beaver (1994); Ott and Mendenhall; or Snedecor and Cochran (1989).


Populations and Parameters

Usually, there is a clearly defined set of elements in which you are interested. This set of elements is called the universe, and a set of values associated with these elements is called a population of values. The statistical term population has nothing to do with people per se. A statistical population is a collection of values, not a collection of people. For example, a universe is all the students at a particular school, and there could be two populations of interest: one of height values and one of weight values. Or, a universe is the set of all widgets manufactured by a particular company, while the population of values could be the length of time each widget is used before it fails.

A population of values can be described in terms of its cumulative distribution function, which gives the proportion of the population less than or equal to each possible value. A discrete population can also be described by a probability function, which gives the proportion of the population equal to each possible value. A continuous population can often be described by a density function, which is the derivative of the cumulative distribution function. A density function can be approximated by a histogram that gives the proportion of the population lying within each of a series of intervals of values. A probability density function is like a histogram with an infinite number of infinitely small intervals.

In technical literature, when the term distribution is used without qualification, it generally refers to the cumulative distribution function. In informal writing, distribution sometimes means the density function instead. Often the word distribution is used simply to refer to an abstract population of values rather than some concrete population. Thus, the statistical literature refers to many types of abstract distributions, such as normal distributions, exponential distributions, Cauchy distributions, and so on. When a phrase such as normal distribution is used, it frequently does not matter whether the cumulative distribution function or the density function is intended.

It may be expedient to describe a population in terms of a few measures that summarize interesting features of the distribution. One such measure, computed from the population values, is called a parameter. Many different parameters can be defined to measure different aspects of a distribution.

The most commonly used parameter is the (arithmetic) mean. If the population contains a finite number of values, the population mean is computed as the sum of all the values in the population divided by the number of elements in the population. For an infinite population, the concept of the mean is similar but requires more complicated mathematics.

E(x) denotes the mean of a population of values symbolized by x, such as height, where E stands for expected value. You can also consider expected values of derived functions of the original values. For example, if x represents height, then [IMAGE] is the expected value of height squared, that is, the mean value of the population obtained by squaring every value in the population of heights.

It is often impossible to measure all of the values in a population. A collection of measured values is called a sample. A mathematical function of a sample of values is called a statistic. A statistic is to a sample as a parameter is to a population. It is customary to denote statistics by Roman letters and parameters by Greek letters. For example, the population mean is often written as [mu], whereas the sample mean is written as [IMAGE]. The field of statistics is largely concerned with the study of the behavior of sample statistics.

Samples can be selected in a variety of ways. Most SAS procedures assume that the data constitute a simple random sample, which means that the sample was selected in such a way that all possible samples were equally likely to be selected.

Statistics from a sample can be used to make inferences, or reasonable guesses, about the parameters of a population. For example, if you take a random sample of 30 students from the high school, the mean height for those 30 students is a reasonable guess, or estimate, of the mean height of all the students in the high school. Other statistics, such as the standard error, can provide information about how good an estimate is likely to be.

For any population parameter, several statistics can estimate it. Often, however, there is one particular statistic that is customarily used to estimate a given parameter. For example, the sample mean is the usual estimator of the population mean. In the case of the mean, the formulas for the parameter and the statistic are the same. In other cases, the formula for a parameter may be different from that of the most commonly used estimator. The most commonly used estimator is not necessarily the best estimator in all applications.

Measures of location include the mean, the median, and the mode. These measures describe the center of a distribution. In the definitions that follows, notice that if the entire sample changes by adding a fixed amount to each observation, then these measures of location are shifted by the same fixed amount.

The Mean

The population mean [IMAGE] is usually estimated by the sample mean [IMAGE].

The Median

The population median is the central value, lying above and below half of the population values. The sample median is the middle value when the data are arranged in ascending or descending order. For an even number of observations, the midpoint between the two middle values is usually reported as the median.

The Mode

The mode is the value at which the density of the population is at a maximum. Some densities have more than one local maximum (peak) and are said to be multimodal. The sample mode is the value that occurs most often in the sample. By default, PROC UNIVARIATE reports the lowest such value if there is a tie for the most-often-occurring sample value. PROC UNIVARIATE lists all possible modes when you specify the MODES option in the PROC statement. If the population is continuous, then all sample values occur once, and the sample mode has little use.

Percentiles, including quantiles, quartiles, and the median, are useful for a detailed study of a distribution. For a set of measurements arranged in order of magnitude, the pth percentile is the value that has p percent of the measurements below it and (100-p) percent above it. The median is the 50th percentile. Because it may not be possible to divide your data so that you get exactly the desired percentile, the UNIVARIATE procedure uses a more precise definition.

The upper quartile of a distribution is the value below which 75 percent of the measurements fall (the 75th percentile). Twenty-five percent of the measurements fall below the lower quartile value.In the following example, SAS artificially generates the data with a pseudorandom number function. The UNIVARIATE procedure computes a variety of quantiles and measures of location, and outputs the values to a SAS data set. A DATA step then uses the SYMPUT routine to assign the values of the statistics to macro variables. The macro %FORMGEN uses these macro variables to produce value labels for the FORMAT procedure. PROC CHART uses the resulting format to display the values of the statistics on a histogram.

options nodate pageno=1 linesize=64 pagesize=52;

title 'Example of Quantiles and Measures of Location';

data random;
   drop n;
   do n=1 to 1000;
      X=floor(exp(rannor(314159)*.8+1.8));
      output;
   end;
run;

proc univariate data=random nextrobs=0;
   var x;
   output out=location
          mean=Mean mode=Mode median=Median
          q1=Q1 q3=Q3 p5=P5 p10=P10 p90=P90 p95=P95
          max=Max;
run;
proc print data=location noobs;
run;
data _null_;
   set location;
   call symput('MEAN',round(mean,1));
   call symput('MODE',mode);
   call symput('MEDIAN',round(median,1));
   call symput('Q1',round(q1,1));
   call symput('Q3',round(q3,1));
   call symput('P5',round(p5,1));
   call symput('P10',round(p10,1));
   call symput('P90',round(p90,1));
   call symput('P95',round(p95,1));
   call symput('MAX',min(50,max));
run;

%macro formgen;
%do i=1 %to &max;
   %let value=&i;
   %if &i=&p5     %then %let value=&value  P5;
   %if &i=&p10    %then %let value=&value  P10;
   %if &i=&q1     %then %let value=&value  Q1;
   %if &i=&mode   %then %let value=&value  Mode;
   %if &i=&median %then %let value=&value  Median;
   %if &i=&mean   %then %let value=&value  Mean;
   %if &i=&q3     %then %let value=&value  Q3;
   %if &i=&p90    %then %let value=&value  P90;
   %if &i=&p95    %then %let value=&value  P95;
   %if &i=&max    %then %let value=>=&value;
   &i="&value"
%end;
%mend;

proc format print;
   value stat %formgen;
run;
options pagesize=42 linesize=64;

proc chart data=random;
   vbar x / midpoints=1 to &max by 1;
   format x stat.;
   footnote  'P5  =  5TH PERCENTILE';
   footnote2 'P10 = 10TH PERCENTILE';
   footnote3 'P90 = 90TH PERCENTILE';
   footnote4 'P95 = 95TH PERCENTILE';
   footnote5 'Q1  =  1ST QUARTILE  ';
   footnote6 'Q3  =  3RD QUARTILE  ';
run;
[HTML Output] [Listing Output] [HTML Output] [Listing Output] [HTML Output] [Listing Output]

Another group of statistics is important in studying the distribution of a population. These statistics measure the variability, also called the spread, of values. In the definitions given in the sections that follow, notice that if the entire sample is changed by the addition of a fixed amount to each observation, then the values of these statistics are unchanged. If each observation in the sample is multiplied by a constant, however, the values of these statistics are appropriately rescaled.

The Range

The sample range is the difference between the largest and smallest values in the sample. For many populations, at least in statistical theory, the range is infinite, so the sample range may not tell you much about the population. The sample range tends to increase as the sample size increases. If all sample values are multiplied by a constant, the sample range is multiplied by the same constant.

The Interquartile Range

The interquartile range is the difference between the upper and lower quartiles. If all sample values are multiplied by a constant, the sample interquartile range is multiplied by the same constant.

The Variance

The population variance, usually denoted by [IMAGE], is the expected value of the squared difference of the values from the population mean:

[IMAGE]

The sample variance is denoted by [IMAGE]. The difference between a value and the mean is called a deviation from the mean. Thus, the variance approximates the mean of the squared deviations.

When all the values lie close to the mean, the variance is small but never less than zero. When values are more scattered, the variance is larger. If all sample values are multiplied by a constant, the sample variance is multiplied by the square of the constant.

Sometimes values other than [IMAGE] are used in the denominator. The VARDEF= option controls what divisor the procedure uses.

The Standard Deviation

The standard deviation is the square root of the variance, or root-mean-square deviation from the mean, in either a population or a sample. The usual symbols are [sigma] for the population and s for a sample. The standard deviation is expressed in the same units as the observations, rather than in squared units. If all sample values are multiplied by a constant, the sample standard deviation is multiplied by the same constant.

Coefficient of Variation

The coefficient of variation is a unitless measure of relative variability. It is defined as the ratio of the standard deviation to the mean expressed as a percentage. The coefficient of variation is meaningful only if the variable is measured on a ratio scale. If all sample values are multiplied by a constant, the sample coefficient of variation remains unchanged.

Skewness

The variance is a measure of the overall size of the deviations from the mean. Since the formula for the variance squares the deviations, both positive and negative deviations contribute to the variance in the same way. In many distributions, positive deviations may tend to be larger in magnitude than negative deviations, or vice versa. Skewness is a measure of the tendency of the deviations to be larger in one direction than in the other. For example, the data in the last example are skewed to the right.

Population skewness is defined as

[IMAGE]

Because the deviations are cubed rather than squared, the signs of the deviations are maintained. Cubing the deviations also emphasizes the effects of large deviations. The formula includes a divisor of [IMAGE] to remove the effect of scale, so multiplying all values by a constant does not change the skewness. Skewness can thus be interpreted as a tendency for one tail of the population to be heavier than the other. Skewness can be positive or negative and is unbounded.

Kurtosis

The heaviness of the tails of a distribution affects the behavior of many statistics. Hence it is useful to have a measure of tail heaviness. One such measure is kurtosis. The population kurtosis is usually defined as

[IMAGE]

Note:   Some statisticians omit the subtraction of 3.  [cautionend]

Because the deviations are raised to the fourth power, positive and negative deviations make the same contribution, while large deviations are strongly emphasized. Because of the divisor [IMAGE], multiplying each value by a constant has no effect on kurtosis.

Population kurtosis must lie between [IMAGE] and [IMAGE], inclusive. If [IMAGE] represents population skewness and [IMAGE] represents population kurtosis, then

[IMAGE]

Statistical literature sometimes reports that kurtosis measures the peakedness of a density. However, heavy tails have much more influence on kurtosis than does the shape of the distribution near the mean (Kaplansky 1945; Ali 1974; Johnson, et al. 1980).

Sample skewness and kurtosis are rather unreliable estimators of the corresponding parameters in small samples. They are better estimators when your sample is very large. However, large values of skewness or kurtosis may merit attention even in small samples because such values indicate that statistical methods that are based on normality assumptions may be inappropriate.

One especially important family of theoretical distributions is the normal or Gaussian distribution. A normal distribution is a smooth symmetric function often referred to as "bell-shaped." Its skewness and kurtosis are both zero. A normal distribution can be completely specified by only two parameters: the mean and the standard deviation. Approximately 68 percent of the values in a normal population are within one standard deviation of the population mean; approximately 95 percent of the values are within two standard deviations of the mean; and about 99.7 percent are within three standard deviations. Use of the term normal to describe this particular kind of distribution does not imply that other kinds of distributions are necessarily abnormal or pathological.

Many statistical methods are designed under the assumption that the population being sampled is normally distributed. Nevertheless, most real-life populations do not have normal distributions. Before using any statistical method based on normality assumptions, you should consult the statistical literature to find out how sensitive the method is to nonnormality and, if necessary, check your sample for evidence of nonnormality.

In the following example, SAS generates a sample from a normal distribution with a mean of 50 and a standard deviation of 10. The UNIVARIATE procedure performs tests for location and normality. Because the data are from a normal distribution, all p-values from the tests for normality are greater than 0.15. The CHART procedure displays a histogram of the observations. The shape of the histogram is a belllike, normal density.

options nodate pageno=1 linesize=64 pagesize=52;

title '10000 Obs sample from a Normal Distribution';
title2 'with Mean=50 and Standard Deviation=10';

data normaldat;
   drop n;
   do n=1 to 10000;
      X=10*rannor(53124)+50;
      output;
   end;
run;

proc univariate data=normaldat nextrobs=0 normal
                          mu0=50 loccount;
   var x;
run;
proc format;
   picture msd
      20='20 3*Std' (noedit)
      30='30 2*Std' (noedit)
      40='40 1*Std' (noedit)
      50='50 Mean ' (noedit)
      60='60 1*Std' (noedit)
      70='70 2*Std' (noedit)
      80='80 3*Std' (noedit)
   other=' ';
run;
options linesize=64 pagesize=42;  

proc chart;
   vbar x / midpoints=20 to 80 by 2;
   format x msd.;
run;

[HTML Output]  [Listing Output] [HTML Output]  [Listing Output]


Sampling Distribution of the Mean

If you repeatedly draw samples of size n from a population and compute the mean of each sample, then the sample means themselves have a distribution. Consider a new population consisting of the means of all the samples that could possibly be drawn from the original population. The distribution of this new population is called a sampling distribution.

It can be proven mathematically that if the original population has mean [mu] and standard deviation [sigma], then the sampling distribution of the mean also has mean [mu], but its standard deviation is [IMAGE]. The standard deviation of the sampling distribution of the mean is called the standard error of the mean. The standard error of the mean provides an indication of the accuracy of a sample mean as an estimator of the population mean.

If the original population has a normal distribution, then the sampling distribution of the mean is also normal. If the original distribution is not normal but does not have excessively long tails, then the sampling distribution of the mean can be approximated by a normal distribution for large sample sizes.

The following example consists of three separate programs that show how the sampling distribution of the mean can be approximated by a normal distribution as the sample size increases. The first DATA step uses the RANEXP function to create a sample of 1000 observations from an exponential distribution.The theoretical population mean is 1.00, while the sample mean is 1.01, to two decimal places. The population standard deviation is 1.00; the sample standard deviation is 1.04.

This is an example of a nonnormal distribution. The population skewness is 2.00, which is close to the sample skewness of 1.97. The population kurtosis is 6.00, but the sample kurtosis is only 4.80.

options nodate pageno=1 linesize=64 pagesize=42;

title '1000 Observation Sample';
title2 'from an Exponential Distribution';

data expodat;
   drop n;
   do n=1 to 1000;
      X=ranexp(18746363);
      output;
   end;
run;
proc format;
    value axisfmt
      .05='0.05'
      .55='0.55'
     1.05='1.05'
     1.55='1.55'
     2.05='2.05'
     2.55='2.55'
     3.05='3.05'
     3.55='3.55'
     4.05='4.05'
     4.55='4.55'
     5.05='5.05'
     5.55='5.55'
     other=' ';
run;

proc chart data=expodat ;
   vbar x / axis=300 
            midpoints=0.05 to 5.55 by .1;
   format x axisfmt.;
run;
options pagesize=64;

proc univariate data=expodat noextrobs=0 normal
                mu0=1;
   var x;
run;
[HTML Output] [Listing Output] [HTML Output] [Listing Output]

The next DATA step generates 1000 different samples from the same exponential distribution. Each sample contains ten observations. The MEANS procedure computes the mean of each sample. In the data set that is created by PROC MEANS, each observation represents the mean of a sample of ten observations from an exponential distribution. Thus, the data set is a sample from the sampling distribution of the mean for an exponential population.

PROC UNIVARIATE displays statistics for this sample of means. Notice that the mean of the sample of means is .99, almost the same as the mean of the original population. Theoretically, the standard deviation of the sampling distribution is [IMAGE], whereas the standard deviation of this sample from the sampling distribution is .30. The skewness (.55) and kurtosis (-.006) are closer to zero in the sample from the sampling distribution than in the original sample from the exponential distribution. This is so because the sampling distribution is closer to a normal distribution than is the original exponential distribution. The CHART procedure displays a histogram of the 1000-sample means. The shape of the histogram is much closer to a belllike, normal density, but it is still distinctly lopsided.

options nodate pageno=1 linesize=64 pagesize=48;

title '1000 sample Means with 10 Obs per Sample';
title2 'Drawn from an Exponential Distribution';

data samp10;
   drop n;
   do Sample=1 to 1000;
      do n=1 to 10;
         X=ranexp(433879);
         output;
      end;
   end;

proc means data=samp10 noprint;
   output out=mean10 mean=Mean;
   var x;
   by sample;
run;
 proc format;
     value axisfmt
       .05='0.05'
       .55='0.55'
      1.05='1.05'
      1.55='1.55'
      2.05='2.05'
      other=' ';
 run;

proc chart data=mean10;
   vbar mean/axis=300
             midpoints=0.05 to 2.05 by .1;
   format mean axisfmt.;
run;
options pagesize=64;
proc univariate data=mean10 noextrobs=0 normal
                mu0=1;
   var mean;
run;
[HTML Output] [Listing Output] [HTML Output] [Listing Output]

In the following DATA step, the size of each sample from the exponential distribution is increased to 50. The standard deviation of the sampling distribution is smaller than in the previous example because the size of each sample is larger. Also, the sampling distribution is even closer to a normal distribution, as can be seen from the histogram and the skewness.

options nodate pageno=1 linesize=64 pagesize=48;

title '1000 sample Means with 50 Obs per Sample';
title2 'Drawn from an Exponential Distribution';

data samp50;
   drop n;
   do sample=1 to 1000;
      do n=1 to 50;
         X=ranexp(72437213);
         output;
      end;
   end;

proc means data=samp50 noprint;
   output out=mean50 mean=Mean;
   var x;
   by sample;
run;
proc format;
   value axisfmt
       .05='0.05'
       .55='0.55'
      1.05='1.05'
      1.55='1.55'
      2.05='2.05'
      2.55='2.55'
      other=' ';
run;

proc chart data=mean50;
   vbar mean / axis=300
               midpoints=0.05 to 2.55 by .1;
   format mean axisfmt.;
run;
options pagesize=64;

proc univariate data=mean50 nextrobs=0 normal
                mu0=1;
   var mean;
run;
[HTML Output] [Listing Output] [HTML Output] [Listing Output]

The purpose of the statistical methods that have been discussed so far is to estimate a population parameter by means of a sample statistic. Another class of statistical methods is used for testing hypotheses about population parameters or for measuring the amount of evidence against a hypothesis.

Consider the universe of students in a college. Let the variable X be the number of pounds by which a student's weight deviates from the ideal weight for a person of the same sex, height, and build. You want to find out whether the population of students is, on the average, underweight or overweight. To this end, you have taken a random sample of X values from nine students, with results as given in the following DATA step:

title 'Deviations from Normal Weight';

data x;
   input X @@;
   datalines;
-7 -2 1 3 6 10 15 21 30
;

You can define several hypotheses of interest. One hypothesis is that, on the average, the students are of exactly ideal weight. If [mu] represents the population mean of the X values, you can write this hypothesis, called the null hypothesis, as [IMAGE]. The other two hypotheses, called alternative hypotheses, are that the students are underweight on the average, [IMAGE], and that the students are overweight on the average, [IMAGE].

The null hypothesis is so called because in many situations it corresponds to the assumption of "no effect" or "no difference." However, this interpretation is not appropriate for all testing problems. The null hypothesis is like a straw man that can be toppled by statistical evidence. You decide between the alternative hypotheses according to which way the straw man falls.

A naive way to approach this problem would be to look at the sample mean [IMAGE] and decide among the three hypotheses according to the following rule:

The trouble with this approach is that there may be a high probability of making an incorrect decision. If H0 is true, you are nearly certain to make a wrong decision because the chances of [IMAGE] being exactly zero are almost nil. If [mu] is slightly less than zero, so that H1 is true, there may be nearly a 50 percent chance that [IMAGE] will be greater than zero in repeated sampling, so the chances of incorrectly choosing H2 would also be nearly 50 percent. Thus, you have a high probability of making an error if [IMAGE] is near zero. In such cases, there is not enough evidence to make a confident decision, so the best response may be to reserve judgment until you can obtain more evidence.

The question is, how far from zero must [IMAGE] be for you to be able to make a confident decision? The answer can be obtained by considering the sampling distribution of [IMAGE]. If X has a roughly normal distribution, then [IMAGE] has an approximately normal sampling distribution. The mean of the sampling distribution of [IMAGE] is [mu]. Assume temporarily that [sigma], the standard deviation of X, is known to be 12. Then the standard error of [IMAGE] for samples of nine observations is [IMAGE].

You know that about 95 percent of the values from a normal distribution are within two standard deviations of the mean, so about 95 percent of the possible samples of nine X values have a sample mean [IMAGE] between [IMAGE]and [IMAGE], or between -8 and 8. Consider the chances of making an error with the following decision rule:

If H0 is true, then in about 95 percent of the possible samples [IMAGE] will be between the critical values [IMAGE] and 8, so you will reserve judgment. In these cases the statistical evidence is not strong enough to fell the straw man. In the other 5 percent of the samples you will make an error; in 2.5 percent of the samples you will incorrectly choose H1, and in 2.5 percent you will incorrectly choose H2.

The price you pay for controlling the chances of making an error is the necessity of reserving judgment when there is not sufficient statistical evidence to reject the null hypothesis.

Significance and Power

The probability of rejecting the null hypothesis if it is true is called the Type I error rate of the statistical test and is typically denoted as [IMAGE]. In this example, an [IMAGE] value less than [IMAGE] or greater than 8 is said to be statistically significant at the 5 percent level. You can adjust the type I error rate according to your needs by choosing different critical values. For example, critical values of -4 and 4 would produce a significance level of about 32 percent, while -12 and 12 would supply a type I error rate of about 0.3 percent.

The decision rule is a two-tailed test because the alternative hypotheses allow for population means either smaller or larger than the value specified in the null hypothesis. If you were interested only in the possibility of the students being overweight on the average, you could use a one-tailed test:

For this one-tailed test, the type I error rate is 2.5 percent, half that of the two-tailed test.

The probability of rejecting the null hypothesis if it is false is called the power of the statistical test and is typically denoted as [IMAGE]. [IMAGE] is called the Type II error rate, which is the probability of not rejecting a false null hypothesis. The power depends on the true value of the parameter. In the example, assume the population mean is 4. The power for detecting H2 is the probability of getting a sample mean greater than 8. The critical value 8 is one standard error higher than the population mean 4. The chance of getting a value at least one standard deviation greater than the mean from a normal distribution is about 16 percent, so the power for detecting the alternative hypothesis H2 is about 16 percent. If the population mean were 8, the power for H2 would be 50 percent, whereas a population mean of 12 would yield a power of about 84 percent.

The smaller the type I error rate is, the less the chance of making an incorrect decision, but the higher the chance of having to reserve judgment. In choosing a type I error rate, you should consider the resulting power for various alternatives of interest.

Student's t Distribution

In practice, you usually cannot use any decision rule that uses a critical value based on [sigma] because you do not usually know the value of [sigma]. You can, however, use s as an estimate of [sigma]. Consider the following statistic:

[IMAGE]

This t statistic is the difference between the sample mean and the hypothesized mean [IMAGE] divided by the estimated standard error of the mean.

If the null hypothesis is true and the population is normally distributed, then the t statistic has what is called a Student's t distribution with [IMAGE] degrees of freedom. This distribution looks very similar to a normal distribution, but the tails of the Student's t distribution are heavier. As the sample size gets larger, the sample standard deviation becomes a better estimator of the population standard deviation, and the t distribution gets closer to a normal distribution.

You can base a decision rule on the t statistic:

The value 2.3 was obtained from a table of Student's t distribution to supply a type I error rate of 5 percent for 8 (that is, [IMAGE]) degrees of freedom. Most common statistics texts contain a table of Student's t distribution. If you do not have a statistics text handy, you can use the DATA step and the TINV function to print any values from the t distribution.

By default, PROC UNIVARIATE computes a t statistic for the null hypothesis that [IMAGE], along with related statistics. Use the MU0= option in the PROC statement to specify another value for the null hypothesis.

This example uses the data on deviations from normal weight, which consist of nine observations. First, PROC MEANS computes the t statistic for the null hypothesis that [IMAGE]. Then, the TINV function in a DATA step computes the value of Student's t distribution for a two-tailed test at the 5 percent level of significance and 8 degrees of freedom.

data devnorm;
   title 'Deviations from Normal Weight';
   input X @@;
   datalines;
-7 -2 1 3 6 10 15 21 30
;

proc means data=devnorm maxdec=3 n mean
           std stderr t probt;
run;

title 'Student''s t Critical Value';

data _null_;
   file print;
   t=tinv(.975,8);
   put t 5.3;
run;
[HTML Output] [Listing Output]

In the current example, the value of the t statistic is 2.18, which is less than the critical t value of 2.3 (for a 5 percent significance level and 8 degrees of freedom). Thus, at a 5 percent significance level you must reserve judgment. If you had elected to use a 10 percent significance level, the critical value of the t distribution would have been 1.86 and you could have rejected the null hypothesis. The sample size is so small, however, that the validity of your conclusion depends strongly on how close the distribution of the population is to a normal distribution.

Probability Values

Another way to report the results of a statistical test is to compute a probability value or p-value. A p-value gives the probability in repeated sampling of obtaining a statistic as far in the direction(s) specified by the alternative hypothesis as is the value actually observed. A two-tailed p-value for a t statistic is the probability of obtaining an absolute t value that is greater than the observed absolute t value. A one-tailed p-value for a t statistic for the alternative hypothesis [IMAGE] is the probability of obtaining a t value greater than the observed t value. Once the p-value is computed, you can perform a hypothesis test by comparing the p-value with the desired significance level. If the p-value is less than or equal to the type I error rate of the test, the null hypothesis can be rejected. The two-tailed p-value, labeled Pr > |t| in the PROC MEANS output, is .0606, so the null hypothesis could be rejected at the 10 percent significance level but not at the 5 percent level.

A p-value is a measure of the strength of the evidence against the null hypothesis. The smaller the p-value, the stronger the evidence for rejecting the null hypothesis.

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

Tue, 04 Jan 2022 05:46:00 -0600 text/html https://www.sfu.ca/sasdoc/sashtml/proc/ztatback.htm
Surgical versus Nonsurgical Therapy for Lumbar Spinal Stenosis No result found, try new keyword!Surgical candidates with a history of at least ... non-normal secondary outcomes in SAS software, version 9.1 (SAS Institute). Statistical significance was defined as P<0.05 on the basis of ... Tue, 21 Nov 2023 10:00:00 -0600 en-US text/html https://www.nejm.org/doi/10.1056/NEJMoa0707136 The mystery of Vivek Ramaswamy’s rapid rise in the polls

Two new polls of Republican primary voters released on Thursday showed former President Donald Trump in first place by a wide margin. But what was startling was who came in second.

The first shows Florida Gov. Ron DeSantis in his usual spot far behind Trump. The other shows 38-year-old first-time political candidate Vivek Ramaswamy edging out DeSantis for second place.

Ramaswamy’s early rise represents the most significant movement in the still-nascent race for the GOP presidential nomination. Or does it?

There’s no question that Ramaswamy has come out of nowhere to become a surprisingly interesting candidate to the GOP electorate. But there are some methodological curiosities that raise questions about just where Ramaswamy fits within the tiers of Republican hopefuls below the dominant frontrunner.

Ascertaining Ramaswamy’s true standing isn’t just an academic exercise. The Republican National Committee says it will use polling to determine podium order at its first sanctioned debate later this month, so Ramaswamy will likely be at or near the center of the stage if Trump chooses not to participate.

Overall, polling averages put him in third place. In RealClearPolitics’ average, Ramaswamy is at 6.1 percent, behind only Trump (54.2 percent) and DeSantis (15.1 percent), but ahead of Mike Pence (5.2 percent), Nikki Haley (3.4 percent), Tim Scott (2.8 percent) and Chris Christie (2.6 percent). FiveThirtyEight’s polling average shows Ramaswamy even higher, at 7.5 percent, 2 points clear of Pence for third.

Ramaswamy’s strength comes almost entirely from polls conducted over the internet, according to a POLITICO analysis. In internet surveys over the past month — the vast majority of which are conducted among panels of people who sign up ahead of time to complete polls, often for financial incentives — Ramaswamy earns an average of 7.8 percent, a clear third behind Trump and DeSantis.

In polls conducted mostly or partially over the telephone, in which people are contacted randomly, not only does Ramaswamy lag his average score — he’s way back in seventh place, at just 2.6 percent.

There’s no singular, obvious explanation for the disparity, but there are some leading theories for it, namely the demographic characteristics and internet literacy of Ramaswamy’s supporters, along with the complications of an overly white audience trying to pronounce the name of a son of immigrants from India over the phone.

The two polls released on Thursday stand as examples of the discrepancy. A Fairleigh Dickinson University poll had Ramaswamy at 3 percent, tied with Haley for fifth place and behind both Christie and Pence.

Meanwhile, a web-panel poll from the Republican firm Cygnal had him at 11 percent, 1 point ahead of DeSantis. (Cygnal is also working for Ramaswamy’s campaign, though this poll was commissioned independently of that effort, and it isn’t the only one to show him cracking into the low double-digits.)

There’s no doubt Ramaswamy is rising. The only question is how much. While he’s at 6.1 percent in the RealClearPolitics average now, he was at 3.1 percent a month ago and 2.2 percent the month prior.

And unlike other candidates who have invested significant financial resources to gain traction — like Scott and North Dakota Gov. Doug Burgum — Ramaswamy has done it almost entirely on the back of earned media. According to AdImpact, Ramaswamy, who is mostly self-funding his campaign, has spent $1.8 million on TV and digital advertising, far less than the other contenders, some of whom have already cracked $20 million on ads when combined with supportive super PACs.

Instead, the attention Ramaswamy has garnered comes from TV and digital media catering to the highest-intensity voters paying the closest attention to the race thus far, even though the first vote is still more than five months away. And high-information, high-interest voters — those who might be called the most “online” — are also likely to be over-represented in opt-in internet polls.

This could change when he’s on the debate stage later this month, but the information those voters are receiving about Ramaswamy is also almost uniformly positive so far. According to Morning Consult’s weekly tracking of the race, 36 percent of GOP primary voters reported hearing something positive about him in the past week, a greater percentage than for any other candidate, even though a slim majority, 52 percent, said they hadn’t heard anything about him at all.

Polls generally also show Ramaswamy is favored more by younger Republicans and voters with college degrees, also groups that are often more present in online polls than phone surveys.

Just because there are apparent methodological effects at play doesn’t mean one batch of polls is more likely to be closer to reality than the other. And it’s certainly possible that they’ll converge as the race goes along, particularly once larger numbers of voters get to know the candidates through televised debates and other more closely followed events.

On the other hand, there’s one reason why Ramaswamy’s support might actually be artificially lower on the phone than in point-and-click internet polls: his name.

Pollsters work hard to ensure that their interviewers pronounce the candidates’ name correctly. They provide pronunciation guides — like this one in a accurate New York Times/Siena College poll: “Viv-AKE Rahm-uh-SWAM-ee” — and call centers monitor some of the interviews to make sure their employees are saying it right. (If you’ve ever heard that one of your phone calls was “being monitored for quality assurance,” this is how it works.)

And then, in order for a respondent to choose Ramaswamy in a phone poll, he or she will have to repeat the name back to the interviewer. And the national Republican electorate is definitely older and whiter than the country as a whole: In a accurate New York Times/Siena College poll, more than 80 percent of likely GOP primary voters were white, and 38 percent were 65 or older.

“When your candidate is named Vivek Ramaswamy,” said one Republican pollster, granted anonymity to discuss the polling dynamics candidate, “that’s like DEFCON 1 for confusion and mispronunciation.”

Sat, 12 Aug 2023 00:50:00 -0500 en text/html https://www.politico.com/news/2023/08/12/vivek-ramaswamy-polls-rise-00110937 MATH.5760 Statistical Programming using SAS (Formerly 92.576)
Id: 008449 Credits Min: 3 Credits Max: 3

Description

An introduction to creation and manipulation of databases and statistical analysis using SAS software. SAS is widely used in the pharmaceutical industry, medical research and other areas. Cannot be used as a Math Elective.

View Current Offerings
Tue, 10 Oct 2023 14:50:00 -0500 en text/html https://www.uml.edu/catalog/courses/MATH/5760
New Hampshire Institute of Politics holds Lesser-Known Candidates Forum

Not all the people running for president are names that are widely known. On Thursday night, the New Hampshire Institute of Politics at Saint Anselm College held its Lesser-Known Candidates Forum.The tradition goes back to 1972. "One man can't do the job, it's just too much,” said Mary Maxwell, a Republican presidential candidate."I’ve lived in 11 different cities and two countries, and in the wisdom I’ve gained from all of these places, I’ve learned that everyone in those communities values their family, their friends, and their community most of all,” said Gabriel Cornejo, a Democratic presidential candidate.“Too many young people are being gunned down in our country every day, too many people are losing their children, and too many people are losing their parents in this country,” said Darius Mitchell, a Republican presidential candidate."Vermin Supreme will take away your guns and supply you better ones. These better guns will shoot marshmallows, but they will still be lethal,” said Vermin Supreme, a Democratic presidential candidate. Twenty candidates took part: 14 Democrats and six Republicans.The New Hampshire primary will take place on Jan. 23, 2024. WMUR-TV and ABC News will be hosting a debate days before the first-in-the-nation primary.

Not all the people running for president are names that are widely known.

On Thursday night, the New Hampshire Institute of Politics at Saint Anselm College held its Lesser-Known Candidates Forum.

The tradition goes back to 1972.

"One man can't do the job, it's just too much,” said Mary Maxwell, a Republican presidential candidate.

"I’ve lived in 11 different cities and two countries, and in the wisdom I’ve gained from all of these places, I’ve learned that everyone in those communities values their family, their friends, and their community most of all,” said Gabriel Cornejo, a Democratic presidential candidate.

“Too many young people are being gunned down in our country every day, too many people are losing their children, and too many people are losing their parents in this country,” said Darius Mitchell, a Republican presidential candidate.

"Vermin Supreme will take away your guns and supply you better ones. These better guns will shoot marshmallows, but they will still be lethal,” said Vermin Supreme, a Democratic presidential candidate.

Twenty candidates took part: 14 Democrats and six Republicans.

The New Hampshire primary will take place on Jan. 23, 2024. WMUR-TV and ABC News will be hosting a debate days before the first-in-the-nation primary.

Thu, 07 Dec 2023 15:00:00 -0600 en text/html https://www.wmur.com/article/new-hampshire-iop-lesser-known-candidates-forum-23/46069998
El Salvador general election latest polls by candidate 2023

Basic Account

Get to know the platform

You only have access to basic statistics.

Starter Account

The ideal entry-level account for individual users

  • Instant access to 1m statistics
  • Download in XLS, PDF &amp; PNG format
  • Detailed references

Professional Account

Full access

Business Solutions including all features.

* Prices do not include sales tax.

Learn more about how Statista can support your business.

El Mundo. (November 13, 2023). Polls on the 2024 Salvadoran presidential election prospects in November 2023, by candidate [Graph]. In Statista. Retrieved January 05, 2024, from https://www.statista.com/statistics/1426582/el-salvador-election-poll-candidate/

El Mundo. "Polls on the 2024 Salvadoran presidential election prospects in November 2023, by candidate." Chart. November 13, 2023. Statista. Accessed January 05, 2024. https://www.statista.com/statistics/1426582/el-salvador-election-poll-candidate/

El Mundo. (2023). Polls on the 2024 Salvadoran presidential election prospects in November 2023, by candidate. Statista. Statista Inc.. Accessed: January 05, 2024. https://www.statista.com/statistics/1426582/el-salvador-election-poll-candidate/

El Mundo. "Polls on The 2024 Salvadoran Presidential Election Prospects in November 2023, by Candidate." Statista, Statista Inc., 13 Nov 2023, https://www.statista.com/statistics/1426582/el-salvador-election-poll-candidate/

El Mundo, Polls on the 2024 Salvadoran presidential election prospects in November 2023, by candidate Statista, https://www.statista.com/statistics/1426582/el-salvador-election-poll-candidate/ (last visited January 05, 2024)

Polls on the 2024 Salvadoran presidential election prospects in November 2023, by candidate [Graph], El Mundo, November 13, 2023. [Online]. Available: https://www.statista.com/statistics/1426582/el-salvador-election-poll-candidate/

Tue, 05 Dec 2023 10:00:00 -0600 en text/html https://www.statista.com/statistics/1426582/el-salvador-election-poll-candidate/ Mexico general election latest polls by candidate 2024

Basic Account

Get to know the platform

You only have access to basic statistics.

Starter Account

The ideal entry-level account for individual users

  • Instant access to 1m statistics
  • Download in XLS, PDF &amp; PNG format
  • Detailed references

Professional Account

Full access

Business Solutions including all features.

* Prices do not include sales tax.

Learn more about how Statista can support your business.

Poder360, &amp; Facebook. (November 13, 2023). Polls on the 2024 Mexican presidential elections in November 2023, by candidate [Graph]. In Statista. Retrieved January 05, 2024, from https://www.statista.com/statistics/1424986/mexico-election-poll-candidate/

Poder360, und Facebook. "Polls on the 2024 Mexican presidential elections in November 2023, by candidate." Chart. November 13, 2023. Statista. Accessed January 05, 2024. https://www.statista.com/statistics/1424986/mexico-election-poll-candidate/

Poder360, Facebook. (2023). Polls on the 2024 Mexican presidential elections in November 2023, by candidate. Statista. Statista Inc.. Accessed: January 05, 2024. https://www.statista.com/statistics/1424986/mexico-election-poll-candidate/

Poder360, and Facebook. "Polls on The 2024 Mexican Presidential Elections in November 2023, by Candidate." Statista, Statista Inc., 13 Nov 2023, https://www.statista.com/statistics/1424986/mexico-election-poll-candidate/

Poder360 &amp; Facebook, Polls on the 2024 Mexican presidential elections in November 2023, by candidate Statista, https://www.statista.com/statistics/1424986/mexico-election-poll-candidate/ (last visited January 05, 2024)

Polls on the 2024 Mexican presidential elections in November 2023, by candidate [Graph], Poder360, &amp; Facebook, November 13, 2023. [Online]. Available: https://www.statista.com/statistics/1424986/mexico-election-poll-candidate/

Tue, 05 Dec 2023 10:00:00 -0600 en text/html https://www.statista.com/statistics/1424986/mexico-election-poll-candidate/ Federal Statistical Office of Germany

4 January 2024 Inflation rate of +3.7% expected in December 2023

The inflation rate in Germany is expected to be +3.7% in December 2023. The inflation rate is measured as the change in the consumer price index (CPI) compared with the same month a year earlier. Based on the results available so far, the Federal Statistical Office (Destatis) also reports that consumer prices are expected to increase by 0.1% on November 2023. The annual average inflation rate is expected to stand at +5.9% in 2023.

More

Wed, 03 Jan 2024 10:00:00 -0600 en text/html https://www.destatis.de/EN/Home/_node.html




A00-240 book | A00-240 approach | A00-240 syllabus | A00-240 course outline | A00-240 benefits | A00-240 test | A00-240 Practice Test | A00-240 plan | A00-240 information source | A00-240 test plan |


Killexams test Simulator
Killexams Questions and Answers
Killexams Exams List
Search Exams

Killexams.com A00-240 Exam Simulator Screens


Exam Simulator 3.0.9 uses the actual SASInstitute A00-240 questions and answers that make up Exam Braindumps. A00-240 Exam Simulator is full screen windows application that provide you the experience of same test environment as you experience in test center.

About Us


We are a group of Certified Professionals, working hard to provide up to date and 100% valid test questions and answers.

Who We Are

We help people to pass their complicated and difficult SASInstitute A00-240 exams with short cut SASInstitute A00-240 Exam Braindumps that we collect from professional team of Killexams.com

What We Do

We provide actual SASInstitute A00-240 questions and answers in Exam Braindumps that we obtain from killexams.com. These Exam Braindumps contains up to date SASInstitute A00-240 questions and answers that help to pass exam at first attempt. Killexams.com develop Exam Simulator for realistic exam experience. Exam simulator helps to memorize and practice questions and answers. We take premium exams from Killexams.com

Why Choose Us

Exam Braindumps that we provide is updated on regular basis. All the Questions and Answers are verified and corrected by certified professionals. Online test help is provided 24x7 by our certified professionals. Our source of exam questions is killexams.com which is best certification exam Braindumps provider in the market.

97,860

Happy clients

245

Vendors

6,300

Exams Provided

7,110

Testimonials

Premium A00-240 Full Version


Our premium A00-240 - SAS Statistical Business Analysis SAS9: Regression and Model contains complete question bank contains actual exam questions. Premium A00-240 braindumps are updated on regular basis and verified by certified professionals. There is one time payment during 3 months, no auto renewal and no hidden charges. During 3 months any change in the exam questions and answers will be available in your download section and you will be intimated by email to re-download the exam file after update.

Contact Us


We provide Live Chat and Email Support 24x7. Our certification team is available only on email. Order and Troubleshooting support is available 24x7.

4127 California St,
San Francisco, CA 22401

+1 218 180 22490