Today we would like to take a look at the benefits and limitations of a computer adaptive test. Not that we can persuade the GMAC to bring back the paper test; Rather, we would like you to be acutely aware of the upside and downside of taking such a test in order to best acclimate yourself to the test environment.
In general, a CAT greatly increases the flexibility of test management. The key benefits include:
Despite the above advantages, computer adaptive tests have numerous limitations, and they raise several technical and procedural issues. Here we just focus on the limitations for the GMAT.
Conclusion: Practice makes perfect! Prepare with more CATs. Read long articles on your computer screen. Take mock tests in a setting similar to your test center during the same time period of a day. Reduce your response time in the areas you are best at, for example, getting your Sentence Correction time down to less than 1 minute per question. That way you can save time for the question types that you are less confidence about and achieve an overall higher total score!
Scores, Percentiles and Their Significance
On the traditional test, all questions were worth equal points. On the GMAT CAT each question is assigned points based on the level of difficulty. There are three factors that determine your final score. The first two factors have the most bearing on your score.
We discussed the third factor in our previous edition. In effect, this is measuring how well rounded you are in all areas. An individual who scores well in only a few areas will have a lower score than another individual who scored well in all areas. The way that your performance in the various tested topics is quantified is by standard deviation. The deviation between different areas is calculated and the more you deviate [i.e. higher standard deviation] between each section, the lower your GMAT score. This third factor doesn’t have as much weight on the final score as the number and difficulty of questions.
The initial (raw) score from the GMAT CAT is the ability level that corresponds to the response pattern on the administered questions. The two raw scores are combined (not the scaled scores) to form a total scaled score from 200-800. The raw score is also transformed to the Quantitative or Verbal scaled scores. For Verbal and Quantitative sections the scaled score ranges from 0 to 60.
In your score report you are also given a percentile score. This percentile means that you scored better than that percent of the testers. Percentile rankings are based on the entire GMAT test-taking population during the three most recent years. So if you got a score in the 85th percentile that means that you did better than 85% of all the people who took the test in the past three years. While GMAT scores are only a part of your overall application, a high percentile rank demonstrates that you are better than most people in the Verbal and/or Quantitative areas.
For Verbal, you will be in the 99th percentile if your scaled score is at or above 46. In comparison, it is much more competitive in Quantitative. The 99th percentile requires a scaled score of 51. This means fewer people score on Verbal than Quantitative. So make sure you don’t drag your total score down because of a mediocre performance on Verbal.
There is a degree of error with the GMAT as with all standardized tests. The standard error of difference for the total GMAT score is about 41, according to Graduate Management Admissions Council. This means that the difference between the total GMAT scores actually received by two test takers could be within 41 points above or below the difference between the test takers’ scores of true ability. The standard error of difference for the Verbal scaled score is 3.9, and for the Quantitative scaled score 4.3. Research also indicates that a test-taker will most likely earn a Total score within about 30 points of a score of true ability. Your Verbal and Quantitative scaled scores are probably within about 2.9 points of your true scores.
GMAT scores are a relatively reliable predictor of academic performance in the first year of a business school program. Studies have shown that the median correlation between GMAT scores and first-year grades was 0.41 (perfect correlation is 1.0). Comparing 0.41 to the median correlation of 0.28 between undergraduate grade point average and first-year grades, you can conclude that business schools do have a strong incentive to see good GMAT scores from applicants. Because there is a degree of error, we all should exercise caution in comparing two scores. That is why other parts of your business school application are also as crucial to your admission.
Computer-Adaptive Test Taking Strategies
Unlike the old paper-and-pencil administered GMAT of the past, the GMAT CAT is better adapted to measure your ability with fewer questions. On the old paper-and-pencil GMAT you would answer 61 questions of varying difficulty in each section. So an average test taker would breeze through the easy questions, get most of the difficult questions wrong, and get some of the medium difficulty questions right and some wrong. With the CAT, you answer only 41 questions for the Verbal section and 37 for the Quantitative section that are tailored to match your level of ability. So the average test taker is no longer wasting time answering the easy questions that he will most likely get right, nor the really difficult questions that he will most likely get wrong.
You are given a question of moderate difficulty at the beginning of the test and first question in each question type. If you answer this question correctly, then the difficulty level increases. If you answer incorrectly, the difficulty level decreases and this up-down system continues through the duration of the exam. The jump to a higher difficulty or the drop to a lower difficulty level decreases as you move through the test.
The first few questions you answer will either move you to a significantly more difficult or easy level; however, the last few questions you answer will only slightly increase or decrease in difficulty. Please also bear in mind that there is a penalty for not finishing a section. The details have not been released by the GMAC or Pearson. But for each unfinished section, the penalty is about 4x the point for an incorrectly answered question. If you run out of time, then just randomly answer the last questions, at least you have 20% of the chance of getting it right for each question. If these questions are part of the trial un-scored questions, most likely the impact on your score is not that great. (Roughly 37 out of 41 verbal questions are scored, 33 out of 37 math questions are scored. So about 4 in each section are unscored.) We need to caution you against guessing in the early stage of the test. Since your chances of guessing correctly are only 20% for each question, an incorrect choice moves you down to a less difficulty level very quickly in the beginning of the test. After a few randomly guessed wrong choices, the test assumes an appropriate level for you and it will be very hard for you to regain your momentum later as the CAT algorithm will not give you very difficult questions for you later to pile up some last minute points.
In sum, at the beginning, in as few as four questions you can move up to the highest possible level by responding correctly to all four questions, or down to the lowest possible level by responding incorrectly to all of them. This system was developed to better “zero in” on your real skill level. Think of it as adjusting a lens. You first adjust the macro-focus to ensure you are in the right range of focus, and then you adjust the micro-focus to fine tune to reach the optimal focal point. The GMAT CAT uses a complex algorithm, which we explain before, to focus in on your real skill level.
Therefore, please take particular care with the first few questions of each question type in both Verbal and Quantitative sections. Sometimes, it might be well into around the 10th question before you see a new verbal type question. Whenever you see that first question of a new type, slow down and do your best without unnecessarily spending too much time on it. Otherwise, you will have to rush through later questions. It is essentially a balancing act in which you need to pace yourself from the beginning to the end in order to maximize your score.
Computer-Adaptive Testing Algorithm
As mentioned in yesterday’s post, there are three main types of statistics required of all items in an item bank –
Computer adaptive testing (“CAT”) can begin when such an item bank exists. However, two more steps are required. First, the test-maker needs to select a procedure for determining test-takers’ ability estimates based upon their performance on the tested items. Second, the test-maker needs to choose an algorithm for sequencing the set of test items to be administered to test-takers.
An ideal item pool for a computer adaptive test would be one with a large number of highly discriminating items well distributed at each ability level. The information functions for these items would appear as a series of peaked distributions across all levels of ability estimate.
The CAT algorithm is usually an iterative process with the following steps:
We will continue with our analysis on the GMAT CAT scoring system tomorrow.
So you know what the GMAT is all about, but you’re unsure exactly how answering all of those questions results in a final score that could make or break your chances for admission into business school. In this and following entries we will break down for you the system behind your score and how the test is administered to obtain that score.
Item Response Theory
Item Response Theory (IRT) is the system used by Computer-Adaptive Testing such as the GMAT CAT to determine which question is the “best” next question based on the demonstrated ability level of the test taker. It is a statistical model that relates the probability of a test-taker correctly answering a problem to characteristics of the problem and the test-taker’s true ability. It was first introduced in 1968.
The IRT model states that the probability of a correct response to item i for test-taker X is a function of ai, bi, and ci and test-taker X’s true ability. A person’s estimated true score is denoted as theta (). True score is the score a test-taker would receive on a perfectly reliable test. Since it is unavoidable for all tests to contain error, true scores are a theoretical concept; in an actual testing program, we will never know an individual’s true score. However, we can, compute an estimate of a test-taker’s true score and we can estimate the amount of error in that estimate.
P(ui=1 | ThetaX, ai, bi, ci) = ci + (1 - ci) / [1 + exp(-1.7 ai (ThetaX - bi)]
The model typically involves three parameters –
ai defines the ability of the item to discriminate between individual test-takers,
b, is the difficulty of the item, and
ci is the probability that the test-taker would get the question right solely by guessing.
On the GMAT, this model is used to determine your final score, i.e., where you stand on the ability scale, or, what your Theta is. For example, in the graph below, the horizontal axis is the ability scale, ranging from very low (-3.0) to very high (+3.0). When ability follows the normal curve, 68% of the test-takers will have ability between -1 and +1; 95% will be between -2.0 and +2.0. The vertical axis is the probability of responding correctly to this item.
The ai parameter defines the slope of the curve at its inflection point. The curve would be flatter with a lower value of ai; steeper with a higher value. Thus aidenotes how well the item is able to discriminate between test-takers of slightly different ability (within a narrow effective range).
The bi parameter defines the location of the curve’s inflection point along the theta scale. Lower values of bi will shift the curve to the left; higher to the right. The bidoes not affect the shape of the curve.
The lower asymptote is at ci=.25. (An asymptote is a straight line or curve A to which another curve B (the one being studied) approaches closer and closer as one moves along it.) This is the probability of a correct response for test-takers with very little ability (e.g. = -2.0 or -2.6). The curve has an upper asymptote at 1.0; high ability test-takers are very likely to respond correctly.
We will continue with our analysis on the GMAT CAT scoring system tomorrow.
- Pace It to Ace It – Test Taking Tips for the SAT
- Specialized Business MS Degrees on the Rise
- Three ways to ace your MBA Admission interview through proper prep
- Applying for an M.B.A.: Reading Between the Lines
- The Ten Toughest Business Schools to Enter
- Basic Strategies to Conquer the GMAT
- The Profile of the 2011 GMAT Test Taker Demonstrates Growing Diversity.
- Business School Scholarship Application Advice
- The Changing Face of Executive MBA programs
- Business School Reapplication: To do, or not to do?