Tag Archive


Today we would like to take a look at the benefits and limitations of a computer adaptive test. Not that we can persuade the GMAC to bring back the paper test; Rather, we would like you to be acutely aware of the upside and downside of taking such a test in order to best acclimate yourself to the test environment.

In general, a CAT greatly increases the flexibility of test management. The key benefits include:

  • Tests can be taken year around at any registered centers.
  • Unofficial Scores are available immediately, expediting the B-school application process.
  • Tests are individually paced so that a test-taker can choose a more suitable time of the day to take the test and special requirements for the disabled can be better accommodated.
  • More accurate scores can be provided by a CAT over a wide range of abilities than by a traditional test.

For test-centers

  • Minimal training of test administrators is required.
  • Test security is increased because hard copy test booklets are not distributed.

For test-makers

  • Test question pools and scoring method can be updated centrally and distributed at once later. Cost can be decreased while quality and speed can be improved substantially.

Despite the above advantages, computer adaptive tests have numerous limitations, and they raise several technical and procedural issues. Here we just focus on the limitations for the GMAT.

  • Test-takers need to perform equally well when reading a passage, question or graph on the computer screen as on the paper. The same applies to writing an essay.
  • Test-takers need to maintain a relatively high level of comfortableness with taking a test on the computer, which means that they should not make simple mistakes such as not selecting the correct answer before continuing. Other examples include that test-takers are not usually permitted to go back and change answers, requiring them to do away with their long-time paper test-taking habits. And we all know “Old habits die hard”!
  • With each examinee receiving a different set of questions, there can be perceived inequities.
  • There is a limited pool of test questions with the most desirable characteristics of a CAT item. This means that test security in the long run will be affected as people may try to remember the harder questions and compare notes with others. This issue can be addressed by expanding the question pool. Otherwise, it will degrade the test quality, or a longer test would be needed.

Conclusion: Practice makes perfect! Prepare with more CATs. Read long articles on your computer screen. Take mock tests in a setting similar to your test center during the same time period of a day. Reduce your response time in the areas you are best at, for example, getting your Sentence Correction time down to less than 1 minute per question. That way you can save time for the question types that you are less confidence about and achieve an overall higher total score!

Posted on November 19, 2007 by Manhattan Review

This entry was posted in GMAT and tagged , , , , , . Bookmark the permalink.

Scores, Percentiles and Their Significance

On the traditional test, all questions were worth equal points. On the GMAT CAT each question is assigned points based on the level of difficulty. There are three factors that determine your final score. The first two factors have the most bearing on your score.

  1. The number of questions you answered correctly.
  2. The difficulty level of each question you answered correctly.
  3. The range of cognitive abilities tested by the questions answered correctly.

We discussed the third factor in our previous edition. In effect, this is measuring how well rounded you are in all areas. An individual who scores well in only a few areas will have a lower score than another individual who scored well in all areas. The way that your performance in the various tested topics is quantified is by standard deviation. The deviation between different areas is calculated and the more you deviate [i.e. higher standard deviation] between each section, the lower your GMAT score. This third factor doesn’t have as much weight on the final score as the number and difficulty of questions.

The initial (raw) score from the GMAT CAT is the ability level that corresponds to the response pattern on the administered questions. The two raw scores are combined (not the scaled scores) to form a total scaled score from 200-800. The raw score is also transformed to the Quantitative or Verbal scaled scores. For Verbal and Quantitative sections the scaled score ranges from 0 to 60.

In your score report you are also given a percentile score. This percentile means that you scored better than that percent of the testers. Percentile rankings are based on the entire GMAT test-taking population during the three most recent years. So if you got a score in the 85th percentile that means that you did better than 85% of all the people who took the test in the past three years. While GMAT scores are only a part of your overall application, a high percentile rank demonstrates that you are better than most people in the Verbal and/or Quantitative areas.

For Verbal, you will be in the 99th percentile if your scaled score is at or above 46. In comparison, it is much more competitive in Quantitative. The 99th percentile requires a scaled score of 51. This means fewer people score on Verbal than Quantitative. So make sure you don’t drag your total score down because of a mediocre performance on Verbal.

There is a degree of error with the GMAT as with all standardized tests. The standard error of difference for the total GMAT score is about 41, according to Graduate Management Admissions Council. This means that the difference between the total GMAT scores actually received by two test takers could be within 41 points above or below the difference between the test takers’ scores of true ability. The standard error of difference for the Verbal scaled score is 3.9, and for the Quantitative scaled score 4.3. Research also indicates that a test-taker will most likely earn a Total score within about 30 points of a score of true ability. Your Verbal and Quantitative scaled scores are probably within about 2.9 points of your true scores.

GMAT scores are a relatively reliable predictor of academic performance in the first year of a business school program. Studies have shown that the median correlation between GMAT scores and first-year grades was 0.41 (perfect correlation is 1.0). Comparing 0.41 to the median correlation of 0.28 between undergraduate grade point average and first-year grades, you can conclude that business schools do have a strong incentive to see good GMAT scores from applicants. Because there is a degree of error, we all should exercise caution in comparing two scores. That is why other parts of your business school application are also as crucial to your admission.

Posted on November 14, 2007 by Manhattan Review

This entry was posted in Admissions, GMAT, MBA and tagged , , , . Bookmark the permalink.

Computer-Adaptive Test Taking Strategies

Unlike the old paper-and-pencil administered GMAT of the past, the GMAT CAT is better adapted to measure your ability with fewer questions. On the old paper-and-pencil GMAT you would answer 61 questions of varying difficulty in each section. So an average test taker would breeze through the easy questions, get most of the difficult questions wrong, and get some of the medium difficulty questions right and some wrong. With the CAT, you answer only 41 questions for the Verbal section and 37 for the Quantitative section that are tailored to match your level of ability. So the average test taker is no longer wasting time answering the easy questions that he will most likely get right, nor the really difficult questions that he will most likely get wrong.

You are given a question of moderate difficulty at the beginning of the test and first question in each question type. If you answer this question correctly, then the difficulty level increases. If you answer incorrectly, the difficulty level decreases and this up-down system continues through the duration of the exam. The jump to a higher difficulty or the drop to a lower difficulty level decreases as you move through the test.

The first few questions you answer will either move you to a significantly more difficult or easy level; however, the last few questions you answer will only slightly increase or decrease in difficulty. Please also bear in mind that there is a penalty for not finishing a section. The details have not been released by the GMAC or Pearson. But for each unfinished section, the penalty is about 4x the point for an incorrectly answered question. If you run out of time, then just randomly answer the last questions, at least you have 20% of the chance of getting it right for each question. If these questions are part of the trial un-scored questions, most likely the impact on your score is not that great. (Roughly 37 out of 41 verbal questions are scored, 33 out of 37 math questions are scored. So about 4 in each section are unscored.) We need to caution you against guessing in the early stage of the test. Since your chances of guessing correctly are only 20% for each question, an incorrect choice moves you down to a less difficulty level very quickly in the beginning of the test. After a few randomly guessed wrong choices, the test assumes an appropriate level for you and it will be very hard for you to regain your momentum later as the CAT algorithm will not give you very difficult questions for you later to pile up some last minute points.

In sum, at the beginning, in as few as four questions you can move up to the highest possible level by responding correctly to all four questions, or down to the lowest possible level by responding incorrectly to all of them. This system was developed to better “zero in” on your real skill level. Think of it as adjusting a lens. You first adjust the macro-focus to ensure you are in the right range of focus, and then you adjust the micro-focus to fine tune to reach the optimal focal point. The GMAT CAT uses a complex algorithm, which we explain before, to focus in on your real skill level.

Therefore, please take particular care with the first few questions of each question type in both Verbal and Quantitative sections. Sometimes, it might be well into around the 10th question before you see a new verbal type question. Whenever you see that first question of a new type, slow down and do your best without unnecessarily spending too much time on it. Otherwise, you will have to rush through later questions. It is essentially a balancing act in which you need to pace yourself from the beginning to the end in order to maximize your score.

Posted on November 13, 2007 by Manhattan Review

This entry was posted in Admissions, GMAT, MBA and tagged , , , , . Bookmark the permalink.

Computer-Adaptive Testing Algorithm

As mentioned in yesterday’s post, there are three main types of statistics required of all items in an item bank –

  1. ai – the ability of the item to discriminate between individual test-takers
  2. bi – difficulty level, and
  3. ci – the probability that the test-taker would get the question right solely by guessing.

Computer adaptive testing (“CAT”) can begin when such an item bank exists. However, two more steps are required. First, the test-maker needs to select a procedure for determining test-takers’ ability estimates based upon their performance on the tested items. Second, the test-maker needs to choose an algorithm for sequencing the set of test items to be administered to test-takers.

An ideal item pool for a computer adaptive test would be one with a large number of highly discriminating items well distributed at each ability level. The information functions for these items would appear as a series of peaked distributions across all levels of ability estimate.

The CAT algorithm is usually an iterative process with the following steps:

  1. Given the currently estimated ability level of a test-taker at a given point (usually the first question is started at mid ability level), the program evaluates all the items that have not yet been administered to determine which will be the best one to administer next.In this approach, the “best” next item would be the one that provides the most information about the test-taker. Typically difficulty level of an item is the most important parameter. However, in order to be able to clearly discriminate the ability among individual test-takers, the test-maker also incorporates other factors in the item selection process on a particular exam. They include different question types (data sufficiency vs. problem solving; critical reasoning vs. sentence correction), content (e.g., algebra, ratios, combinatorics, topic and inference questions for the same reading comprehension passage, etc.), and exposure (i.e., the number of times the question has been seen by other test takers already during a given period).Demonstrating to the CAT that you can handle a variety of substantive areas in all question formats will increase your GMAT score. The greater the variance among your ability in different tested topics, the lower your score. In other words, the GMAT rewards generalists—test takers who demonstrate a broad spectrum of competencies. This approach does make sense as in a business world, being well-rounded and knowledgeable can be positively correlated to a manager’s decision-making skills and managerial ability in general.
  2. The “best” next item is administered and the test-taker answers
  3. The program computes a new ability estimate based on the answers to all of the previous items
  4. Steps 1 through 3 are repeated until a stopping criterion is satisfied.

We will continue with our analysis on the GMAT CAT scoring system tomorrow.

Posted on November 12, 2007 by Manhattan Review

This entry was posted in Admissions, GMAT, MBA and tagged , , , . Bookmark the permalink.

So you know what the GMAT is all about, but you’re unsure exactly how answering all of those questions results in a final score that could make or break your chances for admission into business school. In this and following entries we will break down for you the system behind your score and how the test is administered to obtain that score.

Item Response Theory

Item Response Theory (IRT) is the system used by Computer-Adaptive Testing such as the GMAT CAT to determine which question is the “best” next question based on the demonstrated ability level of the test taker. It is a statistical model that relates the probability of a test-taker correctly answering a problem to characteristics of the problem and the test-taker’s true ability. It was first introduced in 1968.

The IRT model states that the probability of a correct response to item i for test-taker X is a function of ai, bi, and ci and test-taker X’s true ability. A person’s estimated true score is denoted as theta (). True score is the score a test-taker would receive on a perfectly reliable test. Since it is unavoidable for all tests to contain error, true scores are a theoretical concept; in an actual testing program, we will never know an individual’s true score. However, we can, compute an estimate of a test-taker’s true score and we can estimate the amount of error in that estimate.

P(ui=1 | ThetaXai, bi, ci) = ci + (1 - ci) / [1 + exp(-1.7 ai (ThetaXbi)]

The model typically involves three parameters –

ai defines the ability of the item to discriminate between individual test-takers,

b, is the difficulty of the item, and

ci is the probability that the test-taker would get the question right solely by guessing.

On the GMAT, this model is used to determine your final score, i.e., where you stand on the ability scale, or, what your Theta is. For example, in the graph below, the horizontal axis is the ability scale, ranging from very low (-3.0) to very high (+3.0). When ability follows the normal curve, 68% of the test-takers will have ability between -1 and +1; 95% will be between -2.0 and +2.0. The vertical axis is the probability of responding correctly to this item.


The ai parameter defines the slope of the curve at its inflection point. The curve would be flatter with a lower value of ai; steeper with a higher value. Thus aidenotes how well the item is able to discriminate between test-takers of slightly different ability (within a narrow effective range).

The bi parameter defines the location of the curve’s inflection point along the theta scale. Lower values of bi will shift the curve to the left; higher to the right. The bidoes not affect the shape of the curve.

The lower asymptote is at ci=.25. (An asymptote is a straight line or curve A to which another curve B (the one being studied) approaches closer and closer as one moves along it.) This is the probability of a correct response for test-takers with very little ability (e.g. = -2.0 or -2.6). The curve has an upper asymptote at 1.0; high ability test-takers are very likely to respond correctly.

We will continue with our analysis on the GMAT CAT scoring system tomorrow.

Posted on November 11, 2007 by Manhattan Review

This entry was posted in Admissions, GMAT and tagged , , , , , . Bookmark the permalink.