Visual perception starts with localized filters that subdivide the image into fragments that undergo separate analyses. The visual system has to reconstruct objects by grouping image fragments that belong to the same object. A widely held view is that perceptual grouping occurs in parallel across the visual scene and without attention. To test this idea, we measured the speed of grouping in pictures of animals and vehicles. In a classification task, these pictures were categorized efficiently. In an image-parsing task, participants reported whether two cues fell on the same or different objects, and we measured reaction times. Despite the participants' fast object classification, perceptual grouping required more time if the distance between cues was larger, and we observed an additional delay when the cues fell on different parts of a single object. Parsing was also slower for inverted than for upright objects. These results imply that perception starts with rapid object classification and that rapid classification is followed by a serial perceptual grouping phase, which is more efficient for objects in a familiar orientation than for objects in an unfamiliar orientation.