The gist of the scene refers to a summary of a semantic description of the scene, such as its category, its layout, and a few objects inside the scene and their attributes. This article reviews the recent progress in the study of scene perception and visual search of the scenes. Human can recognize the gist of a novel image in a single glance. The scenes are processed very efficiently, fast enough to be able to influence object processing. Gist information activates the higher-order semantic network, and facilitates object recognition. The visual system makes use of contextual gist information for object search in natural scenes. The gist guidance model of visual search that combines scene context and top-down mechanisms predicts the image regions to be fixated by the observers performing visual search.