We are developing a system for recommending sightseeing spots based on users' preferences derived from landscape images. This study explores various color extraction methods, with hierarchical clustering emerging as a promising technique. When applying this method to real tourist images, we faced challenges, such as misaligned image orientations and extended processing times up to 10 minutes for larger photos. In terms of text extraction from images using OCR, Google Cloud Vision was suggested to be more effective compared to Tesseract.