To the general public, text-to-image generators, such as Midjourney and DALL-E, seem to work through magic and, indeed, their inner workings are often frustratingly opaque. This is, in part, due to the lack of transparency from big tech companies around aspects like training data and how the algorithms powering their generators work, on one hand, and the deep and technical knowledge in computer science and machine learning, on the other, that is required to understand these workings. Acknowledging these aspects, this qualitative examination seeks to better understand the black box of algorithmic vision through asking a large language model to first describe two sets of visually distinct journalistic images. The resulting descriptions are then fed into the same large language model to see how the AI tool remediates these images. In doing so, this study evaluates how machines process images in each set and which specific visual style elements across three dimensions (representational, aesthetic, and technical) machine vision regards as important for the description and which it does not. Taken together, this exploration helps scholars understand more about how computers process, describe, and render images, including the attributes that they focus on and tend to ignore when doing so.<p></p>