si4t jars had temporarily been removed from deployer-web addon to test publishing stability, and consequently content was being published to the broker DB without removing the si4t tags. Issue persisted after disabling the SI4T extension, because the index data for search tags was being included in the JSON object that was being saved into the broker database. The content service would successfully render content and then pages would randomly fail with a 404 or 500 error.
To be able to successfully view pages and remove the si4t tags from broker DB, developer added si4t jars back to deployer-web extension and republished offending pages.
To identify pages which need to be republished, DBA ran below query against the broker database.
select publication_id, component_id, content from PAGE_CONTENT where content like '%indexdata%'
Output such as the below will be returned
NAMESPACE_ID,PUBLICATION_ID,PAGE_ID,CONTENT,CHARSET,PAGE_CONTENT_KEY
1,31,2807,"<!-- INDEX-DATA-START:<indexdata><url>/en/legal/terms-of-use/index.html</url><title>Terms of Use</title>
...