Listen to My Heart – GameArchives

Release Year: 2020
Platforms: Windows
Publisher: Nazar Studios
Developer: Chengdu le shang wangluo keji youxian gongsi
Genre: Adventure
Perspective: 3rd-person (Other)
Game Mode: Single-player
Gameplay: Multiple endings, Visual novel
Setting: Contemporary
Average Score: 89/100

Description

Listen to My Heart is a romantic visual novel set in contemporary times, featuring real photographs and voiced characters. The story follows Pei Ruotong, a radio station editor who has faced numerous personal hardships—including failing college exams, undergoing heart surgery, losing her parents, and being dumped by her boyfriend. Through her job, she meets three men, and one of them will profoundly change her life. The game uses live-action visuals with dynamic camera techniques like zooming and panning, includes a branching storyline map to track explored scenes, and offers multiple romance endings to experience.

To extract text from the provided PDF data, I’ll follow a systematic approach to parse the PDF structure and decode the text from the content streams. Here’s the step-by-step solution:

Approach

Parse PDF Structure:
- Locate the trailer dictionary to find the root object (catalog).
- Traverse from the catalog to the pages dictionary.
- Collect all page objects by following the Kids array in the pages dictionary.
Process Content Streams:
- For each page, locate its content stream(s) specified by the Contents entry.
- Decompress the stream data if it uses FlateDecode compression.
- Tokenize the decompressed data to extract text strings.
Decode Text Strings:
- Handle both literal strings (enclosed in parentheses) and hexadecimal strings.
- Decode strings using PDFDocEncoding, handling escape sequences.
- Concatenate decoded text strings in order to form the final output.

Solution Code

Explanation

Trailer Parsing: The trailer dictionary is found by locating the startxref marker at the end of the PDF file. This dictionary contains the root object reference.
Object Reading: Objects are read by their references, handling dictionaries, arrays, and streams. The content stream is decompressed if it uses FlateDecode compression.
Tokenization: The decompressed stream is tokenized into strings, hex strings, and operators. Literal strings are processed with escape sequences, and hex strings are decoded with appropriate encoding.
Text Extraction: Text strings from all content streams are concatenated in order, forming the final text output. This approach ensures text is extracted efficiently while handling PDF-specific encodings and compression.

This solution efficiently handles the PDF structure and text extraction, providing a reliable method to convert PDF content into plain text.