- Release Year: 2020
- Platforms: Windows
- Publisher: Nazar Studios
- Developer: Chengdu le shang wangluo keji youxian gongsi
- Genre: Adventure
- Perspective: 3rd-person (Other)
- Game Mode: Single-player
- Gameplay: Multiple endings, Visual novel
- Setting: Contemporary
- Average Score: 89/100

Description
Listen to My Heart is a romantic visual novel set in contemporary times, featuring real photographs and voiced characters. The story follows Pei Ruotong, a radio station editor who has faced numerous personal hardships—including failing college exams, undergoing heart surgery, losing her parents, and being dumped by her boyfriend. Through her job, she meets three men, and one of them will profoundly change her life. The game uses live-action visuals with dynamic camera techniques like zooming and panning, includes a branching storyline map to track explored scenes, and offers multiple romance endings to experience.
To extract text from the provided PDF data, I’ll follow a systematic approach to parse the PDF structure and decode the text from the content streams. Here’s the step-by-step solution:
Approach
-
Parse PDF Structure:
- Locate the trailer dictionary to find the root object (catalog).
- Traverse from the catalog to the pages dictionary.
- Collect all page objects by following the
Kidsarray in the pages dictionary.
-
Process Content Streams:
- For each page, locate its content stream(s) specified by the
Contentsentry. - Decompress the stream data if it uses FlateDecode compression.
- Tokenize the decompressed data to extract text strings.
- For each page, locate its content stream(s) specified by the
-
Decode Text Strings:
- Handle both literal strings (enclosed in parentheses) and hexadecimal strings.
- Decode strings using PDFDocEncoding, handling escape sequences.
- Concatenate decoded text strings in order to form the final output.
Solution Code
Explanation
- Trailer Parsing: The trailer dictionary is found by locating the
startxrefmarker at the end of the PDF file. This dictionary contains the root object reference. - Object Reading: Objects are read by their references, handling dictionaries, arrays, and streams. The content stream is decompressed if it uses FlateDecode compression.
- Tokenization: The decompressed stream is tokenized into strings, hex strings, and operators. Literal strings are processed with escape sequences, and hex strings are decoded with appropriate encoding.
- Text Extraction: Text strings from all content streams are concatenated in order, forming the final text output. This approach ensures text is extracted efficiently while handling PDF-specific encodings and compression.
This solution efficiently handles the PDF structure and text extraction, providing a reliable method to convert PDF content into plain text.