Harald Thank you very much for your detailed explanation and for the continued trust and support you’ve extended to me. I am truly honored to contribute to the Chinese localization of your documentation, and I deeply appreciate the vital role this work plays in serving your global user community.
I fully understand and respect the current workflow—keeping the documentation repository private until changes have stabilized before syncing translations. This approach is not only sensible but essential for maintaining content quality and avoiding the confusion that can arise from frequent, iterative edits. I am sincerely grateful for the diffs and change summaries you’ve consistently provided; they have been invaluable in improving the accuracy and efficiency of my translations—and have significantly lowered the cognitive barrier to understanding context and intent.
To further enhance our collaboration and streamline future translations—while fully respecting your existing process—I would like to offer one small, low-effort suggestion:
- Consistent Formatting Structure:
I’ve developed a lightweight Python script that separates the Markdown content from the Spine API Reference (https://esotericsoftware.com/spine-api-reference ) into two components:
- A stable template preserving the overall structure and formatting, and
- A clean, isolated text file containing only translatable content.
This allows us to track changes more precisely and ensures formatting consistency across revisions.
(I’ve attached the script for your reference—if helpful, I’d be happy to share it or adapt it for your workflow. And I would greatly appreciate it if you could kindly review the output it generates, to confirm whether it aligns with your formatting standards or if any adjustments would be needed to better suit your documentation conventions.)
- Targeted Diff for Translatable Content:
For API documentation specifically, if feasible, could we consider generating diffs based on this isolated translatable text (rather than the full Markdown)? This would allow me to directly merge changes into the complete API document with minimal friction, helping me quickly identify context, understand the scope of updates, and ensure consistent terminology reuse.
I remain fully committed to receiving translation tasks only after changes have been finalized and approved. Should you ever need assistance in harmonizing Chinese terminology, tone, or style across documents, please don’t hesitate to ask—I would be delighted to become a more integrated part of your localization workflow.
Thank you once again to the entire Spine team for your hard work, dedication, and thoughtful approach to documentation. I look forward to continuing our collaboration in delivering clear, consistent, and high-quality Chinese documentation to your users worldwide.
and here is the separation code script.py:
import os
import json
import re
import argparse
from bs4 import BeautifulSoup, NavigableString
# -------------------------------
# CONF
# -------------------------------
INPUT_FILE = "input.html"
TEMPLATE_FILE = "template.html"
CONTENT_FILE = "content.json"
OUTPUT_FILE = "translated_output.html"
TRANSLATED_JSON = "translated_content.json"
HEADER_CONTENT = """NOMARKDOWN<style>
dl dt, .m, #content #ref table, #content #ref td { font-size: 90%; font-weight: normal; font-family:Consolas, DejaVu Sans Mono, Bitstream Vera Sans Mono, Menlo, Monaco, Lucida Console, Liberation Mono, Courier New, monospace; }
dl { border: 0; border-collapse: separate; border-radius: 10px; overflow: hidden; margin-top:1.66em; }
dt, dd { background-color:#fbfdfe; }
dt:nth-of-type(even), dd:nth-of-type(odd) { background-color:#fbf9f9; }
dd, dt.empty { border-bottom: 1px solid #ddd; margin: 0; }
dt.empty { padding-bottom: 0.7em !important; }
.enum { border-bottom: 1px solid #ddd; margin: 0; padding-bottom: 0.7em !important; }
dd:first-child { background-color:#c2d6e8; font-weight:bold; padding: 0.75em 0.7em 0.65em 0.7em; border:0; }
dd:first-child a { color:black; margin-top:-0.75em; padding-top:0.75em; }
dt { padding: 0.7em 0.7em 0 0.7em; }
dd { padding: 0.5em 0.7em 0.76em 2.7em; }
dd.x { display:none; }
.n { color:#1c2022 !important; font-weight:bold; margin-top:-0.5em; padding-top:0.5em;}
#content-body u { color:#8400bf; text-decoration:none; }
.br { height:0.7em; }
.p { font-weight:bold; }
.c { border-bottom:1px solid #ddd; margin-top:2em; }
.c:first-of-type { margin-top:0; }
.desc { margin-top: 0.85em; margin-bottom:-0.41em; line-height:1.25em; }
#ref i, #ref i a { color:#976738; font-style: italic; }
#ref i a:hover { color:#f40; }
#ref li { margin: 0.7em 0 0 0 !important; }
#ref li code:first-child { padding:2px 3px; margin-bottom:-1px; background:#f0f0f0; border:1px solid #ddd; }
#ref dd code { background:none; border:0; padding:0; margin:0; }
#content #ref table, #content #ref tbody, #content #ref td { border:0; background:none; padding:0; margin:0; }
#content #ref td:first-child { white-space:nowrap; }
#content #ref td:last-child { width:100%; }
#content #ref table, #content #ref td { font-size:100%; }
h1, h2, h3, h4 { font-size:2.142em !important; display:inline !important; border:0 !important; margin-bottom:0 !important;padding-top: 0.75em; margin-top: -0.55em; }
</style>
"""
def clean_text(text):
cleaned = re.sub(r'\s+', ' ', text)
return cleaned.strip()
def split_html():
print("🔄 Splitting HTML into template and content...")
with open(INPUT_FILE, "r", encoding="utf-8") as f:
html_content = f.read()
if html_content.startswith(HEADER_CONTENT):
content_to_process = html_content[len(HEADER_CONTENT):]
else:
print("⚠️ Warning: No Style Section Found")
content_to_process = html_content
soup = BeautifulSoup(content_to_process, "html.parser")
content_dict = {}
counter = 0
translatable_tags = ['h1', 'h2', 'h3', 'h4', 'div', 'dt', 'dd',
'li', 'td', 'th', 'code', 'a', 'i', 'u', 'span', 'ul', 'p']
for tag in soup.find_all(translatable_tags):
if not tag.get_text(strip=True):
continue
if tag.name == 'span' and re.fullmatch(r'[(),]', tag.get_text(strip=True)):
continue
if tag.name == 'dd' and tag.get('class') == ['x'] and not tag.get_text(strip=True):
continue
if tag.name == 'a' and not tag.get_text(strip=True):
continue
texts = []
for content in tag.contents:
if isinstance(content, NavigableString) and str(content).strip():
texts.append(str(content))
if texts:
full_text = "".join(texts)
cleaned_text = clean_text(full_text)
tag_id = f"text_{counter:05d}"
content_dict[tag_id] = cleaned_text
tag.clear()
tag.append(f"{{{{ {tag_id} }}}}")
counter += 1
with open(TEMPLATE_FILE, "w", encoding="utf-8") as f:
f.write(str(soup))
with open(CONTENT_FILE, "w", encoding="utf-8") as f:
json.dump(content_dict, f, ensure_ascii=False, indent=2)
print(f"✅ Template saved to: {TEMPLATE_FILE}")
print(f"✅ Extracted content saved to: {CONTENT_FILE}")
def merge_html(add_header=True):
print("🔄 Merging template with translated content...")
if not os.path.exists(TEMPLATE_FILE):
print(f"❌ Template file '{TEMPLATE_FILE}' not found.")
return
if not os.path.exists(TRANSLATED_JSON):
print(f"❌ Translated content file '{TRANSLATED_JSON}' not found.")
return
with open(TEMPLATE_FILE, "r", encoding="utf-8") as f:
template = f.read()
with open(TRANSLATED_JSON, "r", encoding="utf-8") as f:
content_dict = json.load(f)
html = template
for key, value in content_dict.items():
placeholder = f"{{{{ {key} }}}}"
html = html.replace(placeholder, value)
if add_header:
html = HEADER_CONTENT + html
with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
f.write(html)
print(f"✅ Final HTML generated at: {OUTPUT_FILE}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Split or Merge HTML translations.")
parser.add_argument("--mode", choices=["split", "merge"], required=True,
help="Mode of operation: 'split' to extract content or 'merge' to build final HTML.")
args = parser.parse_args()
if args.mode == "split":
if not os.path.exists(INPUT_FILE):
print(f"❌ Input file '{INPUT_FILE}' not found.")
else:
split_html()
elif args.mode == "merge":
merge_html()
To split: python script.py --mode split
To merge(need a translated_content.json): python script.py --mode merge