Writing documents has always been a tedious task, especially when writing some API documents. Because the format of this kind of file is fixed, but it must be written and often needs to be modified, it always makes people feel upset. Therefore, before that, I wanted to use automation to generate word files. And recently studying AI, feeling that automatically generating files can be combined with, so first do a technical reserve.
Use Python to Generate Word DOC
To generate a Word document in Python, you can use the python-docx library. To install it, use pip.
1 |
pip install python-docx |
python-docx can either open a brand new file or open an existing file as a template. It’s recommended to just try it out and then customize your own template.
Python-docx
The Python-docs support the addition of features such as headings, paragraphs, and pictures, which allows you to construct titles, content, images, tables, and figures. The parts of tables and figures can also be assigned labels for display in the table of contents.
In order to simplify the use of python-docx, I separated its functions into several add_heading, add_paragraph, and add_list_bullet in another file, so that it can look more concise when used.
Download project
Download the demo project with the following command.
1 |
git clone https://gitlab.com/eagleein578/python-docx.git |
Generate Word document
1 2 3 |
cd python-docx pip install python-docx python3 gen_word.py |
The generated results
Open demo2.docx and you can see the following content.
The Code
Main Program
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
from docx import Document from docx.shared import Inches from docx import Document from docx.oxml import OxmlElement from docx.shared import Pt from docx.oxml.ns import qn from gen_wordtool import add_heading, add_paragraph, add_list_bullet from gen_wordtool import add_list_number, add_caption, add_pic from gen_wordtool import add_separate, add_table, set_table_border import gen_wordtool gen_wordtool.doc = Document("demo.docx") add_heading("開頭", 1) p = add_paragraph("這是一篇測試文章, ") p.add_run("中間也可以加粗體").bold = True add_heading('第一章', 2) p=add_paragraph('下面是一個列表') add_list_bullet("蘋果") add_list_bullet("香蕉") add_list_bullet("芭樂") add_separate() add_paragraph("下面是其編號") add_list_number("Apple") add_list_number("Banana") add_list_number("Fruit") add_separate() p = add_pic("img/pic.jpg") add_caption(p, "自動圖片") add_separate() records = ( (3, '101', 'Spam'), (7, '422', 'Eggs'), (4, '631', 'Spam, spam, eggs, and spam') ) table = add_table(rows=1, cols=3) table.style = "Light Shading Accent 1" hdr_cells = table.rows[0].cells hdr_cells[0].text = 'Qty' hdr_cells[1].text = 'Id' hdr_cells[2].text = 'Desc' for qty, id, desc in records: row_cells = table.add_row().cells row_cells[0].text = str(qty) row_cells[1].text = id row_cells[2].text = desc set_table_border(table) add_caption(table, "自動表格") gen_wordtool.doc.add_page_break() gen_wordtool.doc.save('demo2.docx') |
The main program utilizes the function provided by gen_wordtool.py to construct files. In principle, there is nothing particularly special about it, but it is more concise.
L1-L5: Functions of the import docx module.
L6~9: Import the functions written in gen_wordtool.py.
L11: Modify using the demo.docx template.
L13-15: Add titles and content with bolded text.
Add list items.
L24-27: Add numerical list
L31-32: Add images and their captions.
Add tables, set their style to “Light Shading Accent 1” and add borders. The official document website mentions the available styles.
L56: Add table numbers to the table.
L57 to L59 should be inserted and saved.
gen_wordtoolpy
“gen_wordtool.py is a script that combines the API of python-docx to create a simpler syntax for usage.”
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
from docx import Document from docx.shared import Inches from docx import Document from docx.oxml import OxmlElement from docx.shared import Pt from docx.oxml.ns import qn import docx doc="" def add_separate(): doc.add_paragraph('') doc.add_paragraph('') def add_heading(desc, lvl): doc.add_heading(desc, level=lvl) def add_paragraph(desc): p = doc.add_paragraph(desc) return p def add_list_bullet(desc): p = doc.add_paragraph(desc, style='List Bullet') return p def add_list_number(desc): p = doc.add_paragraph(desc, style='List Number') return p def add_caption(obj, desc): """ Based on: https://github.com/python-openxml/python-docx/issues/359 """ if type(obj) == docx.table.Table: target = '表格' else: target = '圖' # caption type paragraph = doc.add_paragraph(f'{target} ', style='Caption') # numbering field run = paragraph.add_run() fldChar = docx.oxml.OxmlElement('w:fldChar') fldChar.set(docx.oxml.ns.qn('w:fldCharType'), 'begin') run._r.append(fldChar) instrText = docx.oxml.OxmlElement('w:instrText') instrText.text = f' STYLEREF 1 \s ' run._r.append(instrText) fldChar = docx.oxml.OxmlElement('w:fldChar') fldChar.set(docx.oxml.ns.qn('w:fldCharType'), 'end') run._r.append(fldChar) # add dash between chapter and seq paragraph.add_run(f'-') # numbering field run = paragraph.add_run() fldChar = docx.oxml.OxmlElement('w:fldChar') fldChar.set(docx.oxml.ns.qn('w:fldCharType'), 'begin') run._r.append(fldChar) instrText = docx.oxml.OxmlElement('w:instrText') instrText.text = f'SEQ {target} \\* ARABIC \s 1' run._r.append(instrText) fldChar = docx.oxml.OxmlElement('w:fldChar') fldChar.set(docx.oxml.ns.qn('w:fldCharType'), 'end') run._r.append(fldChar) # caption text paragraph.add_run(f' {desc}') def add_pic(file): p = doc.add_picture(file, width=Inches(1.25)) return p def add_table(rows, cols): tbl = doc.add_table(rows=rows, cols=cols) return tbl def set_cell_border(cell, **kwargs): """ Set cell's border Usage: set_cell_border( cell, top={"sz": 12, "val": "single", "color": "#FF0000"}, bottom={"sz": 12, "color": "#00FF00", "space": "0"} ) """ tc = cell._tc tcPr = tc.get_or_add_tcPr() # Border codes borders = { "top": "top", "end": "right", "bottom": "bottom", "start": "left" } for key, value in kwargs.items(): border_name = borders[key] border_elm = OxmlElement('w:' + border_name) # Check for each attribute in the border element for k, v in value.items(): if k == "sz": sz = OxmlElement('w:sz') sz.set(qn('w:val'), str(v)) border_elm.append(sz) elif k == "val": val = OxmlElement('w:val') val.set(qn('w:val'), v) border_elm.append(val) elif k == "color": color = OxmlElement('w:color') color.set(qn('w:val'), v) border_elm.append(color) elif k == "space": space = OxmlElement('w:space') space.set(qn('w:val'), v) border_elm.append(space) tcPr.append(border_elm) def set_table_border(table): for row in table.rows: for cell in row.cells: set_cell_border( cell, top={"sz": 24, "val": "single", "color": "#FF0000"}, # 12pt, single black border bottom={"sz": 24, "val": "single", "color": "#000000"}, start={"sz": 24, "val": "single", "color": "#000000"}, end={"sz": 24, "val": "single", "color": "#000000"} ) |
There are several complex functions involved.
- add_caption(): Adds a caption to a chart or table so that it can be displayed in the chart or table of contents. It uses function variables such as STYLEREF, SEQ, and ARABIC, which are difficult to understand. The fastest way is still to observe the function variables from an already established file. In Word, press ALT + F9 to switch display function code or content.
- set_table_border(): Adds borders to a table.
Conclusion
Word may not be considered a good format, but it is very widely used. The advantage is that it also follows certain standards, which allows us to make some modifications using Python. This lays the foundation for future automation generation.