For work reasons, a DBF file needs to be submitted to the superior department every semester, and most of the information in this DBF is used in a specific software to add data originally in excel, and finally generated by the software.
Since it needs to be done once a year and every colleague needs to do it once, the overall workload will be 2 * 5 = 10 people/day (with luck), so I decided to use automation to achieve it.
Since there is not much demand and it is used in specific occasions, there is no requirement for UI. Decided to try (life is short, I use) python3 to automate it.
Install
Under macos, because of the addition of brew, it is relatively simple to install related software. Of course, we need to have a good network environment.
panjie@panjies-iMac web % brew install python
==> Downloading
....
==> python@3.9
Python has been installed as
/usr/local/bin/python3
Unversioned symlinks `python`, `python-config`, `pip` etc. pointing to
`python3`, `python3-config`, `pip3` etc., respectively, have been installed into
/usr/local/opt/python@3.9/libexec/bin
You can install Python packages with
pip3 install <package>
They will install into the site-package directory
/usr/local/lib/python3.9/site-packages
tkinter is no longer included with this formula, but it is available separately:
brew install python-tk@3.9
See: https://docs.brew.sh/Homebrew-and-Python
Seeing the above results, it means that python is installed. The prompt information after installation is very important, you need to read it carefully. For example, we use the above command to install python3
, and then its package manager is called pip3
and so on.
If we accidentally forget to read the prompts, we can actually go to brew's official website and find the address of the corresponding package. There will be a prompt after the corresponding package is successfully installed on that address.
After the installation is complete, open the shell and run phtyon3 --version
panjie@panjies-iMac web % python3 --version
Python 3.9.5
Successful installation.
Use python
Before using python, I still like to download a IDE, such as 1613eccd46f3d8 pycharm , and install it after downloading:
run:
Then create a new project, select all the defaults and click Finish:
Click Run, Hi PyCharm is successfully displayed in the console, indicating that the Python environment is successful.
Read EXCEL
Phtyon is only very popular, and it is inseparable from having a very complete community. Basically, everything we want will be supported by the package, so when we need to read the excel table, the first thing we want is to search for There are third-party packages available.
Then we get a key message: Use pandas to operate excel.
Install pandas
Like many excellent package managers, pip3 also provides very friendly package management functions. So we want to install pandas only need to pip3 of official website then pandas to search it.
Click User Guide to proceed to the instruction page. Follow the instructions on this page to start installing panda.
According to the instructions, we execute python3 -m pip install pandas
to start the pandas installation:
panjie@panjies-iMac web % python3 -m pip install pandas
Collecting pandas
Downloading pandas-1.3.3-cp39-cp39-macosx_10_9_x86_64.whl (11.6 MB)
|███▍ | 1.2 MB 53 kB/s eta 0:03:15
In fact, the official site of PIP is https://pypi.org/ , and we will visit this site to query the installation package below.
Since we are installing for the first time, there are basically no packages in the package manager. At this time, we need to wait patiently for the download. After pandas and its dependent packages are all installed, you will get the following prompt for successful installation:
Installing collected packages: six, pytz, python-dateutil, numpy, pandas
Successfully installed numpy-1.21.2 pandas-1.3.3 python-dateutil-2.8.2 pytz-2021.1 six-1.16.0
WARNING: You are using pip version 21.1.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.
test
Press the pandas introductory document or find some sample code from the Internet, and use pandas to read any excel to test it:
import pandas
def test():
pandas.read_excel('/Users/panjie/sync/work/task.xlsx')
if __name__ == '__main__':
test()
operation result:
ImportError: Missing optional dependency 'openpyxl'. Use pip or conda to install openpyxl.
Prompt me that I need to install the module openpyxl
Interlude: I did not restart pycharm after installing pandas. At this time, pycharm prompted me that pandas was not found. At this point, try to use the install package in the pycharm prompt to directly and quickly solve the problem that cannot be found. As for why this is the case, there is no research.
Install openpyxl
With the experience of installing pandas, installing openpyxl is very simple.
panjie@panjies-iMac web % python3 -m pip install openpyxl
Collecting openpyxl
Downloading openpyxl-3.0.8-py2.py3-none-any.whl (244 kB)
|████████████████████████████████| 244 kB 926 kB/s
Collecting et-xmlfile
Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.8
WARNING: You are using pip version 21.1.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.
shell
installing with 0613eccd46f7de, pycharm still reports errors. This may be because pycharm and shell are two scopes. So open Terminal in pycharm, and then continue to run python3 -m pip install openpyxl
, after the installation is successful, run the sample code again, success:
Add some code:
import pandas
def test():
xxx = pandas.read_excel('/Users/panjie/sync/work/task.xlsx')
print(xxx)
if __name__ == '__main__':
test()
Print it out:
/Users/panjie/PycharmProjects/pythonProject/venv/bin/python /Users/panjie/PycharmProjects/pythonProject/main.py
方案计划名 方案课组 课程号 ... 备注 备注.1 类型
0 软件 专业选修课 xxx ... 2~16 NaN 1
1 计算机 专业必修课 xxx ... 3~17 NaN 1
2 计算机 专业必修课 xxx ... 3~17 NaN 1
3 计算机 专业必修课 xxx ... 13*2+11*2 NaN 1
4 软件 学科平台课程 xxx ... 4~16 NaN 1
DBF read and write
If you want to process it into a usable dbf file, you need to analyze the DBF file generated by the software. At this time, you need a library that can read and write dbf files under python.
Then I found a dbfpy
to read and write the dbf library, but checked that the latest release date is 2015 (it may only support python2), so I reported the attitude of a try to 1613eccd46f8d9 to install .
(venv) panjie@panjies-iMac pythonProject % python3 -m pip install dbfpy
Collecting dbfpy
Downloading dbfpy-2.3.1.tar.gz (19 kB)
...
Installing collected packages: dbfpy
Successfully installed dbfpy-2.3.1
test
Similarly, we try to read the library sample code 1613eccd46f956 in the official document:
import pandas
from dbfpy import dbf
def test():
xxx = pandas.read_excel('/Users/panjie/sync/work/task.xlsx')
print(xxx)
def testDbf():
db = dbf.Dbf("/Users/panjie/sync/work/x_kck.DBF")
for rec in db:
print
rec
print
if __name__ == '__main__':
testDbf()
Then an error was reported when running:
print repr(_rec)
^
SyntaxError: invalid syntax
Through the query, we know that dbfpy only supports python2.7 at most, and it cannot be used in python3. . .
So I started to move to another dbf library ---- dbf . Seeing that its latest release date is 2021, I am finally relieved:
(venv) panjie@panjies-iMac pythonProject % python3 -m pip install dbf
After the installation is successful, find its official document , and test it:
import pandas
from dbf import *
import dbf
def test():
xxx = pandas.read_excel('/Users/panjie/sync/work/task.xlsx')
print(xxx)
def testDbf():
db = dbf.Table(filename='/Users/panjie/sync/work/x_kck.DBF')
db.open(dbf.READ_WRITE)
for record in db:
print(record)
db.close()
if __name__ == '__main__':
testDbf()
Test Results:
0 - 课程编号 : '123 '
1 - 课程名称 : '123 '
2 - 备注 : ' '
3 - 简码 : ' '
4 - 标志 : ' '
5 - 输入人 : ' '
6 - 输入日期 : None
7 - 审核 : ' '
8 - 审核人 : ' '
9 - 审核日期 : None
Test write DBF
Reading a file requires reading line by line, while writing a file only requires additional records:
def testDbf():
db = dbf.Table(filename='/Users/panjie/sync/work/x_kck.DBF')
db.open(dbf.READ_WRITE)
db.append({'课程编号': '456'})
for record in db:
print(record)
db.close()
In the above code, I tried to add the record of 456, and then print the record value again after adding the record:
/Users/panjie/PycharmProjects/pythonProject/venv/bin/python /Users/panjie/PycharmProjects/pythonProject/main.py
0 - 课程编号 : '123 '
1 - 课程名称 : '123 '
2 - 备注 : ' '
3 - 简码 : ' '
4 - 标志 : ' '
5 - 输入人 : ' '
6 - 输入日期 : None
7 - 审核 : ' '
8 - 审核人 : ' '
9 - 审核日期 : None
0 - 课程编号 : '456 ' 😄😄😄😄😄😄
1 - 课程名称 : ' '
2 - 备注 : ' '
3 - 简码 : ' '
4 - 标志 : ' '
5 - 输入人 : ' '
6 - 输入日期 : None
7 - 审核 : ' '
8 - 审核人 : ' '
9 - 审核日期 : None
And this is exactly what I want.
ending
After solving the related technical points of using python to read excel and write dbf, the following code implementation is just a logical processing problem. After the code is completed, simply save a copy. Next year, the 10 working days will be shortened to 1 working day, and I believe that the time for writing the code should be controlled within 1 working day.
This is obviously a planned sale.
Summarize
Language is only a tool, what we should learn is thought. When a new jump is received, the technical difficulties in the jump should be found at the first time. From the moment the technical difficulties are solved and solved, the entire challenge begins to have an expected completion time.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。