How to install phthon on macos and process excel and dbf

For work reasons, a DBF file needs to be submitted to the superior department every semester, and most of the information in this DBF is used in a specific software to add data originally in excel, and finally generated by the software.
Since it needs to be done once a year and every colleague needs to do it once, the overall workload will be 2 * 5 = 10 people/day (with luck), so I decided to use automation to achieve it.

Since there is not much demand and it is used in specific occasions, there is no requirement for UI. Decided to try (life is short, I use) python3 to automate it.

Install

Under macos, because of the addition of brew, it is relatively simple to install related software. Of course, we need to have a good network environment.

panjie@panjies-iMac web % brew install python
==> Downloading 

....

==> python@3.9
Python has been installed as
  /usr/local/bin/python3

Unversioned symlinks `python`, `python-config`, `pip` etc. pointing to
`python3`, `python3-config`, `pip3` etc., respectively, have been installed into
  /usr/local/opt/python@3.9/libexec/bin

You can install Python packages with
  pip3 install <package>
They will install into the site-package directory
  /usr/local/lib/python3.9/site-packages

tkinter is no longer included with this formula, but it is available separately:
  brew install python-tk@3.9

See: https://docs.brew.sh/Homebrew-and-Python

Seeing the above results, it means that python is installed. The prompt information after installation is very important, you need to read it carefully. For example, we use the above command to install python3 , and then its package manager is called pip3 and so on.

If we accidentally forget to read the prompts, we can actually go to brew's official website and find the address of the corresponding package. There will be a prompt after the corresponding package is successfully installed on that address.

After the installation is complete, open the shell and run phtyon3 --version

panjie@panjies-iMac web % python3 --version
Python 3.9.5

Successful installation.

Use python

Before using python, I still like to download a IDE, such as 1613eccd46f3d8 pycharm , and install it after downloading:

run:

Then create a new project, select all the defaults and click Finish:

Click Run, Hi PyCharm is successfully displayed in the console, indicating that the Python environment is successful.

Read EXCEL

Phtyon is only very popular, and it is inseparable from having a very complete community. Basically, everything we want will be supported by the package, so when we need to read the excel table, the first thing we want is to search for There are third-party packages available.

Then we get a key message: Use pandas to operate excel.

Install pandas

Like many excellent package managers, pip3 also provides very friendly package management functions. So we want to install pandas only need to pip3 of official website then pandas to search it.

Click User Guide to proceed to the instruction page. Follow the instructions on this page to start installing panda.

According to the instructions, we execute python3 -m pip install pandas to start the pandas installation:

panjie@panjies-iMac web % python3 -m pip install pandas
Collecting pandas
  Downloading pandas-1.3.3-cp39-cp39-macosx_10_9_x86_64.whl (11.6 MB)
     |███▍                            | 1.2 MB 53 kB/s eta 0:03:15

In fact, the official site of PIP is https://pypi.org/ , and we will visit this site to query the installation package below.

Since we are installing for the first time, there are basically no packages in the package manager. At this time, we need to wait patiently for the download. After pandas and its dependent packages are all installed, you will get the following prompt for successful installation:

Installing collected packages: six, pytz, python-dateutil, numpy, pandas
Successfully installed numpy-1.21.2 pandas-1.3.3 python-dateutil-2.8.2 pytz-2021.1 six-1.16.0
WARNING: You are using pip version 21.1.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.

test

Press the pandas introductory document or find some sample code from the Internet, and use pandas to read any excel to test it:

import pandas
def test():
    pandas.read_excel('/Users/panjie/sync/work/task.xlsx')

if __name__ == '__main__':
    test()

operation result:

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

Prompt me that I need to install the module openpyxl

Interlude: I did not restart pycharm after installing pandas. At this time, pycharm prompted me that pandas was not found. At this point, try to use the install package in the pycharm prompt to directly and quickly solve the problem that cannot be found. As for why this is the case, there is no research.

Install openpyxl

With the experience of installing pandas, installing openpyxl is very simple.

panjie@panjies-iMac web % python3 -m pip install openpyxl                                       
Collecting openpyxl
  Downloading openpyxl-3.0.8-py2.py3-none-any.whl (244 kB)
     |████████████████████████████████| 244 kB 926 kB/s 
Collecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.8
WARNING: You are using pip version 21.1.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.

shell installing with 0613eccd46f7de, pycharm still reports errors. This may be because pycharm and shell are two scopes. So open Terminal in pycharm, and then continue to run python3 -m pip install openpyxl , after the installation is successful, run the sample code again, success:

Add some code:

import pandas
def test():
    xxx = pandas.read_excel('/Users/panjie/sync/work/task.xlsx')
    print(xxx)

if __name__ == '__main__':
    test()

Print it out:

/Users/panjie/PycharmProjects/pythonProject/venv/bin/python /Users/panjie/PycharmProjects/pythonProject/main.py
                       方案计划名           方案课组           课程号  ...         备注 备注.1  类型
0                       软件         专业选修课  xxx  ...       2~16  NaN   1
1                    计算机         专业必修课  xxx  ...       3~17  NaN   1
2                    计算机         专业必修课  xxx  ...       3~17  NaN   1
3        计算机               专业必修课       xxx  ...  13*2+11*2  NaN   1
4                       软件       学科平台课程  xxx  ...       4~16  NaN   1

DBF read and write

If you want to process it into a usable dbf file, you need to analyze the DBF file generated by the software. At this time, you need a library that can read and write dbf files under python.

Then I found a dbfpy to read and write the dbf library, but checked that the latest release date is 2015 (it may only support python2), so I reported the attitude of a try to 1613eccd46f8d9 to install .

(venv) panjie@panjies-iMac pythonProject %  python3 -m pip install dbfpy
Collecting dbfpy
  Downloading dbfpy-2.3.1.tar.gz (19 kB)
...
Installing collected packages: dbfpy
Successfully installed dbfpy-2.3.1

test

Similarly, we try to read the library sample code 1613eccd46f956 in the official document:

import pandas
from dbfpy import dbf


def test():
    xxx = pandas.read_excel('/Users/panjie/sync/work/task.xlsx')
    print(xxx)


def testDbf():
    db = dbf.Dbf("/Users/panjie/sync/work/x_kck.DBF")
    for rec in db:
        print
        rec
    print


if __name__ == '__main__':
    testDbf()

Then an error was reported when running:

    print repr(_rec)
          ^
SyntaxError: invalid syntax

Through the query, we know that dbfpy only supports python2.7 at most, and it cannot be used in python3. . .

So I started to move to another dbf library ---- dbf . Seeing that its latest release date is 2021, I am finally relieved:

(venv) panjie@panjies-iMac pythonProject %  python3 -m pip install dbf

After the installation is successful, find its official document , and test it:

import pandas
from dbf import *
import dbf


def test():
    xxx = pandas.read_excel('/Users/panjie/sync/work/task.xlsx')
    print(xxx)


def testDbf():
    db = dbf.Table(filename='/Users/panjie/sync/work/x_kck.DBF')
    db.open(dbf.READ_WRITE)
    for record in db:
        print(record)
    db.close()

if __name__ == '__main__':
    testDbf()

Test Results:

  0 - 课程编号      : '123         '
  1 - 课程名称      : '123                                     '
  2 - 备注        : '                                                            '
  3 - 简码        : '          '
  4 - 标志        : ' '
  5 - 输入人       : '                    '
  6 - 输入日期      : None
  7 - 审核        : ' '
  8 - 审核人       : '                    '
  9 - 审核日期      : None

Test write DBF

Reading a file requires reading line by line, while writing a file only requires additional records:

def testDbf():
    db = dbf.Table(filename='/Users/panjie/sync/work/x_kck.DBF')
    db.open(dbf.READ_WRITE)
    db.append({'课程编号': '456'})
    for record in db:
        print(record)
    db.close()

In the above code, I tried to add the record of 456, and then print the record value again after adding the record:

/Users/panjie/PycharmProjects/pythonProject/venv/bin/python /Users/panjie/PycharmProjects/pythonProject/main.py
  0 - 课程编号      : '123         '
  1 - 课程名称      : '123                                     '
  2 - 备注        : '                                                            '
  3 - 简码        : '          '
  4 - 标志        : ' '
  5 - 输入人       : '                    '
  6 - 输入日期      : None
  7 - 审核        : ' '
  8 - 审核人       : '                    '
  9 - 审核日期      : None
  0 - 课程编号      : '456         ' 😄😄😄😄😄😄
  1 - 课程名称      : '                                        '
  2 - 备注        : '                                                            '
  3 - 简码        : '          '
  4 - 标志        : ' '
  5 - 输入人       : '                    '
  6 - 输入日期      : None
  7 - 审核        : ' '
  8 - 审核人       : '                    '
  9 - 审核日期      : None

And this is exactly what I want.

ending

After solving the related technical points of using python to read excel and write dbf, the following code implementation is just a logical processing problem. After the code is completed, simply save a copy. Next year, the 10 working days will be shortened to 1 working day, and I believe that the time for writing the code should be controlled within 1 working day.

This is obviously a planned sale.

Summarize

Language is only a tool, what we should learn is thought. When a new jump is received, the technical difficulties in the jump should be found at the first time. From the moment the technical difficulties are solved and solved, the entire challenge begins to have an expected completion time.

How to install phthon on macos and process excel and dbf

Install

Use python

Read EXCEL

Install pandas

test

Install openpyxl

DBF read and write

test

Test write DBF

ending

Summarize

潘杰

引用和评论

使用FreeBSD+WireGuard+nginx完成本地部署微信对接服务器一种方法

python与nodejs哪个性能高

Anaconda安装教程以及Anaconda和pip配置国内镜像

如何减少跨团队交付摩擦？——基于 DevOps 与敏捷的最佳实践

Python 描述符

科学计算编程涉及到的技术栈简介

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时