婷婷综合国产,91蜜桃婷婷狠狠久久综合9色 ,九九九九九精品,国产综合av

主頁 > 知識庫 > 利用python做表格數據處理

利用python做表格數據處理

熱門標簽:蘇州人工外呼系統軟件 電話外呼系統招商代理 淮安呼叫中心外呼系統如何 打印谷歌地圖標注 佛山通用400電話申請 電話機器人貸款詐騙 看懂地圖標注方法 京華圖書館地圖標注 廣東旅游地圖標注

技術背景

數據處理是一個當下非常熱門的研究方向,通過對于大型實際場景中的數據進行建模,可以用于預測下一階段可能出現的情況。比如我們有過去的2002年-2018年的黃金價格的數據:

該數據來源于Gitee上的一個開源項目。其中包含有:時間、開盤價、收盤價、最高價、最低價、交易數以及成交額這么幾個參數。假如我們使用一個機器學習的模型去分析這個數據,也許我們可以預測在這個數據中并不存在的金價數據。如果預測的契合度較好,那么對于一些人的投資策略來說有重大意義。但是這種實際場景下的數據,往往數據量是非常大的。雖然這里我們使用到的數據只有300多KB,但是我們更多的時候不得不考慮10個GB甚至是1個TB以上的數據的處理。如果處理都無法處理,那我們如何對這些數據進行建模呢?

python對Excel表格的處理

首先我們看一個最簡單的情況,我們先不考慮性能的問題,那么我們可以使用xlrd這個工具來在python中打開和加載一個Excel表格:

# table.py

def read_table_by_xlrd():
    import xlrd
    workbook = xlrd.open_workbook(r'data.xls')
    sheet_name = workbook.sheet_names()
    print ('All sheets in the file data.xls are: {}'.format(sheet_name))
    sheet = workbook.sheet_by_index(0)
    print ('The cell value of row index 0 and col index 1 is: {}'.format(sheet.cell_value(0, 1)))
    print ('The elements of row index 0 are: {}'.format(sheet.row_values(0)))
    print ('The length of col index 1 are: {}'.format(len(sheet.col_values(1))))

if __name__ == '__main__':
    read_table_by_xlrd()

上述代碼的輸出如下:

[dechin@dechin-manjaro gold]$ python3 table.py 
All sheets in the file data.xls are: ['Sheet1', 'Sheet2', 'Sheet3']
The cell value of row index 0 and col index 1 is: 開
The elements of row index 0 are: ['時間', '開', '高', '低', '收', '量', '額']
The length of col index 1 are: 3923

我們這里成功的將一個xls格式的表格加載到了python的內存中,我們可以對這些數據進行分析。如果需要對這些數據修改,可以使用openpyxl這個倉庫,但是這里我們不做過多的贅述。

在python中還有另外一個非常常用且非常強大的庫可以用來處理表格數據,那就是pandas,這里我們利用ipython這個工具簡單展示一下使用pandas處理表格數據的方法:

[dechin@dechin-manjaro gold]$ ipython
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd

In [2]: !ls -l
總用量 368
-rw-r--r-- 1 dechin dechin 372736  3月 27 21:31 data.xls
-rw-r--r-- 1 dechin dechin    563  3月 27 21:42 table.py

In [3]: data = pd.read_excel('data.xls', 'Sheet1') # 讀取excel格式的文件

In [4]: data.to_csv('data.csv', encoding='utf-8') # 轉成csv格式的文件

In [7]: !ls -l
總用量 588
-rw-r--r-- 1 dechin dechin 221872  3月 27 21:52 data.csv
-rw-r--r-- 1 dechin dechin 372736  3月 27 21:31 data.xls
-rw-r--r-- 1 dechin dechin    563  3月 27 21:42 table.py

In [8]: !head -n 10 data.csv # 讀取csv文件的頭10行
,時間,開,高,低,收,量,額
0,2002-10-30,83.98,92.38,82.0,83.52,352,29373370
1,2002-10-31,83.9,83.92,83.9,83.91,66,5537480
2,2002-11-01,84.5,84.65,84.0,84.51,77,6502510
3,2002-11-04,84.9,85.06,84.9,84.99,95,8076330
4,2002-11-05,85.1,85.2,85.1,85.13,61,5193650
5,2002-11-06,84.9,84.9,84.9,84.9,1,84900
6,2002-11-07,85.0,85.15,85.0,85.14,26,2212310
7,2002-11-08,85.25,85.28,85.1,85.16,35,2981780
8,2002-11-11,85.18,85.19,85.18,85.19,65,5537050

在ipython中我們不僅可以執行python指令,還可以在前面加一個!就能夠執行一些系統命令,非常的方便。csv格式的文件,其實就是用逗號跟換行符來替代常用的\t字符串進行數據的分隔。

但是,不論是使用xlrd還是pandas,我們都會面臨一個同樣的問題:需要把所有的數據加載到內存中進行處理。我們一般的個人電腦只有8GB-16GB的內存,就算是比較大的64GB的內存,我們也只能夠在內存中對64GB以下內存大小的文件進行處理,這對于大數據場景來說遠遠不夠。所以,下一章節中介紹的vaex就是一個很好的解決方案。另外,關于Linux下查看本地內存以及使用情況的方法如下:

[dechin@dechin-manjaro gold]$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b 交換 空閑 緩沖 緩存   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 35812168 328340 2904872    0    0    20    27  362  365  8  4 88  0  0
[dechin@dechin-manjaro gold]$ vmstat 2 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b 交換 空閑 緩沖 緩存   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 35810916 328356 2905844    0    0    20    27  362  365  8  4 88  0  0
 0  0      0 35811916 328364 2904952    0    0     0     6  613  688  1  1 99  0  0
 0  0      0 35812168 328364 2904856    0    0     0     0  672  642  0  1 99  0  0

我們可以看到空閑內存大約有36GB的內存,這里我們本機一共有40GB的內存,算是比較大的了。

vaex的安裝與使用

vaex提供了一種內存映射的數據處理方案,我們不需要將整個的數據文件加載到內存中進行處理,我們可以直接對硬盤存儲進行操作。換句話說,我們所能夠處理的文件大小不再受到內存大小的限制,只要在磁盤存儲空間允許的范圍內,我們都可以對這么大小的文件進行處理。
一般現在個人PC的磁盤最小也有128GB,遠遠大于內存可以承受的范圍。當然,由于分區的不同,不一定能夠保障所有的內存資源都能夠被使用到,這里附上查看當前目錄分區的可用磁盤空間大小查詢的方法:

[dechin@dechin-manjaro gold]$ df -hl .
文件系統        容量  已用  可用 已用% 掛載點
/dev/nvme0n1p9  144G   57G   80G   42% /

這里可以看到我們還有80GB的可用磁盤空間,也就是說,如果我們在當前目錄放一個80GB大小的表格文件,那么用pandas和xlrd都是沒辦法處理的,因為這已經遠遠超出了內存可支持的空間。但是用vaex,我們依然可以對這個文件進行處理。

在vaex的官方文檔鏈接中也介紹有vaex的原理和優勢:

vaex的安裝

與大多數的python第三方包類似的,我們可以使用pip來進行下載和管理。當然由于下載的文件會比較多,中間的過程也會較為緩慢,我們只需安靜等待即可:

[dechin@dechin-manjaro gold]$ python3 -m pip install vaex
Collecting vaex
  Downloading vaex-4.1.0-py3-none-any.whl (4.5 kB)
Collecting vaex-ml0.12,>=0.11.0
  Downloading vaex_ml-0.11.1-py3-none-any.whl (95 kB)
     |████████████████████████████████| 95 kB 81 kB/s 
Collecting vaex-core5,>=4.1.0
  Downloading vaex_core-4.1.0-cp38-cp38-manylinux2010_x86_64.whl (2.5 MB)
     |████████████████████████████████| 2.5 MB 61 kB/s 
Collecting vaex-viz0.6,>=0.5.0
  Downloading vaex_viz-0.5.0-py3-none-any.whl (19 kB)
Collecting vaex-astro0.9,>=0.8.0
  Downloading vaex_astro-0.8.0-py3-none-any.whl (20 kB)
Collecting vaex-hdf50.8,>=0.7.0
  Downloading vaex_hdf5-0.7.0-py3-none-any.whl (15 kB)
Collecting vaex-server0.5,>=0.4.0
  Downloading vaex_server-0.4.0-py3-none-any.whl (13 kB)
Collecting vaex-jupyter0.7,>=0.6.0
  Downloading vaex_jupyter-0.6.0-py3-none-any.whl (42 kB)
     |████████████████████████████████| 42 kB 82 kB/s 
Requirement already satisfied: traitlets in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-ml0.12,>=0.11.0->vaex) (5.0.5)
Requirement already satisfied: numba in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-ml0.12,>=0.11.0->vaex) (0.51.2)
Requirement already satisfied: jinja2 in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-ml0.12,>=0.11.0->vaex) (2.11.2)
Requirement already satisfied: psutil>=1.2.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (5.7.2)
Requirement already satisfied: six in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (1.15.0)
Requirement already satisfied: cloudpickle in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (1.6.0)
Requirement already satisfied: numpy>=1.16 in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (1.20.1)
Requirement already satisfied: dask[array] in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (2.30.0)
Collecting pyarrow>=3.0
  Downloading pyarrow-3.0.0-cp38-cp38-manylinux2014_x86_64.whl (20.7 MB)
     |████████████████████████████████| 20.7 MB 86 kB/s 
Requirement already satisfied: pandas in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (1.1.3)
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/tabulate/                                       
Collecting tabulate>=0.8.3
  Downloading tabulate-0.8.9-py3-none-any.whl (25 kB)
Requirement already satisfied: pyyaml in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (5.3.1)
Collecting frozendict
  Downloading frozendict-1.2.tar.gz (2.6 kB)
Collecting aplus
  Downloading aplus-0.11.0.tar.gz (3.7 kB)
Requirement already satisfied: requests in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (2.24.0)
Requirement already satisfied: nest-asyncio>=1.3.3 in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (1.4.2)
Collecting progressbar2
  Downloading progressbar2-3.53.1-py2.py3-none-any.whl (25 kB)
Requirement already satisfied: future>=0.15.2 in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-core5,>=4.1.0->vaex) (0.18.2)
Requirement already satisfied: matplotlib>=1.3.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-viz0.6,>=0.5.0->vaex) (3.3.4)
Requirement already satisfied: pillow in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-viz0.6,>=0.5.0->vaex) (8.0.1)
Requirement already satisfied: astropy in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-astro0.9,>=0.8.0->vaex) (4.0.2)
Requirement already satisfied: h5py>=2.9 in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-hdf50.8,>=0.7.0->vaex) (2.10.0)
Collecting cachetools
  Downloading cachetools-4.2.1-py3-none-any.whl (12 kB)
Requirement already satisfied: tornado>4.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from vaex-server0.5,>=0.4.0->vaex) (6.0.4)
Collecting xarray
  Downloading xarray-0.17.0-py3-none-any.whl (759 kB)
     |████████████████████████████████| 759 kB 28 kB/s 
Collecting ipympl
  Downloading ipympl-0.7.0-py2.py3-none-any.whl (106 kB)
     |████████████████████████████████| 106 kB 39 kB/s 
Collecting ipyleaflet
  Downloading ipyleaflet-0.13.6-py2.py3-none-any.whl (3.3 MB)
     |████████████████████████████████| 3.3 MB 75 kB/s 
Collecting ipyvuetify2,>=1.2.2
  Downloading ipyvuetify-1.6.2-py2.py3-none-any.whl (11.7 MB)
     |████████████████████████████████| 11.7 MB 173 kB/s 
Collecting ipyvolume>=0.4
  Downloading ipyvolume-0.5.2-py2.py3-none-any.whl (2.9 MB)
     |████████████████████████████████| 2.9 MB 66 kB/s 
Collecting bqplot>=0.10.1
  Downloading bqplot-0.12.23-py2.py3-none-any.whl (1.2 MB)
     |████████████████████████████████| 1.2 MB 175 kB/s 
Requirement already satisfied: ipython-genutils in /home/dechin/anaconda3/lib/python3.8/site-packages (from traitlets->vaex-ml0.12,>=0.11.0->vaex) (0.2.0)
Requirement already satisfied: setuptools in /home/dechin/anaconda3/lib/python3.8/site-packages (from numba->vaex-ml0.12,>=0.11.0->vaex) (50.3.1.post20201107)
Requirement already satisfied: llvmlite0.35,>=0.34.0.dev0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from numba->vaex-ml0.12,>=0.11.0->vaex) (0.34.0)
Requirement already satisfied: MarkupSafe>=0.23 in /home/dechin/anaconda3/lib/python3.8/site-packages (from jinja2->vaex-ml0.12,>=0.11.0->vaex) (1.1.1)
Requirement already satisfied: toolz>=0.8.2; extra == "array" in /home/dechin/anaconda3/lib/python3.8/site-packages (from dask[array]->vaex-core5,>=4.1.0->vaex) (0.11.1)
Requirement already satisfied: pytz>=2017.2 in /home/dechin/anaconda3/lib/python3.8/site-packages (from pandas->vaex-core5,>=4.1.0->vaex) (2020.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/dechin/anaconda3/lib/python3.8/site-packages (from pandas->vaex-core5,>=4.1.0->vaex) (2.8.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/dechin/anaconda3/lib/python3.8/site-packages (from requests->vaex-core5,>=4.1.0->vaex) (2020.6.20)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,1.26,>=1.21.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from requests->vaex-core5,>=4.1.0->vaex) (1.25.11)
Requirement already satisfied: idna3,>=2.5 in /home/dechin/anaconda3/lib/python3.8/site-packages (from requests->vaex-core5,>=4.1.0->vaex) (2.10)
Requirement already satisfied: chardet4,>=3.0.2 in /home/dechin/anaconda3/lib/python3.8/site-packages (from requests->vaex-core5,>=4.1.0->vaex) (3.0.4)
Collecting python-utils>=2.3.0
  Downloading python_utils-2.5.6-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: cycler>=0.10 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib>=1.3.1->vaex-viz0.6,>=0.5.0->vaex) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib>=1.3.1->vaex-viz0.6,>=0.5.0->vaex) (1.3.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib>=1.3.1->vaex-viz0.6,>=0.5.0->vaex) (2.4.7)
Collecting ipywidgets>=7.6.0
  Downloading ipywidgets-7.6.3-py2.py3-none-any.whl (121 kB)
     |████████████████████████████████| 121 kB 175 kB/s 
Requirement already satisfied: ipykernel>=4.7 in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (5.3.4)
Collecting branca0.5,>=0.3.1
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Collecting shapely
  Downloading Shapely-1.7.1-cp38-cp38-manylinux1_x86_64.whl (1.0 MB)
     |████████████████████████████████| 1.0 MB 98 kB/s 
Collecting traittypes3,>=0.2.1
  Downloading traittypes-0.2.1-py2.py3-none-any.whl (8.6 kB)
Collecting ipyvue2,>=1.5
  Downloading ipyvue-1.5.0-py2.py3-none-any.whl (2.7 MB)
     |████████████████████████████████| 2.7 MB 80 kB/s 
Collecting ipywebrtc
  Downloading ipywebrtc-0.5.0-py2.py3-none-any.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 99 kB/s 
Collecting pythreejs>=1.0.0
  Downloading pythreejs-2.3.0-py2.py3-none-any.whl (3.4 MB)
     |████████████████████████████████| 3.4 MB 30 kB/s 
Requirement already satisfied: widgetsnbextension~=3.5.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (3.5.1)
Requirement already satisfied: nbformat>=4.2.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (5.0.8)
Requirement already satisfied: ipython>=4.0.0; python_version >= "3.3" in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (7.19.0)
Collecting jupyterlab-widgets>=1.0.0; python_version >= "3.6"
  Downloading jupyterlab_widgets-1.0.0-py3-none-any.whl (243 kB)
     |████████████████████████████████| 243 kB 115 kB/s 
Requirement already satisfied: jupyter-client in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipykernel>=4.7->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (6.1.7)
Collecting ipydatawidgets>=1.1.1
  Downloading ipydatawidgets-4.2.0-py2.py3-none-any.whl (275 kB)
     |████████████████████████████████| 275 kB 73 kB/s 
Requirement already satisfied: notebook>=4.4.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (6.1.4)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (3.2.0)
Requirement already satisfied: jupyter-core in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (4.6.3)
Requirement already satisfied: backcall in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.2.0)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,3.1.0,>=2.0.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (3.0.8)
Requirement already satisfied: pickleshare in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.7.5)
Requirement already satisfied: pexpect>4.3; sys_platform != "win32" in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (4.8.0)
Requirement already satisfied: pygments in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (2.7.2)
Requirement already satisfied: jedi>=0.10 in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.17.1)
Requirement already satisfied: decorator in /home/dechin/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (4.4.2)
Requirement already satisfied: pyzmq>=13 in /home/dechin/anaconda3/lib/python3.8/site-packages (from jupyter-client->ipykernel>=4.7->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (19.0.2)
Requirement already satisfied: terminado>=0.8.3 in /home/dechin/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.9.1)
Requirement already satisfied: argon2-cffi in /home/dechin/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (20.1.0)
Requirement already satisfied: Send2Trash in /home/dechin/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (1.5.0)
Requirement already satisfied: nbconvert in /home/dechin/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (6.0.7)
Requirement already satisfied: prometheus-client in /home/dechin/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.8.0)
Requirement already satisfied: pyrsistent>=0.14.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.17.3)
Requirement already satisfied: attrs>=17.4.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (20.3.0)
Requirement already satisfied: wcwidth in /home/dechin/anaconda3/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,3.1.0,>=2.0.0->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.2.5)
Requirement already satisfied: ptyprocess>=0.5 in /home/dechin/anaconda3/lib/python3.8/site-packages (from pexpect>4.3; sys_platform != "win32"->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.6.0)
Requirement already satisfied: parso0.8.0,>=0.7.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from jedi>=0.10->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.7.0)
Requirement already satisfied: cffi>=1.0.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (1.14.3)
Requirement already satisfied: mistune2,>=0.8.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.8.4)
Requirement already satisfied: testpath in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.4.4)
Requirement already satisfied: pandocfilters>=1.4.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (1.4.3)
Requirement already satisfied: jupyterlab-pygments in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.1.2)
Requirement already satisfied: bleach in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (3.2.1)
Requirement already satisfied: entrypoints>=0.2.2 in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.3)
Requirement already satisfied: defusedxml in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.6.0)
Requirement already satisfied: nbclient0.6.0,>=0.5.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.5.1)
Requirement already satisfied: pycparser in /home/dechin/anaconda3/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (2.20)
Requirement already satisfied: webencodings in /home/dechin/anaconda3/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (0.5.1)
Requirement already satisfied: packaging in /home/dechin/anaconda3/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (20.4)
Requirement already satisfied: async-generator in /home/dechin/anaconda3/lib/python3.8/site-packages (from nbclient0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.6.0->ipympl->vaex-jupyter0.7,>=0.6.0->vaex) (1.10)
Building wheels for collected packages: frozendict, aplus
  Building wheel for frozendict (setup.py) ... done
  Created wheel for frozendict: filename=frozendict-1.2-py3-none-any.whl size=3148 sha256=1ae5d8fe0d670f73bf3ee88453978246919197a616f0e08e601c84cc244cb238
  Stored in directory: /home/dechin/.cache/pip/wheels/9b/9b/56/5713233cf7226423ab6c58c08081551a301b5863e343ba053c
  Building wheel for aplus (setup.py) ... done
  Created wheel for aplus: filename=aplus-0.11.0-py3-none-any.whl size=4412 sha256=9762d51c5ece813b0c5a27ff6ebc1a86e709d55edb7003dcc11272c954dd39c7
  Stored in directory: /home/dechin/.cache/pip/wheels/de/93/23/3db69e1003030a764c9827dc02137119ec5e6e439afd64eebb
Successfully built frozendict aplus
Installing collected packages: pyarrow, tabulate, frozendict, aplus, python-utils, progressbar2, vaex-core, vaex-ml, vaex-viz, vaex-astro, vaex-hdf5, cachetools, vaex-server, xarray, jupyterlab-widgets, ipywidgets, ipympl, branca, shapely, traittypes, ipyleaflet, ipyvue, ipyvuetify, ipywebrtc, ipydatawidgets, pythreejs, ipyvolume, bqplot, vaex-jupyter, vaex
  Attempting uninstall: ipywidgets
    Found existing installation: ipywidgets 7.5.1
    Uninstalling ipywidgets-7.5.1:
      Successfully uninstalled ipywidgets-7.5.1
Successfully installed aplus-0.11.0 bqplot-0.12.23 branca-0.4.2 cachetools-4.2.1 frozendict-1.2 ipydatawidgets-4.2.0 ipyleaflet-0.13.6 ipympl-0.7.0 ipyvolume-0.5.2 ipyvue-1.5.0 ipyvuetify-1.6.2 ipywebrtc-0.5.0 ipywidgets-7.6.3 jupyterlab-widgets-1.0.0 progressbar2-3.53.1 pyarrow-3.0.0 python-utils-2.5.6 pythreejs-2.3.0 shapely-1.7.1 tabulate-0.8.9 traittypes-0.2.1 vaex-4.1.0 vaex-astro-0.8.0 vaex-core-4.1.0 vaex-hdf5-0.7.0 vaex-jupyter-0.6.0 vaex-ml-0.11.1 vaex-server-0.4.0 vaex-viz-0.5.0 xarray-0.17.0

在出現Successfully installed的字樣之后,就代表我們已經安裝成功,可以開始使用了。

性能對比

由于使用其他的工具我們也可以正常的打開和讀取表格文件,為了體現出使用vaex的優勢,這里我們直接用ipython來對比一下兩者的打開時間:

[dechin@dechin-manjaro gold]$ ipython
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import vaex

In [2]: import xlrd

In [3]: %timeit xlrd.open_workbook(r'data.xls')
46.4 ms ± 76.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit vaex.open('data.csv')
4.95 ms ± 48.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit vaex.open('data.hdf5')
1.34 ms ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

我們從結果中發現,打開同樣的一份文件,使用xlrd需要將近50ms的時間,而vaex最低只需要1ms的時間,如此巨大的性能優勢使得我們不得不對vaex給予更多的關注。關于跟其他庫的對比,在這個鏈接中已經有人做過了,即使是對比pandas,vaex在讀取速度上也有1000多倍的加速,而計算速度的加速效果在數倍,總體來說表現非常的優秀。

數據格式轉換

在上一章節的測試中,我們用到了1個沒有提到過的文件:data.hdf5,這個文件其實是從data.csv轉換而來的。這一章節我們主要就介紹如何將數據格式進行轉換,以適配vaex可以打開和識別的格式。第一個方案是使用pandas將csv格式的文件直接轉換為hdf5格式,操作類似于在python對表格數據處理的章節中將xls格式的文件轉換成csv格式:

[dechin@dechin-manjaro gold]$ ipython
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd

In [4]: data = pd.read_csv('data.csv')

In [10]: data.to_hdf('data.hdf5','data',mode='w',format='table')

In [11]: !ls -l
總用量 932
-rw-r--r-- 1 dechin dechin 221872  3月 27 21:52 data.csv
-rw-r--r-- 1 dechin dechin 348524  3月 27 22:17 data.hdf5
-rw-r--r-- 1 dechin dechin 372736  3月 27 21:31 data.xls
-rw-r--r-- 1 dechin dechin    563  3月 27 21:42 table.py

操作完成之后在當前目錄下生成了一個hdf5文件。但是這種操作方式有個弊端,就是生成的hdf5文件跟vaex不是直接適配的關系,如果直接用df = vaex.open('data.hdf5')的方法進行讀取的話,輸出內容如下所示:

In [3]: df
Out[3]: 
#      table
0      '(0, [83.98, 92.38, 82.  , 83.52], [       0,   ...
1      '(1, [83.9 , 83.92, 83.9 , 83.91], [      1,    ...
2      '(2, [84.5 , 84.65, 84.  , 84.51], [      2,    ...
3      '(3, [84.9 , 85.06, 84.9 , 84.99], [      3,    ...
4      '(4, [85.1 , 85.2 , 85.1 , 85.13], [      4,    ...
...    ...
3,917  '(3917, [274.65, 275.35, 274.6 , 274.61], [     ...
3,918  '(3918, [274.4, 275.2, 274.1, 275. ], [      391...
3,919  '(3919, [275.  , 275.01, 274.  , 274.19], [     ...
3,920  '(3920, [275.2, 275.2, 272.6, 272.9], [      392...
3,921  '(3921, [272.96, 273.73, 272.5 , 272.93], [     ...

在這個數據中,丟失了最關鍵的索引信息,雖然數據都被正確的保留了下來,但是在讀取上有非常大的不便。因此我們更加推薦第二種數據轉換的方法,直接用vaex進行數據格式的轉換:

[dechin@dechin-manjaro gold]$ ipython
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import vaex

In [2]: df = vaex.from_csv('data.csv')

In [3]: df.export_hdf5('vaex_data.hdf5')

In [4]: !ls -l
總用量 1220
-rw-r--r-- 1 dechin dechin 221856  3月 27 22:34 data.csv
-rw-r--r-- 1 dechin dechin 348436  3月 27 22:34 data.hdf5
-rw-r--r-- 1 dechin dechin 372736  3月 27 21:31 data.xls
-rw-r--r-- 1 dechin dechin    563  3月 27 21:42 table.py
-rw-r--r-- 1 dechin dechin 293512  3月 27 22:52 vaex_data.hdf5

執行完畢后在當前目錄下生成了一個vaex_data.hdf5文件,讓我們再試試讀取這個新的hdf5文件:

[dechin@dechin-manjaro gold]$ ipython
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import vaex

In [2]: df = vaex.open('vaex_data.hdf5')

In [3]: df
Out[3]: 
#      i     t             s       h       l      e       n      a
0      0     '2002-10-30'  83.98   92.38   82.0   83.52   352    29373370
1      1     '2002-10-31'  83.9    83.92   83.9   83.91   66     5537480
2      2     '2002-11-01'  84.5    84.65   84.0   84.51   77     6502510
3      3     '2002-11-04'  84.9    85.06   84.9   84.99   95     8076330
4      4     '2002-11-05'  85.1    85.2    85.1   85.13   61     5193650
...    ...   ...           ...     ...     ...    ...     ...    ...
3,917  3917  '2018-11-23'  274.65  275.35  274.6  274.61  13478  3708580608
3,918  3918  '2018-11-26'  274.4   275.2   274.1  275.0   13738  3773763584
3,919  3919  '2018-11-27'  275.0   275.01  274.0  274.19  13984  3836845568
3,920  3920  '2018-11-28'  275.2   275.2   272.6  272.9   15592  4258130688
3,921  3921  '2018-11-28'  272.96  273.73  272.5  272.93  592    161576336

In [4]: df.s
Out[4]: 
Expression = s
Length: 3,922 dtype: float64 (column)
-------------------------------------
   0   83.98
   1    83.9
   2    84.5
   3    84.9
   4    85.1
    ...     
3917  274.65
3918   274.4
3919     275
3920   275.2
3921  272.96

In [11]: df.plot(df.i, df.s, show=True) # 作圖
/home/dechin/anaconda3/lib/python3.8/site-packages/vaex/viz/mpl.py:311: UserWarning: `plot` is deprecated and it will be removed in version 5.x. Please `df.viz.heatmap` instead.
  warnings.warn('`plot` is deprecated and it will be removed in version 5.x. Please `df.viz.heatmap` instead.')

這里我們也需要提一下,在新的hdf5文件中,索引從高、低等中文變成了h、l等英文,這是為了方便數據的操作,我們在csv文件中將索引手動的修改成了英文,再轉換成hdf5的格式。最后我們使用vaex自帶的畫圖功能,繪制了這十幾年期間黃金的價格變動:

由于vaex自帶的繪圖方法比較少,總結如下:

最常用的還是熱度圖,因此這里繪制出來的黃金價格圖的效果也是熱度圖的效果,但是基本上功能是比較完備的,而且性能異常的強大。

總結概要

在這篇文章中我們介紹了三種不同的python庫對表格數據進行處理,分別是xlrd、pandas和vaex,其中特別著重的強調了一下vaex的優越性能以及在大數據中的應用價值。配合一些簡單的示例,我們可以初步的了解到這些庫各自的特點,在實際場景中可以斟酌使用。

以上就是利用python做表格數據處理的詳細內容,更多關于python 表格數據處理的資料請關注腳本之家其它相關文章!

您可能感興趣的文章:
  • python 刪除excel表格重復行,數據預處理操作
  • Python3讀取和寫入excel表格數據的示例代碼
  • 基于Python快速處理PDF表格數據
  • Python基于pandas爬取網頁表格數據
  • 基于python實現把json數據轉換成Excel表格
  • 使用 Python 讀取電子表格中的數據實例詳解
  • python讀取word 中指定位置的表格及表格數據
  • python 中Arduino串口傳輸數據到電腦并保存至excel表格
  • Python 用三行代碼提取PDF表格數據
  • Python獲取數據庫數據并保存在excel表格中的方法
  • python 獲取頁面表格數據存放到csv中的方法
  • python3 讀取Excel表格中的數據

標簽:中山 駐馬店 呼和浩特 畢節 江蘇 股票 湖州 衡水

巨人網絡通訊聲明:本文標題《利用python做表格數據處理》,本文關鍵詞  利用,python,做,表格,數據處理,;如發現本文內容存在版權問題,煩請提供相關信息告之我們,我們將及時溝通與處理。本站內容系統采集于網絡,涉及言論、版權與本站無關。
  • 相關文章
  • 下面列出與本文章《利用python做表格數據處理》相關的同類信息!
  • 本頁收集關于利用python做表格數據處理的相關信息資訊供網民參考!
  • 推薦文章
    婷婷综合国产,91蜜桃婷婷狠狠久久综合9色 ,九九九九九精品,国产综合av
    精品福利二区三区| 精品va天堂亚洲国产| 91高清在线观看| 亚洲国产精品精华液2区45| 日本成人在线电影网| 国产aⅴ综合色| 欧美成va人片在线观看| 丝袜美腿亚洲一区| 成人黄色免费短视频| 91精品在线观看入口| 国产精品久久久久精k8| 国产一区二区久久| 日韩免费观看高清完整版在线观看| 亚洲另类中文字| 色婷婷综合久久久中文一区二区| 日韩精品自拍偷拍| 日本韩国一区二区三区视频| 激情综合网av| 成人精品电影在线观看| 一本色道久久综合狠狠躁的推荐| 国产成人啪免费观看软件| 一区二区三区中文在线| 欧美成人一区二区三区| 日韩一区在线播放| 中文字幕一区二区日韩精品绯色 | 9l国产精品久久久久麻豆| 国产一区二区精品在线观看| 欧美aaaaa成人免费观看视频| 成人影视亚洲图片在线| 日韩午夜av电影| 欧美国产一区视频在线观看| 亚洲18色成人| 色婷婷综合视频在线观看| 久久久99精品久久| 成人午夜看片网址| 国产精品久久久久久久午夜片| 国产日韩三级在线| 亚洲午夜av在线| 欧美国产一区二区| 秋霞成人午夜伦在线观看| 日本免费在线视频不卡一不卡二| 日韩理论片一区二区| 欧美激情一区二区三区四区| 天堂影院一区二区| 日韩免费在线观看| 久久精品一区蜜桃臀影院| thepron国产精品| 国产日本欧美一区二区| www.99精品| 亚洲国产精品嫩草影院| 91精品国产综合久久婷婷香蕉| 日产国产欧美视频一区精品| 国产色91在线| 欧美国产综合色视频| 一级日本不卡的影视| 日韩一级完整毛片| 日韩一级成人av| 韩国v欧美v日本v亚洲v| 成人高清视频免费观看| 久久综合九色综合欧美亚洲| 成人av资源站| 亚洲一区免费在线观看| 欧美日本在线视频| 国产成人欧美日韩在线电影| av不卡在线观看| 色婷婷av一区二区三区之一色屋| 国产精品网曝门| 欧洲一区二区三区在线| 国产在线精品一区二区| 天使萌一区二区三区免费观看| 欧美日韩激情一区二区三区| 国产美女娇喘av呻吟久久| 亚洲无人区一区| 国产欧美一区二区三区在线看蜜臀| 91精品国产欧美一区二区18 | 亚洲国产日韩一区二区| 久久午夜色播影院免费高清| 国产精品久久久久影视| 一区二区三区av电影| 成人动漫一区二区| 国产成人小视频| 亚洲欧洲性图库| 亚洲国产综合色| 欧美亚洲图片小说| 欧美日韩免费观看一区三区| 欧美性大战久久| 日韩精品一区二区三区在线 | 国产日韩一级二级三级| 懂色一区二区三区免费观看| 欧美亚洲免费在线一区| 国产欧美va欧美不卡在线| 麻豆国产欧美日韩综合精品二区 | 亚洲码国产岛国毛片在线| 欧洲中文字幕精品| 奇米影视在线99精品| 久久9热精品视频| jvid福利写真一区二区三区| 欧美人牲a欧美精品| 欧美精品一区二区精品网| 亚洲视频一二区| 麻豆成人在线观看| 白白色 亚洲乱淫| 日韩欧美一二区| 亚洲精品国产第一综合99久久| 蜜臀精品一区二区三区在线观看| 国产99精品在线观看| 欧美一区二区三区四区高清| 中文字幕 久热精品 视频在线 | 欧美三级午夜理伦三级中视频| 欧美一级生活片| 亚洲欧美国产77777| 乱中年女人伦av一区二区| 日本精品裸体写真集在线观看| 久久久综合视频| 毛片一区二区三区| 一本到不卡精品视频在线观看| 久久精品欧美一区二区三区不卡| 亚洲成av人影院| 色综合久久久久| 日本一区二区在线不卡| 久久91精品久久久久久秒播| 欧美日韩综合一区| 亚洲欧美日韩人成在线播放| 成人性生交大片免费看视频在线| 欧美不卡视频一区| 日本不卡中文字幕| 欧美日韩日日夜夜| 伊人婷婷欧美激情| 色综合久久综合网欧美综合网| 国产欧美日韩另类视频免费观看| 久久激情综合网| 精品久久久久久久久久久久包黑料 | 久久国产综合精品| 国产亚洲女人久久久久毛片| 久久久久久久综合日本| 91精品国产91综合久久蜜臀| 91电影在线观看| 欧美亚洲一区三区| 成人污视频在线观看| 国产精品伊人色| 国产激情视频一区二区在线观看| 亚洲欧洲韩国日本视频 | 欧美剧在线免费观看网站| 国产精品资源在线| 日韩精品国产欧美| 亚洲高清中文字幕| 亚洲综合精品自拍| 国产丝袜在线精品| 色丁香久综合在线久综合在线观看| 欧美大片日本大片免费观看| 日韩精品专区在线| 99久久婷婷国产精品综合| 亚洲欧洲精品一区二区精品久久久| 韩国精品免费视频| 欧美精品一区二区三区很污很色的| 毛片一区二区三区| 亚洲精品一区二区精华| 国产成人精品aa毛片| 亚洲国产精品av| 91网站最新网址| 午夜久久久影院| 久久综合色鬼综合色| 成人爱爱电影网址| 精品久久久久久久久久久久久久久 | 青青草视频一区| 日韩影院免费视频| 麻豆国产精品一区二区三区| 97久久精品人人澡人人爽| 成人性视频免费网站| 欧美人牲a欧美精品| 欧美一区日韩一区| 亚洲一区二区三区美女| 盗摄精品av一区二区三区| 在线国产亚洲欧美| 亚洲免费av网站| 丰满白嫩尤物一区二区| 欧洲av在线精品| 国产欧美日产一区| 九九视频精品免费| 成人av在线一区二区三区| 欧美老人xxxx18| 亚洲午夜日本在线观看| 懂色av一区二区在线播放| 日韩免费观看高清完整版| 亚洲视频免费看| 欧美zozozo| 亚洲国产成人va在线观看天堂| 福利一区二区在线| 日韩一级免费观看| 久久亚洲综合色| 国产精品一区在线| 欧美国产激情二区三区| 精品视频在线免费观看| 日本一区二区久久| 五月综合激情网| 国产一区二区三区精品视频| 综合久久国产九一剧情麻豆| 日韩一区二区三区电影在线观看 | 婷婷激情综合网| 亚洲欧美另类小说视频|