Some thoughts & notes for PyCon APAC 2015

這次是第二次參加 PyCon,
與其像去年當個沒做事的 Web 組冗員,
還是乖乖當個會眾就好,
但還是有義務幫忙一些簡單的小事啦。

相較於上次參加,
這一年中也對 Python 更加瞭解了一些,
但仍然有很多地方不足,
但至少今年的議程中,
能夠聽懂的議程比較多了。

今年的議程中,
感覺偏硬的講題變多了,
似乎是件好事,
不然很多人提到 Python 都覺得只有 Django, Flask, Bottle, Tornado, ... 之類的 Web 應用
或是 Numpy, Scipy 之類的科學運算。
其實 Security 的部分也越來越常看見 exploit 是用 Python 寫的,
只是可能比較難在 PyCon 出現吧,
應該都出現在 HITCON。
但今年的講題真的是比去年還富有多樣性就是。

而今年印象最深刻的議程是 Scott Tsai 講 GDB 掛 Python hook 做 Debug 那場,
講者的英文在我耳裡聽起來跟 native speaker 一樣, 非常流利。
現場直接 Live Demo 也非常嫻熟,真的很厲害。

除了議程之外,最讓我印象深刻的絕對是第二天晚上的夜市。

晚餐的自助吧菜色非常多,可以吃得很飽。
再加上很棒的管樂隊演奏(宇宙戰艦大和號、龍貓、...),
還有其他的音樂表演,然後各個攤位自己舉辦的各種活動。

因為一開始就被拉去 Code Fight 的攤位,
然後跟 dv 被抓去當第一題的白老鼠,還被拍了照,
然後就一直待在那邊玩,所以印象最深的也只有 Code Fight。
結果我跟 dv 很糗的第一題都沒在時間內解出來,還在那邊耗很久的時間,
用別人的電腦真的很不習慣啊!
好吧,其實只是太廢的藉口,回去得多練練功。
但這活動真的很有趣,一堆工程師宅宅在現場解題目,玩得不亦樂乎,希望明年還會有。

code fight
source: https://www.facebook.com/photo.php?fbid=10206661768324121&set=a.1209323314476.2032355.1270526425&type=1

附上 Code Fight 的 GitHub Repo (有興趣的人可以去貢獻題目XD)

ccwang002/pyapac_code_fighter · GitHub


Badge System & Bingo 令我印象深刻的程度僅次於 Night Party。

由於今年的 Badge 是 RFID card,
在各個廠商攤位以及會議室的門口都有 RPi 做好的 RFID Reader,
會眾到贊助廠商的攤位可以 Check-in,
一方面是讓廠商得到你的 email,之後可以寄送一些廣告或者是徵才訊息給你,
另一方面是官方做了一個Badge System
每個廠商的攤位 Check-in 都會有成就可以解鎖,
而每場議程也都可以 Check-in,
然後根據解鎖的成就數量,會影響到最後一天 Bingo 活動起始的格子數。

因為我解鎖了不少成就,
所以在最後一天的 Bingo 活動中幸運的得到了 PyCarm 一年份的序號,
雖然平常還是都在用 Vim 就是。

pycharm

順帶一提,PyCharm 攤位送的贈品是溜溜球,讓人有種回味童年的感覺


PyCon 真的每年都有不一樣的東西,
除了上面提到的 Night Party 跟 Bingo 以外,
其實會場的佈置也很棒,
現場有一個超級長的橫向時間軸海報,
上面記錄了跟 Python 相關的許多重大事件,
然後附上便利貼和筆,讓會眾可以在海報上面留言,
我只有拍下 PyCon 2015 的部分。

poster

然後還有今年的穿梭時空的飛車的看板,
可以給人拍照用XD

signboard

還有餐點也真的很棒

meal

因為工作人員幾乎都是上班族,
所以都是用下班或是假日的時間義務幫忙,
真的很感謝他們辛苦的籌備!

喔 對了 因為今年 irc channel 真的是有夠冷清
所以官方開了個 gitter 代替 irc 的聊天功能
想看聊天記錄的人可以去看看
https://gitter.im/pycontw/pyconapac2015

心得就是上面的部分啦
以下是三天議程中我自己紀錄的筆記,附上 Hackpad 共筆連結:



Day 1 - 2015/06/05 - Fri

PyCon APAC 2015 - Day 1 (June 5) - hackpad.com


Ecosystem Threats to Python - Van Lindberg

Python is 25 years old.

  • The Python Ecosystem
    • 2014 - Ranking of the programing language, took from stackoverflow, github
  • Perl vs Python - from 2004 - 2013
    • Perl down, Python up.
  • The transistion from Python2 to Python3
  • Threats
    • Java
    • nodejs
    • go
  • I cannot interoperate
    • I see dead code.
    • Fortran, C, Cpp
  • Why Django sucks in DjangoCon
  • Let's talk about Java, Why Java?
    • The corporate machine
    • The JVM
    • The Apache Software Foundation
    • Android
    • Big Data
    • ..., Acceptable hackerness
  • Why Javasciprt?
    • Ubiquity
    • Gmail
    • Chrome and Virtual Machine race
      • V8
    • Javascript, the Good Parts
    • Evented / Async from the ground up
    • JSON
    • ..., Acceptable hackerness
  • Go
    • static binary
    • bottom up for multi-core use
  • Why Go?
    • Rob Pike and Google
    • Pragmatic, small, "fits your brain"
    • Channels and Goroutines
      • centrate all CPUs
      • right now in Python we don't have this feature.
        • Pypy
        • libsvm
      • Performance
        • Python was created for simple using, but go focus on multi-core and performance
    • High level datatypes
    • gofmt
    • go run
    • Deployment
    • Speed
  • Python
    • Let's talk about what we are doing wrong
    • Multi-core, package managemnet
  • Why Python?
    • Broad Ecosystem
      • ruby - bind with rails
      • go - small group of UNIX servers
      • python - it's hard to say where python it is, it's everywhere.
        • Every Movie, single media stream today, used Python.
        • Huge strike in the scientific computing
    • Pragmatic, "fits your brain"
    • Teaching language
    • The PyPy R&D division
    • Python 3
  • Python is not just a language, Python is the Python Community.

When programming functionally in Python - Apua Juan

  • Theory in Functional Programming
  • Generator Failure?
    • Range object vs Generator
    • 這兩個是不一樣的東西,generator 就是跑過就沒了
  • Coroutine
    • Example in Python Official Document
  • List comprehension from Haskell
    • lambda function
  • Python doesn't provide...
    • Algebraic Data Type? Recursive Data Type? Parametric or generic
      • Class 無助於產生 ADT
      • meta-class 可以,但不完全,還是得定義給了什麼 Class
  • Pattern Matching
    • _, a = abc(*data)
    • 這方面的支援不夠好
  • Type Class
    • Type Class is NOT "Class"
    • Another form of data abstraction
    • More abstract than ADT
    • It is the abc (Abstract Base Class) in Python
      • Django 大量的使用了 abc
  • Monad
    • keyword "Maybe" in Haskell
    • Container, Function for Container
    • Monadic
      • 開了個檔案,但是使用 read mode, 並對該 fd 做寫入, 不應該改動
  • Type System
    • Python use Duck Typing, so we basically don't use specific type.
    • Annotation for type checking in the future Python
  • Generic Function
def fcn(a, b=None, *args, **kwargs): ...  
---  
def fcn(a, *, b=None, **kwargs): ...  
  • Python lacks somthing in Functional Programming
    • imperative vs declarative
    • interpreted vs compiled
    • dynamic typed vs static typed
    • Python 基本上偏左邊

Python Debugger Uncovered - Dmitry Trofimov

  • about this talk
    • how to trace Python programs
    • show implementation of a Python Debugger
    • lots of code
    • PyDev
  • Python debuggers
    • Implemented in Python
      • pdb, PyCharm, Pydev
      • platform independent: CPython, Jython, PyPy, IronPython
      • Can be broken by user code (can be prevented by tricky fixes)
    • Implemented in C
      • winpdb, Wing, gdb(with Python mappings)
  • Tracing Python code with Python
    • sys.settrace(tracefunc)
      • call, line, reutrn, c_call, c_return, c_exception
    • Simple Trace Function
def tracefunc(frame, event, arg):  
    print ("%s on #%d % (event, frame.f_lineno))  
    return tracefunc  

import sys  

sys.settrace(tracefunc)  
  • Let's make simple Python debugger
    • Console Debugger
    • Visual Debugger
  • Protocol
    • every message is a line
  • Command Types
    • Set Breakpoint
    • Resume
    • Get Threads
    • Get Frame
    • Evaluate Expression
  • IDE
    • Creates server socket
    • Launches a script being debugged with a command
  • Debugger Main Code
  • Demo
  • Important Features
    • Conditional Breakpoints
    • Exception Breakpoint
    • Step Over / Step Into / Run to Line
    • PYthon 2.4 to Python 3.4
  • https://wiki.python.org/moin/PythonDebuggers

Programmatic Debugging with GDB and Python - Scott Tsai

  • debug C / C++ code with Python
  • GDB
    • set a Conditional Breakpoint
    • set a Breakpoint that only triggers for a specific thread
    • Debug multiple process
  • Getting a Python Prompt in GDB
$ gdb  
(gdb) python-interactive  
  • GDB Embedding IPython

(In some .py file)

import IPython  
IPython.embed_kernel()  

(In some shell)

$ gdb -x gdb-ipython.py  
$ ipython3 console kernel-xxx.json  
$ gdb -q /bin/true  
(gdb) start  
(gdb) python-interactive  
  • How Source Level Debugging works?
    • gcc -g
    • eu-strip -f xxx.debug xxx
      • eu-strip - split debug info
  • Do my binaries have debug info?
    • Look for the .debug_info section
    • .gnu_debuglink
  • DWARF ELF Sections
    • .debug_abbrev
    • .debug_info
    • pyelftools
      • pip install pyelftools
  • Linux Distros Provide Debug Info
    • (Fedora, Red Hat)
      • debug-info install $PACKGENAME
  • CPython and Numpy has debug info, butt libz.so and other externel libraries don't.
  • Debug multiprocess with gdb
set detach-on-fork off  
set traget-async on  
set pagination off  
add-inferior  
...  
  • Debug Optimized Code
    • use gdb to alter the control flow

The Future of GUI Programming with Python - Tzu-ping Chung

$ pip install toga-demo  
$ toga-demo  
  • Difficulties
    • API Design
    • Platform
  • Mobile Problem
  • Mobile Support
    • C API Availability
  • Activity Stack / Fragment

RPyScan

用 Raspberry Pi + Python 自幹 3D 人體掃描機,因為買現成的太貴了。


MMO Server Design with Twisted.py - Dan Maas

// 這外國講者的中文講的真的蠻標準的

  • SPINPUNCH
  • THUNDER RUN
  • Topics
    • System Architecture
    • How to write asynchronous HTTP server with Twisted
    • How to profile Asynchronous Server
  • Game = Engine + Game Data + Art
    • Engine: Server, Client, Analytics
    • Game Data: Units, buildings, items
    • Art: Images, Sounds
  • Engine
    • Client / Server "web app"
    • Server: Python
    • Client: JavaScript / HTML5 Canvas
  • Won't be mention today
    • Analytics system (SQL, map/reduce)
    • Gamedata build pipeline
    • Art build pipeline
  • Server
    • Client sends requests (by HTTP) to run gmae actions
      • "Upgrade this building"
      • "Produce this unit"
      • "Buy this thing in the Store"
    • Check requirements, if OK, then mutate player state, send reply
  • Server Design Requirements
    • High Scale
      • 20,000+ daily players
      • 2,000+ concurrent players
    • Low latency
      • cannot greater than 1xx ms
  • Server Implementation
    • Python
    • Twisted Asynchronous HTTP server
    • Cluster of processes (on Amazon EC2)
    • Support ~ 100 online players per CPU
      • Scaling by adding cores
  • What is Twisted?
    • Network library
    • Asynchronous event loop, like NGINX
    • Supports many internet protocols
      • HTTP
      • SSH
      • FTP
      • SMTP
    • Consistent Python API, not every easy but really consistent
    • Easy to extend with custom classes
  • Asynchronous server
  • use both synchronous and asynchronous code
    • not every computation are suitable with asynchronous
    • synchronous code is easier to write
    • synchronous (99%)
      • fast: 1 - 100 ms
        • attack
    • asynchronous (1%)
      • slow: 100ms - 10 seconds
        • Reading/writing Amazon S3 on login/logout
        • Quering Facebook API
        • Top scores database query
  • Write an asynchronous http server by using Twisted
    • reactor
    • twisted.web.NOT_DONE_YET
    • request.write("Hello")
    • request.finish()
  • How to connect "before" and "after"
  • inlineCallbacks (decorator) in Twisted
  • Collect data on each request
    • Average latency (performance hotspot)
  • Watch total "unhalted" time
    • What % of the time the CPU is waiting for the next request?
    • 50% = danger

Day 2- 2015/06/06 - Sat

PyCon APAC 2015 - Day 2 (June 6) - hackpad.com


GIL - Tzung-Bi Shih

https://github.com/penvirus/gil1

  • Introduction
    • Global Interpreter Lock
      • Giant Lock
    • GIL in CPython protects
      • Interpreter state, thread state, ...
      • reference count
      • "a Guarantee"
    • Other implementations
      • fine-grained Lock
        • 把 lock 切小
      • lock-free
        • 不需要 Lock - 兩種
          • algorithm
          • 底層操作達到 atomic
        • 把 lock offload 到下一層
    • GIL 好做
  • GIL over multi-processor
    • Want to produce efficient program.
  • GIL 該不該存在,需不需要拿掉?
    • 問題
      • 有太多 legacy 的東西要處理
      • 「男子漢的約定不能改變!」
  • Brainless Solution for multi-process

    • Embarrassingly parallel
      • no dependency between those parallel tasks
    • IPC-required parallel tasks
      • share states with other peers
      • the most costly overhead of the GIL battle
    • Example
      • multiprocessing
        • process pool
        • nondeterministic
          • the same input, different output.
        • further observations
          • workers are forked when initializing the pool, they share the same memory copy
      • pp (parallel python) remote node
        • ppserver.py -v 1 -p 10000 &
  • Release the GIL

    • Examples
      • ctypes
        • thread for GIL battle
      • Python / C extension
        • linking to the busy.so extension
        • When it comes to C thread vs Python thread battle, C thread always win.
  • Cooperative Multitasking
    • Only applicable to IO-bound tasks
    • Single process, single Threads
      • no other thread, no GIL battle
    • Executing the code when exactly needed
    • Examples:
      • generator (太 geek,容易寫錯)
      • pyev (recommended)
        • link to libe
        • 實際上會使用 io watcher
        • further observations
          • 不用任何 symbol 去接、重複使用同一個 symbol 都會造成 segmentation fault
      • gevent (recommended)
  • Interpreter as an Instance (rough idea)
    • C program, singel process, multi-thread
      • still can share states with relatively low penalty
    • Allocate memory space for interpreter context
      • that is, accept an address to put instance context in Py_Initialize()
  • Conclusion
    • How to live along with GIL well?
      • Multi-process
      • Release the GIL
      • Cooperative Multitasking
        • for IO-bound solution
      • Perhaps, Interpreter as an Instance

Python & LLVM - 李楓


Python 讓你的眼睛看得見 - Yu-Chi Lin

  • McGurk 效應
  • 資料從哪來
    • 英文語料庫:AVleter, CUAVE, OuluVS, IBMSR, IBMIH
    • 中文語料庫:找不到,所以資料庫自己建
  • 自己建資料庫
    • 透過安排好的語料,隨機讓受試者唸出,並錄影錄音。
  • OpenCV
  • 研究架構
    • 臉部偵測 => 嘴巴偵測 => 相鄰 frame 的變化量計算,切出音節
    • 聲音訊號分析 => 以波形能量切音節
    • 結合以上兩個,以聲音為主,影像為輔,切出正確音節
  • 偵測
  • 如果你要訓練一個自己的分類器?
  • 怎麼判斷電腦切音節切得好不好?=> 還是得靠人力來判斷
    • 聲音切音節 + 人力切音節 => 得到最接近正確音節的位置
    • 其他切音節的方法
      • 以嘴巴開合的面積大小來計算
        • 以顏色判別嘴唇或皮膚,以此計算開合大小
        • HSV colorspace
        • 框出嘴唇的外輪廓,計算面積大小
  • OpenCV in Machine Learning
    • scikit-learn

以雲端語音合成技術為基礎的音文同步有聲書之建立系統 - Chao-Ka Chang

  • google TTS
  • Python MTK

龜作圖


LT

  • 網頁爬蟲
    • urllib2
    • pycurl
    • selenium
    • virtkey, pytesser
    • use thread
    • 被 ban
      • 睡覺皇帝大, sleep
      • proxy

Day3 - 2015/06/07 - Sun

PyCon APAC 2015 - Day 3 (June 7) - hackpad.com


- Andy Terrel

  • The Fundamental Physics
    • Moving / Copying data is more expensive than computation
  • Business Data Processing
  • Scientific Data Processing
  • "Data Has Mass"
  • Data Gravity
  • Memory Matters
    • 1980s
    • 90s - 00s
      • L1, L2
    • 2010s
      • L3
      • SSD
  • Speed Matters
  • "Data Scientist" Dilemma
    • Massive data to deal with (must bring code to data)
    • Cacophony of tools, data-bases, and products to integrate
    • Modern hardware tempts to be used but mostly idel (GPUs, data0center clusters)
    • Huge
  • Why Python --- Spectrum
    • Occasional
      • Cut and Paste
    • Scientist Developer
      • Extend frameworks
    • Developer
      • Create frameworks
    • Unique aspect of Python
  • Architecting for Data
    • Data exploration as the central task.
    • Data visualization as a first-
  • Building Exploratory Data Platforms
    • Environments
      • Wakari
      • Anaconda
    • Analytics
      • Blaze
      • Numba
    • Visualization
      • Bokeh
  • Our Position
    • No one-size-fits-all pint-and-click application is enough to solve business problems.
    • A language-based platform is needed. ...
  • Data Science Discovery Process
    • Data Acquistition - Blaze
    • Data Preparation - Anaconda server
    • Data Analysis - IPython Notebook
    • Data Interpretation - Bokeh
  • Important Pieces of the Platform
    • Anaconda - easy to install, plus lots of libraries
  • Building a better PyData Ecosystem
  • Open SOurce Technology
    • Blaze
      • Array URLs and compute servers for breaking down data-silos.
    • Bokeh
      • Interactive Visualization in the Browser for Python (and other languages) of large data.
    • Numba
      • Optimizaing Compiler for subset of Python which allows multi-core, multi-process and basic CPU support
  • Data Pain
    • Dealing with data applications has numerous pain points
    • Hundreds of data formats
    • Basic programs expect all data to fit in memory
    • Data analysis pipelines constantly changin from one form to another
    • ...
    • ...
  • Blaze
    • NEED, TOOL, CAPABILITY
    • Compisition
      • Distributed Systems
        • spark
      • Scientific Computing
        • HDFS
        • bcolz
      • BI - DB
        • mongo
      • DM/Stats/ML
    • Abstract experssions, Data Storage, Computational Backend
    • Architecture
      • Flexible
      • Use compilation of deferred expressions to optimize data interactions
    • Dask
    • ODO
      • Shapeshifting for your data
        • odo(source, target)
  • Numba
    • JIT, Dynamic compiler for Python
    • Optimize data-parallel computations at call time, to take advantage of local hardware configuration
    • Compatible with C, C++, Fortran
    • C++, C, Fortran, Python => LLVM IR => x86, ARM, PTX
  • Data Visualization
    • Bokeh
      • https://github.com/bokeh/bokeh
      • Interactive
      • Novel graphics
      • Streaming, dynamic, large data
      • For the browser, with or without a server
      • Matplotlib compatibility
      • No need to write Jade
      • No JavaScript

Machine learning in Finance using Python - Eric Tham

http://www.slideshare.net/erictham/machine-learning-in-finance-using-python

  • Introduction
    • Pattern recognition, algorithm, data, prediction
  • What is machine Learning
  • Machine in Finance
    • Sentiment Analysis (Behavoiural finance)
    • Credit analytics
    • Financial forecasting
      • Technical transformation on Data
    • Portfolio allocation

Share


Donation

如果覺得這篇文章對你有幫助, 除了留言讓我知道外, 或許也可以考慮請我喝杯咖啡, 不論金額多寡我都會非常感激且能鼓勵我繼續寫出對你有幫助的文章。

If this blog post happens to be helpful to you, besides of leaving a reply, you may consider buy me a cup of coffee to support me. It would help me write more articles helpful to you in the future and I would really appreciate it.


Related Posts