wordpress error 500 & wordpress Path

simply because plugin crash I guess

go to the Path of wordpress

/opt/bitnami/apps/wordpress/htdocs

/var/www/html/wp-content/plugins

find plugin folder, rename any plugin you think may cause error

go back to your website


change wordpress files deployed by bitnami on AWS

find the path of wordpress

/opt/bitnami/apps/wordpress/htdocs

exam file structure apt-get install tree

├── index.php
├── license.txt
├── readme.html
├── wp-activate.php
├── wp-admin
├── wp-blog-header.php
├── wp-comments-post.php
├── wp-config.php
├── wp-config-sample.php
├── wp-content
├── wp-cron.php
├── wp-includes
├── wp-links-opml.php
├── wp-load.php
├── wp-login.php
├── wp-mail.php
├── wp-settings.php
├── wp-signup.php
├── wp-trackback.php
└── xmlrpc.php

in folder wp-content, which contains themes that can be modified.


bitnami AWS server MySQL config

if you launch a bitnami worldpress server on AWS,

it’s mysql, apache, php-fpm are stored in different locations compares to installed via apt-get

they are monitored and managed by bitnami.

so, go to

/opt/bitnami

./ctlscript.sh 

./ctlscript.sh status

if you want to restart mysql

./ctlscript.sh restart mysql

Or

use command mysqladmin

>mysqladmin

MySQL add user privilege

Linux config mysql

go to /etc/mysql/mysql.conf.d/

> vim mysqld.conf

command out

bind-address = 127.0.0.1

to

# bind-address = 127.0.0.1

 

get in mysql

$ mysql -u root -p  (-localhost)
Enter password:

if you want to create user fun with password love, and who can access the database from anywhere

> create user 'fun'@'%' identified by 'love'

then

> grant all privileges on *.* to 'fun'@'%' with grant option;

>flush privileges;

 

mysql> GRANT ALL ON . to root@’192.168.1.4′ IDENTIFIED BY ‘your-root-password’;

If … Mysql forget root password

go to the Path of mysql’s config directory

e.g.

/etc/mysql/mysql.conf.d/

or if you are using bitnami

/opt/bitnami/mysql/

vim mysqld.cnf   (my.cnf)

under [mysqld]

add

skip-grant-tables

Now log in to mysql that do not need password, change password for root, e.g.

use mysql

mysql>update user set password=password('newpassword') where user= 'root';

mysql>flush privileges;

difficulty encountered

in bitnami, mysql database’s user table do NOT have column ‘password’, instead it has, ‘authentication_string’ for the password column, so:

mysql>update user set authentication_string=password('newpassword') where user= 'root';

mysql>flush privileges;

password(‘string’), is a function that encrypt the string to safer representation.

then go back to mysql config file and delete  skip-grant-tables

restart mysql

 

pandas, read_csv( parameters explain, Chinese )

pandas.read_csv参数整理
读取CSV(逗号分割)文件到DataFrame
也支持文件的部分导入和选择迭代
参数:
filepath_or_buffer : str,pathlib。str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)
可以是URL,可用URL类型包括:http, ftp, s3和文件。对于多文件正在准备中
本地文件读取实例:://localhost/path/to/table.csv
sep : str, default ‘,’
指定分隔符。如果不指定参数,则会尝试使用逗号分隔。分隔符长于一个字符并且不是‘\s+’,将使用python的语法分析器。并且忽略数据中的逗号。正则表达式例子:’\r\t’
delimiter : str, default None
定界符,备选分隔符(如果指定该参数,则sep参数失效)
delim_whitespace : boolean, default False.
指定空格(例如’ ‘或者’ ‘)是否作为分隔符使用,等效于设定sep=’\s+’。如果这个参数设定为Ture那么delimiter 参数失效。
在新版本0.18.1支持
header : int or list of ints, default ‘infer’
指定行数用来作为列名,数据开始行数。如果文件中没有列名,则默认为0,否则设置为None。如果明确设定header=0 就会替换掉原来存在列名。header参数可以是一个list例如:[0,1,3],这个list表示将文件中的这些行作为列标题(意味着每一列有多个标题),介于中间的行将被忽略掉(例如本例中的2;本例中的数据1,2,4行将被作为多级标题出现,第3行数据将被丢弃,dataframe的数据从第5行开始。)。
注意:如果skip_blank_lines=True 那么header参数忽略注释行和空行,所以header=0表示第一行数据而不是文件的第一行。
names : array-like, default None
用于结果的列名列表,如果数据文件中没有列标题行,就需要执行header=None。默认列表中不能出现重复,除非设定参数mangle_dupe_cols=True。
index_col : int or sequence or False, default None
用作行索引的列编号或者列名,如果给定一个序列则有多个行索引。
如果文件不规则,行尾有分隔符,则可以设定index_col=False 来是的pandas不适用第一列作为行索引。
usecols : array-like, default None
返回一个数据子集,该列表中的值必须可以对应到文件中的位置(数字可以对应到指定的列)或者是字符传为文件中的列名。例如:usecols有效参数可能是 [0,1,2]或者是 [‘foo’, ‘bar’, ‘baz’]。使用这个参数可以加快加载速度并降低内存消耗。
as_recarray : boolean, default False
不赞成使用:该参数会在未来版本移除。请使用pd.read_csv(…).to_records()替代。
返回一个Numpy的recarray来替代DataFrame。如果该参数设定为True。将会优先squeeze参数使用。并且行索引将不再可用,索引列也将被忽略。
squeeze : boolean, default False
如果文件值包含一列,则返回一个Series
prefix : str, default None
在没有列标题时,给列添加前缀。例如:添加‘X’ 成为 X0, X1, …
mangle_dupe_cols : boolean, default True
重复的列,将‘X’…’X’表示为‘X.0’…’X.N’。如果设定为false则会将所有重名列覆盖。
dtype : Type name or dict of column -> type, default None
每列数据的数据类型。例如 {‘a’: np.float64, ‘b’: np.int32}
engine : {‘c’, ‘python’}, optional
Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.
使用的分析引擎。可以选择C或者是python。C引擎快但是Python引擎功能更加完备。
converters : dict, default None
列转换函数的字典。key可以是列名或者列的序号。
true_values : list, default None
Values to consider as True
false_values : list, default None
Values to consider as False
skipinitialspace : boolean, default False
忽略分隔符后的空白(默认为False,即不忽略).
skiprows : list-like or integer, default None
需要忽略的行数(从文件开始处算起),或需要跳过的行号列表(从0开始)。
skipfooter : int, default 0
从文件尾部开始忽略。 (c引擎不支持)
skip_footer : int, default 0
不推荐使用:建议使用skipfooter ,功能一样。
nrows : int, default None
需要读取的行数(从文件头开始算起)。
na_values : scalar, str, list-like, or dict, default None
一组用于替换NA/NaN的值。如果传参,需要制定特定列的空值。默认为‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘nan’`.
keep_default_na : bool, default True
如果指定na_values参数,并且keep_default_na=False,那么默认的NaN将被覆盖,否则添加。
na_filter : boolean, default True
是否检查丢失值(空字符串或者是空值)。对于大文件来说数据集中没有空值,设定na_filter=False可以提升读取速度。
verbose : boolean, default False
是否打印各种解析器的输出信息,例如:“非数值列中缺失值的数量”等。
skip_blank_lines : boolean, default True
如果为True,则跳过空行;否则记为NaN。
parse_dates : boolean or list of ints or names or list of lists or dict, default False
  • boolean. True -> 解析索引
  • list of ints or names. e.g. If [1, 2, 3] -> 解析1,2,3列的值作为独立的日期列;
  • list of lists. e.g. If [[1, 3]] -> 合并1,3列作为一个日期列使用
  • dict, e.g. {‘foo’ : [1, 3]} -> 将1,3列合并,并给合并后的列起名为”foo”
infer_datetime_format : boolean, default False
如果设定为True并且parse_dates 可用,那么pandas将尝试转换为日期类型,如果可以转换,转换方法并解析。在某些情况下会快5~10倍。
keep_date_col : boolean, default False
如果连接多列解析日期,则保持参与连接的列。默认为False。
date_parser : function, default None
用于解析日期的函数,默认使用dateutil.parser.parser来做转换。Pandas尝试使用三种不同的方式解析,如果遇到问题则使用下一种方式。
1.使用一个或者多个arrays(由parse_dates指定)作为参数;
2.连接指定多列字符串作为一个列作为参数;
3.每行调用一次date_parser函数来解析一个或者多个字符串(由parse_dates指定)作为参数。
dayfirst : boolean, default False
DD/MM格式的日期类型
iterator : boolean, default False
返回一个TextFileReader 对象,以便逐块处理文件。
chunksize : int, default None
文件块的大小, See IO Tools docs for more informationon iterator and chunksize.
compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’
直接使用磁盘上的压缩文件。如果使用infer参数,则使用 gzip, bz2, zip或者解压文件名中以‘.gz’, ‘.bz2’, ‘.zip’, or ‘xz’这些为后缀的文件,否则不解压。如果使用zip,那么ZIP包中国必须只包含一个文件。设置为None则不解压。
新版本0.18.1版本支持zip和xz解压
thousands : str, default None
千分位分割符,如“,”或者“.”
decimal : str, default ‘.’
字符中的小数点 (例如:欧洲数据使用’,‘).
float_precision : string, default None
Specifies which converter the C engine should use for floating-point values. The options are None for the ordinary converter, high for the high-precision converter, and round_trip for the round-trip converter.
指定
lineterminator : str (length 1), default None
行分割符,只在C解析器下使用。
quotechar : str (length 1), optional
引号,用作标识开始和解释的字符,引号内的分割符将被忽略。
quoting : int or csv.QUOTE_* instance, default 0
控制csv中的引号常量。可选 QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3)
doublequote : boolean, default True
双引号,当单引号已经被定义,并且quoting 参数不是QUOTE_NONE的时候,使用双引号表示引号内的元素作为一个元素使用。
escapechar : str (length 1), default None
当quoting 为QUOTE_NONE时,指定一个字符使的不受分隔符限值。
comment : str, default None
标识着多余的行不被解析。如果该字符出现在行首,这一行将被全部忽略。这个参数只能是一个字符,空行(就像skip_blank_lines=True)注释行被header和skiprows忽略一样。例如如果指定comment=’#’ 解析‘#empty\na,b,c\n1,2,3’ 以header=0 那么返回结果将是以’a,b,c’作为header。
encoding : str, default None
指定字符集类型,通常指定为’utf-8′. List of Python standard encodings
dialect : str or csv.Dialect instance, default None
如果没有指定特定的语言,如果sep大于一个字符则忽略。具体查看csv.Dialect 文档
tupleize_cols : boolean, default False
Leave a list of tuples on columns as is (default is to convert to a Multi Index on the columns)
error_bad_lines : boolean, default True
如果一行包含太多的列,那么默认不会返回DataFrame ,如果设置成false,那么会将改行剔除(只能在C解析器下使用)。
warn_bad_lines : boolean, default True
如果error_bad_lines =False,并且warn_bad_lines =True 那么所有的“bad lines”将会被输出(只能在C解析器下使用)。
low_memory : boolean, default True
分块加载到内存,再低内存消耗中解析。但是可能出现类型混淆。确保类型不被混淆需要设置为False。或者使用dtype 参数指定类型。注意使用chunksize 或者iterator 参数分块读入会将整个文件读入到一个Dataframe,而忽略类型(只能在C解析器中有效)
buffer_lines : int, default None
不推荐使用,这个参数将会在未来版本移除,因为他的值在解析器中不推荐使用
compact_ints : boolean, default False
不推荐使用,这个参数将会在未来版本移除
如果设置compact_ints=True ,那么任何有整数类型构成的列将被按照最小的整数类型存储,是否有符号将取决于use_unsigned 参数
use_unsigned : boolean, default False
不推荐使用:这个参数将会在未来版本移除
如果整数列被压缩(i.e. compact_ints=True),指定被压缩的列是有符号还是无符号的。
memory_map : boolean, default False
如果使用的文件在内存内,那么直接map文件使用。使用这种方式可以避免文件再次进行IO操作。

python format print example

In [1]:
print 'before: {:.2f} after'.format(1.5555)
before: 1.56 after
In [2]:
print '{1},{0},{1},{2},{0}'.format('pos',777,True) 
777,pos,777,True,pos
In [3]:
print '{name},{age}'.format(age=18,name='cutie')  
cutie,18
In [4]:
has=['first', 2.00, 'third']
print '1st {0[0]} all: {0} last {0[2]} end'.format(has)
1st first all: ['first', 2.0, 'third'] last third end
In [5]:
print 'start--- {:,} ---end'.format(9876543210)
start--- 9,876,543,210 ---end
In [6]:
print 'start:{:>8}'.format(123)
start:     123
In [7]:
print 'start:{:0>8}'.format(123)
start:00000123
In [8]:
print 'start:{:A>8}'.format(123)
start:AAAAA123
In [ ]:
 
In [ ]:
 
In [ ]:
 

understand axis in matplotlib

axis is a subarea within figure() in matplotlib

figure()

plt.axes([0.3, 0.5, 0.4, 0.2])

 

 

 

 

 

 

 

 

 

the figure area size is 1, 1 as base

 

In [2]:
import matplotlib.pyplot as plt
import numpy as np
In [6]:
physical_sciences=[ 13.8,  14.9,  14.8,  16.5,  18.2,  19.1,  20. ,  21.3,  22.5,
        23.7,  24.6,  25.7,  27.3,  27.6,  28. ,  27.5,  28.4,  30.4,
        29.7,  31.3,  31.6,  32.6,  32.6,  33.6,  34.8,  35.9,  37.3,
        38.3,  39.7,  40.2,  41. ,  42.2,  41.1,  41.7,  42.1,  41.6,
        40.8,  40.7,  40.7,  40.7,  40.2,  40.1];
computer_science=[ 13.6,  13.6,  14.9,  16.4,  18.9,  19.8,  23.9,  25.7,  28.1,
        30.2,  32.5,  34.8,  36.3,  37.1,  36.8,  35.7,  34.7,  32.4,
        30.8,  29.9,  29.4,  28.7,  28.2,  28.5,  28.5,  27.5,  27.1,
        26.8,  27. ,  28.1,  27.7,  27.6,  27. ,  25.1,  22.2,  20.6,
        18.6,  17.6,  17.8,  18.1,  17.6,  18.2]
year=[1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980,
       1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991,
       1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
       2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011]

two methods to specify many distinct graphs

In [10]:
# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')

# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')

# Display the plot
plt.show()
In [8]:
# Create plot axes for the first line plot
plt.axes([0.05,0.05,0.425,0.9])

# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year,physical_sciences, color='blue')

# Create plot axes for the second line plot
plt.axes([.525,0.05,0.425,0.9])


# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='red')


# Display the plot
plt.show()
In [9]:
# Create a figure with 1x2 subplot and make the left subplot active
plt.subplot(1,2,1)

# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')

# Make the right subplot active in the current 1x2 subplot grid
plt.subplot(1,2,2)


# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')
plt.title('Computer Science')

# Use plt.tight_layout() to improve the spacing between subplots
plt.tight_layout()
plt.show()
In [ ]:
 

Jupyter notebook into post

download jupyter notebook as html,

add that html source code to XYZ Html

embed that to post like

In [1]:
import numpy as np

from bokeh.plotting import figure, show, output_file
from bokeh.io import output_notebook

N = 4000
x = np.random.random(size=N) * 100
y = np.random.random(size=N) * 100
radii = np.random.random(size=N) * 1.5
colors = ["#%02x%02x%02x" % (int(r), int(g), 150) for r, g in zip(50+2*x, 30+2*y)]
In [2]:
output_notebook()
Loading BokehJS ...
In [3]:
p = figure()

p.scatter(x, y, radius=radii,
          fill_color=colors, fill_alpha=0.6,
          line_color=None)

# output_file("color_scatter.html", title="color_scatter.py example")

show(p)  # open a browser
In [ ]:
 
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from time import time

%matplotlib inline
plt.rcParams['figure.figsize'] = (1.5, 1.5) # set default size of plots
# plt.rcParams['image.interpolation'] = 'nearest'
# plt.rcParams['image.cmap'] = 'gray'
In [2]:
from sklearn.decomposition import PCA, RandomizedPCA, randomized_svd
from sklearn.cluster import KMeans
from sklearn.manifold import Isomap
from sklearn.model_selection import train_test_split, KFold
In [3]:
train=pd.read_csv('train.csv')
test=pd.read_csv('test.csv')
train.shape,test.shape
Out[3]:
((42000, 785), (28000, 784))
In [4]:
train.head(1)
Out[4]:
label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

1 rows × 785 columns

In [5]:
test.head(1)
Out[5]:
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

1 rows × 784 columns

In [9]:
label=train.pop('label')
In [7]:
def fuckpca(train,test,n):
    start=time()
    pca=PCA(n_components=n,whiten=True)
    train=pca.fit_transform(train)
    test=pca.transform(test)
    print 'used {:.2f}s'.format(time()-start)
    return train,test
In [8]:
train_pca,test_pca=fuckpca(train,test,36)
used 4.97s
In [9]:
train_pca.shape,test_pca.shape
Out[9]:
((42000L, 36L), (28000L, 36L))
In [10]:
plt.imshow(train_pca[3].reshape(6,-1))
Out[10]:
<matplotlib.image.AxesImage at 0x1b99b358>
In [11]:
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
In [12]:
model=GradientBoostingClassifier(verbose=1,n_estimators=300)
model
Out[12]:
GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=3,
              max_features=None, max_leaf_nodes=None,
              min_impurity_split=1e-07, min_samples_leaf=1,
              min_samples_split=2, min_weight_fraction_leaf=0.0,
              n_estimators=300, presort='auto', random_state=None,
              subsample=1.0, verbose=1, warm_start=False)
In [99]:
start=time()
model.fit(train_pca,label)
print 'used {:.2f}s'.format(time()-start)
      Iter       Train Loss   Remaining Time 
         1       79940.9682            6.62m
         2       70830.7802            6.62m
         3       64173.7553            6.61m
         4       59037.7864            6.58m
         5       54599.3908            6.58m
         6       50886.2372            6.57m
         7       47761.6971            6.55m
         8       45040.9071            6.54m
         9       42543.2014            6.55m
        10       40319.8099            6.56m
        20       27069.1592            6.42m
        30       20786.5203            6.28m
        40       17051.2103            6.03m
        50       14576.1295            5.82m
        60       12778.5881            5.59m
        70       11447.6600            5.35m
        80       10375.5170            5.12m
        90        9518.4201            4.88m
       100        8816.3706            4.64m
       200        5027.3031            2.26m
       300        3355.2391            0.00s
used 405.63s
In [100]:
result=model.predict(test_pca)
In [ ]:
 
In [13]:
def save():
    import numpy as np
    submit=pd.DataFrame({'ImageId':np.arange(1,len(result)+1),'Label':result})
    submit.to_csv('gbc.csv',index=False)
In [14]:
# sub=pd.concat([pd.Series(np.arange(1,len(result)+1)),pd.Series(result)],axis=1)
# sub.columns=['ImageId','Label']
In [108]:
model.score(after,label)
Out[108]:
0.9878095238095238
In [115]:
tr=model.predict(after)
In [128]:
(tr==label).sum()/float(label.shape[0])
Out[128]:
0.9878095238095238
In [129]:
from sklearn.metrics import confusion_matrix
In [ ]:
 
In [4]:
from sklearn.svm import SVC
In [5]:
svc=SVC(verbose=1)
svc
Out[5]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=1)
In [7]:
train_pca=train
test_pca=test
In [10]:
start=time()
svc.fit(train_pca,label)
print 'used {:.2f}s'.format(time()-start)
[LibSVM]used 4429.63s
In [19]:
result=svc.predict(test_pca)
In [21]:
save()
In [17]:
from sklearn import svm,datasets
from sklearn.model_selection import GridSearchCV
In [13]:
iris = datasets.load_iris()
iris.data[0:4]
Out[13]:
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2]])
In [14]:
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 5, 10]}
In [15]:
model = svm.SVC()
In [18]:
classifier =GridSearchCV(model, parameters)
In [19]:
classifier.fit(iris.data, iris.target)
Out[19]:
GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 5, 10]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)
In [20]:
classifier.best_params_
Out[20]:
{'C': 1, 'kernel': 'linear'}
In [37]:
# classifier.cv_results_
In [38]:
classifier.best_estimator_
Out[38]:
SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
In [24]:
import scipy
In [33]:
print scipy.stats.expon(scale=100)
<scipy.stats._distn_infrastructure.rv_frozen object at 0x0000000008E57E10>
In [ ]:
 
In [39]:
parameter_dist = {
  'C': scipy.stats.expon(scale=100),
  'kernel': ['linear'],
  'gamma': scipy.stats.expon(scale=.1),
}
In [119]:
classifier = grid_search.RandomizedSearchCV(model, parameter_dist)
classifier.fit(iris.data, iris.target)
Out[119]:
RandomizedSearchCV(cv=None, error_score='raise',
          estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
          fit_params={}, iid=True, n_iter=10, n_jobs=1,
          param_distributions={'kernel': ['linear'], 'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x0000000008E79FD0>, 'gamma': <scipy.stats._distn_infrastructure.rv_frozen object at 0x0000000008E70080>},
          pre_dispatch='2*n_jobs', random_state=None, refit=True,
          scoring=None, verbose=0)
In [120]:
classifier.best_params_, classifier.best_score_
Out[120]:
({'C': 1.3991944739478859, 'gamma': 0.0022802232812657304, 'kernel': 'linear'},
 0.9933333333333333)
In [63]:
wtf=scipy.stats.expon(scale=10)
In [66]:
print wtf.rvs(5)
[ 10.29516445   1.53962143   2.67578885   0.21101641   1.31133069]
In [ ]:
wtf.rvs
In [ ]:
 

Python odds and ends scope, filter, reduce

description code comments 
quickly assign values a,b,c = (3,7,12)  unpack
nested functions outer func return inner func

Python's built-in scope

  • check out Python's built-in scope, which is really just a built-in module called builtins
  • to query builtins, you'll need to import builtins
In [16]:
import builtins
print dir(builtins)
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BufferError', 'BytesWarning', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError', 'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'WindowsError', 'ZeroDivisionError', '__builtins__', '__doc__', '__file__', '__future_module__', '__name__', '__package__', '__path__', 'abs', 'absolute_import', 'all', 'any', 'apply', 'ascii', 'basestring', 'bin', 'bool', 'buffer', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'cmp', 'coerce', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'dreload', 'enumerate', 'eval', 'execfile', 'file', 'filter', 'float', 'format', 'frozenset', 'get_ipython', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'intern', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'long', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'range', 'raw_input', 'reduce', 'reload', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'sys', 'tuple', 'type', 'unichr', 'unicode', 'vars', 'xrange', 'zip']

nested functions

  • return innter function
In [22]:
def raise_val(n):
    
    """Return the inner function."""
    def inner(x):
        """Raise x to the power of n."""
        raised = x ** n
        return raised
    
    return inner
In [25]:
square = raise_val(2)
cube =  raise_val(3)

print square(6), cube(6)
36 216

pass parameters

In [30]:
raise_val(4)(3)
Out[30]:
81

scope searched

  • Local scope
  • Enclosing functions
  • Global
  • Built-in
In [44]:
n=3

def outer():
    """Prints the value of n."""
    n = 1
    def inner():
        n = 2
        print(n)
        
    inner()
    print(n)
In [45]:
outer()
2
1

nested func

  • nesting functions is the idea of a closure
  • This means that the nested or inner function remembers the state of its enclosing scope when called
  • Thus, anything defined locally in the enclosing scope is available to the inner function even when the outer function has finished execution
In [46]:
# Define echo
def echo(n):
    """Return the inner_echo function."""

    # Define inner_echo
    def inner_echo(word1):
        """Concatenate n copies of word1."""
        echo_word = word1 * n
        return echo_word

    # Return inner_echo
    return inner_echo

# Call echo: twice
twice = echo(2)

# Call echo: thrice
thrice = echo(3)

# Call twice() and thrice() then print
print(twice('hello'), thrice('hello'))
('hellohello', 'hellohellohello')
In [48]:
echo(7)('wtf ')
Out[48]:
'wtf wtf wtf wtf wtf wtf wtf '

flexible arguments

Function with variable-length arguments (*args)

In [51]:
# Define gibberish
def gibberish(*notmatter):
    """Concatenate strings in *args together."""

    # Initialize an empty string: hodgepodge
    hodgepodge = ''

    # Concatenate the strings in args
    for word in notmatter:
        hodgepodge += word+ ' '

    # Return hodgepodge
    return hodgepodge

# Call gibberish() with one string: one_word
one_word = gibberish("luke")

# Call gibberish() with five strings: many_words
many_words = gibberish("luke", "leia", "han", "obi", "darth")

# Print one_word and many_words
print(one_word)
print(many_words)
luke 
luke leia han obi darth 

Function with variable-length keyword arguments (**kwargs)

In [55]:
# Define report_status
def report_status(**whatevername):
    """Print out the status of a movie character."""

    print("\nBEGIN: REPORT\n")

    print whatevername
    print 
    # Print a formatted status report
    for key, value in whatevername.items():
        print(key + ": " + value)

    print("\nEND REPORT")

# First call to report_status()
report_status(name="luke", affiliation="jedi", status="missing")

# Second call to report_status()
report_status(name="anakin", affiliation="sith lord", status="deceased")
BEGIN: REPORT

{'status': 'missing', 'affiliation': 'jedi', 'name': 'luke'}

status: missing
affiliation: jedi
name: luke

END REPORT

BEGIN: REPORT

{'status': 'deceased', 'affiliation': 'sith lord', 'name': 'anakin'}

status: deceased
affiliation: sith lord
name: anakin

END REPORT

Map() and lambda functions

In [56]:
# Create a list of strings: spells
spells = ['protego', 'accio', 'expecto patronum', 'legilimens']

# Use map() to apply a lambda function over spells: shout_spells
shout_spells = map(lambda item: item + '!!!', spells)

# Convert shout_spells to a list: shout_spells_list
shout_spells_list = list(shout_spells)

# Convert shout_spells into a list and print it
print(shout_spells_list)
['protego!!!', 'accio!!!', 'expecto patronum!!!', 'legilimens!!!']

Filter() and lambda functions

The function filter() offers a way to filter out elements from a list that doesn't satisfy certain criteria.

  • filter(function or None, sequence) -> list, tuple, or string

  • Return those items of sequence for which function(item) is true. If function is None, return the items that are true. If sequence is a tuple or string, return the same type, else return a list.

In [59]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Use filter() to apply a lambda function over fellowship: result
result = filter(lambda member: len(member) > 6, fellowship)


# Convert result to a list: result_list
result_list = list(result)

# Convert result into a list and print it
print(result_list)
['samwise', 'aragorn', 'legolas', 'boromir']
In [69]:
filter(lambda member: len(member) >3, ['1234','234','34567'])
Out[69]:
['1234', '34567']
In [83]:
filter(None, [12>1, 'wtf' if 2>1 else 0, 'aiya' if 3>2 else 7, 'momomo' if 4>5 else -44])
Out[83]:
[True, 'wtf', 'aiya', -44]

Reduce() and lambda functions

The reduce() function is useful for performing some computation on a list and, unlike map() and filter(), returns a single value as a result.

To use reduce(), you must import it from the functools module.

  • reduce(function, sequence[, initial]) -> value
  • Apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value.
  • For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5).
  • If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.
In [86]:
# Import reduce from functools
from functools import reduce

# Create a list of strings: stark
stark = ['robb', 'sansa', 'arya', 'eddard', 'jon']

# Use result() to apply a lambda function over stark: result
result = reduce(lambda item1, item2: item1 +' '+ item2, stark)

# Print the result
print(result)
robb sansa arya eddard jon

error handling

  • raise error
In [88]:
try: '3'+3
except Exception, e: print e
cannot concatenate 'str' and 'int' objects
In [91]:
# Define shout_echo
def shout_echo(word1, echo=1):
    """Concatenate echo copies of word1 and three
    exclamation marks at the end of the string."""

    # Raise an error with raise
    if echo < 0:
        raise ValueError('echo must be greater than 0')

    # Concatenate echo copies of word1 using *: echo_word
    echo_word = word1 * echo

    # Concatenate '!!!' to echo_word: shout_word
    shout_word = echo_word + '!!!'

    # Return shout_word
    return shout_word

# Call shout_echo
try:
    shout_echo("particle", echo=-3)
except Exception, e: 
    print e
echo must be greater than 0
In [96]:
shout_echo("123", echo=-1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-96-44c64dc8c830> in <module>()
----> 1 shout_echo("123", echo=-1)

<ipython-input-91-d6d66ed4753d> in shout_echo(word1, echo)
      6     # Raise an error with raise
      7     if echo < 0:
----> 8         raise ValueError('echo must be greater than 0')
      9 
     10     # Concatenate echo copies of word1 using *: echo_word

ValueError: echo must be greater than 0

use filter(lambda: x ...) in Pandas

In [98]:
# Select retweets from the Twitter dataframe: result

result = filter(lambda x: x[0:2] == 'RT', df['text'])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-98-ac4d1cb6d465> in <module>()
      1 # Select retweets from the Twitter dataframe: result
      2 
----> 3 result = filter(lambda x: x[0:2] == 'RT', df['text'])

NameError: name 'df' is not defined
In [ ]: