Pandas basis

 

Question  answer explain
 how to get how big in memory a DataFrame object is?  df.info()
 what is the best representative of null value in Pandas object?  np.nan import numpy as np
 what is the best way to slice a DataFrame by index?  df.iloc[-5:, 2:] use iloc method
 how to convert a DataFrame (excluding indexes) to a numpy ndarray?  df.values it is a attribute, can’t be called
 what is the most basic way to create a DataFrame?  pd.DataFrame(dict) pass dictionary to; keys are column names
 what is broadcasting?  pd[‘new’]=7 all the values of the new column will be 7
 how to change df’s column names, index names?  pd.columns = [‘a’,’b’,…]

pd.index = [‘c’,’d’,…]

assign value directly
 when read csv, how to specify names of the column  pd.read_csv(path, names=[‘a’,’b’,…..])  instead, pass header=None will prevent pandas using data as column names, but use 0,1,2,3 ….
when read csv, how to let pandas to turn some specific values into NaN? pd.read_csv(path, na_values = ‘-1’)

pdf.read_csv(path, na_values = {‘column3’:[‘ -2’, ‘wtf’,…]})

all the values which is character ‘-1’ will be rendered to NaN
how to parse data in reading csv pd.read_csv(path, parse_dates = [[0,1,2]]) pandas will parse column 1, 2, 3 into one datetype column
does index of df have a name? pd.index.name = ‘xxx’ assign a name to the index of df
how to save df to a csv file with other delimiters  rather than ‘,’ pd.to_csv(path, sep=’\t’) save to a csv file which separates data by tab

how to batch convert string to Date type

df[‘datestring’]=pd.to_datetime(df[‘datestring’])

how to get 2 DataFrame together,  & append one df to another?

 

how .all()  work? default parameter: axis=0    check each of the column of a DataFrame, if all the rows in that column are True, return True for that column
how .any() work? same    if any of the rows in that column is True, return True
  if set axis=1, check all the rows    
some_any
In [14]:
import pandas as pd
df=pd.DataFrame({'col_1':[True,True,True,True],'col_2':[False,False,False,False],
                 'col_3':[True,False,True,False],'col_4':[0,0,0,1],\
                 'col_5':[0,0,0,0],'col_6':[1,1,1,1],'col_7':[0,1,2,3],'col_8':[7,6,5,4]})
df
Out[14]:
col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8
0 True False True 0 0 1 0 7
1 True False False 0 0 1 1 6
2 True False True 0 0 1 2 5
3 True False False 1 0 1 3 4
In [15]:
df.all()
Out[15]:
col_1     True
col_2    False
col_3    False
col_4    False
col_5    False
col_6     True
col_7    False
col_8     True
dtype: bool
In [16]:
df.any()
Out[16]:
col_1     True
col_2    False
col_3     True
col_4     True
col_5    False
col_6     True
col_7     True
col_8     True
dtype: bool
In [17]:
df.all(axis=1)
Out[17]:
0    False
1    False
2    False
3    False
dtype: bool
In [18]:
df.any(axis=1)
Out[18]:
0    True
1    True
2    True
3    True
dtype: bool
In [ ]:
 

 

Leave a Reply

Your email address will not be published.