4.1. DataFrame Create

  • pd.DataFrame(list[dict])

  • pd.DataFrame(dict[str,list])

4.1.1. SetUp

>>> import pandas as pd
>>> import numpy as np

4.1.2. Create from List of Dicts

>>> pd.DataFrame([
...     {'A': 1.0, 'B': 2.0},
...     {'A': 3.0, 'B': 4.0},
... ])
     A    B
0  1.0  2.0
1  3.0  4.0
>>> pd.DataFrame([
...     {'A': 1.0, 'B': 2.0},
...     {'B': 3.0, 'C': 4.0},
... ])
     A    B    C
0  1.0  2.0  NaN
1  NaN  3.0  4.0
>>> pd.DataFrame([
...     {'firstname': 'Mark', 'lastname': 'Watney'},
...     {'firstname': 'Melissa', 'lastname': 'Lewis'},
...     {'firstname': 'Rick', 'lastname': 'Martinez'},
...     {'firstname': 'Alex', 'lastname': 'Vogel'},
... ])
  firstname  lastname
0      Mark    Watney
1   Melissa     Lewis
2      Rick  Martinez
3      Alex     Vogel

4.1.3. Create from Dict

>>> pd.DataFrame({
...     'A': ['a', 'b', 'c'],
...     'B': [1.0, 2.0, 3.0],
...     'C': [1, 2, 3],
... })
   A    B  C
0  a  1.0  1
1  b  2.0  2
2  c  3.0  3
>>> pd.DataFrame({
...     'firstname': ['Mark', 'Melissa', 'Rick', 'Alex'],
...     'lastname': ['Watney', 'Lewis', 'Martinez', 'Vogel'],
... })
  firstname  lastname
0      Mark    Watney
1   Melissa     Lewis
2      Rick  Martinez
3      Alex     Vogel

4.1.4. Create from NDArray

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>>
>>>
>>> df = pd.DataFrame(np.random.randn(7, 4))
>>>
>>> df
          0         1         2         3
0  1.764052  0.400157  0.978738  2.240893
1  1.867558 -0.977278  0.950088 -0.151357
2 -0.103219  0.410599  0.144044  1.454274
3  0.761038  0.121675  0.443863  0.333674
4  1.494079 -0.205158  0.313068 -0.854096
5 -2.552990  0.653619  0.864436 -0.742165
6  2.269755 -1.454366  0.045759 -0.187184

4.1.5. Use Case - 0x01

>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> pd.DataFrame({
...     'A': 1.,
...     'B': pd.Timestamp('1961-04-12'),
...     'C': pd.Series(1, index=list(range(4)), dtype='float32'),
...     'D': np.array([3] * 4, dtype='int32'),
...     'E': pd.Categorical(["test", "train", "test", "train"]),
...     'F': 'foo',
...     'G': [1,2,3,4],
... })
     A          B    C  D      E    F  G
0  1.0 1961-04-12  1.0  3   test  foo  1
1  1.0 1961-04-12  1.0  3  train  foo  2
2  1.0 1961-04-12  1.0  3   test  foo  3
3  1.0 1961-04-12  1.0  3  train  foo  4

4.1.6. Use Case - 0x02

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>>
>>>
>>> df = pd.DataFrame(
...     columns = ['Morning', 'Noon', 'Evening', 'Midnight'],
...     index = pd.date_range('1999-12-30', periods=7),
...     data = np.random.randn(7, 4))
...
>>> df
             Morning      Noon   Evening  Midnight
1999-12-30  1.764052  0.400157  0.978738  2.240893
1999-12-31  1.867558 -0.977278  0.950088 -0.151357
2000-01-01 -0.103219  0.410599  0.144044  1.454274
2000-01-02  0.761038  0.121675  0.443863  0.333674
2000-01-03  1.494079 -0.205158  0.313068 -0.854096
2000-01-04 -2.552990  0.653619  0.864436 -0.742165
2000-01-05  2.269755 -1.454366  0.045759 -0.187184

4.1.7. Assignments

Code 4.60. Solution
"""
* Assignment: DataFrame Create
* Complexity: easy
* Lines of code: 10 lines
* Time: 5 min

English:
    1. Create `result: pd.DataFrame` for input data
    2. Run doctests - all must succeed

Polish:
    1. Stwórz `result: pd.DataFrame` dla danych wejściowych
    2. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * Use selection with `alt` key in your IDE

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> pd.set_option('display.width', 500)
    >>> pd.set_option('display.max_columns', 10)
    >>> pd.set_option('display.max_rows', 10)

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is pd.DataFrame, \
    'Variable `result` must be a `pd.DataFrame` type'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
         Crew Role        Astronaut
    0   Prime  CDR   Neil Armstrong
    1   Prime  LMP      Buzz Aldrin
    2   Prime  CMP  Michael Collins
    3  Backup  CDR     James Lovell
    4  Backup  LMP   William Anders
    5  Backup  CMP       Fred Haise
"""

import pandas as pd

"""
"Prime", "CDR", "Neil Armstrong"
"Prime", "LMP", "Buzz Aldrin"
"Prime", "CMP", "Michael Collins"
"Backup", "CDR", "James Lovell"
"Backup", "LMP", "William Anders"
"Backup", "CMP", "Fred Haise"
"""


# type: pd.DataFrame
result = ...