Страницы

Поиск по вопросам

понедельник, 30 декабря 2019 г.

TypeError: 'int' object is not subscriptable, Pandas DateIndex

#python #pandas #dataframe #datetime


Написал функцию для добавления столбцов в DataFrame на основе того, что в индексе
df будет дата, причем уникальная. Получаю такую ошибку:


  TypeError: 'int' object is not subscriptable 


в строке M = date.month[0]. Понятия не имею почему.

def AddWeekends_DateIndex(df):
Weekday_Name = {1: 'Monday', 2: 'Tuesday', 3: 'Wednesday', 4: 'Thursday', 5: 'Friday',
6: 'Saturday', 7: 'Sunday'}
Month = []
Weekday = []
days_in_month = [31, 59, 90, 120, 151, 181] #количество дней от января
for date in df.index:
    M = date.month[0]
    Month.append(M)

    if M == 1:
        A = date.day[0] % 7 + 1
    else:
        A = (days_in_month[M - 2] + date.day[0]) % 7 + 1

    Weekday.append(Weekday_Name[A])

df['Weekday'] = Weekday
df['Month'] = Month

Weekend = []
for date in df.index:
    if (df.loc[date, 'Weekday'] == 'Saturday' or df.loc[date, 'Weekday'] == 'Sunday'):
        Weekend.append(1)
    else:
        Weekend.append(0)
df['Weekend'] = Weekend

January = [1, 2, 3, 4, 5, 6, 7, 8]
March = [8]
May = [1, 2, 3, 9, 10]
June = [12]

for date in df[df['Month'] == 1].index:
    if date.day[0] in January:
        df.loc[date, 'Weekend'] = 1

for date in df[df['Month'] == 3].index:
    if date.day[0] in March:
        df.loc[date, 'Weekend'] = 1

for i in df[df['Month'] == 5].index:
    if date.day[0] in May:
        df.loc[date, 'Weekend'] = 1

for i in df[df['Month'] == 6].index:
    if date.day[0] in June:
        df.loc[date, 'Weekend'] = 1

b = {1: 'January', 2: 'February', 3: 'March', 4: 'April', 5: 'May', 6: 'June'}
df['Month'] = df['Month'].map(b)
return df


Пример входных данных:

+-------------------------+
|  Date          Value    |
+-------------------------+
| 2019-01-01     908.2640 |
| 2019-01-02    1814.3060 |
| 2019-01-03    2354.2990 |
| 2019-01-04    2238.6185 |
| 2019-01-05    2440.3580 |
| 2019-01-06    2966.7020 |
| 2019-01-07    3037.1810 |
| 2019-01-08    3018.9515 |
| 2019-01-09    3258.6010 |
| 2019-01-10    2700.2050 |
+-------------------------+


Пример выходных данных:

+------------+-----------+---------+-----------+---------+
|   Date     |  Value    |  Month  |  Weekday  | Weekend |
+------------+-----------+---------+-----------+---------+
| 2019-01-01 | 908.2640  | January | Tuesday   |       1 |
| 2019-01-02 | 1814.3060 | January | Wednesday |       1 |
| 2019-01-03 | 2354.2990 | January | Thursday  |       1 |
| 2019-01-04 | 2238.6185 | January | Friday    |       1 |
| 2019-01-05 | 2440.3580 | January | Saturday  |       1 |
| 2019-01-06 | 2966.7020 | January | Sunday    |       1 |
| 2019-01-07 | 3037.1810 | January | Monday    |       1 |
| 2019-01-08 | 3018.9515 | January | Tuesday   |       1 |
| 2019-01-09 | 3258.6010 | January | Wednesday |       0 |
| 2019-01-10 | 2700.2050 | January | Thursday  |       0 |
+------------+-----------+---------+-----------+---------+

    


Ответы

Ответ 1



Я бы решал вашу задачу в стиле Pandas: import holidays # pip install holidays def get_holidays(col, holidays_country_class=holidays.RU, extra_holidays=None, dtype=np.int8): if isinstance(col, pd.core.indexes.datetimes.DatetimeIndex): col = col.to_series() min_yr = col.min().year max_yr = col.max().year years = list(range(min_yr, max_yr+1)) hol = pd.to_datetime( [tup[0] for tup in sorted(holidays_country_class(years=years).items())]) if extra_holidays is not None: extra_holidays = pd.to_datetime(extra_holidays) hol = hol.union(extra_holidays).unique() return col.dt.floor("D").isin(hol).astype(dtype) def gen_dt_features(dt_col, extra_holidays=None): if isinstance(dt_col, pd.core.indexes.datetimes.DatetimeIndex): dt_col = dt_col.to_series() return pd.DataFrame({ "Weekend": get_holidays(dt_col, extra_holidays=extra_holidays), "DayOfWeek": dt_col.dt.weekday_name, "Month": dt_col.dt.month_name() }, index=dt_col.index) исходный DF: In [67]: df Out[67]: Value Date 2019-01-01 908.2640 2019-01-02 1814.3060 2019-01-03 2354.2990 2019-01-04 2238.6185 2019-01-05 2440.3580 2019-01-06 2966.7020 2019-01-07 3037.1810 2019-01-08 3018.9515 2019-01-09 3258.6010 2019-01-10 2700.2050 решение: In [68]: df = df.join(gen_dt_features(df.index)) результат: In [69]: df Out[69]: Value Weekend DayOfWeek Month Date 2019-01-01 908.2640 1 Tuesday January 2019-01-02 1814.3060 1 Wednesday January 2019-01-03 2354.2990 1 Thursday January 2019-01-04 2238.6185 1 Friday January 2019-01-05 2440.3580 1 Saturday January 2019-01-06 2966.7020 1 Sunday January 2019-01-07 3037.1810 1 Monday January 2019-01-08 3018.9515 1 Tuesday January 2019-01-09 3258.6010 0 Wednesday January 2019-01-10 2700.2050 0 Thursday January UPDATE: с указанием дополнительных выходных: In [91]: df = df.join(gen_dt_features(df.index, extra_holidays=['2019-01-10'])) In [92]: df Out[92]: Value Weekend DayOfWeek Month Date 2019-01-01 908.2640 1 Tuesday January 2019-01-02 1814.3060 1 Wednesday January 2019-01-03 2354.2990 1 Thursday January 2019-01-04 2238.6185 1 Friday January 2019-01-05 2440.3580 1 Saturday January 2019-01-06 2966.7020 1 Sunday January 2019-01-07 3037.1810 1 Monday January 2019-01-08 3018.9515 1 Tuesday January 2019-01-09 3258.6010 0 Wednesday January 2019-01-10 2700.2050 1 Thursday January # <-- NOTRE: extra holiday

Ответ 2



Вы проходите циклом по значениям индекса: for date in df.index: ... соответственно в переменной date будет находиться скалярный объект типа pandas._libs.tslibs.timestamps.Timestamp. Атрибут .month - вернёт целое число (порядковый номер месяца). Для целых чисел не определен оператор [] о чем вам Python сообщает в ошибке. Воспроизведение ошибки: In [21]: d = pd.to_datetime("2019-01-01") In [22]: d Out[22]: Timestamp('2019-01-01 00:00:00') In [23]: d.month Out[23]: 1 In [24]: d.month[0] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in ----> 1 d.month[0] TypeError: 'int' object is not subscriptable это тоже самое что и: In [25]: 1[0] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in ----> 1 1[0] TypeError: 'int' object is not subscriptable

Комментариев нет:

Отправить комментарий