#python #pandas #dataframe #datetime
Написал функцию для добавления столбцов в DataFrame на основе того, что в индексе df будет дата, причем уникальная. Получаю такую ошибку: TypeError: 'int' object is not subscriptable в строке M = date.month[0]. Понятия не имею почему. def AddWeekends_DateIndex(df): Weekday_Name = {1: 'Monday', 2: 'Tuesday', 3: 'Wednesday', 4: 'Thursday', 5: 'Friday', 6: 'Saturday', 7: 'Sunday'} Month = [] Weekday = [] days_in_month = [31, 59, 90, 120, 151, 181] #количество дней от января for date in df.index: M = date.month[0] Month.append(M) if M == 1: A = date.day[0] % 7 + 1 else: A = (days_in_month[M - 2] + date.day[0]) % 7 + 1 Weekday.append(Weekday_Name[A]) df['Weekday'] = Weekday df['Month'] = Month Weekend = [] for date in df.index: if (df.loc[date, 'Weekday'] == 'Saturday' or df.loc[date, 'Weekday'] == 'Sunday'): Weekend.append(1) else: Weekend.append(0) df['Weekend'] = Weekend January = [1, 2, 3, 4, 5, 6, 7, 8] March = [8] May = [1, 2, 3, 9, 10] June = [12] for date in df[df['Month'] == 1].index: if date.day[0] in January: df.loc[date, 'Weekend'] = 1 for date in df[df['Month'] == 3].index: if date.day[0] in March: df.loc[date, 'Weekend'] = 1 for i in df[df['Month'] == 5].index: if date.day[0] in May: df.loc[date, 'Weekend'] = 1 for i in df[df['Month'] == 6].index: if date.day[0] in June: df.loc[date, 'Weekend'] = 1 b = {1: 'January', 2: 'February', 3: 'March', 4: 'April', 5: 'May', 6: 'June'} df['Month'] = df['Month'].map(b) return df Пример входных данных: +-------------------------+ | Date Value | +-------------------------+ | 2019-01-01 908.2640 | | 2019-01-02 1814.3060 | | 2019-01-03 2354.2990 | | 2019-01-04 2238.6185 | | 2019-01-05 2440.3580 | | 2019-01-06 2966.7020 | | 2019-01-07 3037.1810 | | 2019-01-08 3018.9515 | | 2019-01-09 3258.6010 | | 2019-01-10 2700.2050 | +-------------------------+ Пример выходных данных: +------------+-----------+---------+-----------+---------+ | Date | Value | Month | Weekday | Weekend | +------------+-----------+---------+-----------+---------+ | 2019-01-01 | 908.2640 | January | Tuesday | 1 | | 2019-01-02 | 1814.3060 | January | Wednesday | 1 | | 2019-01-03 | 2354.2990 | January | Thursday | 1 | | 2019-01-04 | 2238.6185 | January | Friday | 1 | | 2019-01-05 | 2440.3580 | January | Saturday | 1 | | 2019-01-06 | 2966.7020 | January | Sunday | 1 | | 2019-01-07 | 3037.1810 | January | Monday | 1 | | 2019-01-08 | 3018.9515 | January | Tuesday | 1 | | 2019-01-09 | 3258.6010 | January | Wednesday | 0 | | 2019-01-10 | 2700.2050 | January | Thursday | 0 | +------------+-----------+---------+-----------+---------+
Ответы
Ответ 1
Я бы решал вашу задачу в стиле Pandas: import holidays # pip install holidays def get_holidays(col, holidays_country_class=holidays.RU, extra_holidays=None, dtype=np.int8): if isinstance(col, pd.core.indexes.datetimes.DatetimeIndex): col = col.to_series() min_yr = col.min().year max_yr = col.max().year years = list(range(min_yr, max_yr+1)) hol = pd.to_datetime( [tup[0] for tup in sorted(holidays_country_class(years=years).items())]) if extra_holidays is not None: extra_holidays = pd.to_datetime(extra_holidays) hol = hol.union(extra_holidays).unique() return col.dt.floor("D").isin(hol).astype(dtype) def gen_dt_features(dt_col, extra_holidays=None): if isinstance(dt_col, pd.core.indexes.datetimes.DatetimeIndex): dt_col = dt_col.to_series() return pd.DataFrame({ "Weekend": get_holidays(dt_col, extra_holidays=extra_holidays), "DayOfWeek": dt_col.dt.weekday_name, "Month": dt_col.dt.month_name() }, index=dt_col.index) исходный DF: In [67]: df Out[67]: Value Date 2019-01-01 908.2640 2019-01-02 1814.3060 2019-01-03 2354.2990 2019-01-04 2238.6185 2019-01-05 2440.3580 2019-01-06 2966.7020 2019-01-07 3037.1810 2019-01-08 3018.9515 2019-01-09 3258.6010 2019-01-10 2700.2050 решение: In [68]: df = df.join(gen_dt_features(df.index)) результат: In [69]: df Out[69]: Value Weekend DayOfWeek Month Date 2019-01-01 908.2640 1 Tuesday January 2019-01-02 1814.3060 1 Wednesday January 2019-01-03 2354.2990 1 Thursday January 2019-01-04 2238.6185 1 Friday January 2019-01-05 2440.3580 1 Saturday January 2019-01-06 2966.7020 1 Sunday January 2019-01-07 3037.1810 1 Monday January 2019-01-08 3018.9515 1 Tuesday January 2019-01-09 3258.6010 0 Wednesday January 2019-01-10 2700.2050 0 Thursday January UPDATE: с указанием дополнительных выходных: In [91]: df = df.join(gen_dt_features(df.index, extra_holidays=['2019-01-10'])) In [92]: df Out[92]: Value Weekend DayOfWeek Month Date 2019-01-01 908.2640 1 Tuesday January 2019-01-02 1814.3060 1 Wednesday January 2019-01-03 2354.2990 1 Thursday January 2019-01-04 2238.6185 1 Friday January 2019-01-05 2440.3580 1 Saturday January 2019-01-06 2966.7020 1 Sunday January 2019-01-07 3037.1810 1 Monday January 2019-01-08 3018.9515 1 Tuesday January 2019-01-09 3258.6010 0 Wednesday January 2019-01-10 2700.2050 1 Thursday January # <-- NOTRE: extra holidayОтвет 2
Вы проходите циклом по значениям индекса: for date in df.index: ... соответственно в переменной date будет находиться скалярный объект типа pandas._libs.tslibs.timestamps.Timestamp. Атрибут .month - вернёт целое число (порядковый номер месяца). Для целых чисел не определен оператор [] о чем вам Python сообщает в ошибке. Воспроизведение ошибки: In [21]: d = pd.to_datetime("2019-01-01") In [22]: d Out[22]: Timestamp('2019-01-01 00:00:00') In [23]: d.month Out[23]: 1 In [24]: d.month[0] --------------------------------------------------------------------------- TypeError Traceback (most recent call last)in ----> 1 d.month[0] TypeError: 'int' object is not subscriptable это тоже самое что и: In [25]: 1[0] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in ----> 1 1[0] TypeError: 'int' object is not subscriptable
Комментариев нет:
Отправить комментарий