AutoTrain 文档

表格参数

您正在查看 版本,它需要从源代码安装. 如果您想要使用常规的 pip 安装,请查看最新稳定版本 (v0.8.8).
Hugging Face's logo
加入 Hugging Face 社区

并获取增强文档体验

开始使用

表格参数

--batch-size BATCH_SIZE
                    Training batch size to use
--seed SEED           Random seed for reproducibility
--target-columns TARGET_COLUMNS
                    Specify the names of the target or label columns separated by commas if multiple. These columns are what the model will
                    predict. Required for defining the output of the model.
--categorical-columns CATEGORICAL_COLUMNS
                    List the names of columns that contain categorical data, useful for models that need explicit handling of such data.
                    Categorical data is typically processed differently from numerical data, such as through encoding. If not specified, the
                    model will infer the data type.
--numerical-columns NUMERICAL_COLUMNS
                    Identify columns that contain numerical data. Proper specification helps in applying appropriate scaling and normalization
                    techniques, which can significantly impact model performance. If not specified, the model will infer the data type.
--id-column ID_COLUMN
                    Specify the column name that uniquely identifies each row in the dataset. This is critical for tracking samples through the
                    model pipeline and is often excluded from model training. Required field.
--task {classification,regression}
                    Define the type of machine learning task, such as 'classification', 'regression'. This parameter determines the model's
                    architecture and the loss function to use. Required to properly configure the model.
--num-trials NUM_TRIALS
                    Set the number of trials for hyperparameter tuning or model experimentation. More trials can lead to better model
                    configurations but require more computational resources. Default is 100 trials.
--time-limit TIME_LIMIT
                    mpose a time limit (in seconds) for training or searching for the best model configuration. This helps manage resource
                    allocation and ensures the process does not exceed available computational budgets. The default is 3600 seconds (1 hour).
--categorical-imputer {most_frequent,None}
                    Select the method or strategy to impute missing values in categorical columns. Options might include 'most_frequent',
                    'None'. Correct imputation can prevent biases and improve model accuracy.
--numerical-imputer {mean,median,None}
                    Choose the imputation strategy for missing values in numerical columns. Common strategies include 'mean', & 'median'.
                    Accurate imputation is vital for maintaining the integrity of numerical data.
--numeric-scaler {standard,minmax,normal,robust}
                    Determine the type of scaling to apply to numerical data. Examples include 'standard' (zero mean and unit variance), 'min-
                    max' (scaled between given range), etc. Scaling is essential for many algorithms to perform optimally
< > 在 GitHub 上更新

© . This site is unofficial and not affiliated with Hugging Face, Inc.