Cleaning Your Dataset
In [1]:
Copied!
from active_vision import ActiveLearner
al = ActiveLearner(name="cycle-1")
from active_vision import ActiveLearner
al = ActiveLearner(name="cycle-1")
In [2]:
Copied!
al.load_model(model="timm/convnext_tiny.fb_in22k", pretrained=True)
al.load_model(model="timm/convnext_tiny.fb_in22k", pretrained=True)
2025-02-08 00:44:04.661 | INFO | active_vision.core:_detect_optimal_device:87 - CUDA GPU detected - will load model on GPU 2025-02-08 00:44:04.662 | INFO | active_vision.core:load_model:73 - Loading a pretrained timm model `timm/convnext_tiny.fb_in22k` on `cuda`
In [3]:
Copied!
import pandas as pd
train_set = pd.read_parquet("data/training_samples.parquet")
evaluation_set = pd.read_parquet("data/evaluation_samples.parquet")
import pandas as pd
train_set = pd.read_parquet("data/training_samples.parquet")
evaluation_set = pd.read_parquet("data/evaluation_samples.parquet")
In [4]:
Copied!
from fastai.vision.all import aug_transforms
al.load_dataset(
train_set,
filepath_col="filepath",
label_col="label",
image_size=320,
batch_tfms=aug_transforms(size=224),
)
from fastai.vision.all import aug_transforms
al.load_dataset(
train_set,
filepath_col="filepath",
label_col="label",
image_size=320,
batch_tfms=aug_transforms(size=224),
)
2025-02-08 00:44:04.688 | INFO | active_vision.core:load_dataset:125 - Loading dataset from `filepath` and `label` columns 2025-02-08 00:44:05.136 | INFO | active_vision.core:load_dataset:159 - Creating new learner 2025-02-08 00:44:06.146 | INFO | active_vision.core:_optimize_learner:100 - Enabled mixed precision training 2025-02-08 00:44:06.146 | INFO | active_vision.core:_finalize_setup:109 - Training set size: 3871 2025-02-08 00:44:06.147 | INFO | active_vision.core:_finalize_setup:110 - Validation set size: 967 2025-02-08 00:44:06.147 | INFO | active_vision.core:_finalize_setup:111 - Done. Ready to train.
In [5]:
Copied!
al.show_batch()
al.show_batch()
In [6]:
Copied!
al.lr_find()
al.lr_find()
2025-02-08 00:44:06.695 | INFO | active_vision.core:lr_find:200 - Finding optimal learning rate
2025-02-08 00:44:11.021 | INFO | active_vision.core:lr_find:202 - Optimal learning rate: 0.0020892962347716093
In [7]:
Copied!
al.train(epochs=3, lr=5e-3)
al.train(epochs=3, lr=5e-3)
2025-02-08 00:44:11.324 | INFO | active_vision.core:train:213 - Training head for 1 epochs 2025-02-08 00:44:11.325 | INFO | active_vision.core:train:214 - Training model end-to-end for 3 epochs 2025-02-08 00:44:11.326 | INFO | active_vision.core:train:215 - Learning rate: 0.005 with one-cycle learning rate scheduler
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.622257 | 0.316801 | 0.920372 | 00:10 |
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.347197 | 0.274804 | 0.930714 | 00:15 |
1 | 0.257916 | 0.192745 | 0.949328 | 00:16 |
2 | 0.124891 | 0.180057 | 0.955533 | 00:15 |
Top Loss¶
Run the inference on the dataset to get the top loss samples.
In [8]:
Copied!
loss_df = al.evaluate(al.dataset, filepath_col="filepath", label_col="label", batch_size=128)
loss_df = al.evaluate(al.dataset, filepath_col="filepath", label_col="label", batch_size=128)
2025-02-08 00:45:15.026 | INFO | active_vision.core:evaluate:318 - Accuracy: 98.70%
In [9]:
Copied!
loss_df = loss_df.sort_values(by="loss", ascending=False)
loss_df.head(50)
loss_df = loss_df.sort_values(by="loss", ascending=False)
loss_df.head(50)
Out[9]:
filepath | label | pred_label | pred_conf | probs | loss | |
---|---|---|---|---|---|---|
2449 | data/ice cream/2449.jpg | ice cream | muffin | 0.125068 | [7.219683993753279e-06, 1.86670949915424e-05, ... | 8.416824 |
3139 | data/orange/3139.jpg | orange | juice | 0.125118 | [1.96086375581217e-07, 1.578678734404093e-07, ... | 7.961312 |
1410 | data/cookie/1410.jpg | cookie | muffin | 0.118468 | [1.5076525414770003e-06, 5.773781595053151e-06... | 7.762379 |
4129 | data/strawberry/4129.jpg | strawberry | orange | 0.101275 | [1.3527422879633377e-06, 0.0021084246691316366... | 6.982127 |
3279 | data/pineapple/3279.jpg | pineapple | juice | 0.124922 | [1.1066877192433822e-07, 1.8016018543676182e-0... | 6.181763 |
718 | data/cake/718.jpg | cake | muffin | 0.124878 | [1.6235714994650152e-08, 1.506924292016265e-07... | 6.010271 |
804 | data/candy/804.jpg | candy | juice | 0.124827 | [4.0165673453884665e-08, 2.0321322153904475e-0... | 5.948222 |
3323 | data/pineapple/3323.jpg | pineapple | ice cream | 0.124297 | [0.0014718767488375306, 0.00030552386306226254... | 5.786842 |
2940 | data/muffin/2940.jpg | muffin | cake | 0.081927 | [0.0008330147247761488, 0.005612582433968782, ... | 5.598393 |
1398 | data/cookie/1398.jpg | cookie | salad | 0.120781 | [6.813405343564227e-05, 0.0008821748779155314,... | 5.410439 |
2416 | data/ice cream/2416.jpg | ice cream | muffin | 0.121203 | [1.2168256034783553e-05, 5.793822856503539e-05... | 5.328399 |
146 | data/apple/146.jpg | apple | orange | 0.124323 | [0.005311989225447178, 4.849363904213533e-05, ... | 5.237789 |
3470 | data/pineapple/3470.jpg | pineapple | grape | 0.123451 | [7.048515726637561e-06, 1.1374029782018624e-05... | 5.042381 |
3744 | data/pretzel/3744.jpg | pretzel | cookie | 0.109596 | [0.005268890876322985, 0.0027670366689562798, ... | 4.744959 |
2931 | data/muffin/2931.jpg | muffin | ice cream | 0.124095 | [9.993943450581355e-08, 4.951255050400505e-06,... | 4.743701 |
510 | data/cake/510.jpg | cake | muffin | 0.123084 | [6.624666752941266e-07, 2.0249230146873742e-05... | 4.373780 |
2941 | data/muffin/2941.jpg | muffin | cake | 0.094058 | [9.178587788483128e-06, 8.656232239445671e-06,... | 4.122075 |
503 | data/cake/503.jpg | cake | ice cream | 0.077482 | [0.001400662586092949, 0.009885282255709171, 0... | 4.075693 |
3147 | data/orange/3147.jpg | orange | cookie | 0.092384 | [0.000380437180865556, 0.1505107432603836, 0.0... | 3.896651 |
4126 | data/strawberry/4126.jpg | strawberry | cake | 0.087539 | [4.575606726575643e-05, 9.810316987568513e-05,... | 3.746092 |
1475 | data/cookie/1475.jpg | cookie | hot dog | 0.069148 | [0.011740331538021564, 0.22006836533546448, 0.... | 3.709618 |
3274 | data/pineapple/3274.jpg | pineapple | banana | 0.104612 | [0.0006821539718657732, 0.8080698847770691, 0.... | 3.560763 |
1443 | data/cookie/1443.jpg | cookie | pretzel | 0.069663 | [4.017434548586607e-05, 0.00025615113554522395... | 3.401579 |
3863 | data/salad/3863.jpg | salad | hot dog | 0.121117 | [6.072193173167761e-06, 5.311800123308785e-06,... | 3.372142 |
1125 | data/carrot/1125.jpg | carrot | salad | 0.092965 | [0.013872077688574791, 0.00562127074226737, 0.... | 3.241745 |
3870 | data/salad/3870.jpg | salad | ice cream | 0.084644 | [0.0006992585840635002, 0.003124691778793931, ... | 3.013049 |
4049 | data/salad/4049.jpg | salad | strawberry | 0.119296 | [7.46435034670867e-07, 1.9026534573640674e-05,... | 2.959520 |
483 | data/banana/483.jpg | banana | salad | 0.073871 | [0.050402410328388214, 0.05611823499202728, 0.... | 2.880294 |
4075 | data/salad/4075.jpg | salad | juice | 0.100825 | [0.0007437673048116267, 7.71394552430138e-05, ... | 2.632005 |
2432 | data/ice cream/2432.jpg | ice cream | juice | 0.074639 | [0.0019310544012114406, 0.047707922756671906, ... | 2.609064 |
1460 | data/cookie/1460.jpg | cookie | popcorn | 0.101674 | [0.0005879048840142787, 0.002070356858894229, ... | 2.548320 |
1545 | data/doughnut/1545.jpg | doughnut | strawberry | 0.095723 | [0.00016763946041464806, 0.0005572352092713118... | 2.433929 |
3260 | data/pineapple/3260.jpg | pineapple | strawberry | 0.106199 | [0.004481835290789604, 0.018164092674851418, 0... | 2.381356 |
1915 | data/grape/1915.jpg | grape | watermelon | 0.107361 | [4.231215643812902e-05, 0.00010472358553670347... | 2.207155 |
4042 | data/salad/4042.jpg | salad | apple | 0.111082 | [0.8718231916427612, 1.546978637634311e-05, 2.... | 2.109825 |
4406 | data/waffle/4406.jpg | waffle | cookie | 0.082152 | [0.006578177213668823, 0.015786968171596527, 0... | 1.994762 |
2870 | data/muffin/2870.jpg | muffin | cake | 0.106637 | [1.299502088158988e-07, 3.769630711758509e-05,... | 1.899083 |
1839 | data/grape/1839.jpg | grape | cake | 0.075246 | [0.0005387595156207681, 8.553598308935761e-05,... | 1.606808 |
3252 | data/pineapple/3252.jpg | pineapple | juice | 0.103028 | [3.6325209293863736e-06, 3.335514702484943e-05... | 1.580271 |
3229 | data/orange/3229.jpg | orange | apple | 0.100117 | [0.7622272372245789, 0.0002534612431190908, 6.... | 1.548854 |
4033 | data/salad/4033.jpg | salad | hot dog | 0.082868 | [0.00040137593168765306, 0.0013634845381602645... | 1.534968 |
2962 | data/muffin/2962.jpg | muffin | cake | 0.100465 | [8.514558430761099e-05, 0.0001889959821710363,... | 1.524468 |
1490 | data/cookie/1490.jpg | cookie | ice cream | 0.080233 | [0.00023257000430021435, 0.003296308685094118,... | 1.514757 |
2282 | data/ice cream/2282.jpg | ice cream | salad | 0.087893 | [0.0004607433220371604, 0.00013206964649725705... | 1.417533 |
3405 | data/pineapple/3405.jpg | pineapple | orange | 0.083389 | [0.0008283390779979527, 0.0004509652790147811,... | 1.375507 |
774 | data/candy/774.jpg | candy | candy | 0.061292 | [0.006661119405180216, 0.0013752710074186325, ... | 1.356683 |
2742 | data/juice/2742.jpg | juice | apple | 0.096247 | [0.7215490341186523, 2.8318027034401894e-06, 1... | 1.279480 |
3150 | data/orange/3150.jpg | orange | apple | 0.094106 | [0.6982393860816956, 0.0002591143420431763, 4.... | 1.210756 |
3374 | data/pineapple/3374.jpg | pineapple | popcorn | 0.081653 | [0.0025348954368382692, 0.01960543356835842, 0... | 1.186831 |
1390 | data/cookie/1390.jpg | cookie | cake | 0.093362 | [1.0888876431636163e-06, 1.9755755147343734e-0... | 1.183479 |
In [10]:
Copied!
from active_vision.utils import show_interactive_table
show_interactive_table(loss_df.head(50))
from active_vision.utils import show_interactive_table
show_interactive_table(loss_df.head(50))
image | filepath | label | pred_label | pred_conf | probs | loss | |
---|---|---|---|---|---|---|---|
Loading ITables v2.2.4 from the internet... (need help?) |
Label the top loss samples¶
In [ ]:
Copied!
loss_df["strategy"] = "Loss"
loss_df["score"] = loss_df["loss"]
al.label(loss_df.head(50), output_filename="top_loss_samples.parquet")
loss_df["strategy"] = "Loss"
loss_df["score"] = loss_df["loss"]
al.label(loss_df.head(50), output_filename="top_loss_samples.parquet")
In [ ]:
Copied!