准备

uv管理python环境

cd /opt/qcom/aistack/tutorials/; 
uv venv py10 -p /usr/bin/python3.10
source py10/bin/activate

uv run python --version
cd /opt/qcom/aistack/aimet/1.34.0.44

uv pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
uv pip install  *.whl
accelerate                0.33.0
aimet                     1.34.0.0.207.0.44+torch.gpu
aimetcommon               1.34.0.0.207.0.44+torch.gpu
aimettorch                1.34.0.0.207.0.44+torch.gpu
numpy                     1.23.5
peft                      0.15.0
tokenizers                0.19.0
torch                     2.1.2+cu121
torchvision               0.16.2+cu121
transformers              4.43.2

1.模型适配与量化

├── onnx
│   ├── llama32_1b.encodings
│   ├── llama32_1b.onnx
│   ├── llama32_1b.pth
│   ├── llama32_1b_torch.encodings
│   ├── lm_head_conv_Conv.weight
│   ├── model_embed_tokens_Gather.weight
│   ├── model_layers_0_mlp_down_proj_conv_Conv.weight
│   ├── model_layers_0_mlp_gate_proj_conv_Conv.weight
│   ├── model_layers_0_mlp_up_proj_conv_Conv.weight
│   ├── model_layers_0_self_attn_k_proj_conv_Conv.weight
│   ├── model_layers_0_self_attn_o_proj_conv_Conv.weight
│   ├── model_layers_0_self_attn_q_proj_conv_Conv.weight
...
├── output
│   └── tokenizer
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       └── tokenizer.json
└── test_vectors
    ├── fp_0.pkl
    ├── layer_output_name_order.json
    └── qt_0.pkl

2.生成 QNN模型

3.在手机上运行

├── to_device
│   ├── genie-t2t-run
│   ├── htp_backend_ext_config.json
│   ├── htp-model-config-llama32-1b-gqa.json
│   ├── libGenie.so
│   ├── libQnnHtpNetRunExtensions.so
│   ├── libQnnHtp.so
│   ├── libQnnHtpV79Skel.so
│   ├── libQnnHtpV79Stub.so
│   ├── libQnnSystem.so
│   ├── lprompt_1024.txt
│   ├── lprompt_4096.txt
│   ├── models
│   │   └── weight_sharing_model_1_of_1.serialized.bin
│   └── tokenizer.json
adb shell
mkdir -p /data/local/tmp/llama3_2_assets
cd /data/local/tmp/llama3_2_assets
export LD_LIBRARY_PATH=$PWD
export ADSP_LIBRARY_PATH=$PWD
./genie-t2t-run -c ./htp-model-config-llama32-1b-gqa.json -p '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nPlan a 5 day trip to London for 4 people.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'

./genie-t2t-run -c ./htp-model-config-llama32-1b-gqa.json --prompt_file lprompt_1024.txt

配置改动

性能

./genie-t2t-run -c ./htp-model-config-llama32-1b-gqa.json --prompt_file lprompt_1024.txt --profile perf.json
{
  "header": {
    "header_version": {
      "major": 0,
      "minor": 1,
      "patch": 0
    },
    "version": {
      "major": 0,
      "minor": 1,
      "patch": 0
    },
    "artifact_type": "GENIE_PROFILE"
  },
  "metadata": {
    "timestamp": 391862341312
  },
  "components": [
    {
      "name": "dialog0",
      "type": "dialog",
      "events": [
        {
          "type": "GenieDialog_create",
          "duration": 1124864,
          "start": 391862341614,
          "stop": 391863466478,
          "init-time": {
            "value": 1124713,
            "unit": "us"
          }
        },
        {
          "type": "GenieDialog_query",
          "duration": 5726945,
          "start": 391863466531,
          "stop": 391869193476,
          "num-prompt-tokens": {
            "value": 823,
            "unit": ""
          },
          "prompt-processing-rate": {
            "value": 1547.9876708984375,
            "unit": "toks/sec"
          },
          "time-to-first-token": {
            "value": 531665,
            "unit": "us"
          },
          "num-generated-tokens": {
            "value": 200,
            "unit": ""
          },
          "token-generation-rate": {
            "value": 38.52525329589844,
            "unit": "toks/sec"
          },
          "token-generation-time": {
            "value": 5191579,
            "unit": "us"
          }
        },
        {
          "type": "GenieDialog_free",
          "duration": 32347,
          "start": 391869193478,
          "stop": 391869225825
        }
      ]
    }
  ]
}

autoDL踩过的坑

备注

pip.txt
example1_tree.log
example2_tree.log

references