新手上路，请多包涵

我能得到的最接近的例子是在这个问题中找到的： https ://github.com/tensorflow/tensorflow/issues/899

使用这个最小的可重现代码：

 import tensorflow as tf
import tensorflow.python.framework.ops as ops
g = tf.Graph()
with g.as_default():
  A = tf.Variable(tf.random_normal( [25,16] ))
  B = tf.Variable(tf.random_normal( [16,9] ))
  C = tf.matmul(A,B) # shape=[25,9]
for op in g.get_operations():
  flops = ops.get_stats_for_node_def(g, op.node_def, 'flops').value
  if flops is not None:
    print 'Flops should be ~',2*25*16*9
    print '25 x 25 x 9 would be',2*25*25*9 # ignores internal dim, repeats first
    print 'TF stats gives',flops

但是，返回的 FLOPS 始终为 None。有没有办法具体测量 FLOPS，尤其是 PB 文件？

原文由 kwotsin 发布，翻译遵循 CC BY-SA 4.0 许可协议

python tensorflow

阅读 863

2 个回答

得票最新

社区维基

发布于
2023-01-10

✓ 已被采纳

有点晚了，但也许它可以帮助将来的一些游客。对于您的示例，我成功测试了以下代码段：

 g = tf.Graph()
run_meta = tf.RunMetadata()
with g.as_default():
    A = tf.Variable(tf.random_normal( [25,16] ))
    B = tf.Variable(tf.random_normal( [16,9] ))
    C = tf.matmul(A,B) # shape=[25,9]

    opts = tf.profiler.ProfileOptionBuilder.float_operation()
    flops = tf.profiler.profile(g, run_meta=run_meta, cmd='op', options=opts)
    if flops is not None:
        print('Flops should be ~',2*25*16*9)
        print('25 x 25 x 9 would be',2*25*25*9) # ignores internal dim, repeats first
        print('TF stats gives',flops.total_float_ops)

也可以将探查器与 Keras 结合使用，如以下代码片段：

 import tensorflow as tf
import keras.backend as K
from keras.applications.mobilenet import MobileNet

run_meta = tf.RunMetadata()
with tf.Session(graph=tf.Graph()) as sess:
    K.set_session(sess)
    net = MobileNet(alpha=.75, input_tensor=tf.placeholder('float32', shape=(1,32,32,3)))

    opts = tf.profiler.ProfileOptionBuilder.float_operation()
    flops = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)

    opts = tf.profiler.ProfileOptionBuilder.trainable_variables_parameter()
    params = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)

    print("{:,} --- {:,}".format(flops.total_float_ops, params.total_parameters))

我希望我能帮上忙！

原文由 Tobias Scheck 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2023-01-10

我想以 Tobias Schnek 的回答为基础，并回答原始问题：如何从 pb 文件中获取 FLOP。

使用 TensorFlow 1.6.0 运行 Tobias answer 的第一段代码

g = tf.Graph()
run_meta = tf.RunMetadata()
with g.as_default():
    A = tf.Variable(tf.random_normal([25,16]))
    B = tf.Variable(tf.random_normal([16,9]))
    C = tf.matmul(A,B)

    opts = tf.profiler.ProfileOptionBuilder.float_operation()
    flops = tf.profiler.profile(g, run_meta=run_meta, cmd='op', options=opts)
    if flops is not None:
        print('Flops should be ~',2*25*16*9)
        print('TF stats gives',flops.total_float_ops)

我们得到以下输出：

 Flops should be ~ 7200
TF stats gives 8288

那么，为什么我们得到 8288 而不是预期的结果 7200=2*25*16*9[a] ？答案在于张量 A 和 B 的初始化方式。使用高斯分布初始化会花费一些 FLOP。更改 A 和 B 的定义

    A = tf.Variable(initial_value=tf.zeros([25, 16]))
    B = tf.Variable(initial_value=tf.zeros([16, 9]))

给出预期的输出 7200 。

通常，网络的变量在其他方案中使用高斯分布进行初始化。大多数时候，我们对初始化 FLOP 不感兴趣，因为它们在初始化期间完成一次，并且不会在训练或推理期间发生。那么， 如何在不考虑初始化 FLOP 的情况下获得 FLOP 的确切数量？

用 pb 冻结图表。从 pb 文件计算 FLOP 实际上是 OP 的用例。

以下片段说明了这一点：

 import tensorflow as tf
from tensorflow.python.framework import graph_util

def load_pb(pb):
    with tf.gfile.GFile(pb, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name='')
        return graph

# ***** (1) Create Graph *****
g = tf.Graph()
sess = tf.Session(graph=g)
with g.as_default():
    A = tf.Variable(initial_value=tf.random_normal([25, 16]))
    B = tf.Variable(initial_value=tf.random_normal([16, 9]))
    C = tf.matmul(A, B, name='output')
    sess.run(tf.global_variables_initializer())
    flops = tf.profiler.profile(g, options = tf.profiler.ProfileOptionBuilder.float_operation())
    print('FLOP before freezing', flops.total_float_ops)
# *****************************

# ***** (2) freeze graph *****
output_graph_def = graph_util.convert_variables_to_constants(sess, g.as_graph_def(), ['output'])

with tf.gfile.GFile('graph.pb', "wb") as f:
    f.write(output_graph_def.SerializeToString())
# *****************************

# ***** (3) Load frozen graph *****
g2 = load_pb('./graph.pb')
with g2.as_default():
    flops = tf.profiler.profile(g2, options = tf.profiler.ProfileOptionBuilder.float_operation())
    print('FLOP after freezing', flops.total_float_ops)

产出

FLOP before freezing 8288
FLOP after freezing 7200

[a]通常矩阵乘法的 FLOP 是乘积 AB 的 mq(2p -1)，其中 A[m, p] 和 B[p, q] 但 TensorFlow 出于某种原因返回 2mpq。已打开一个问题以了解原因。

原文由 BiBi 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

TensorFlow：有没有办法测量模型的 FLOPS？

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

分解质因素的算法很难，理解不了。请问有哪位大佬可以进行解释一下呢？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Stack Overflow 翻译

TensorFlow：有没有办法测量模型的 FLOPS？

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

分解质因素的算法很难，理解不了。 请问有哪位大佬可以进行解释一下呢？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Stack Overflow 翻译

分解质因素的算法很难，理解不了。请问有哪位大佬可以进行解释一下呢？