WEBVTT

1
00:00:00.000 --> 00:00:04.892
This is the Transkribera app
where you can locally and securely

2
00:00:04.892 --> 00:00:10.000
transcribe and process text
on your computer using local models.

3
00:00:10.000 --> 00:00:13.122
The first thing I'll do
is show that you can simply

4
00:00:13.122 --> 00:00:16.000
drag and drop an audio file
directly onto the app.

5
00:00:16.000 --> 00:00:19.000
Then the transcription starts.

6
00:00:19.000 --> 00:00:22.000
So, what is it transcribing with?

7
00:00:22.000 --> 00:00:26.000
It uses a model called
KB Whisper in this case.

8
00:00:26.000 --> 00:00:28.000
You can see that at the top right here.

9
00:00:28.000 --> 00:00:30.000
Here you can choose between
different types of models

10
00:00:30.000 --> 00:00:34.000
and now I'm using one called
KB Whisper Small.

11
00:00:34.000 --> 00:00:36.000
And KB stands for the Swedish National Library.

12
00:00:36.000 --> 00:00:39.000
So the National Library has
fine-tuned

13
00:00:39.000 --> 00:00:42.000
or adjusted an already existing model

14
00:00:42.000 --> 00:00:47.000
and made it even better with
50,000 extra hours of Swedish speech.

15
00:00:47.000 --> 00:00:50.000
The first time you click a model
that's not downloaded,

16
00:00:50.000 --> 00:00:53.000
and also the first time you start
the app, it will download.

17
00:00:53.000 --> 00:00:57.000
That takes anywhere from 5
to 15 minutes.

18
00:00:57.000 --> 00:01:00.000
But if you click here, you'll see

19
00:01:00.000 --> 00:01:02.000
how it starts spinning up here.

20
00:01:02.000 --> 00:01:07.000
When it stops spinning and
the computer icon appears again,

21
00:01:07.000 --> 00:01:09.000
you're good to go.

22
00:01:09.000 --> 00:01:13.000
You can also check how far
it has come in the download.

23
00:01:13.000 --> 00:01:16.000
Otherwise, just wait
and grab a coffee.

24
00:01:16.000 --> 00:01:19.000
We can also choose how we
want to view our transcription.

25
00:01:19.000 --> 00:01:23.000
Either as segments
or all in one block here.

26
00:01:23.000 --> 00:01:26.000
And of course, we can
save our transcription.

27
00:01:26.000 --> 00:01:28.000
You see it says transcription.

28
00:01:28.000 --> 00:01:30.000
And save it as a text file
or as a markdown file.

29
00:01:30.000 --> 00:01:33.000
We can also choose to copy the text.

30
00:01:33.000 --> 00:01:36.000
Then we can open any app
and just paste it in.

31
00:01:36.000 --> 00:01:39.000
Then, of course, we have
text size and similar options.

32
00:01:39.000 --> 00:01:42.000
And if we run into any problems,
we can click here

33
00:01:42.000 --> 00:01:45.000
and it will transcribe again.

34
00:01:45.000 --> 00:01:49.000
If we have local language models
installed on our computer

35
00:01:49.000 --> 00:01:51.000
using the Ollama software,

36
00:01:51.000 --> 00:01:55.000
we can also choose to
process this transcription.

37
00:01:55.000 --> 00:01:57.000
So we click on prompt here.

38
00:01:57.000 --> 00:02:00.000
Then we can pick one of these three.

39
00:02:00.000 --> 00:02:03.000
They're included when you install the app.

40
00:02:03.000 --> 00:02:07.000
You can edit these and add
your own prompts too.

41
00:02:07.000 --> 00:02:09.000
Now we've chosen summary.

42
00:02:09.000 --> 00:02:11.000
If you want to add more
info to this

43
00:02:11.000 --> 00:02:13.000
prompt, you can do that here too.

44
00:02:13.000 --> 00:02:18.000
So I can, for example, write
that Micke from RISE is speaking.

45
00:02:18.000 --> 00:02:21.000
And click save.

46
00:02:21.000 --> 00:02:23.000
Then I can choose the language model here.

47
00:02:23.000 --> 00:02:25.000
And now it says Ollama here.

48
00:02:25.000 --> 00:02:29.000
And I have these models
installed locally on my computer.

49
00:02:29.000 --> 00:02:30.000
So I think I'll use my own.

50
00:02:30.000 --> 00:02:33.000
Gemma 3, 27B.

51
00:02:33.000 --> 00:02:36.000
Which is a good model from Google.

52
00:02:36.000 --> 00:02:39.000
And as mentioned, it runs
completely locally on my computer.

53
00:02:39.000 --> 00:02:42.000
If you have an API key for Berget,

54
00:02:42.000 --> 00:02:47.000
and Berget is also a company
running language models in Sweden.

55
00:02:49.000 --> 00:02:52.000
When we're done, we click process.

56
00:02:52.000 --> 00:02:54.000
Now you can see the window splits up

57
00:02:54.000 --> 00:02:56.000
so we see our transcription here.

58
00:02:56.000 --> 00:03:00.000
And now the summary
starts being created.

59
00:03:00.000 --> 00:03:03.000
And here you see how it
included Micke from RISE

60
00:03:03.000 --> 00:03:05.000
describing two central challenges.

61
00:03:05.000 --> 00:03:09.000
Which was the extra info
I added here.

62
00:03:09.000 --> 00:03:12.000
So here is our processed text.

63
00:03:12.000 --> 00:03:16.000
If I want to save this, you can see
processed text has appeared.

64
00:03:16.000 --> 00:03:20.000
Which I can save as a
text file or markdown file.

65
00:03:20.000 --> 00:03:22.000
Let's go back.

66
00:03:22.000 --> 00:03:25.000
And then I choose to go
back without saving.

67
00:03:25.000 --> 00:03:28.000
At the top right, we have our settings.

68
00:03:28.000 --> 00:03:30.000
And we also have

69
00:03:30.000 --> 00:03:34.000
the option to switch between
light mode and dark mode.

70
00:03:34.000 --> 00:03:38.000
So you can pick what's most
comfortable for your eyes.

71
00:03:38.000 --> 00:03:42.000
Looking at settings here,
we can go to general.

72
00:03:42.000 --> 00:03:46.000
Here we can choose if we want
the app in English or Swedish.

73
00:03:46.000 --> 00:03:49.000
We have API keys.

74
00:03:49.000 --> 00:03:53.000
And that's if you have an
API key for Berget.

75
00:03:53.000 --> 00:03:56.351
You can save it here,
and then you can

76
00:03:56.351 --> 00:04:00.000
send their audio and text
to their AI models.

77
00:04:00.000 --> 00:04:04.000
And if you have Ollama
installed on your computer,

78
00:04:04.000 --> 00:04:07.000
you'll see the models
available to you,

79
00:04:07.000 --> 00:04:10.000
which are all local on your own computer.

80
00:04:10.000 --> 00:04:12.343
If we look at downloaded
models, we see those

81
00:04:12.343 --> 00:04:15.000
models that are actually
installed locally.

82
00:04:15.000 --> 00:04:18.000
And here we can also
choose to remove them.

83
00:04:18.000 --> 00:04:21.909
And if we've added
license keys for

84
00:04:21.909 --> 00:04:26.000
Berget, we can see which
cloud models we have.

85
00:04:26.000 --> 00:04:30.000
For transcription, we have
KB Whisper Large on Berget.

86
00:04:30.000 --> 00:04:33.000
It's the same for language models.

87
00:04:33.000 --> 00:04:35.000
These are from Ollama.

88
00:04:35.000 --> 00:04:38.000
And these models are from Berget.

89
00:04:38.000 --> 00:04:41.000
And at the bottom, we have prompts.

90
00:04:41.000 --> 00:04:45.000
That means we can choose
to summarize text in different ways.

91
00:04:45.000 --> 00:04:49.000
And you can either edit
these existing ones,

92
00:04:49.000 --> 00:04:52.000
or you can choose to
add another prompt.

93
00:04:52.000 --> 00:04:54.000
So let's create a test prompt.

94
00:04:54.000 --> 00:04:58.000
Then you can write your
prompt there and click save.

95
00:04:58.000 --> 00:05:00.000
And now it's

96
00:05:00.000 --> 00:05:02.000
available in the app.

97
00:05:02.000 --> 00:05:06.000
Another thing you can do is
click on new recording.

98
00:05:06.000 --> 00:05:08.000
Then you can select
your audio input.

99
00:05:08.000 --> 00:05:11.000
I'm using the MacBook Pro microphone.

100
00:05:11.000 --> 00:05:14.000
Then you just need to
press record.

101
00:05:14.000 --> 00:05:18.000
Hi there, I'm Micke,
hope all is well, bye.

102
00:05:18.000 --> 00:05:22.000
Then we press stop
and click transcribe.

103
00:05:22.000 --> 00:05:25.756
The first time you download
a new Whisper model

104
00:05:25.756 --> 00:05:30.000
to your computer, it takes a few minutes
for the app to prepare

105
00:05:30.000 --> 00:05:34.000
itself before it can be used.

106
00:05:34.000 --> 00:05:40.000
So don't worry, just grab a coffee, check your mail, or do something else meanwhile.

107
00:05:42.000 --> 00:05:46.000
And now we have the text
I just recorded.

108
00:05:46.000 --> 00:05:50.000
If you have any questions,
just get in touch as usual.

109
00:05:50.000 --> 00:05:52.000
Take care!
