Discussion:
[Arm-netbook] Libre RISC-V SoC
Luke Kenneth Casson Leighton
2018-01-19 16:06:51 UTC
Permalink
ok so, from the anonymous benefactor (independent of the shakti team),
the one who sponsored me with the zc706 FPGA developer board, he has
just had an interesting meeting with MOSIS, and has confirmed that
they have an LPDDR3 PHY, so in combination with the DDR3 *controller*
that was developed and released by someone working at CERN, this would
be the last major piece of the "interfaces" side of puzzle that, until
then, blocked progress.

so he asked, also, what would it take to get things ready within a
year, to hand over the ASIC-based design to MOSIS, for them to turn it
into an ASIC, and i said "a team of engineers would need to be paid
for". he asked - and please note the question very carefully - "would
USD 250,000 be enough?" to which i replied (genuinely) yes... if done
carefully. software also has to be taken care of.

please note: *that's as far as the conversation has gone so far*.

it is still exciting, and the next phase would be to get a strong
committment and then i can start finding people to do software
bring-up and also develop the VLSI / VHDL which will put all this
together - mostly that's glue logic for the interfaces (putting them
onto a "Tile" interface if using the rocket-chip or BOOM architecture)
but also designing a multiplexer GPIO bank.

i'm also talking to jeff from nyuzi, he designed a software-driven
"compute engine that happens to be reasonably good at 3D", the
interesting bit being that he's focussed on working out which areas
need performance improvements. this is something that's almost
completely lacking in the published academic world... *because nobody
in the academic world has designed and published a GPU!*

we worked out that nyuzi is approximately 1/16th the speed of MALI400.
roughly. although area-for-area it's quite hard to tell whether
that's a fair assessment because you can't *get* die areas for MALI400
(anyone know anything better than these estimates?
https://forum.beyond3d.com/posts/1176110/ ) and it's the performance /
mm^2 / watt that's critical, we worked out that if you put in 2 nyuzi
cores and managed to halve the number of instructions / pixel by
replacing critical path blocks with hardware-rendering ones, you'd end
up at about 25% the performance of MALI400 and that i feel would be
"good enough" for a first version. i'd be interested to hear what
people think, here.

the *general-purpose compute* performance of nyuzi on the other hand
is really good.

anyway lots of planning to do.

l.

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send la
Christopher Havel
2018-01-19 16:12:40 UTC
Permalink
Whoo, excitement. I *really* wish I could help but I'm kind of a
perpetually budding hobbyist here. Let me know if you need something strung
up in 7400- or 4000-series logic, tho -- *that* is a language I can speak ;)
_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large attachments
Luke Kenneth Casson Leighton
2018-01-19 16:16:22 UTC
Permalink
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


On Fri, Jan 19, 2018 at 4:12 PM, Christopher Havel
Post by Christopher Havel
Whoo, excitement. I *really* wish I could help but I'm kind of a
perpetually budding hobbyist here. Let me know if you need something strung
up in 7400- or 4000-series logic, tho -- *that* is a language I can speak ;)
:)

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large attachment
Lauri Kasanen
2018-01-19 16:22:45 UTC
Permalink
On Fri, 19 Jan 2018 16:06:51 +0000
Post by Luke Kenneth Casson Leighton
so he asked, also, what would it take to get things ready within a
year, to hand over the ASIC-based design to MOSIS, for them to turn it
into an ASIC, and i said "a team of engineers would need to be paid
for". he asked - and please note the question very carefully - "would
USD 250,000 be enough?" to which i replied (genuinely) yes... if done
carefully. software also has to be taken care of.
A libre risc-v soc would indeed be great.
Post by Luke Kenneth Casson Leighton
we worked out that if you put in 2 nyuzi
cores and managed to halve the number of instructions / pixel by
replacing critical path blocks with hardware-rendering ones, you'd end
up at about 25% the performance of MALI400 and that i feel would be
"good enough" for a first version. i'd be interested to hear what
people think, here.
25% of MALI400 is not enough for even mobile games: it is enough for
spinning windows ala Compiz, but that can also be done with much less.
So if the goal is something usable, rather than a platform to kickstart
nyuzi development on, targeting lower perf with lower power usage would
be better.

Now, I don't remember if video decode blocks would be in the RISC parts
or in nyuzi. GPUs are usually good at colorspace conversion and
scaling, so if there are no dedicated blocks for those, then the nyuzi
core should be targeted/sized at that usage. Bicubic upscaling + color
conversion to fullhd - the texture units, memory bandwidth and compute
parts need to have enough oomph.

Radeon R300 parts are capable of that. I'm not sure how that maps to %
of Mali.

- Lauri

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send larg
Luke Kenneth Casson Leighton
2018-01-19 18:04:40 UTC
Permalink
Post by Lauri Kasanen
On Fri, 19 Jan 2018 16:06:51 +0000
25% of MALI400 is not enough for even mobile games: it is enough for
spinning windows ala Compiz, but that can also be done with much less.
So if the goal is something usable, rather than a platform to kickstart
nyuzi development on, targeting lower perf with lower power usage would
be better.
general-purpose software-based rendering engines are not
parrtiicularly good at 3D, luckily jeff's work shows exactly where the
prime focus would be needed. he did however point out that due to the
sheer quantity of hardware needed to get that optimised
hardware-accelerated "function" it would turn nyuzi into a totally
different engine.

still thinking about it.
Post by Lauri Kasanen
Now, I don't remember if video decode blocks would be in the RISC parts
or in nyuzi.
neither. opencores has a series of video "blocks" that could go in a
separate engine, DMA-based etc. etc. these will need tto be added as
well.

l.

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large att
Richard Wilbur
2018-01-20 05:47:55 UTC
Permalink
That's awesome news on so many fronts! (libre LPDDR3 PHY, working
relationship with MOSIS, possibility of funding, VHDL glue logic for
interfaces, multiplexer GPIO bank, performance tuning a libre GPU,
video coprocessors with DMA)
Sounds like lots of fun!

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large attachments to arm-***@f
Luke Kenneth Casson Leighton
2018-01-20 06:56:05 UTC
Permalink
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


On Sat, Jan 20, 2018 at 5:47 AM, Richard Wilbur
Post by Richard Wilbur
That's awesome news on so many fronts! (libre LPDDR3 PHY, working
relationship with MOSIS, possibility of funding, VHDL glue logic for
interfaces, multiplexer GPIO bank, performance tuning a libre GPU,
video coprocessors with DMA)
Sounds like lots of fun!
yehyeh! don't forget unlimited resources at an indian university to
create an 8-stage pipeline harvard architecture superscalar 16-core
SMP processor that uses only 120mW per core in 28nm and can run at a
2.5ghz clock rate, mustn't forget that, y'know - i mean it's _just_
the processor bit...

l.

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large attachments t
Bill Kontos
2018-01-22 20:38:26 UTC
Permalink
On Fri, Jan 19, 2018 at 6:06 PM, Luke Kenneth Casson Leighton
Post by Luke Kenneth Casson Leighton
i'm also talking to jeff from nyuzi, he designed a software-driven
"compute engine that happens to be reasonably good at 3D", the
interesting bit being that he's focussed on working out which areas
need performance improvements. this is something that's almost
completely lacking in the published academic world... *because nobody
in the academic world has designed and published a GPU!*
we worked out that nyuzi is approximately 1/16th the speed of MALI400.
roughly. although area-for-area it's quite hard to tell whether
that's a fair assessment because you can't *get* die areas for MALI400
(anyone know anything better than these estimates?
https://forum.beyond3d.com/posts/1176110/ ) and it's the performance /
mm^2 / watt that's critical, we worked out that if you put in 2 nyuzi
cores and managed to halve the number of instructions / pixel by
replacing critical path blocks with hardware-rendering ones, you'd end
up at about 25% the performance of MALI400 and that i feel would be
"good enough" for a first version. i'd be interested to hear what
people think, here.
That would by no means be enough. We need video decoding blocks,
without those the SoC is as good as an rpi for e.g. the education
market that you have mentioned and I'm sure not interested personally
on using a system as a desktop/laptop that can't play videos without
stuttering while potentially doing something else at the same time.
And I'm not even talking about games here. I don't know if there is a
simd extension on the cpu cores that someone can write a real time
decoder for and hook it up to whatever's needed for firefox etc to
automatically redirect to( not that that's an easy thing), but 25% of
an outdated low performance gpu sounds really low to me. What's the
point of having such an awesome low power core if we can't use the
thermal envelope for pushing graphics?
Post by Luke Kenneth Casson Leighton
the *general-purpose compute* performance of nyuzi on the other hand
is really good.
What is gpgpu used on desktops right now for?

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large attachments to arm-***@files.phco
Luke Kenneth Casson Leighton
2018-01-22 21:12:08 UTC
Permalink
Post by Bill Kontos
That would by no means be enough. We need video decoding blocks,
... you mean like this?
https://opencores.org/project,video_systems
Post by Bill Kontos
Post by Luke Kenneth Casson Leighton
the *general-purpose compute* performance of nyuzi on the other hand
is really good.
What is gpgpu used on desktops right now for?
something that usually uses four to ten times more power than the
target power budget for the entire SoC.

l.

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large attachments to arm-***@files.phcomp.co.u
Bill Kontos
2018-01-23 12:23:23 UTC
Permalink
On Mon, Jan 22, 2018 at 11:12 PM, Luke Kenneth Casson Leighton
Post by Luke Kenneth Casson Leighton
... you mean like this?
https://opencores.org/project,video_systems
Yes, maybe with the adition of hevc. That would be ideal.

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large attach
Luke Kenneth Casson Leighton
2018-01-23 12:45:07 UTC
Permalink
Post by Bill Kontos
On Mon, Jan 22, 2018 at 11:12 PM, Luke Kenneth Casson Leighton
Post by Luke Kenneth Casson Leighton
... you mean like this?
https://opencores.org/project,video_systems
Yes, maybe with the adition of hevc. That would be ideal.
do you happen to know if the building blocks - the key high-cpu-load
parts - of HEVC (aka H.265) _happen_ to be the same or near-identical
to MPEG or H.264 and so on?

also critical will be a YUV->RGB converter plus scaler... and oh
look! https://opencores.org/project,video_stream_scaler

if anyone remembers the National Semi Geode LX800 (bought by AMD),
that, staggeringly, could actually do 720p video displayed on
1600x1200 (with a bit of a tear at times), and could easily do
1280x720 (without tearing) @ 30fps.... *ENTIRELY IN SOFTWARE*...
because it had a YUV->RGB converter hard macro that took care of the
most expensive bit.

... and that was a 500mhz 486 with DDR2 RAM! absolutely incredible.

so, anyway, yes: each little piece of the puzzle will be needed,
saving big chunks of CPU cycles.

l.

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large attachments to a
Bill Kontos
2018-01-24 01:42:01 UTC
Permalink
On Tue, Jan 23, 2018 at 2:45 PM, Luke Kenneth Casson Leighton
Post by Luke Kenneth Casson Leighton
do you happen to know if the building blocks - the key high-cpu-load
parts - of HEVC (aka H.265) _happen_ to be the same or near-identical
to MPEG or H.264 and so on?
I don't know. But youtube is pushing vp9 and it's successor av1 now.
These are royalty free, while the situation with h.265 is a bit
unclear to me in regards to what products need royalties or not. One
thing I do know is that h.265 uses blocks of 64x64 pixels for
compression vs 16x16 of h.264.
Post by Luke Kenneth Casson Leighton
also critical will be a YUV->RGB converter plus scaler... and oh
look! https://opencores.org/project,video_stream_scaler
if anyone remembers the National Semi Geode LX800 (bought by AMD),
that, staggeringly, could actually do 720p video displayed on
1600x1200 (with a bit of a tear at times), and could easily do
because it had a YUV->RGB converter hard macro that took care of the
most expensive bit.
... and that was a 500mhz 486 with DDR2 RAM! absolutely incredible.
That sounds impressive indeed.
Post by Luke Kenneth Casson Leighton
so, anyway, yes: each little piece of the puzzle will be needed,
saving big chunks of CPU cycles.
_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send l
Christopher Havel
2018-01-24 01:48:13 UTC
Permalink
I have a thin client with a 366MHz AMD Geode. YouTube anything (even @
240p) almost literally sets it on fire, even with an extremely lightweight
Linux distro on it. It doesn't so much skip frames as it does entire 10+sec
chunks... and that's with 512MB RAM. I can put a gig in there, sort of...
system has a low-level timing issue, I found out from an insider guy --
there is ONE make and model of 1gb PC2700 out there that will work. It's an
APacer brand stick and it's absolutely hen's teeth because I've never found
it. I've been looking for multiple years now...
_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large att
Luke Kenneth Casson Leighton
2018-01-24 05:44:47 UTC
Permalink
On Wed, Jan 24, 2018 at 1:48 AM, Christopher Havel
Post by Christopher Havel
240p) almost literally sets it on fire,
you need to find and compile up the accelerated video extension.
last time i did that was 10 years ago. without it the processor will
have to do its own YUV-to-RGB conversion and yes it will melt.

l.

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send lar
Lauri Kasanen
2018-01-24 07:26:16 UTC
Permalink
On Tue, 23 Jan 2018 20:48:13 -0500
Post by Christopher Havel
240p) almost literally sets it on fire, even with an extremely lightweight
Linux distro on it. It doesn't so much skip frames as it does entire 10+sec
chunks... and that's with 512MB RAM. I can put a gig in there, sort of...
system has a low-level timing issue, I found out from an insider guy --
there is ONE make and model of 1gb PC2700 out there that will work. It's an
APacer brand stick and it's absolutely hen's teeth because I've never found
it. I've been looking for multiple years now...
..plus you need to use proper software, not Firefox/Chrome/whatever,
since those use inefficient methods to be able to overlay and transform
the content. Something like mplayer is both fast and likely to support
the accelerator(s).

- Lauri

_______________________________________________
arm-netbook mailing list arm-***@lists.phcomp.co.uk
http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook
Send large a

Loading...